From the earliest days of parallel computing, the fundamental goal was to accelerate the performance of algorithms that ran too slowly on sequential machines. As has been described in many other places in this book, the effort to do basic research in computer science was always secondary to the need for algorithms that solved practical problems more quickly than was possible on other machines.
One might think that an important prerequisite for this would be advanced profiling technology. In fact, about the most advanced piece of equipment then in use was a wristwatch! Most algorithms were timed on one node, then on two, then on four, and so on. The results of this analysis were then compared with the theoretically derived models for the applications. If all was well, one proceeded to number-crunch; if not, one inserted print statements and timed the gaps between them to see what pieces of code were behaving in ways not predicted by the models.
Even the breakthrough of having a function that a program could call to get at timing information was a long time coming, and even then proved somewhat unpopular, since it had different names on different machines and didn't even exist on the sequential machines. As a result, people tended to just not bother with it rather than mess up their codes with many different timing routines.