Of course, this was all totally adequate for the first few applications that were parallelized, since their behavior was so simple to model. A program solving Laplace's equation on a square grid, for example, has a very simple model that one would actually have to work quite hard not to find in a parallel code. As time passed, however, more complex problems were attempted which weren't so easy to model and tools had to be invented.
Of course, this discussion has missed a rather important point which we also tended to overlook in the early days.
When comparing performance of the problems on one, two, four, eight, and so on nodes, one is really only assessing the efficiency of the parallel version of the code. However, an algorithm that achieves 100 percent efficiency on a parallel computer may still be worthless if its absolute performance is lower than that of a sequential code running on another machine.
Again, this was not so important in the earliest days, since the big battle over architectures had not yet arisen. Nowadays, however, when there is a multitude of sequential and parallel supercomputers, it is extremely important to be able to know that a parallel version of a code is going to outperform a sequential version running on another architecture. It is becoming increasingly important to be able to understand what complex algorithms are doing and why, so that the performance of the software and hardware can both be tuned to achieve best results.
This section attempts to discuss some of the issues surrounding algorithm visualization, parallelization and performance optimization, and the tools which CP developed to help in this area. A major recent general tool, PABLO [Reed:91a] has been developed at Illinois by Reed's group, but here we only describe the CP activity. One of the earliest tools was Seecube [Couch:88a;88b].