Hopefully, the visualization system goes some way towards the development of a parallel algorithm. One must then code and debug the application which, as has been described previously, can be a reasonably time-consuming process. Finally, one comes to the ``crisis'' point of actually running the parallel code and seeing how fast it goes.
One of our major concerns in developing performance analysis tools was to make them easy to use. The standard UNIX method of taking the completed program, deleting all its object files, and then recompiling them with special switches seemed to be asking too much for parallel programs because the process is so iterative. On a sequential machine, the profiler may be run once or twice, usually just to check that the authors' impressions of performance are correct. On a parallel computer, we feel that the optimization phase should more correctly be included in the development cycle than as an afterthought, because we believe that few parallel applications perform at their best immediately after debugging is complete. We wanted, therefore, to have a system that could give important information about an algorithm without any undue effort.
The system to be described works with the simple addition of either a runtime switch or the definition of an ``environment'' variable, and makes available about 90% of the capabilities of the entire package. To use some of the most exotic features, one must recompile code.