Our assumption that parallel algorithms are complex entities seems to be borne out by the fact that nearly everyone who has invested the (minimal) time to use the profiling tools on their application has come away understanding something better than before. In some cases, the revelations have been so profound that significant performance enhancements have been made possible.
In general, the system has been found easy to use, given a basic understanding of the parallel algorithm being profiled, and most users have no difficulty recognizing their applications from the various displays. On the other hand, the integration between the different profiling aspects is not yet as tight as one might wish and we are currently working on this aspect.
Another interesting issue that comes up with great regularity is the request on behalf of the users for a button marked ``Why?'', which would automatically analyze the profile data being presented and then point out a block of source code and a suggestion for how to improve its performance. In general, this is clearly too difficult, but it is interesting to note that certain types of runtime system are more amenable to this type of analysis than others. The ``distribution profiler,'' for instance, possesses enough information to perform quite complex communication and I/O optimizations on an algorithm and we are currently exploring ways of implementing these strategies. It is possible that this line of thought may eventually lead us to a more complete programming model than is in use now-one which will be more amenable to the automation of parallel processing that has long been our goal.