Fortran Parallel Programming Systems

Ken Kennedy (director), Christian Bischof, Preston Briggs, Alan Carle, Alok Choudhary, Keith Cooper, Geoffrey Fox, Andreas Griewank, Tom Haupt, Seema Hiranandani, Charles Koelbel, John Mellor-Crummey, Ravi Ponnusamy, Sanjay Ranka, Joel Saltz, Alan Sussman, Linda Torczon, and Scott Warren

The objective of the Fortran Parallel Programming Systems project is to make parallel computer systems usable for programmers in Fortran, a widely used language in the scientific community. In this effort, a special emphasis is placed on data-parallel programming and scalable parallelism.

To achieve this objective, the research group is developing a coordinated programming system that includes compilers and tools for Fortran D, an extended dialect of Fortran that supports machine-independent data-parallel programming. The tools support a variety of parallel programming activities, including intelligent editing and program transformation, parallel debugging, performance estimation, performance visualization and tuning, and automatic data partitioning.

Research efforts also include validation of the compilers and tools on realistic applications, as well as investigations of new functionality to handle irregular computations, parallel I/O, and automatic differentiation using the program analysis infrastructure developed for the project.

Ken Kennedy's research interests include parallel computing in science and engineering, scientific programming environments, and optimization of compiled code. He has published more than 80 technical articles and supervised more than 20 Ph.D. dissertations on programming support software for high-performance computer systems. He has supervised the construction of two substantial software systems for programming parallel machines: an automatic vectorizer for Fortran 77 and an integrated scientific programming environment. His current work focuses on extending techniques developed for automatic vectorization to programming tools for parallel computer systems and high-performance microprocessors. Through the CRPC, he is seeking to develop new strategies for supporting architecture-independent parallel programming. Kennedy was elected to the National Academy of Engineering in 1990 and currently serves on the Computer Science and Telecommunications Board of the National Research Council.

Fortran D Language and Compilers

Existing languages for parallel programming on scalable parallel systems are primitive and hard to use. They are primitive in the sense that each one reflects the architecture of the target machine for which it is intended, making programs written for current parallel systems highly machine-dependent. As a result, there is no protection of the programming investment on parallel machines -a program written for one target machine may need to be completely rewritten when the next-generation machine is available. This situation is the principal impediment to widespread use of scalable parallel systems for science and engineering problems.

To address this problem, CRPC researchers have developed Fortran D, a set of extensions to Fortran 77 and Fortran 90 that permit the programmer to specify, in a machine-independent way, how to distribute a program's principal data structures among the processors of a parallel system. In addition, Fortran D makes programming easier than it is with explicit message-passing, because programmers can write codes that use a shared name space, independent of the target architecture. Programmers find a shared name space easier to use than a distributed name space because data placement and access issues can be ignored. Using sophisticated compiler techniques, these "high-level" programs can be compiled for both SIMD and MIMD parallel architectures.

The Fortran D research effort has led to prototype compilers for the Intel Paragon and Thinking Ma-ch ines CM-5 for both Fortran 77D and Fortran 90D. In addition, the Fortran 90D compiler has been ported to a number of other machines, including the Intel Paragon, nCube/2, and a network of workstations. Compilers for other machines, such as the SIMD MasPar MP-2, are under development. The strategy for all these compilers is based upon deep program analysis, aggressive communication optimization, advanced code-generation techniques and the use of sophisticated computation and communication libraries. The effectiveness of these methods is being evaluated using a suite of scientific programs developed by CRPC researchers at Syracuse University.

Fortran D was a major impetus behind the recently distributed informal standard for High Performance Fortran (HPF). The High Performance Fortran Forum, which produced the standard, was convened by the CRPC and included representatives from industry, academia, and government laboratories. The Fortran D compilers produced by the CRPC are being used as models for several commercial HPF compilers. Thus, the project has established an efficient technology transfer mechanism by which new features in Fortran D, once demonstrated, may be included in a future round of HPF specification.

The Fortran group also works closely with applications scientists and engineers working on "irregular" scientific problems, such as computational fluid dynamics, computational chemistry, computational biology, structural mechanics, and electrical power grid calculations. Key aspects of the research associated with irregular scientific problems focuses on the development of portable runtime support libraries which (1) coordinate interprocessor data movement, (2) manage the storage of, and access to, copies of off-processor data, (3) support a shared name space, and (4) couple runtime data and workload partition ers to compilers. These runtime support libraries are being used to port application codes to a variety of multiprocessor architectures and are being incorporated into the Fortran D distributed-memory compilers.

Keith Cooper's research interests include programming environments, interprocedural analysis and optimization, and compiling for advanced architecture microprocessors. He is one of the principal implementors of the Rn/ParaScope compiler for Fortran, which serves as a testbed for research in optimization and code generation techniques. In a long-established collaboration with Ken Kennedy, he has developed the fastest-known algorithms for solving the flow-insensitive interprocedural summary and aliasing problems. His current work in this area is aimed at developing practical and effective techniques for cross-procedural optimization. In collaboration with other CRPC researchers, he has developed a set of improvements to previous techniques for register allocation based on the graph-coloring paradigm. That investigation has now shifted to look at compiler management of latency. Two areas of particular focus are optimization for deep-memory hierarchies and aggressive forms of instruction scheduling.

ParaScope Programming Environment

ParaScope is a collection of tools that support parallel programming at the level of whole programs. It was initially designed to support development of Fortran programs with explicit parallelism in the form of parallel loops. The fundamental issue addressed in the system is understanding how memory values are shared among parallel tasks when constructing parallel programs that use a single shared name space. At the heart of ParaScope are its facilities for interprocedural analysis and optimization of Fortran programs and an intelligent editor that provides programmers with information about how memory is shared among loop iterations, helping them make decisions about how parallelism can best be exploited. For Fortran programs in which parallelism is expressed explicitly in the form of parallel loops, detecting when two or more processors access memory in a conflicting manner is a central issue for program correctness. The Fortran group has developed a technique to automatically pinpoint such accesses, known as data races, during execution of programs with nested parallel loops. The technique has been implemented in a prototype debugging system as part of the ParaScope parallel programming environment.

Joel Saltz came to the University of Maryland after spending three years at ICASE at the NASA-Langley Research Center as Lead Computer Scientist and three years at Yale University as an Assistant Professor. He leads a research group at the University of Maryland, College Park whose goal is to develop methods that will make it possible to produce portable compilers that generate efficient multiprocessor code for irregular scientific problems, i.e. problems that are unstructured, sparse, adaptive or block structured. He collaborates with a wide variety of appications researchers from areas such as computational fluid dynamics, computational chemistry, computational biology, and structural mechanics.

Related Projects

The design and construction of a successor to ParaScope, known as the D system, has begun under ARPA funding. The D system will support the development of programs written in Fortran 90D. The central issue in the development of these tools will be to bridge the gap between the high-level source program and the resulting code for a particular target parallel architecture. These tools will help programmers select data distributions and provide support for debugging code executing on a parallel machine in terms of the original high-level source. As the D system and Fortran D compiler technology mature, the Fortran D compiler will be integrated into the D system and retargeted to non-uniform memory-access shared-memory machines.

Members of the Fortran group are involved in several additional collaborations that are capitalizing on the available software infrastructure. For instance, researchers at Rice University and Argonne National Laboratory are continuing to enhance ADIFOR, an automatic differentiation tool for Fortran built upon the ParaScope infrastructure, to support sensitivity analysis of large simulation codes for use in multidisciplinary design optimization by members of the CRPC Parallel Optimization group. The Massively Scalar Compiler Project at Rice is exploiting the interprocedural analysis engine developed for ParaScope and is also investigating interactions between parallelizing transformations and scalar node performance. The Fortran group is also collaborating with the CRPC Parallel Paradigm Integration project to investigate ways of integrating ParaScope and Fortran D-style data decomposition directives into Fortran M, a modular version of Fortran. Syracuse University is coordinating an ARPA activity to set up a Parallel Compiler Runtime Consortium. This involves other CRPC sites and aims to design and implement common runtime support for parallel Fortran, C++ and Ada for both data and task parallelism. Finally, the CRPC is an active collaborator in a project by the Intel Delta Consortium to develop software support for parallel I/O. The Fortran project researchers will develop and implement extensions to Fortran D that support "out-of-core" arrays, which are too large to fit into the main memory of even a massively parallel computer system.

Linda Torczon's research interests include code generation, interprocedural data-flow analysis and optimization, and programming environments. In the code generation realm, she has published a set of improvements to graph coloring register allocation. In the area of interprocedural analysis and optimization, she has developed techniques for interprocedural constant propagation and recompilation analysis. She has also completed studies on the effectiveness of interprocedural optimization and on the relative effectiveness of several interprocedural constant propagation techniques. In the programming environment arena, she is one of the driving forces behind the ParaScope programming environment project. She is a principal architect of the framework for whole-program analysis in the ParaScope programming environment and one of the key implementors of an optimizing compiler for Fortran. Torczon is involved in a collaboration to determine the overall effectiveness of inter-procedural constant propagation techniques, and another in which she is examining a set of problems that arise in building an optimizing compiler for the Intel iPSC/860 and the IBM RS/6000.