Next: Parallel I/O Routines For Up: Key Concepts For Parallel Previous: Implementation of the Left-Looking

Approaches To Parallel I/O

Our discussion of parallel I/O for dense matrices assumes that in-core matrices are distributed over processes using a block-cyclic data distribution as in ScaLAPACK [4, 2]. Processes are viewed as being laid out with a two-dimensional logical topology, forming a process mesh. Our approach to parallel I/O for dense matrices hinges on the number of file pointers, and on which processes have access to the file pointers. We divide parallel I/O modes into two broad classes

There is one file pointer into the disk file. In this case some of the possibilities are
1. Only one process has access to the file pointer. Thus only that process can do I/O to the file, and has to scatter to, or gather from, the other processes when reading or writing the file.
2. All processes in a group have individual access to the file pointer. Synchronization is required if the order in which data are written to, or read from, the file is important.
3. All processes in a group have collective access to the file pointer permitting collective I/O operations in which all processes can read the same data from the file, or collectively write to the file in such a way that the data from exactly one of the processes is actually written to the file.
Each process in a group has its own file pointer. We consider here two main possibilities
1. The file pointers can all access a global file space. In this case we refer to the file as a ``shared file.''
2. each file pointer can only access its own local file space. This file space is physically and logically contiguous. In this case we refer to the file as a ``distributed file.''

Modes 1(a) and 1(b) correspond to the case in which there is no parallel I/O system, and all I/O is bound to be sequential. Modes 1(c), 2(a) and 2(b) corresponds to different ways of doing parallel I/O. The shared file mode is the most general since it means a file can be written using one particular process grid and block size and read later using a different process grid and block size. A distributed file can only be read using the same process grid and block size that it was written with. However, a major drawback of a shared file is that, in general, each process can only read and write contiguous elements at a time. This results in very poor performance unless block sizes are very large or unless the process grid is chosen to be (for Fortran codes) so that each column of the matrix lies in one process. The potential for poor performance arises because most I/O systems work best when reading large blocks. Furthermore, if only a small amount of data is written at a time systems such as the Intel Paragon will not stripe the data across disks so I/O is essentially sequentialized.

Next: Parallel I/O Routines For Up: Key Concepts For Parallel Previous: Implementation of the Left-Looking

Jack Dongarra
Thu Apr 18 21:51:24 EDT 1996