next up previous
Next: Sequential Out-Of-Core LU Factorization Up: Key Concepts For Parallel Previous: Key Concepts For Parallel

Introduction

  The in-core solution of dense linear systems typically takes less than one hour on the largest parallel computers, even when the system occupies all of memory. For example, on 1,000 processors of an Intel paragon supercomputer, each with 16 Mbytes of memory, it takes about 22 minutes to factor and solve at 64-bit precision a dense linear system of order 40,000 that fills up all the memory available to applications. This indicates that the processing power of such machines is underutilized in problems that require the solution of a single linear system in the sense that much larger systems could be solved before the run time became prohibitively large. In the absence of substantial increases in the ratio of memory to processing power it is natural to develop out-of-core solvers to tackle very large linear systems. These types of large linear system arise, for example, in three-dimensional electromagnetic scattering problems and in fluid flow past complex objects [10, 11].

This paper presents a prototype for the design of a parallel software library for the out-of-core solution of dense linear systems. In section 2, we consider left- and right-looking, out-of-core parallel LU factorization routines and propose a hybrid version that balances the degree of parallelism with the amount of I/O. In section 4 different approaches to parallel I/O are discussed. Section 5 outlines the main components of a library of routines for performing I/O on dense matrices. A complete parallel, out-of-core LU factorization routine is described in section 6. This algorithm is implemented in terms of the BLACS [9], PBLAS [3], and ScaLAPACK [2] routines. Section 7 presents some preliminary performance results on the Intel Paragon. A summary and conclusions are presented in section 8.



Jack Dongarra
Thu Apr 18 21:51:24 EDT 1996