The out-of-core prototype codes consists of roughly three components:

(1) One component handles I/O and file management.

(2) The left-look variant of LU,QR, and Cholesky factorization algorithms

(3) Support routines for operations with out-of-core matrix.



(1) I/O Component.
=================

A high level inteceface is provided for read and writing
sections of a ScaLAPACK array. For example,

call ZLAREAD( iodev, m,n,  ia,ja,  
		B,ib,jb,descB,   info )


will read  a m by n submatrix  from an out-of=core matrix
	A( ia:(ia+m-1), ja:(ja+n-1) )
into a ScaLAPACK matrix B,  
	B(ib:(ib+m-1), jb:(jb+n-1) )

There are no alignment constraints on (ia,ja) or (ib,jb), however,
best performance occurs if A, B are processor and block aligned.


Similar to Fortran I/O, each out-of-core matrix is associated with a
device unit number (between 1 to 99). At the lowest level, disk input
and output is record oriented.  Each record is an mmb x nnb ScaLAPACK
matrix, where  mmb is a mulitple of mb*nprow and nnb a multiple of
nb*npcol, (ie  mod(mmb, mb*nprow) == mod(nnb, nb*npcol) == 0).


Each out-of-core matrix, like 2-D Block Cyclicly distributed
ScaLAPACK matrices, is associated with a descriptor. The 
descriptor is constructed by


call PFDESCINIT( descA, m,n, mb,nb,   rsrc,csrc, ictxt,
		iodev,  filetype, mmb,nnb, Asize,  filename, info )

where  

iodev	integer  (1 <= iodev <= 99)
	iodev associated with the out-of-core matrix

filetype character*1

	'D'	data is distributed in many files,

		This option is best on an environment where
		each processor has access to a fast local disk.

	'S'	data is shared in one file.

		This option is best on an environment where
		a parallel/concurrent file system supports 
		concurrent read/write requests such as the Paragon
		Parallel file system.

		Note that some NFS implmentation may not support
		concurrent read/write.

	'I'	similar to 'S' where data is shared and interleaved
		in one shared file.


filename	character*(*)

	file to be associated with the out-of-core matrix.

	if filename starts with '/' (filename(1:1).eq.'/')
	it is assumed to be a full absolute path name.

	Otherwise, the file is assumed to be on a fast disk partition
	such as '/tmp' or '/pfs'.


Asize	integer

	The size of temporary work space/buffer  to be used
	in accessing the out-of-core matrix.
	


The I/O unit can be close by

call LACLOSE( iodev, 'NoKeep', myid, nproc, info ) 

to remove the file or

call LACLOSE( iodev, 'Keep', myid, nproc, info ) 

to keep the file.

Note that the data layout on disk is tied to  the processor grid 
(nprow,npcol) and block size (mb,nb).





(2) 'Left-looking' variant of LU, QR, Cholesky, factorzations.

LAPACK and ScaLAPACK implements a 'Right-Looking' variant of
LU, QR and Cholesky factorization. A 'Left-looking' variant
can reduce the volume of I/O for out-of-core algorithms.

A column oriented implementation is chosen to reuse most of
ScaLAPACK routines for performing pivoting or applying Householder
elementary operations.

For best performance, the factorization routines shall need
a minium of 2 (m by nnb) ScaLAPACK array panels. The algorithm
attempt to use a variable width panel (eg in Cholesky factorization)
to fully utilize all of in core memory.

PFxGEQRF	--- QR factorization
PFxGEQRS	--- solve with QR factorization

PFxTRF		--- LU factorization
PFxTRS		--- solve with LU factorization

PFxPOTRF	--- Cholesky factorization
PFxPOTRS	--- solve with Cholesky factorization


(3) Support routines.

A number of support routines are written to operate
on out-of-core matrices.

PFxGEMM	 -- matrix-matrix operation where at most one
	    descriptor is associated with an out-of-core matrix.

PFxTRSM --  perform triangular solve where the triangular factor
	    is out-of-core.

PFxORMQR 
(PFxUNMQR) -- apply Householder elementary transformation.

PFxMATGEN -- generate a random out-of-core matrix.

PFxLAPRNT --- print the contents of an out-of-core matrix.