##############################################################################
##############################################################################
##                                                                          ##
##                            SCASY README                                  ## 
##                                                                          ## 
## Version 1.0                                                              ##
##                                                                          ##
## Contributors: Robert Granat and Bo Kgstrm                              ##
##               Department of Computing Science and HPC2N                  ##
##               Ume University, Sweden                                    ##
##                                                                          ##
## Email: {granat,bokg}@cs.umu.se                                           ##
##                                                                          ##
## Date: August 31, 2009.                                                   ##
##                                                                          ##
##                                                                          ##
##############################################################################
##############################################################################

1. Introduction
===============
SCASY is a HPC parallel ScaLAPACK-style library for solving 44 sign and 
transpose variants of eight common Sylvester-type matrix equations.

The latest version will always be available at the library homepage
	
	http://www.cs.umu.se/research/parallel/scasy

2. Quick installation instructions
==================================
First of all, you need a Fortran 90/95 compiler. To install SCASY on your 
system you also need the following software libraries:
	
	a) ScaLAPACK, see http://www.netlib.org/scalapack, including the
	PBLAS and BLACS libraries.
	b) LAPACK, see http://www.netlib.org/lapack
	c) a high-performance BLAS, e.g., ATLAS or GOTO-BLAS. The reference
        BLAS is found here: http://www.netlib.org/blas.
	d) RECSY, see http://www.cs.umu.se/research/parallel/recsy, and its
	dependencies from SLICOT (Software Library In Control), see
	http://www.slicot.org.

In order the build the library, please follow the following steps:

	1. Unpack the SCASY archive
	2. Modify the "Make_include" file according to your wishes -- see
	example files for Portland, Pathscale, gfortran, and NAG.
	3. Type "make" to build the library in the subdirectory "lib". If you 
        wish, type "make all" to also build a test program in the subdirectory 
        "test". More information about the test program can be found below, as
	well is in the README files and also in test/TESTSCASY.f.

3. Available functionality
==========================
The following main contributions are available via SCASY:

	a) Parallel solvers for dense Sylvester-type matrix equations
	b) Parallel condition estimators for Sylvester-type matrix equations
	c) Some parallel auxiliary routines

For more information, see the documentation in docs/ and the SCASY homepage 
	
	http://www.cs.umu.se/research/parallel/scasy

4. Comments, questions and bug-reports
======================================
Please send comments, questions and bug-reports to {granat,bokg}@cs.umu.se

5. How to use the supplied test program TESTSCASY
=================================================
Instruction on how to use the test program is in the first 736 lines of the 
source lib/TESTSCASY.f. For trouble-shooting, please send an e-mail to 
granat@cs.umu.se. 

6. The meaning of the different pre-processing flags
====================================================
The following pre-processing options are available in the compile process. 
Notice that pre-processing must be explicitly activated via a suitable flag to
the chosen Fortran compiler.

USE_DYNAMIC - if this flag is activated, the test program TESTSCASY uses 
dynamic allocation (recommended). If this flag is not activated, all 
allocations are performed statically in the test program.

USE_INTEGER8 - if this flag is activated, the test program uses 8 byte integers
to allocate the used memory area (recommended, especially in cases where large
matrices are considered).

LOOPGRID - if this flag is activated, the test program loops through the 
possible process grid dimensions specified by [ACRO]_NPROW_MIN, 
[ACRO]_NPROW_MAX, [ACRO]_NPCOL_MIN, [ACRO]_NPCOL_MAX using the steps 
[ACRO]_NPROW_STEP, [ACRO]_NPCOL_STEP. The user is responsible for allocating 
enough MPI processes in executing the test program such that 
	# MPI processes >= [ACRO]_NPROW_MAX*[ACRO]_NPCOL_MAX
otherwise the test program will abort with an error message. If this flag is 
not activated, the test program ignores the specified process grid dimensions 
listed above and simply creates a rectangular process grid such that 
NPROW*NPCOL is as close to # MPI processes as possible. The test program is 
by default prepared for # MPI processes = 4.

USE_OMP - if this flag is activated, both the library and the test program
expands OpenMP directives for building a library that can be executed in a
distributed memory enviroment with multithreaded nodes. The number of nodes in
the process grid is specified as before by allocating a number of MPI processes
in combination with/without the LOOPGRID flag. The number of threads per node
is specified by the environment variable OMP_NUM_THREADS. 

USE_NEWPQR - if this flag is activated, the current ScaLAPACK implementation
of the unsymmetric QR algorithm is replaced with a new preliminary multishift
version with advanced deflation techniques, which should speedup the general
solvers for the standard equations considerably. For more information on this
new implementation of the QR algorithm, please consider the references. 

USE_AED_RES - if this flag is activated together with USE_NEWPQR, the new 
parallel QR algorithm used in SCASY checks the residual of each Schur 
decomposition computed in each stage of aggressive early deflation and saves 
the maximum results to an output array. This flag is mostly used for 
debugging purposes, and is not recommended for non-expert users. 

7. Planned future functionality
===============================
The following contributions are planned for future releases:

	d) Parallel reduction routines for the standard and generalized
	Schur reductions
	e) Parallel software for computing invariant/deflating subspaces
	f) Parallel Schur-based Riccati matrix equation solvers

8. Selected references
======================
The following references, and the references cited therein, provide all the 
information you need to understand and use the SCASY library, in particular
references 5-7.

1. Robert Granat, A Parallel ScaLAPACK-style Sylvester Solver, Master's Thesis,
Report UMNAD 435/03, Dept. Computing Science, Ume University, Sweden, January,
2003. 

2. Robert Granat, Bo Kgstrm and Peter Poromaa, Parallel ScaLAPACK-style 
Algorithms for Solving Continous-Time Sylvester Equations. In H. Kosch et al 
(Eds), Euro-Par 2003 Parallel Processing. Lecture Notes in Computer Science, 
Springer Verlag, Vol. 2790, pp. 800-809, 2003. 

3. Robert Granat and Bo Kgstrm. Evaluating Parallel Algorithms for Solving 
Sylvester-Type Matrix Equations: Direct Transformation-Based versus Iterative 
Matrix-Sign-Function-Based Methods. To appear in PARA'04 State-of-the-Art in 
Scientific Computing Conference Proceedings, LCNS, Springer Verlag, 2004. 

4. Robert Granat, Isak Jonsson and Bo Kgstrm, Combining Explicit and 
Recursive Blocking for Solving Triangular Sylvester-Type Matrix Equations on 
Distributed Memory Platforms. In M. Danelutto, D. Laforenza, M. Vanneschi 
(Eds), Euro-Par 2004. Lecture Notes in Computer Science, Springer Verlag, 
Vol. 3149, pp. 742-750, 2004. 

5. Robert Granat and Bo Kgstrm, Parallel Solvers for Sylvester-Type Matrix 
Equations with Applications in Condition Estimation, Part I: Theory and 
Algorithms, ACM TOMS, submitted June 2007, revised January and August 2009. 

6. Robert Granat and Bo Kgstrm, ALGORITHM XXX: The SCASY Library -- Parallel
Solvers for Sylvester-Type Matrix Equations with Applications in Condition 
Estimation, Part II, ACM TOMS, submitted June 2007, revised January and 
August 2009. 

7. Robert Granat and Bo Kgstrm, SCASY Users' Guide, Tech Report UMINF-09.10, 
Dept. Computing Science, Ume University, Sweden, August 2009.
