LAPACK Implementation
DO 10 J = 1, N
CALL STRSV( 'Upper', 'Transpose', 'Non-Unit’, J-1, A, LDA, A( 1, J ), 1 )
S = A( J, J ) - SDOT( J-1, A( 1, J ), 1, A( 1, J ), 1 )
IF( S.LE.ZERO ) GO TO 20
A( J, J ) = SQRT( S )
10 CONTINUE
This change by itself is sufficient to significantly improve the performance on a number of machines.
From 72 to 251 Mflop/s for a matrix of order 500 on one processor of a CRAY Y-MP.
However on 378 Mflop/s on 8 Procs. Of a CRAY Y-MP.
Suggest further work needed.