LAPACK 3.12.1
LAPACK: Linear Algebra PACKage
Loading...
Searching...
No Matches
sgejsv.f
Go to the documentation of this file.
1*> \brief \b SGEJSV
2*
3* =========== DOCUMENTATION ===========
4*
5* Online html documentation available at
6* http://www.netlib.org/lapack/explore-html/
7*
8*> Download SGEJSV + dependencies
9*> <a href="http://www.netlib.org/cgi-bin/netlibfiles.tgz?format=tgz&filename=/lapack/lapack_routine/sgejsv.f">
10*> [TGZ]</a>
11*> <a href="http://www.netlib.org/cgi-bin/netlibfiles.zip?format=zip&filename=/lapack/lapack_routine/sgejsv.f">
12*> [ZIP]</a>
13*> <a href="http://www.netlib.org/cgi-bin/netlibfiles.txt?format=txt&filename=/lapack/lapack_routine/sgejsv.f">
14*> [TXT]</a>
15*
16* Definition:
17* ===========
18*
19* SUBROUTINE SGEJSV( JOBA, JOBU, JOBV, JOBR, JOBT, JOBP,
20* M, N, A, LDA, SVA, U, LDU, V, LDV,
21* WORK, LWORK, IWORK, INFO )
22*
23* .. Scalar Arguments ..
24* IMPLICIT NONE
25* INTEGER INFO, LDA, LDU, LDV, LWORK, M, N
26* ..
27* .. Array Arguments ..
28* REAL A( LDA, * ), SVA( N ), U( LDU, * ), V( LDV, * ),
29* $ WORK( LWORK )
30* INTEGER IWORK( * )
31* CHARACTER*1 JOBA, JOBP, JOBR, JOBT, JOBU, JOBV
32* ..
33*
34*
35*> \par Purpose:
36* =============
37*>
38*> \verbatim
39*>
40*> SGEJSV computes the singular value decomposition (SVD) of a real M-by-N
41*> matrix [A], where M >= N. The SVD of [A] is written as
42*>
43*> [A] = [U] * [SIGMA] * [V]^t,
44*>
45*> where [SIGMA] is an N-by-N (M-by-N) matrix which is zero except for its N
46*> diagonal elements, [U] is an M-by-N (or M-by-M) orthonormal matrix, and
47*> [V] is an N-by-N orthogonal matrix. The diagonal elements of [SIGMA] are
48*> the singular values of [A]. The columns of [U] and [V] are the left and
49*> the right singular vectors of [A], respectively. The matrices [U] and [V]
50*> are computed and stored in the arrays U and V, respectively. The diagonal
51*> of [SIGMA] is computed and stored in the array SVA.
52*> SGEJSV can sometimes compute tiny singular values and their singular vectors much
53*> more accurately than other SVD routines, see below under Further Details.
54*> \endverbatim
55*
56* Arguments:
57* ==========
58*
59*> \param[in] JOBA
60*> \verbatim
61*> JOBA is CHARACTER*1
62*> Specifies the level of accuracy:
63*> = 'C': This option works well (high relative accuracy) if A = B * D,
64*> with well-conditioned B and arbitrary diagonal matrix D.
65*> The accuracy cannot be spoiled by COLUMN scaling. The
66*> accuracy of the computed output depends on the condition of
67*> B, and the procedure aims at the best theoretical accuracy.
68*> The relative error max_{i=1:N}|d sigma_i| / sigma_i is
69*> bounded by f(M,N)*epsilon* cond(B), independent of D.
70*> The input matrix is preprocessed with the QRF with column
71*> pivoting. This initial preprocessing and preconditioning by
72*> a rank revealing QR factorization is common for all values of
73*> JOBA. Additional actions are specified as follows:
74*> = 'E': Computation as with 'C' with an additional estimate of the
75*> condition number of B. It provides a realistic error bound.
76*> = 'F': If A = D1 * C * D2 with ill-conditioned diagonal scalings
77*> D1, D2, and well-conditioned matrix C, this option gives
78*> higher accuracy than the 'C' option. If the structure of the
79*> input matrix is not known, and relative accuracy is
80*> desirable, then this option is advisable. The input matrix A
81*> is preprocessed with QR factorization with FULL (row and
82*> column) pivoting.
83*> = 'G': Computation as with 'F' with an additional estimate of the
84*> condition number of B, where A=D*B. If A has heavily weighted
85*> rows, then using this condition number gives too pessimistic
86*> error bound.
87*> = 'A': Small singular values are the noise and the matrix is treated
88*> as numerically rank deficient. The error in the computed
89*> singular values is bounded by f(m,n)*epsilon*||A||.
90*> The computed SVD A = U * S * V^t restores A up to
91*> f(m,n)*epsilon*||A||.
92*> This gives the procedure the licence to discard (set to zero)
93*> all singular values below N*epsilon*||A||.
94*> = 'R': Similar as in 'A'. Rank revealing property of the initial
95*> QR factorization is used do reveal (using triangular factor)
96*> a gap sigma_{r+1} < epsilon * sigma_r in which case the
97*> numerical RANK is declared to be r. The SVD is computed with
98*> absolute error bounds, but more accurately than with 'A'.
99*> \endverbatim
100*>
101*> \param[in] JOBU
102*> \verbatim
103*> JOBU is CHARACTER*1
104*> Specifies whether to compute the columns of U:
105*> = 'U': N columns of U are returned in the array U.
106*> = 'F': full set of M left sing. vectors is returned in the array U.
107*> = 'W': U may be used as workspace of length M*N. See the description
108*> of U.
109*> = 'N': U is not computed.
110*> \endverbatim
111*>
112*> \param[in] JOBV
113*> \verbatim
114*> JOBV is CHARACTER*1
115*> Specifies whether to compute the matrix V:
116*> = 'V': N columns of V are returned in the array V; Jacobi rotations
117*> are not explicitly accumulated.
118*> = 'J': N columns of V are returned in the array V, but they are
119*> computed as the product of Jacobi rotations. This option is
120*> allowed only if JOBU .NE. 'N', i.e. in computing the full SVD.
121*> = 'W': V may be used as workspace of length N*N. See the description
122*> of V.
123*> = 'N': V is not computed.
124*> \endverbatim
125*>
126*> \param[in] JOBR
127*> \verbatim
128*> JOBR is CHARACTER*1
129*> Specifies the RANGE for the singular values. Issues the licence to
130*> set to zero small positive singular values if they are outside
131*> specified range. If A .NE. 0 is scaled so that the largest singular
132*> value of c*A is around SQRT(BIG), BIG=SLAMCH('O'), then JOBR issues
133*> the licence to kill columns of A whose norm in c*A is less than
134*> SQRT(SFMIN) (for JOBR = 'R'), or less than SMALL=SFMIN/EPSLN,
135*> where SFMIN=SLAMCH('S'), EPSLN=SLAMCH('E').
136*> = 'N': Do not kill small columns of c*A. This option assumes that
137*> BLAS and QR factorizations and triangular solvers are
138*> implemented to work in that range. If the condition of A
139*> is greater than BIG, use SGESVJ.
140*> = 'R': RESTRICTED range for sigma(c*A) is [SQRT(SFMIN), SQRT(BIG)]
141*> (roughly, as described above). This option is recommended.
142*> ===========================
143*> For computing the singular values in the FULL range [SFMIN,BIG]
144*> use SGESVJ.
145*> \endverbatim
146*>
147*> \param[in] JOBT
148*> \verbatim
149*> JOBT is CHARACTER*1
150*> If the matrix is square then the procedure may determine to use
151*> transposed A if A^t seems to be better with respect to convergence.
152*> If the matrix is not square, JOBT is ignored. This is subject to
153*> changes in the future.
154*> The decision is based on two values of entropy over the adjoint
155*> orbit of A^t * A. See the descriptions of WORK(6) and WORK(7).
156*> = 'T': transpose if entropy test indicates possibly faster
157*> convergence of Jacobi process if A^t is taken as input. If A is
158*> replaced with A^t, then the row pivoting is included automatically.
159*> = 'N': do not speculate.
160*> This option can be used to compute only the singular values, or the
161*> full SVD (U, SIGMA and V). For only one set of singular vectors
162*> (U or V), the caller should provide both U and V, as one of the
163*> matrices is used as workspace if the matrix A is transposed.
164*> The implementer can easily remove this constraint and make the
165*> code more complicated. See the descriptions of U and V.
166*> \endverbatim
167*>
168*> \param[in] JOBP
169*> \verbatim
170*> JOBP is CHARACTER*1
171*> Issues the licence to introduce structured perturbations to drown
172*> denormalized numbers. This licence should be active if the
173*> denormals are poorly implemented, causing slow computation,
174*> especially in cases of fast convergence (!). For details see [1,2].
175*> For the sake of simplicity, this perturbations are included only
176*> when the full SVD or only the singular values are requested. The
177*> implementer/user can easily add the perturbation for the cases of
178*> computing one set of singular vectors.
179*> = 'P': introduce perturbation
180*> = 'N': do not perturb
181*> \endverbatim
182*>
183*> \param[in] M
184*> \verbatim
185*> M is INTEGER
186*> The number of rows of the input matrix A. M >= 0.
187*> \endverbatim
188*>
189*> \param[in] N
190*> \verbatim
191*> N is INTEGER
192*> The number of columns of the input matrix A. M >= N >= 0.
193*> \endverbatim
194*>
195*> \param[in,out] A
196*> \verbatim
197*> A is REAL array, dimension (LDA,N)
198*> On entry, the M-by-N matrix A.
199*> \endverbatim
200*>
201*> \param[in] LDA
202*> \verbatim
203*> LDA is INTEGER
204*> The leading dimension of the array A. LDA >= max(1,M).
205*> \endverbatim
206*>
207*> \param[out] SVA
208*> \verbatim
209*> SVA is REAL array, dimension (N)
210*> On exit,
211*> - For WORK(1)/WORK(2) = ONE: The singular values of A. During the
212*> computation SVA contains Euclidean column norms of the
213*> iterated matrices in the array A.
214*> - For WORK(1) .NE. WORK(2): The singular values of A are
215*> (WORK(1)/WORK(2)) * SVA(1:N). This factored form is used if
216*> sigma_max(A) overflows or if small singular values have been
217*> saved from underflow by scaling the input matrix A.
218*> - If JOBR='R' then some of the singular values may be returned
219*> as exact zeros obtained by "set to zero" because they are
220*> below the numerical rank threshold or are denormalized numbers.
221*> \endverbatim
222*>
223*> \param[out] U
224*> \verbatim
225*> U is REAL array, dimension ( LDU, N ) or ( LDU, M )
226*> If JOBU = 'U', then U contains on exit the M-by-N matrix of
227*> the left singular vectors.
228*> If JOBU = 'F', then U contains on exit the M-by-M matrix of
229*> the left singular vectors, including an ONB
230*> of the orthogonal complement of the Range(A).
231*> If JOBU = 'W' .AND. (JOBV = 'V' .AND. JOBT = 'T' .AND. M = N),
232*> then U is used as workspace if the procedure
233*> replaces A with A^t. In that case, [V] is computed
234*> in U as left singular vectors of A^t and then
235*> copied back to the V array. This 'W' option is just
236*> a reminder to the caller that in this case U is
237*> reserved as workspace of length N*N.
238*> If JOBU = 'N' U is not referenced, unless JOBT='T'.
239*> \endverbatim
240*>
241*> \param[in] LDU
242*> \verbatim
243*> LDU is INTEGER
244*> The leading dimension of the array U, LDU >= 1.
245*> IF JOBU = 'U' or 'F' or 'W', then LDU >= M.
246*> \endverbatim
247*>
248*> \param[out] V
249*> \verbatim
250*> V is REAL array, dimension ( LDV, N )
251*> If JOBV = 'V', 'J' then V contains on exit the N-by-N matrix of
252*> the right singular vectors;
253*> If JOBV = 'W', AND (JOBU = 'U' AND JOBT = 'T' AND M = N),
254*> then V is used as workspace if the procedure
255*> replaces A with A^t. In that case, [U] is computed
256*> in V as right singular vectors of A^t and then
257*> copied back to the U array. This 'W' option is just
258*> a reminder to the caller that in this case V is
259*> reserved as workspace of length N*N.
260*> If JOBV = 'N' V is not referenced, unless JOBT='T'.
261*> \endverbatim
262*>
263*> \param[in] LDV
264*> \verbatim
265*> LDV is INTEGER
266*> The leading dimension of the array V, LDV >= 1.
267*> If JOBV = 'V' or 'J' or 'W', then LDV >= N.
268*> \endverbatim
269*>
270*> \param[out] WORK
271*> \verbatim
272*> WORK is REAL array, dimension (MAX(7,LWORK))
273*> On exit,
274*> WORK(1) = SCALE = WORK(2) / WORK(1) is the scaling factor such
275*> that SCALE*SVA(1:N) are the computed singular values
276*> of A. (See the description of SVA().)
277*> WORK(2) = See the description of WORK(1).
278*> WORK(3) = SCONDA is an estimate for the condition number of
279*> column equilibrated A. (If JOBA = 'E' or 'G')
280*> SCONDA is an estimate of SQRT(||(R^t * R)^(-1)||_1).
281*> It is computed using SPOCON. It holds
282*> N^(-1/4) * SCONDA <= ||R^(-1)||_2 <= N^(1/4) * SCONDA
283*> where R is the triangular factor from the QRF of A.
284*> However, if R is truncated and the numerical rank is
285*> determined to be strictly smaller than N, SCONDA is
286*> returned as -1, thus indicating that the smallest
287*> singular values might be lost.
288*>
289*> If full SVD is needed, the following two condition numbers are
290*> useful for the analysis of the algorithm. They are provided for
291*> a developer/implementer who is familiar with the details of
292*> the method.
293*>
294*> WORK(4) = an estimate of the scaled condition number of the
295*> triangular factor in the first QR factorization.
296*> WORK(5) = an estimate of the scaled condition number of the
297*> triangular factor in the second QR factorization.
298*> The following two parameters are computed if JOBT = 'T'.
299*> They are provided for a developer/implementer who is familiar
300*> with the details of the method.
301*>
302*> WORK(6) = the entropy of A^t*A :: this is the Shannon entropy
303*> of diag(A^t*A) / Trace(A^t*A) taken as point in the
304*> probability simplex.
305*> WORK(7) = the entropy of A*A^t.
306*> \endverbatim
307*>
308*> \param[in] LWORK
309*> \verbatim
310*> LWORK is INTEGER
311*> Length of WORK to confirm proper allocation of work space.
312*> LWORK depends on the job:
313*>
314*> If only SIGMA is needed ( JOBU = 'N', JOBV = 'N' ) and
315*> -> .. no scaled condition estimate required (JOBE = 'N'):
316*> LWORK >= max(2*M+N,4*N+1,7). This is the minimal requirement.
317*> ->> For optimal performance (blocked code) the optimal value
318*> is LWORK >= max(2*M+N,3*N+(N+1)*NB,7). Here NB is the optimal
319*> block size for SGEQP3 and SGEQRF.
320*> In general, optimal LWORK is computed as
321*> LWORK >= max(2*M+N,N+LWORK(SGEQP3),N+LWORK(SGEQRF), 7).
322*> -> .. an estimate of the scaled condition number of A is
323*> required (JOBA='E', 'G'). In this case, LWORK is the maximum
324*> of the above and N*N+4*N, i.e. LWORK >= max(2*M+N,N*N+4*N,7).
325*> ->> For optimal performance (blocked code) the optimal value
326*> is LWORK >= max(2*M+N,3*N+(N+1)*NB, N*N+4*N, 7).
327*> In general, the optimal length LWORK is computed as
328*> LWORK >= max(2*M+N,N+LWORK(SGEQP3),N+LWORK(SGEQRF),
329*> N+N*N+LWORK(SPOCON),7).
330*>
331*> If SIGMA and the right singular vectors are needed (JOBV = 'V'),
332*> -> the minimal requirement is LWORK >= max(2*M+N,4*N+1,7).
333*> -> For optimal performance, LWORK >= max(2*M+N,3*N+(N+1)*NB,7),
334*> where NB is the optimal block size for SGEQP3, SGEQRF, SGELQF,
335*> SORMLQ. In general, the optimal length LWORK is computed as
336*> LWORK >= max(2*M+N,N+LWORK(SGEQP3), N+LWORK(SPOCON),
337*> N+LWORK(SGELQF), 2*N+LWORK(SGEQRF), N+LWORK(SORMLQ)).
338*>
339*> If SIGMA and the left singular vectors are needed
340*> -> the minimal requirement is LWORK >= max(2*M+N,4*N+1,7).
341*> -> For optimal performance:
342*> if JOBU = 'U' :: LWORK >= max(2*M+N,3*N+(N+1)*NB,7),
343*> if JOBU = 'F' :: LWORK >= max(2*M+N,3*N+(N+1)*NB,N+M*NB,7),
344*> where NB is the optimal block size for SGEQP3, SGEQRF, SORMQR.
345*> In general, the optimal length LWORK is computed as
346*> LWORK >= max(2*M+N,N+LWORK(SGEQP3),N+LWORK(SPOCON),
347*> 2*N+LWORK(SGEQRF), N+LWORK(SORMQR)).
348*> Here LWORK(SORMQR) equals N*NB (for JOBU = 'U') or
349*> M*NB (for JOBU = 'F').
350*>
351*> If the full SVD is needed: (JOBU = 'U' or JOBU = 'F') and
352*> -> if JOBV = 'V'
353*> the minimal requirement is LWORK >= max(2*M+N,6*N+2*N*N).
354*> -> if JOBV = 'J' the minimal requirement is
355*> LWORK >= max(2*M+N, 4*N+N*N,2*N+N*N+6).
356*> -> For optimal performance, LWORK should be additionally
357*> larger than N+M*NB, where NB is the optimal block size
358*> for SORMQR.
359*> \endverbatim
360*>
361*> \param[out] IWORK
362*> \verbatim
363*> IWORK is INTEGER array, dimension (MAX(3,M+3*N)).
364*> On exit,
365*> IWORK(1) = the numerical rank determined after the initial
366*> QR factorization with pivoting. See the descriptions
367*> of JOBA and JOBR.
368*> IWORK(2) = the number of the computed nonzero singular values
369*> IWORK(3) = if nonzero, a warning message:
370*> If IWORK(3) = 1 then some of the column norms of A
371*> were denormalized floats. The requested high accuracy
372*> is not warranted by the data.
373*> \endverbatim
374*>
375*> \param[out] INFO
376*> \verbatim
377*> INFO is INTEGER
378*> < 0: if INFO = -i, then the i-th argument had an illegal value.
379*> = 0: successful exit;
380*> > 0: SGEJSV did not converge in the maximal allowed number
381*> of sweeps. The computed values may be inaccurate.
382*> \endverbatim
383*
384* Authors:
385* ========
386*
387*> \author Univ. of Tennessee
388*> \author Univ. of California Berkeley
389*> \author Univ. of Colorado Denver
390*> \author NAG Ltd.
391*
392*> \ingroup gejsv
393*
394*> \par Further Details:
395* =====================
396*>
397*> \verbatim
398*>
399*> SGEJSV implements a preconditioned Jacobi SVD algorithm. It uses SGEQP3,
400*> SGEQRF, and SGELQF as preprocessors and preconditioners. Optionally, an
401*> additional row pivoting can be used as a preprocessor, which in some
402*> cases results in much higher accuracy. An example is matrix A with the
403*> structure A = D1 * C * D2, where D1, D2 are arbitrarily ill-conditioned
404*> diagonal matrices and C is well-conditioned matrix. In that case, complete
405*> pivoting in the first QR factorizations provides accuracy dependent on the
406*> condition number of C, and independent of D1, D2. Such higher accuracy is
407*> not completely understood theoretically, but it works well in practice.
408*> Further, if A can be written as A = B*D, with well-conditioned B and some
409*> diagonal D, then the high accuracy is guaranteed, both theoretically and
410*> in software, independent of D. For more details see [1], [2].
411*> The computational range for the singular values can be the full range
412*> ( UNDERFLOW,OVERFLOW ), provided that the machine arithmetic and the BLAS
413*> & LAPACK routines called by SGEJSV are implemented to work in that range.
414*> If that is not the case, then the restriction for safe computation with
415*> the singular values in the range of normalized IEEE numbers is that the
416*> spectral condition number kappa(A)=sigma_max(A)/sigma_min(A) does not
417*> overflow. This code (SGEJSV) is best used in this restricted range,
418*> meaning that singular values of magnitude below ||A||_2 / SLAMCH('O') are
419*> returned as zeros. See JOBR for details on this.
420*> Further, this implementation is somewhat slower than the one described
421*> in [1,2] due to replacement of some non-LAPACK components, and because
422*> the choice of some tuning parameters in the iterative part (SGESVJ) is
423*> left to the implementer on a particular machine.
424*> The rank revealing QR factorization (in this code: SGEQP3) should be
425*> implemented as in [3]. We have a new version of SGEQP3 under development
426*> that is more robust than the current one in LAPACK, with a cleaner cut in
427*> rank deficient cases. It will be available in the SIGMA library [4].
428*> If M is much larger than N, it is obvious that the initial QRF with
429*> column pivoting can be preprocessed by the QRF without pivoting. That
430*> well known trick is not used in SGEJSV because in some cases heavy row
431*> weighting can be treated with complete pivoting. The overhead in cases
432*> M much larger than N is then only due to pivoting, but the benefits in
433*> terms of accuracy have prevailed. The implementer/user can incorporate
434*> this extra QRF step easily. The implementer can also improve data movement
435*> (matrix transpose, matrix copy, matrix transposed copy) - this
436*> implementation of SGEJSV uses only the simplest, naive data movement.
437*> \endverbatim
438*
439*> \par Contributors:
440* ==================
441*>
442*> Zlatko Drmac (Zagreb, Croatia) and Kresimir Veselic (Hagen, Germany)
443*
444*> \par References:
445* ================
446*>
447*> \verbatim
448*>
449*> [1] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm I.
450*> SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1322-1342.
451*> LAPACK Working note 169.
452*> [2] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm II.
453*> SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1343-1362.
454*> LAPACK Working note 170.
455*> [3] Z. Drmac and Z. Bujanovic: On the failure of rank-revealing QR
456*> factorization software - a case study.
457*> ACM Trans. Math. Softw. Vol. 35, No 2 (2008), pp. 1-28.
458*> LAPACK Working note 176.
459*> [4] Z. Drmac: SIGMA - mathematical software library for accurate SVD, PSV,
460*> QSVD, (H,K)-SVD computations.
461*> Department of Mathematics, University of Zagreb, 2008.
462*> \endverbatim
463*
464*> \par Bugs, examples and comments:
465* =================================
466*>
467*> Please report all bugs and send interesting examples and/or comments to
468*> drmac@math.hr. Thank you.
469*>
470* =====================================================================
471 SUBROUTINE sgejsv( JOBA, JOBU, JOBV, JOBR, JOBT, JOBP,
472 $ M, N, A, LDA, SVA, U, LDU, V, LDV,
473 $ WORK, LWORK, IWORK, INFO )
474*
475* -- LAPACK computational routine --
476* -- LAPACK is a software package provided by Univ. of Tennessee, --
477* -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
478*
479* .. Scalar Arguments ..
480 IMPLICIT NONE
481 INTEGER INFO, LDA, LDU, LDV, LWORK, M, N
482* ..
483* .. Array Arguments ..
484 REAL A( LDA, * ), SVA( N ), U( LDU, * ), V( LDV, * ),
485 $ WORK( LWORK )
486 INTEGER IWORK( * )
487 CHARACTER*1 JOBA, JOBP, JOBR, JOBT, JOBU, JOBV
488* ..
489*
490* ===========================================================================
491*
492* .. Local Parameters ..
493 REAL ZERO, ONE
494 PARAMETER ( ZERO = 0.0e0, one = 1.0e0 )
495* ..
496* .. Local Scalars ..
497 REAL AAPP, AAQQ, AATMAX, AATMIN, BIG, BIG1, COND_OK,
498 $ CONDR1, CONDR2, ENTRA, ENTRAT, EPSLN, MAXPRJ, SCALEM,
499 $ sconda, sfmin, small, temp1, uscal1, uscal2, xsc
500 INTEGER IERR, N1, NR, NUMRANK, p, q, WARNING
501 LOGICAL ALMORT, DEFR, ERREST, GOSCAL, JRACC, KILL, LSVEC,
502 $ l2aber, l2kill, l2pert, l2rank, l2tran,
503 $ noscal, rowpiv, rsvec, transp
504* ..
505* .. Intrinsic Functions ..
506 INTRINSIC abs, alog, max, min, float, nint, sign, sqrt
507* ..
508* .. External Functions ..
509 REAL SLAMCH, SNRM2
510 INTEGER ISAMAX
511 LOGICAL LSAME
512 EXTERNAL isamax, lsame, slamch, snrm2
513* ..
514* .. External Subroutines ..
515 EXTERNAL scopy, sgelqf, sgeqp3, sgeqrf, slacpy,
516 $ slascl,
519*
520 EXTERNAL sgesvj
521* ..
522*
523* Test the input arguments
524*
525 lsvec = lsame( jobu, 'U' ) .OR. lsame( jobu, 'F' )
526 jracc = lsame( jobv, 'J' )
527 rsvec = lsame( jobv, 'V' ) .OR. jracc
528 rowpiv = lsame( joba, 'F' ) .OR. lsame( joba, 'G' )
529 l2rank = lsame( joba, 'R' )
530 l2aber = lsame( joba, 'A' )
531 errest = lsame( joba, 'E' ) .OR. lsame( joba, 'G' )
532 l2tran = lsame( jobt, 'T' )
533 l2kill = lsame( jobr, 'R' )
534 defr = lsame( jobr, 'N' )
535 l2pert = lsame( jobp, 'P' )
536*
537 IF ( .NOT.(rowpiv .OR. l2rank .OR. l2aber .OR.
538 $ errest .OR. lsame( joba, 'C' ) )) THEN
539 info = - 1
540 ELSE IF ( .NOT.( lsvec .OR. lsame( jobu, 'N' ) .OR.
541 $ lsame( jobu, 'W' )) ) THEN
542 info = - 2
543 ELSE IF ( .NOT.( rsvec .OR. lsame( jobv, 'N' ) .OR.
544 $ lsame( jobv, 'W' )) .OR. ( jracc .AND. (.NOT.lsvec) ) ) THEN
545 info = - 3
546 ELSE IF ( .NOT. ( l2kill .OR. defr ) ) THEN
547 info = - 4
548 ELSE IF ( .NOT. ( l2tran .OR. lsame( jobt, 'N' ) ) ) THEN
549 info = - 5
550 ELSE IF ( .NOT. ( l2pert .OR. lsame( jobp, 'N' ) ) ) THEN
551 info = - 6
552 ELSE IF ( m .LT. 0 ) THEN
553 info = - 7
554 ELSE IF ( ( n .LT. 0 ) .OR. ( n .GT. m ) ) THEN
555 info = - 8
556 ELSE IF ( lda .LT. m ) THEN
557 info = - 10
558 ELSE IF ( lsvec .AND. ( ldu .LT. m ) ) THEN
559 info = - 13
560 ELSE IF ( rsvec .AND. ( ldv .LT. n ) ) THEN
561 info = - 15
562 ELSE IF ( (.NOT.(lsvec .OR. rsvec .OR. errest).AND.
563 $ (lwork .LT. max(7,4*n+1,2*m+n))) .OR.
564 $ (.NOT.(lsvec .OR. rsvec) .AND. errest .AND.
565 $ (lwork .LT. max(7,4*n+n*n,2*m+n))) .OR.
566 $ (lsvec .AND. (.NOT.rsvec) .AND. (lwork .LT. max(7,2*m+n,4*n+1)))
567 $ .OR.
568 $ (rsvec .AND. (.NOT.lsvec) .AND. (lwork .LT. max(7,2*m+n,4*n+1)))
569 $ .OR.
570 $ (lsvec .AND. rsvec .AND. (.NOT.jracc) .AND.
571 $ (lwork.LT.max(2*m+n,6*n+2*n*n)))
572 $ .OR. (lsvec .AND. rsvec .AND. jracc .AND.
573 $ lwork.LT.max(2*m+n,4*n+n*n,2*n+n*n+6)))
574 $ THEN
575 info = - 17
576 ELSE
577* #:)
578 info = 0
579 END IF
580*
581 IF ( info .NE. 0 ) THEN
582* #:(
583 CALL xerbla( 'SGEJSV', - info )
584 RETURN
585 END IF
586*
587* Quick return for void matrix (Y3K safe)
588* #:)
589 IF ( ( m .EQ. 0 ) .OR. ( n .EQ. 0 ) ) THEN
590 iwork(1:3) = 0
591 work(1:7) = 0
592 RETURN
593 ENDIF
594*
595* Determine whether the matrix U should be M x N or M x M
596*
597 IF ( lsvec ) THEN
598 n1 = n
599 IF ( lsame( jobu, 'F' ) ) n1 = m
600 END IF
601*
602* Set numerical parameters
603*
604*! NOTE: Make sure SLAMCH() does not fail on the target architecture.
605*
606 epsln = slamch('Epsilon')
607 sfmin = slamch('SafeMinimum')
608 small = sfmin / epsln
609 big = slamch('O')
610* BIG = ONE / SFMIN
611*
612* Initialize SVA(1:N) = diag( ||A e_i||_2 )_1^N
613*
614*(!) If necessary, scale SVA() to protect the largest norm from
615* overflow. It is possible that this scaling pushes the smallest
616* column norm left from the underflow threshold (extreme case).
617*
618 scalem = one / sqrt(float(m)*float(n))
619 noscal = .true.
620 goscal = .true.
621 DO 1874 p = 1, n
622 aapp = zero
623 aaqq = one
624 CALL slassq( m, a(1,p), 1, aapp, aaqq )
625 IF ( aapp .GT. big ) THEN
626 info = - 9
627 CALL xerbla( 'SGEJSV', -info )
628 RETURN
629 END IF
630 aaqq = sqrt(aaqq)
631 IF ( ( aapp .LT. (big / aaqq) ) .AND. noscal ) THEN
632 sva(p) = aapp * aaqq
633 ELSE
634 noscal = .false.
635 sva(p) = aapp * ( aaqq * scalem )
636 IF ( goscal ) THEN
637 goscal = .false.
638 CALL sscal( p-1, scalem, sva, 1 )
639 END IF
640 END IF
641 1874 CONTINUE
642*
643 IF ( noscal ) scalem = one
644*
645 aapp = zero
646 aaqq = big
647 DO 4781 p = 1, n
648 aapp = max( aapp, sva(p) )
649 IF ( sva(p) .NE. zero ) aaqq = min( aaqq, sva(p) )
650 4781 CONTINUE
651*
652* Quick return for zero M x N matrix
653* #:)
654 IF ( aapp .EQ. zero ) THEN
655 IF ( lsvec ) CALL slaset( 'G', m, n1, zero, one, u, ldu )
656 IF ( rsvec ) CALL slaset( 'G', n, n, zero, one, v, ldv )
657 work(1) = one
658 work(2) = one
659 IF ( errest ) work(3) = one
660 IF ( lsvec .AND. rsvec ) THEN
661 work(4) = one
662 work(5) = one
663 END IF
664 IF ( l2tran ) THEN
665 work(6) = zero
666 work(7) = zero
667 END IF
668 iwork(1) = 0
669 iwork(2) = 0
670 iwork(3) = 0
671 RETURN
672 END IF
673*
674* Issue warning if denormalized column norms detected. Override the
675* high relative accuracy request. Issue licence to kill columns
676* (set them to zero) whose norm is less than sigma_max / BIG (roughly).
677* #:(
678 warning = 0
679 IF ( aaqq .LE. sfmin ) THEN
680 l2rank = .true.
681 l2kill = .true.
682 warning = 1
683 END IF
684*
685* Quick return for one-column matrix
686* #:)
687 IF ( n .EQ. 1 ) THEN
688*
689 IF ( lsvec ) THEN
690 CALL slascl( 'G',0,0,sva(1),scalem, m,1,a(1,1),lda,ierr )
691 CALL slacpy( 'A', m, 1, a, lda, u, ldu )
692* computing all M left singular vectors of the M x 1 matrix
693 IF ( n1 .NE. n ) THEN
694 CALL sgeqrf( m, n, u,ldu, work, work(n+1),lwork-n,
695 $ ierr )
696 CALL sorgqr( m,n1,1, u,ldu,work,work(n+1),lwork-n,
697 $ ierr )
698 CALL scopy( m, a(1,1), 1, u(1,1), 1 )
699 END IF
700 END IF
701 IF ( rsvec ) THEN
702 v(1,1) = one
703 END IF
704 IF ( sva(1) .LT. (big*scalem) ) THEN
705 sva(1) = sva(1) / scalem
706 scalem = one
707 END IF
708 work(1) = one / scalem
709 work(2) = one
710 IF ( sva(1) .NE. zero ) THEN
711 iwork(1) = 1
712 IF ( ( sva(1) / scalem) .GE. sfmin ) THEN
713 iwork(2) = 1
714 ELSE
715 iwork(2) = 0
716 END IF
717 ELSE
718 iwork(1) = 0
719 iwork(2) = 0
720 END IF
721 iwork(3) = 0
722 IF ( errest ) work(3) = one
723 IF ( lsvec .AND. rsvec ) THEN
724 work(4) = one
725 work(5) = one
726 END IF
727 IF ( l2tran ) THEN
728 work(6) = zero
729 work(7) = zero
730 END IF
731 RETURN
732*
733 END IF
734*
735 transp = .false.
736 l2tran = l2tran .AND. ( m .EQ. n )
737*
738 aatmax = -one
739 aatmin = big
740 IF ( rowpiv .OR. l2tran ) THEN
741*
742* Compute the row norms, needed to determine row pivoting sequence
743* (in the case of heavily row weighted A, row pivoting is strongly
744* advised) and to collect information needed to compare the
745* structures of A * A^t and A^t * A (in the case L2TRAN.EQ..TRUE.).
746*
747 IF ( l2tran ) THEN
748 DO 1950 p = 1, m
749 xsc = zero
750 temp1 = one
751 CALL slassq( n, a(p,1), lda, xsc, temp1 )
752* SLASSQ gets both the ell_2 and the ell_infinity norm
753* in one pass through the vector
754 work(m+n+p) = xsc * scalem
755 work(n+p) = xsc * (scalem*sqrt(temp1))
756 aatmax = max( aatmax, work(n+p) )
757 IF (work(n+p) .NE. zero) aatmin = min(aatmin,work(n+p))
758 1950 CONTINUE
759 ELSE
760 DO 1904 p = 1, m
761 work(m+n+p) = scalem*abs( a(p,isamax(n,a(p,1),lda)) )
762 aatmax = max( aatmax, work(m+n+p) )
763 aatmin = min( aatmin, work(m+n+p) )
764 1904 CONTINUE
765 END IF
766*
767 END IF
768*
769* For square matrix A try to determine whether A^t would be better
770* input for the preconditioned Jacobi SVD, with faster convergence.
771* The decision is based on an O(N) function of the vector of column
772* and row norms of A, based on the Shannon entropy. This should give
773* the right choice in most cases when the difference actually matters.
774* It may fail and pick the slower converging side.
775*
776 entra = zero
777 entrat = zero
778 IF ( l2tran ) THEN
779*
780 xsc = zero
781 temp1 = one
782 CALL slassq( n, sva, 1, xsc, temp1 )
783 temp1 = one / temp1
784*
785 entra = zero
786 DO 1113 p = 1, n
787 big1 = ( ( sva(p) / xsc )**2 ) * temp1
788 IF ( big1 .NE. zero ) entra = entra + big1 * alog(big1)
789 1113 CONTINUE
790 entra = - entra / alog(float(n))
791*
792* Now, SVA().^2/Trace(A^t * A) is a point in the probability simplex.
793* It is derived from the diagonal of A^t * A. Do the same with the
794* diagonal of A * A^t, compute the entropy of the corresponding
795* probability distribution. Note that A * A^t and A^t * A have the
796* same trace.
797*
798 entrat = zero
799 DO 1114 p = n+1, n+m
800 big1 = ( ( work(p) / xsc )**2 ) * temp1
801 IF ( big1 .NE. zero ) entrat = entrat + big1 * alog(big1)
802 1114 CONTINUE
803 entrat = - entrat / alog(float(m))
804*
805* Analyze the entropies and decide A or A^t. Smaller entropy
806* usually means better input for the algorithm.
807*
808 transp = ( entrat .LT. entra )
809*
810* If A^t is better than A, transpose A.
811*
812 IF ( transp ) THEN
813* In an optimal implementation, this trivial transpose
814* should be replaced with faster transpose.
815 DO 1115 p = 1, n - 1
816 DO 1116 q = p + 1, n
817 temp1 = a(q,p)
818 a(q,p) = a(p,q)
819 a(p,q) = temp1
820 1116 CONTINUE
821 1115 CONTINUE
822 DO 1117 p = 1, n
823 work(m+n+p) = sva(p)
824 sva(p) = work(n+p)
825 1117 CONTINUE
826 temp1 = aapp
827 aapp = aatmax
828 aatmax = temp1
829 temp1 = aaqq
830 aaqq = aatmin
831 aatmin = temp1
832 kill = lsvec
833 lsvec = rsvec
834 rsvec = kill
835 IF ( lsvec ) n1 = n
836*
837 rowpiv = .true.
838 END IF
839*
840 END IF
841* END IF L2TRAN
842*
843* Scale the matrix so that its maximal singular value remains less
844* than SQRT(BIG) -- the matrix is scaled so that its maximal column
845* has Euclidean norm equal to SQRT(BIG/N). The only reason to keep
846* SQRT(BIG) instead of BIG is the fact that SGEJSV uses LAPACK and
847* BLAS routines that, in some implementations, are not capable of
848* working in the full interval [SFMIN,BIG] and that they may provoke
849* overflows in the intermediate results. If the singular values spread
850* from SFMIN to BIG, then SGESVJ will compute them. So, in that case,
851* one should use SGESVJ instead of SGEJSV.
852*
853 big1 = sqrt( big )
854 temp1 = sqrt( big / float(n) )
855*
856 CALL slascl( 'G', 0, 0, aapp, temp1, n, 1, sva, n, ierr )
857 IF ( aaqq .GT. (aapp * sfmin) ) THEN
858 aaqq = ( aaqq / aapp ) * temp1
859 ELSE
860 aaqq = ( aaqq * temp1 ) / aapp
861 END IF
862 temp1 = temp1 * scalem
863 CALL slascl( 'G', 0, 0, aapp, temp1, m, n, a, lda, ierr )
864*
865* To undo scaling at the end of this procedure, multiply the
866* computed singular values with USCAL2 / USCAL1.
867*
868 uscal1 = temp1
869 uscal2 = aapp
870*
871 IF ( l2kill ) THEN
872* L2KILL enforces computation of nonzero singular values in
873* the restricted range of condition number of the initial A,
874* sigma_max(A) / sigma_min(A) approx. SQRT(BIG)/SQRT(SFMIN).
875 xsc = sqrt( sfmin )
876 ELSE
877 xsc = small
878*
879* Now, if the condition number of A is too big,
880* sigma_max(A) / sigma_min(A) .GT. SQRT(BIG/N) * EPSLN / SFMIN,
881* as a precaution measure, the full SVD is computed using SGESVJ
882* with accumulated Jacobi rotations. This provides numerically
883* more robust computation, at the cost of slightly increased run
884* time. Depending on the concrete implementation of BLAS and LAPACK
885* (i.e. how they behave in presence of extreme ill-conditioning) the
886* implementor may decide to remove this switch.
887 IF ( ( aaqq.LT.sqrt(sfmin) ) .AND. lsvec .AND. rsvec ) THEN
888 jracc = .true.
889 END IF
890*
891 END IF
892 IF ( aaqq .LT. xsc ) THEN
893 DO 700 p = 1, n
894 IF ( sva(p) .LT. xsc ) THEN
895 CALL slaset( 'A', m, 1, zero, zero, a(1,p), lda )
896 sva(p) = zero
897 END IF
898 700 CONTINUE
899 END IF
900*
901* Preconditioning using QR factorization with pivoting
902*
903 IF ( rowpiv ) THEN
904* Optional row permutation (Bjoerck row pivoting):
905* A result by Cox and Higham shows that the Bjoerck's
906* row pivoting combined with standard column pivoting
907* has similar effect as Powell-Reid complete pivoting.
908* The ell-infinity norms of A are made nonincreasing.
909 DO 1952 p = 1, m - 1
910 q = isamax( m-p+1, work(m+n+p), 1 ) + p - 1
911 iwork(2*n+p) = q
912 IF ( p .NE. q ) THEN
913 temp1 = work(m+n+p)
914 work(m+n+p) = work(m+n+q)
915 work(m+n+q) = temp1
916 END IF
917 1952 CONTINUE
918 CALL slaswp( n, a, lda, 1, m-1, iwork(2*n+1), 1 )
919 END IF
920*
921* End of the preparation phase (scaling, optional sorting and
922* transposing, optional flushing of small columns).
923*
924* Preconditioning
925*
926* If the full SVD is needed, the right singular vectors are computed
927* from a matrix equation, and for that we need theoretical analysis
928* of the Businger-Golub pivoting. So we use SGEQP3 as the first RR QRF.
929* In all other cases the first RR QRF can be chosen by other criteria
930* (eg speed by replacing global with restricted window pivoting, such
931* as in SGEQPX from TOMS # 782). Good results will be obtained using
932* SGEQPX with properly (!) chosen numerical parameters.
933* Any improvement of SGEQP3 improves overall performance of SGEJSV.
934*
935* A * P1 = Q1 * [ R1^t 0]^t:
936 DO 1963 p = 1, n
937* .. all columns are free columns
938 iwork(p) = 0
939 1963 CONTINUE
940 CALL sgeqp3( m,n,a,lda, iwork,work, work(n+1),lwork-n, ierr )
941*
942* The upper triangular matrix R1 from the first QRF is inspected for
943* rank deficiency and possibilities for deflation, or possible
944* ill-conditioning. Depending on the user specified flag L2RANK,
945* the procedure explores possibilities to reduce the numerical
946* rank by inspecting the computed upper triangular factor. If
947* L2RANK or L2ABER are up, then SGEJSV will compute the SVD of
948* A + dA, where ||dA|| <= f(M,N)*EPSLN.
949*
950 nr = 1
951 IF ( l2aber ) THEN
952* Standard absolute error bound suffices. All sigma_i with
953* sigma_i < N*EPSLN*||A|| are flushed to zero. This is an
954* aggressive enforcement of lower numerical rank by introducing a
955* backward error of the order of N*EPSLN*||A||.
956 temp1 = sqrt(float(n))*epsln
957 DO 3001 p = 2, n
958 IF ( abs(a(p,p)) .GE. (temp1*abs(a(1,1))) ) THEN
959 nr = nr + 1
960 ELSE
961 GO TO 3002
962 END IF
963 3001 CONTINUE
964 3002 CONTINUE
965 ELSE IF ( l2rank ) THEN
966* .. similarly as above, only slightly more gentle (less aggressive).
967* Sudden drop on the diagonal of R1 is used as the criterion for
968* close-to-rank-deficient.
969 temp1 = sqrt(sfmin)
970 DO 3401 p = 2, n
971 IF ( ( abs(a(p,p)) .LT. (epsln*abs(a(p-1,p-1))) ) .OR.
972 $ ( abs(a(p,p)) .LT. small ) .OR.
973 $ ( l2kill .AND. (abs(a(p,p)) .LT. temp1) ) ) GO TO 3402
974 nr = nr + 1
975 3401 CONTINUE
976 3402 CONTINUE
977*
978 ELSE
979* The goal is high relative accuracy. However, if the matrix
980* has high scaled condition number the relative accuracy is in
981* general not feasible. Later on, a condition number estimator
982* will be deployed to estimate the scaled condition number.
983* Here we just remove the underflowed part of the triangular
984* factor. This prevents the situation in which the code is
985* working hard to get the accuracy not warranted by the data.
986 temp1 = sqrt(sfmin)
987 DO 3301 p = 2, n
988 IF ( ( abs(a(p,p)) .LT. small ) .OR.
989 $ ( l2kill .AND. (abs(a(p,p)) .LT. temp1) ) ) GO TO 3302
990 nr = nr + 1
991 3301 CONTINUE
992 3302 CONTINUE
993*
994 END IF
995*
996 almort = .false.
997 IF ( nr .EQ. n ) THEN
998 maxprj = one
999 DO 3051 p = 2, n
1000 temp1 = abs(a(p,p)) / sva(iwork(p))
1001 maxprj = min( maxprj, temp1 )
1002 3051 CONTINUE
1003 IF ( maxprj**2 .GE. one - float(n)*epsln ) almort = .true.
1004 END IF
1005*
1006*
1007 sconda = - one
1008 condr1 = - one
1009 condr2 = - one
1010*
1011 IF ( errest ) THEN
1012 IF ( n .EQ. nr ) THEN
1013 IF ( rsvec ) THEN
1014* .. V is available as workspace
1015 CALL slacpy( 'U', n, n, a, lda, v, ldv )
1016 DO 3053 p = 1, n
1017 temp1 = sva(iwork(p))
1018 CALL sscal( p, one/temp1, v(1,p), 1 )
1019 3053 CONTINUE
1020 CALL spocon( 'U', n, v, ldv, one, temp1,
1021 $ work(n+1), iwork(2*n+m+1), ierr )
1022 ELSE IF ( lsvec ) THEN
1023* .. U is available as workspace
1024 CALL slacpy( 'U', n, n, a, lda, u, ldu )
1025 DO 3054 p = 1, n
1026 temp1 = sva(iwork(p))
1027 CALL sscal( p, one/temp1, u(1,p), 1 )
1028 3054 CONTINUE
1029 CALL spocon( 'U', n, u, ldu, one, temp1,
1030 $ work(n+1), iwork(2*n+m+1), ierr )
1031 ELSE
1032 CALL slacpy( 'U', n, n, a, lda, work(n+1), n )
1033 DO 3052 p = 1, n
1034 temp1 = sva(iwork(p))
1035 CALL sscal( p, one/temp1, work(n+(p-1)*n+1), 1 )
1036 3052 CONTINUE
1037* .. the columns of R are scaled to have unit Euclidean lengths.
1038 CALL spocon( 'U', n, work(n+1), n, one, temp1,
1039 $ work(n+n*n+1), iwork(2*n+m+1), ierr )
1040 END IF
1041 sconda = one / sqrt(temp1)
1042* SCONDA is an estimate of SQRT(||(R^t * R)^(-1)||_1).
1043* N^(-1/4) * SCONDA <= ||R^(-1)||_2 <= N^(1/4) * SCONDA
1044 ELSE
1045 sconda = - one
1046 END IF
1047 END IF
1048*
1049 l2pert = l2pert .AND. ( abs( a(1,1)/a(nr,nr) ) .GT. sqrt(big1) )
1050* If there is no violent scaling, artificial perturbation is not needed.
1051*
1052* Phase 3:
1053*
1054 IF ( .NOT. ( rsvec .OR. lsvec ) ) THEN
1055*
1056* Singular Values only
1057*
1058* .. transpose A(1:NR,1:N)
1059 DO 1946 p = 1, min( n-1, nr )
1060 CALL scopy( n-p, a(p,p+1), lda, a(p+1,p), 1 )
1061 1946 CONTINUE
1062*
1063* The following two DO-loops introduce small relative perturbation
1064* into the strict upper triangle of the lower triangular matrix.
1065* Small entries below the main diagonal are also changed.
1066* This modification is useful if the computing environment does not
1067* provide/allow FLUSH TO ZERO underflow, for it prevents many
1068* annoying denormalized numbers in case of strongly scaled matrices.
1069* The perturbation is structured so that it does not introduce any
1070* new perturbation of the singular values, and it does not destroy
1071* the job done by the preconditioner.
1072* The licence for this perturbation is in the variable L2PERT, which
1073* should be .FALSE. if FLUSH TO ZERO underflow is active.
1074*
1075 IF ( .NOT. almort ) THEN
1076*
1077 IF ( l2pert ) THEN
1078* XSC = SQRT(SMALL)
1079 xsc = epsln / float(n)
1080 DO 4947 q = 1, nr
1081 temp1 = xsc*abs(a(q,q))
1082 DO 4949 p = 1, n
1083 IF ( ( (p.GT.q) .AND. (abs(a(p,q)).LE.temp1) )
1084 $ .OR. ( p .LT. q ) )
1085 $ a(p,q) = sign( temp1, a(p,q) )
1086 4949 CONTINUE
1087 4947 CONTINUE
1088 ELSE
1089 CALL slaset( 'U', nr-1,nr-1, zero,zero, a(1,2),lda )
1090 END IF
1091*
1092* .. second preconditioning using the QR factorization
1093*
1094 CALL sgeqrf( n,nr, a,lda, work, work(n+1),lwork-n, ierr )
1095*
1096* .. and transpose upper to lower triangular
1097 DO 1948 p = 1, nr - 1
1098 CALL scopy( nr-p, a(p,p+1), lda, a(p+1,p), 1 )
1099 1948 CONTINUE
1100*
1101 END IF
1102*
1103* Row-cyclic Jacobi SVD algorithm with column pivoting
1104*
1105* .. again some perturbation (a "background noise") is added
1106* to drown denormals
1107 IF ( l2pert ) THEN
1108* XSC = SQRT(SMALL)
1109 xsc = epsln / float(n)
1110 DO 1947 q = 1, nr
1111 temp1 = xsc*abs(a(q,q))
1112 DO 1949 p = 1, nr
1113 IF ( ( (p.GT.q) .AND. (abs(a(p,q)).LE.temp1) )
1114 $ .OR. ( p .LT. q ) )
1115 $ a(p,q) = sign( temp1, a(p,q) )
1116 1949 CONTINUE
1117 1947 CONTINUE
1118 ELSE
1119 CALL slaset( 'U', nr-1, nr-1, zero, zero, a(1,2),
1120 $ lda )
1121 END IF
1122*
1123* .. and one-sided Jacobi rotations are started on a lower
1124* triangular matrix (plus perturbation which is ignored in
1125* the part which destroys triangular form (confusing?!))
1126*
1127 CALL sgesvj( 'L', 'NoU', 'NoV', nr, nr, a, lda, sva,
1128 $ n, v, ldv, work, lwork, info )
1129*
1130 scalem = work(1)
1131 numrank = nint(work(2))
1132*
1133*
1134 ELSE IF ( rsvec .AND. ( .NOT. lsvec ) ) THEN
1135*
1136* -> Singular Values and Right Singular Vectors <-
1137*
1138 IF ( almort ) THEN
1139*
1140* .. in this case NR equals N
1141 DO 1998 p = 1, nr
1142 CALL scopy( n-p+1, a(p,p), lda, v(p,p), 1 )
1143 1998 CONTINUE
1144 CALL slaset( 'Upper', nr-1, nr-1, zero, zero, v(1,2),
1145 $ ldv )
1146*
1147 CALL sgesvj( 'L','U','N', n, nr, v,ldv, sva, nr, a,lda,
1148 $ work, lwork, info )
1149 scalem = work(1)
1150 numrank = nint(work(2))
1151
1152 ELSE
1153*
1154* .. two more QR factorizations ( one QRF is not enough, two require
1155* accumulated product of Jacobi rotations, three are perfect )
1156*
1157 CALL slaset( 'Lower', nr-1, nr-1, zero, zero, a(2,1),
1158 $ lda )
1159 CALL sgelqf( nr, n, a, lda, work, work(n+1), lwork-n,
1160 $ ierr)
1161 CALL slacpy( 'Lower', nr, nr, a, lda, v, ldv )
1162 CALL slaset( 'Upper', nr-1, nr-1, zero, zero, v(1,2),
1163 $ ldv )
1164 CALL sgeqrf( nr, nr, v, ldv, work(n+1), work(2*n+1),
1165 $ lwork-2*n, ierr )
1166 DO 8998 p = 1, nr
1167 CALL scopy( nr-p+1, v(p,p), ldv, v(p,p), 1 )
1168 8998 CONTINUE
1169 CALL slaset( 'Upper', nr-1, nr-1, zero, zero, v(1,2),
1170 $ ldv )
1171*
1172 CALL sgesvj( 'Lower', 'U','N', nr, nr, v,ldv, sva, nr, u,
1173 $ ldu, work(n+1), lwork-n, info )
1174 scalem = work(n+1)
1175 numrank = nint(work(n+2))
1176 IF ( nr .LT. n ) THEN
1177 CALL slaset( 'A',n-nr, nr, zero,zero, v(nr+1,1),
1178 $ ldv )
1179 CALL slaset( 'A',nr, n-nr, zero,zero, v(1,nr+1),
1180 $ ldv )
1181 CALL slaset( 'A',n-nr,n-nr,zero,one, v(nr+1,nr+1),
1182 $ ldv )
1183 END IF
1184*
1185 CALL sormlq( 'Left', 'Transpose', n, n, nr, a, lda, work,
1186 $ v, ldv, work(n+1), lwork-n, ierr )
1187*
1188 END IF
1189*
1190 DO 8991 p = 1, n
1191 CALL scopy( n, v(p,1), ldv, a(iwork(p),1), lda )
1192 8991 CONTINUE
1193 CALL slacpy( 'All', n, n, a, lda, v, ldv )
1194*
1195 IF ( transp ) THEN
1196 CALL slacpy( 'All', n, n, v, ldv, u, ldu )
1197 END IF
1198*
1199 ELSE IF ( lsvec .AND. ( .NOT. rsvec ) ) THEN
1200*
1201* .. Singular Values and Left Singular Vectors ..
1202*
1203* .. second preconditioning step to avoid need to accumulate
1204* Jacobi rotations in the Jacobi iterations.
1205 DO 1965 p = 1, nr
1206 CALL scopy( n-p+1, a(p,p), lda, u(p,p), 1 )
1207 1965 CONTINUE
1208 CALL slaset( 'Upper', nr-1, nr-1, zero, zero, u(1,2), ldu )
1209*
1210 CALL sgeqrf( n, nr, u, ldu, work(n+1), work(2*n+1),
1211 $ lwork-2*n, ierr )
1212*
1213 DO 1967 p = 1, nr - 1
1214 CALL scopy( nr-p, u(p,p+1), ldu, u(p+1,p), 1 )
1215 1967 CONTINUE
1216 CALL slaset( 'Upper', nr-1, nr-1, zero, zero, u(1,2), ldu )
1217*
1218 CALL sgesvj( 'Lower', 'U', 'N', nr,nr, u, ldu, sva, nr, a,
1219 $ lda, work(n+1), lwork-n, info )
1220 scalem = work(n+1)
1221 numrank = nint(work(n+2))
1222*
1223 IF ( nr .LT. m ) THEN
1224 CALL slaset( 'A', m-nr, nr,zero, zero, u(nr+1,1), ldu )
1225 IF ( nr .LT. n1 ) THEN
1226 CALL slaset( 'A',nr, n1-nr, zero, zero, u(1,nr+1),
1227 $ ldu )
1228 CALL slaset( 'A',m-nr,n1-nr,zero,one,u(nr+1,nr+1),
1229 $ ldu )
1230 END IF
1231 END IF
1232*
1233 CALL sormqr( 'Left', 'No Tr', m, n1, n, a, lda, work, u,
1234 $ ldu, work(n+1), lwork-n, ierr )
1235*
1236 IF ( rowpiv )
1237 $ CALL slaswp( n1, u, ldu, 1, m-1, iwork(2*n+1), -1 )
1238*
1239 DO 1974 p = 1, n1
1240 xsc = one / snrm2( m, u(1,p), 1 )
1241 CALL sscal( m, xsc, u(1,p), 1 )
1242 1974 CONTINUE
1243*
1244 IF ( transp ) THEN
1245 CALL slacpy( 'All', n, n, u, ldu, v, ldv )
1246 END IF
1247*
1248 ELSE
1249*
1250* .. Full SVD ..
1251*
1252 IF ( .NOT. jracc ) THEN
1253*
1254 IF ( .NOT. almort ) THEN
1255*
1256* Second Preconditioning Step (QRF [with pivoting])
1257* Note that the composition of TRANSPOSE, QRF and TRANSPOSE is
1258* equivalent to an LQF CALL. Since in many libraries the QRF
1259* seems to be better optimized than the LQF, we do explicit
1260* transpose and use the QRF. This is subject to changes in an
1261* optimized implementation of SGEJSV.
1262*
1263 DO 1968 p = 1, nr
1264 CALL scopy( n-p+1, a(p,p), lda, v(p,p), 1 )
1265 1968 CONTINUE
1266*
1267* .. the following two loops perturb small entries to avoid
1268* denormals in the second QR factorization, where they are
1269* as good as zeros. This is done to avoid painfully slow
1270* computation with denormals. The relative size of the perturbation
1271* is a parameter that can be changed by the implementer.
1272* This perturbation device will be obsolete on machines with
1273* properly implemented arithmetic.
1274* To switch it off, set L2PERT=.FALSE. To remove it from the
1275* code, remove the action under L2PERT=.TRUE., leave the ELSE part.
1276* The following two loops should be blocked and fused with the
1277* transposed copy above.
1278*
1279 IF ( l2pert ) THEN
1280 xsc = sqrt(small)
1281 DO 2969 q = 1, nr
1282 temp1 = xsc*abs( v(q,q) )
1283 DO 2968 p = 1, n
1284 IF ( ( p .GT. q ) .AND. ( abs(v(p,q)) .LE. temp1 )
1285 $ .OR. ( p .LT. q ) )
1286 $ v(p,q) = sign( temp1, v(p,q) )
1287 IF ( p .LT. q ) v(p,q) = - v(p,q)
1288 2968 CONTINUE
1289 2969 CONTINUE
1290 ELSE
1291 CALL slaset( 'U', nr-1, nr-1, zero, zero, v(1,2),
1292 $ ldv )
1293 END IF
1294*
1295* Estimate the row scaled condition number of R1
1296* (If R1 is rectangular, N > NR, then the condition number
1297* of the leading NR x NR submatrix is estimated.)
1298*
1299 CALL slacpy( 'L', nr, nr, v, ldv, work(2*n+1), nr )
1300 DO 3950 p = 1, nr
1301 temp1 = snrm2(nr-p+1,work(2*n+(p-1)*nr+p),1)
1302 CALL sscal(nr-p+1,one/temp1,work(2*n+(p-1)*nr+p),1)
1303 3950 CONTINUE
1304 CALL spocon('Lower',nr,work(2*n+1),nr,one,temp1,
1305 $ work(2*n+nr*nr+1),iwork(m+2*n+1),ierr)
1306 condr1 = one / sqrt(temp1)
1307* .. here need a second opinion on the condition number
1308* .. then assume worst case scenario
1309* R1 is OK for inverse <=> CONDR1 .LT. FLOAT(N)
1310* more conservative <=> CONDR1 .LT. SQRT(FLOAT(N))
1311*
1312 cond_ok = sqrt(float(nr))
1313*[TP] COND_OK is a tuning parameter.
1314
1315 IF ( condr1 .LT. cond_ok ) THEN
1316* .. the second QRF without pivoting. Note: in an optimized
1317* implementation, this QRF should be implemented as the QRF
1318* of a lower triangular matrix.
1319* R1^t = Q2 * R2
1320 CALL sgeqrf( n, nr, v, ldv, work(n+1), work(2*n+1),
1321 $ lwork-2*n, ierr )
1322*
1323 IF ( l2pert ) THEN
1324 xsc = sqrt(small)/epsln
1325 DO 3959 p = 2, nr
1326 DO 3958 q = 1, p - 1
1327 temp1 = xsc * min(abs(v(p,p)),abs(v(q,q)))
1328 IF ( abs(v(q,p)) .LE. temp1 )
1329 $ v(q,p) = sign( temp1, v(q,p) )
1330 3958 CONTINUE
1331 3959 CONTINUE
1332 END IF
1333*
1334 IF ( nr .NE. n )
1335 $ CALL slacpy( 'A', n, nr, v, ldv, work(2*n+1), n )
1336* .. save ...
1337*
1338* .. this transposed copy should be better than naive
1339 DO 1969 p = 1, nr - 1
1340 CALL scopy( nr-p, v(p,p+1), ldv, v(p+1,p), 1 )
1341 1969 CONTINUE
1342*
1343 condr2 = condr1
1344*
1345 ELSE
1346*
1347* .. ill-conditioned case: second QRF with pivoting
1348* Note that windowed pivoting would be equally good
1349* numerically, and more run-time efficient. So, in
1350* an optimal implementation, the next call to SGEQP3
1351* should be replaced with eg. CALL SGEQPX (ACM TOMS #782)
1352* with properly (carefully) chosen parameters.
1353*
1354* R1^t * P2 = Q2 * R2
1355 DO 3003 p = 1, nr
1356 iwork(n+p) = 0
1357 3003 CONTINUE
1358 CALL sgeqp3( n, nr, v, ldv, iwork(n+1), work(n+1),
1359 $ work(2*n+1), lwork-2*n, ierr )
1360** CALL SGEQRF( N, NR, V, LDV, WORK(N+1), WORK(2*N+1),
1361** $ LWORK-2*N, IERR )
1362 IF ( l2pert ) THEN
1363 xsc = sqrt(small)
1364 DO 3969 p = 2, nr
1365 DO 3968 q = 1, p - 1
1366 temp1 = xsc * min(abs(v(p,p)),abs(v(q,q)))
1367 IF ( abs(v(q,p)) .LE. temp1 )
1368 $ v(q,p) = sign( temp1, v(q,p) )
1369 3968 CONTINUE
1370 3969 CONTINUE
1371 END IF
1372*
1373 CALL slacpy( 'A', n, nr, v, ldv, work(2*n+1), n )
1374*
1375 IF ( l2pert ) THEN
1376 xsc = sqrt(small)
1377 DO 8970 p = 2, nr
1378 DO 8971 q = 1, p - 1
1379 temp1 = xsc * min(abs(v(p,p)),abs(v(q,q)))
1380 v(p,q) = - sign( temp1, v(q,p) )
1381 8971 CONTINUE
1382 8970 CONTINUE
1383 ELSE
1384 CALL slaset( 'L',nr-1,nr-1,zero,zero,v(2,1),ldv )
1385 END IF
1386* Now, compute R2 = L3 * Q3, the LQ factorization.
1387 CALL sgelqf( nr, nr, v, ldv, work(2*n+n*nr+1),
1388 $ work(2*n+n*nr+nr+1), lwork-2*n-n*nr-nr, ierr )
1389* .. and estimate the condition number
1390 CALL slacpy( 'L',nr,nr,v,ldv,work(2*n+n*nr+nr+1),nr )
1391 DO 4950 p = 1, nr
1392 temp1 = snrm2( p, work(2*n+n*nr+nr+p), nr )
1393 CALL sscal( p, one/temp1, work(2*n+n*nr+nr+p), nr )
1394 4950 CONTINUE
1395 CALL spocon( 'L',nr,work(2*n+n*nr+nr+1),nr,one,temp1,
1396 $ work(2*n+n*nr+nr+nr*nr+1),iwork(m+2*n+1),ierr )
1397 condr2 = one / sqrt(temp1)
1398*
1399 IF ( condr2 .GE. cond_ok ) THEN
1400* .. save the Householder vectors used for Q3
1401* (this overwrites the copy of R2, as it will not be
1402* needed in this branch, but it does not overwrite the
1403* Huseholder vectors of Q2.).
1404 CALL slacpy( 'U', nr, nr, v, ldv, work(2*n+1), n )
1405* .. and the rest of the information on Q3 is in
1406* WORK(2*N+N*NR+1:2*N+N*NR+N)
1407 END IF
1408*
1409 END IF
1410*
1411 IF ( l2pert ) THEN
1412 xsc = sqrt(small)
1413 DO 4968 q = 2, nr
1414 temp1 = xsc * v(q,q)
1415 DO 4969 p = 1, q - 1
1416* V(p,q) = - SIGN( TEMP1, V(q,p) )
1417 v(p,q) = - sign( temp1, v(p,q) )
1418 4969 CONTINUE
1419 4968 CONTINUE
1420 ELSE
1421 CALL slaset( 'U', nr-1,nr-1, zero,zero, v(1,2), ldv )
1422 END IF
1423*
1424* Second preconditioning finished; continue with Jacobi SVD
1425* The input matrix is lower triangular.
1426*
1427* Recover the right singular vectors as solution of a well
1428* conditioned triangular matrix equation.
1429*
1430 IF ( condr1 .LT. cond_ok ) THEN
1431*
1432 CALL sgesvj( 'L','U','N',nr,nr,v,ldv,sva,nr,u,
1433 $ ldu,work(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,info )
1434 scalem = work(2*n+n*nr+nr+1)
1435 numrank = nint(work(2*n+n*nr+nr+2))
1436 DO 3970 p = 1, nr
1437 CALL scopy( nr, v(1,p), 1, u(1,p), 1 )
1438 CALL sscal( nr, sva(p), v(1,p), 1 )
1439 3970 CONTINUE
1440
1441* .. pick the right matrix equation and solve it
1442*
1443 IF ( nr .EQ. n ) THEN
1444* :)) .. best case, R1 is inverted. The solution of this matrix
1445* equation is Q2*V2 = the product of the Jacobi rotations
1446* used in SGESVJ, premultiplied with the orthogonal matrix
1447* from the second QR factorization.
1448 CALL strsm( 'L','U','N','N', nr,nr,one, a,lda, v,
1449 $ ldv )
1450 ELSE
1451* .. R1 is well conditioned, but non-square. Transpose(R2)
1452* is inverted to get the product of the Jacobi rotations
1453* used in SGESVJ. The Q-factor from the second QR
1454* factorization is then built in explicitly.
1455 CALL strsm('L','U','T','N',nr,nr,one,work(2*n+1),
1456 $ n,v,ldv)
1457 IF ( nr .LT. n ) THEN
1458 CALL slaset('A',n-nr,nr,zero,zero,v(nr+1,1),ldv)
1459 CALL slaset('A',nr,n-nr,zero,zero,v(1,nr+1),ldv)
1460 CALL slaset('A',n-nr,n-nr,zero,one,v(nr+1,nr+1),
1461 $ ldv)
1462 END IF
1463 CALL sormqr('L','N',n,n,nr,work(2*n+1),n,work(n+1),
1464 $ v,ldv,work(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr)
1465 END IF
1466*
1467 ELSE IF ( condr2 .LT. cond_ok ) THEN
1468*
1469* :) .. the input matrix A is very likely a relative of
1470* the Kahan matrix :)
1471* The matrix R2 is inverted. The solution of the matrix equation
1472* is Q3^T*V3 = the product of the Jacobi rotations (applied to
1473* the lower triangular L3 from the LQ factorization of
1474* R2=L3*Q3), pre-multiplied with the transposed Q3.
1475 CALL sgesvj( 'L', 'U', 'N', nr, nr, v, ldv, sva, nr,
1476 $ u,
1477 $ ldu, work(2*n+n*nr+nr+1), lwork-2*n-n*nr-nr, info )
1478 scalem = work(2*n+n*nr+nr+1)
1479 numrank = nint(work(2*n+n*nr+nr+2))
1480 DO 3870 p = 1, nr
1481 CALL scopy( nr, v(1,p), 1, u(1,p), 1 )
1482 CALL sscal( nr, sva(p), u(1,p), 1 )
1483 3870 CONTINUE
1484 CALL strsm('L','U','N','N',nr,nr,one,work(2*n+1),n,u,
1485 $ ldu)
1486* .. apply the permutation from the second QR factorization
1487 DO 873 q = 1, nr
1488 DO 872 p = 1, nr
1489 work(2*n+n*nr+nr+iwork(n+p)) = u(p,q)
1490 872 CONTINUE
1491 DO 874 p = 1, nr
1492 u(p,q) = work(2*n+n*nr+nr+p)
1493 874 CONTINUE
1494 873 CONTINUE
1495 IF ( nr .LT. n ) THEN
1496 CALL slaset( 'A',n-nr,nr,zero,zero,v(nr+1,1),ldv )
1497 CALL slaset( 'A',nr,n-nr,zero,zero,v(1,nr+1),ldv )
1498 CALL slaset( 'A',n-nr,n-nr,zero,one,v(nr+1,nr+1),
1499 $ ldv )
1500 END IF
1501 CALL sormqr( 'L','N',n,n,nr,work(2*n+1),n,work(n+1),
1502 $ v,ldv,work(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr )
1503 ELSE
1504* Last line of defense.
1505* #:( This is a rather pathological case: no scaled condition
1506* improvement after two pivoted QR factorizations. Other
1507* possibility is that the rank revealing QR factorization
1508* or the condition estimator has failed, or the COND_OK
1509* is set very close to ONE (which is unnecessary). Normally,
1510* this branch should never be executed, but in rare cases of
1511* failure of the RRQR or condition estimator, the last line of
1512* defense ensures that SGEJSV completes the task.
1513* Compute the full SVD of L3 using SGESVJ with explicit
1514* accumulation of Jacobi rotations.
1515 CALL sgesvj( 'L', 'U', 'V', nr, nr, v, ldv, sva, nr,
1516 $ u,
1517 $ ldu, work(2*n+n*nr+nr+1), lwork-2*n-n*nr-nr, info )
1518 scalem = work(2*n+n*nr+nr+1)
1519 numrank = nint(work(2*n+n*nr+nr+2))
1520 IF ( nr .LT. n ) THEN
1521 CALL slaset( 'A',n-nr,nr,zero,zero,v(nr+1,1),ldv )
1522 CALL slaset( 'A',nr,n-nr,zero,zero,v(1,nr+1),ldv )
1523 CALL slaset( 'A',n-nr,n-nr,zero,one,v(nr+1,nr+1),
1524 $ ldv )
1525 END IF
1526 CALL sormqr( 'L','N',n,n,nr,work(2*n+1),n,work(n+1),
1527 $ v,ldv,work(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr )
1528*
1529 CALL sormlq( 'L', 'T', nr, nr, nr, work(2*n+1), n,
1530 $ work(2*n+n*nr+1), u, ldu, work(2*n+n*nr+nr+1),
1531 $ lwork-2*n-n*nr-nr, ierr )
1532 DO 773 q = 1, nr
1533 DO 772 p = 1, nr
1534 work(2*n+n*nr+nr+iwork(n+p)) = u(p,q)
1535 772 CONTINUE
1536 DO 774 p = 1, nr
1537 u(p,q) = work(2*n+n*nr+nr+p)
1538 774 CONTINUE
1539 773 CONTINUE
1540*
1541 END IF
1542*
1543* Permute the rows of V using the (column) permutation from the
1544* first QRF. Also, scale the columns to make them unit in
1545* Euclidean norm. This applies to all cases.
1546*
1547 temp1 = sqrt(float(n)) * epsln
1548 DO 1972 q = 1, n
1549 DO 972 p = 1, n
1550 work(2*n+n*nr+nr+iwork(p)) = v(p,q)
1551 972 CONTINUE
1552 DO 973 p = 1, n
1553 v(p,q) = work(2*n+n*nr+nr+p)
1554 973 CONTINUE
1555 xsc = one / snrm2( n, v(1,q), 1 )
1556 IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1557 $ CALL sscal( n, xsc, v(1,q), 1 )
1558 1972 CONTINUE
1559* At this moment, V contains the right singular vectors of A.
1560* Next, assemble the left singular vector matrix U (M x N).
1561 IF ( nr .LT. m ) THEN
1562 CALL slaset( 'A', m-nr, nr, zero, zero, u(nr+1,1),
1563 $ ldu )
1564 IF ( nr .LT. n1 ) THEN
1565 CALL slaset('A',nr,n1-nr,zero,zero,u(1,nr+1),ldu)
1566 CALL slaset('A',m-nr,n1-nr,zero,one,u(nr+1,nr+1),
1567 $ ldu)
1568 END IF
1569 END IF
1570*
1571* The Q matrix from the first QRF is built into the left singular
1572* matrix U. This applies to all cases.
1573*
1574 CALL sormqr( 'Left', 'No_Tr', m, n1, n, a, lda, work, u,
1575 $ ldu, work(n+1), lwork-n, ierr )
1576
1577* The columns of U are normalized. The cost is O(M*N) flops.
1578 temp1 = sqrt(float(m)) * epsln
1579 DO 1973 p = 1, nr
1580 xsc = one / snrm2( m, u(1,p), 1 )
1581 IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1582 $ CALL sscal( m, xsc, u(1,p), 1 )
1583 1973 CONTINUE
1584*
1585* If the initial QRF is computed with row pivoting, the left
1586* singular vectors must be adjusted.
1587*
1588 IF ( rowpiv )
1589 $ CALL slaswp( n1, u, ldu, 1, m-1, iwork(2*n+1), -1 )
1590*
1591 ELSE
1592*
1593* .. the initial matrix A has almost orthogonal columns and
1594* the second QRF is not needed
1595*
1596 CALL slacpy( 'Upper', n, n, a, lda, work(n+1), n )
1597 IF ( l2pert ) THEN
1598 xsc = sqrt(small)
1599 DO 5970 p = 2, n
1600 temp1 = xsc * work( n + (p-1)*n + p )
1601 DO 5971 q = 1, p - 1
1602 work(n+(q-1)*n+p)=-sign(temp1,work(n+(p-1)*n+q))
1603 5971 CONTINUE
1604 5970 CONTINUE
1605 ELSE
1606 CALL slaset( 'Lower',n-1,n-1,zero,zero,work(n+2),n )
1607 END IF
1608*
1609 CALL sgesvj( 'Upper', 'U', 'N', n, n, work(n+1), n, sva,
1610 $ n, u, ldu, work(n+n*n+1), lwork-n-n*n, info )
1611*
1612 scalem = work(n+n*n+1)
1613 numrank = nint(work(n+n*n+2))
1614 DO 6970 p = 1, n
1615 CALL scopy( n, work(n+(p-1)*n+1), 1, u(1,p), 1 )
1616 CALL sscal( n, sva(p), work(n+(p-1)*n+1), 1 )
1617 6970 CONTINUE
1618*
1619 CALL strsm( 'Left', 'Upper', 'NoTrans', 'No UD', n, n,
1620 $ one, a, lda, work(n+1), n )
1621 DO 6972 p = 1, n
1622 CALL scopy( n, work(n+p), n, v(iwork(p),1), ldv )
1623 6972 CONTINUE
1624 temp1 = sqrt(float(n))*epsln
1625 DO 6971 p = 1, n
1626 xsc = one / snrm2( n, v(1,p), 1 )
1627 IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1628 $ CALL sscal( n, xsc, v(1,p), 1 )
1629 6971 CONTINUE
1630*
1631* Assemble the left singular vector matrix U (M x N).
1632*
1633 IF ( n .LT. m ) THEN
1634 CALL slaset( 'A', m-n, n, zero, zero, u(n+1,1), ldu )
1635 IF ( n .LT. n1 ) THEN
1636 CALL slaset( 'A',n, n1-n, zero, zero, u(1,n+1),
1637 $ ldu )
1638 CALL slaset( 'A',m-n,n1-n, zero, one,u(n+1,n+1),
1639 $ ldu )
1640 END IF
1641 END IF
1642 CALL sormqr( 'Left', 'No Tr', m, n1, n, a, lda, work, u,
1643 $ ldu, work(n+1), lwork-n, ierr )
1644 temp1 = sqrt(float(m))*epsln
1645 DO 6973 p = 1, n1
1646 xsc = one / snrm2( m, u(1,p), 1 )
1647 IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1648 $ CALL sscal( m, xsc, u(1,p), 1 )
1649 6973 CONTINUE
1650*
1651 IF ( rowpiv )
1652 $ CALL slaswp( n1, u, ldu, 1, m-1, iwork(2*n+1), -1 )
1653*
1654 END IF
1655*
1656* end of the >> almost orthogonal case << in the full SVD
1657*
1658 ELSE
1659*
1660* This branch deploys a preconditioned Jacobi SVD with explicitly
1661* accumulated rotations. It is included as optional, mainly for
1662* experimental purposes. It does perform well, and can also be used.
1663* In this implementation, this branch will be automatically activated
1664* if the condition number sigma_max(A) / sigma_min(A) is predicted
1665* to be greater than the overflow threshold. This is because the
1666* a posteriori computation of the singular vectors assumes robust
1667* implementation of BLAS and some LAPACK procedures, capable of working
1668* in presence of extreme values. Since that is not always the case, ...
1669*
1670 DO 7968 p = 1, nr
1671 CALL scopy( n-p+1, a(p,p), lda, v(p,p), 1 )
1672 7968 CONTINUE
1673*
1674 IF ( l2pert ) THEN
1675 xsc = sqrt(small/epsln)
1676 DO 5969 q = 1, nr
1677 temp1 = xsc*abs( v(q,q) )
1678 DO 5968 p = 1, n
1679 IF ( ( p .GT. q ) .AND. ( abs(v(p,q)) .LE. temp1 )
1680 $ .OR. ( p .LT. q ) )
1681 $ v(p,q) = sign( temp1, v(p,q) )
1682 IF ( p .LT. q ) v(p,q) = - v(p,q)
1683 5968 CONTINUE
1684 5969 CONTINUE
1685 ELSE
1686 CALL slaset( 'U', nr-1, nr-1, zero, zero, v(1,2), ldv )
1687 END IF
1688
1689 CALL sgeqrf( n, nr, v, ldv, work(n+1), work(2*n+1),
1690 $ lwork-2*n, ierr )
1691 CALL slacpy( 'L', n, nr, v, ldv, work(2*n+1), n )
1692*
1693 DO 7969 p = 1, nr
1694 CALL scopy( nr-p+1, v(p,p), ldv, u(p,p), 1 )
1695 7969 CONTINUE
1696
1697 IF ( l2pert ) THEN
1698 xsc = sqrt(small/epsln)
1699 DO 9970 q = 2, nr
1700 DO 9971 p = 1, q - 1
1701 temp1 = xsc * min(abs(u(p,p)),abs(u(q,q)))
1702 u(p,q) = - sign( temp1, u(q,p) )
1703 9971 CONTINUE
1704 9970 CONTINUE
1705 ELSE
1706 CALL slaset('U', nr-1, nr-1, zero, zero, u(1,2), ldu )
1707 END IF
1708
1709 CALL sgesvj( 'L', 'U', 'V', nr, nr, u, ldu, sva,
1710 $ n, v, ldv, work(2*n+n*nr+1), lwork-2*n-n*nr, info )
1711 scalem = work(2*n+n*nr+1)
1712 numrank = nint(work(2*n+n*nr+2))
1713
1714 IF ( nr .LT. n ) THEN
1715 CALL slaset( 'A',n-nr,nr,zero,zero,v(nr+1,1),ldv )
1716 CALL slaset( 'A',nr,n-nr,zero,zero,v(1,nr+1),ldv )
1717 CALL slaset( 'A',n-nr,n-nr,zero,one,v(nr+1,nr+1),ldv )
1718 END IF
1719
1720 CALL sormqr( 'L','N',n,n,nr,work(2*n+1),n,work(n+1),
1721 $ v,ldv,work(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr )
1722*
1723* Permute the rows of V using the (column) permutation from the
1724* first QRF. Also, scale the columns to make them unit in
1725* Euclidean norm. This applies to all cases.
1726*
1727 temp1 = sqrt(float(n)) * epsln
1728 DO 7972 q = 1, n
1729 DO 8972 p = 1, n
1730 work(2*n+n*nr+nr+iwork(p)) = v(p,q)
1731 8972 CONTINUE
1732 DO 8973 p = 1, n
1733 v(p,q) = work(2*n+n*nr+nr+p)
1734 8973 CONTINUE
1735 xsc = one / snrm2( n, v(1,q), 1 )
1736 IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1737 $ CALL sscal( n, xsc, v(1,q), 1 )
1738 7972 CONTINUE
1739*
1740* At this moment, V contains the right singular vectors of A.
1741* Next, assemble the left singular vector matrix U (M x N).
1742*
1743 IF ( nr .LT. m ) THEN
1744 CALL slaset( 'A', m-nr, nr, zero, zero, u(nr+1,1), ldu )
1745 IF ( nr .LT. n1 ) THEN
1746 CALL slaset( 'A',nr, n1-nr, zero, zero, u(1,nr+1),
1747 $ ldu )
1748 CALL slaset( 'A',m-nr,n1-nr, zero, one,u(nr+1,nr+1),
1749 $ ldu )
1750 END IF
1751 END IF
1752*
1753 CALL sormqr( 'Left', 'No Tr', m, n1, n, a, lda, work, u,
1754 $ ldu, work(n+1), lwork-n, ierr )
1755*
1756 IF ( rowpiv )
1757 $ CALL slaswp( n1, u, ldu, 1, m-1, iwork(2*n+1), -1 )
1758*
1759*
1760 END IF
1761 IF ( transp ) THEN
1762* .. swap U and V because the procedure worked on A^t
1763 DO 6974 p = 1, n
1764 CALL sswap( n, u(1,p), 1, v(1,p), 1 )
1765 6974 CONTINUE
1766 END IF
1767*
1768 END IF
1769* end of the full SVD
1770*
1771* Undo scaling, if necessary (and possible)
1772*
1773 IF ( uscal2 .LE. (big/sva(1))*uscal1 ) THEN
1774 CALL slascl( 'G', 0, 0, uscal1, uscal2, nr, 1, sva, n,
1775 $ ierr )
1776 uscal1 = one
1777 uscal2 = one
1778 END IF
1779*
1780 IF ( nr .LT. n ) THEN
1781 DO 3004 p = nr+1, n
1782 sva(p) = zero
1783 3004 CONTINUE
1784 END IF
1785*
1786 work(1) = uscal2 * scalem
1787 work(2) = uscal1
1788 IF ( errest ) work(3) = sconda
1789 IF ( lsvec .AND. rsvec ) THEN
1790 work(4) = condr1
1791 work(5) = condr2
1792 END IF
1793 IF ( l2tran ) THEN
1794 work(6) = entra
1795 work(7) = entrat
1796 END IF
1797*
1798 iwork(1) = nr
1799 iwork(2) = numrank
1800 iwork(3) = warning
1801*
1802 RETURN
1803* ..
1804* .. END OF SGEJSV
1805* ..
1806 END
1807*
subroutine xerbla(srname, info)
Definition cblat2.f:3285
subroutine scopy(n, sx, incx, sy, incy)
SCOPY
Definition scopy.f:82
subroutine sgejsv(joba, jobu, jobv, jobr, jobt, jobp, m, n, a, lda, sva, u, ldu, v, ldv, work, lwork, iwork, info)
SGEJSV
Definition sgejsv.f:474
subroutine sgelqf(m, n, a, lda, tau, work, lwork, info)
SGELQF
Definition sgelqf.f:142
subroutine sgeqp3(m, n, a, lda, jpvt, tau, work, lwork, info)
SGEQP3
Definition sgeqp3.f:149
subroutine sgeqrf(m, n, a, lda, tau, work, lwork, info)
SGEQRF
Definition sgeqrf.f:144
subroutine sgesvj(joba, jobu, jobv, m, n, a, lda, sva, mv, v, ldv, work, lwork, info)
SGESVJ
Definition sgesvj.f:326
subroutine slacpy(uplo, m, n, a, lda, b, ldb)
SLACPY copies all or part of one two-dimensional array to another.
Definition slacpy.f:101
subroutine slascl(type, kl, ku, cfrom, cto, m, n, a, lda, info)
SLASCL multiplies a general rectangular matrix by a real scalar defined as cto/cfrom.
Definition slascl.f:142
subroutine slaset(uplo, m, n, alpha, beta, a, lda)
SLASET initializes the off-diagonal elements and the diagonal elements of a matrix to given values.
Definition slaset.f:108
subroutine slassq(n, x, incx, scale, sumsq)
SLASSQ updates a sum of squares represented in scaled form.
Definition slassq.f90:122
subroutine slaswp(n, a, lda, k1, k2, ipiv, incx)
SLASWP performs a series of row interchanges on a general rectangular matrix.
Definition slaswp.f:113
subroutine spocon(uplo, n, a, lda, anorm, rcond, work, iwork, info)
SPOCON
Definition spocon.f:119
subroutine sscal(n, sa, sx, incx)
SSCAL
Definition sscal.f:79
subroutine sswap(n, sx, incx, sy, incy)
SSWAP
Definition sswap.f:82
subroutine strsm(side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb)
STRSM
Definition strsm.f:181
subroutine sorgqr(m, n, k, a, lda, tau, work, lwork, info)
SORGQR
Definition sorgqr.f:126
subroutine sormlq(side, trans, m, n, k, a, lda, tau, c, ldc, work, lwork, info)
SORMLQ
Definition sormlq.f:166
subroutine sormqr(side, trans, m, n, k, a, lda, tau, c, ldc, work, lwork, info)
SORMQR
Definition sormqr.f:166