LAPACK 3.12.0
LAPACK: Linear Algebra PACKage
Loading...
Searching...
No Matches
zgejsv.f
Go to the documentation of this file.
1*> \brief \b ZGEJSV
2*
3* =========== DOCUMENTATION ===========
4*
5* Online html documentation available at
6* http://www.netlib.org/lapack/explore-html/
7*
8*> \htmlonly
9*> Download ZGEJSV + dependencies
10*> <a href="http://www.netlib.org/cgi-bin/netlibfiles.tgz?format=tgz&filename=/lapack/lapack_routine/zgejsv.f">
11*> [TGZ]</a>
12*> <a href="http://www.netlib.org/cgi-bin/netlibfiles.zip?format=zip&filename=/lapack/lapack_routine/zgejsv.f">
13*> [ZIP]</a>
14*> <a href="http://www.netlib.org/cgi-bin/netlibfiles.txt?format=txt&filename=/lapack/lapack_routine/zgejsv.f">
15*> [TXT]</a>
16*> \endhtmlonly
17*
18* Definition:
19* ===========
20*
21* SUBROUTINE ZGEJSV( JOBA, JOBU, JOBV, JOBR, JOBT, JOBP,
22* M, N, A, LDA, SVA, U, LDU, V, LDV,
23* CWORK, LWORK, RWORK, LRWORK, IWORK, INFO )
24*
25* .. Scalar Arguments ..
26* IMPLICIT NONE
27* INTEGER INFO, LDA, LDU, LDV, LWORK, M, N
28* ..
29* .. Array Arguments ..
30* COMPLEX*16 A( LDA, * ), U( LDU, * ), V( LDV, * ), CWORK( LWORK )
31* DOUBLE PRECISION SVA( N ), RWORK( LRWORK )
32* INTEGER IWORK( * )
33* CHARACTER*1 JOBA, JOBP, JOBR, JOBT, JOBU, JOBV
34* ..
35*
36*
37*> \par Purpose:
38* =============
39*>
40*> \verbatim
41*>
42*> ZGEJSV computes the singular value decomposition (SVD) of a complex M-by-N
43*> matrix [A], where M >= N. The SVD of [A] is written as
44*>
45*> [A] = [U] * [SIGMA] * [V]^*,
46*>
47*> where [SIGMA] is an N-by-N (M-by-N) matrix which is zero except for its N
48*> diagonal elements, [U] is an M-by-N (or M-by-M) unitary matrix, and
49*> [V] is an N-by-N unitary matrix. The diagonal elements of [SIGMA] are
50*> the singular values of [A]. The columns of [U] and [V] are the left and
51*> the right singular vectors of [A], respectively. The matrices [U] and [V]
52*> are computed and stored in the arrays U and V, respectively. The diagonal
53*> of [SIGMA] is computed and stored in the array SVA.
54*> \endverbatim
55*
56* Arguments:
57* ==========
58*
59*> \param[in] JOBA
60*> \verbatim
61*> JOBA is CHARACTER*1
62*> Specifies the level of accuracy:
63*> = 'C': This option works well (high relative accuracy) if A = B * D,
64*> with well-conditioned B and arbitrary diagonal matrix D.
65*> The accuracy cannot be spoiled by COLUMN scaling. The
66*> accuracy of the computed output depends on the condition of
67*> B, and the procedure aims at the best theoretical accuracy.
68*> The relative error max_{i=1:N}|d sigma_i| / sigma_i is
69*> bounded by f(M,N)*epsilon* cond(B), independent of D.
70*> The input matrix is preprocessed with the QRF with column
71*> pivoting. This initial preprocessing and preconditioning by
72*> a rank revealing QR factorization is common for all values of
73*> JOBA. Additional actions are specified as follows:
74*> = 'E': Computation as with 'C' with an additional estimate of the
75*> condition number of B. It provides a realistic error bound.
76*> = 'F': If A = D1 * C * D2 with ill-conditioned diagonal scalings
77*> D1, D2, and well-conditioned matrix C, this option gives
78*> higher accuracy than the 'C' option. If the structure of the
79*> input matrix is not known, and relative accuracy is
80*> desirable, then this option is advisable. The input matrix A
81*> is preprocessed with QR factorization with FULL (row and
82*> column) pivoting.
83*> = 'G': Computation as with 'F' with an additional estimate of the
84*> condition number of B, where A=B*D. If A has heavily weighted
85*> rows, then using this condition number gives too pessimistic
86*> error bound.
87*> = 'A': Small singular values are not well determined by the data
88*> and are considered as noisy; the matrix is treated as
89*> numerically rank deficient. The error in the computed
90*> singular values is bounded by f(m,n)*epsilon*||A||.
91*> The computed SVD A = U * S * V^* restores A up to
92*> f(m,n)*epsilon*||A||.
93*> This gives the procedure the licence to discard (set to zero)
94*> all singular values below N*epsilon*||A||.
95*> = 'R': Similar as in 'A'. Rank revealing property of the initial
96*> QR factorization is used do reveal (using triangular factor)
97*> a gap sigma_{r+1} < epsilon * sigma_r in which case the
98*> numerical RANK is declared to be r. The SVD is computed with
99*> absolute error bounds, but more accurately than with 'A'.
100*> \endverbatim
101*>
102*> \param[in] JOBU
103*> \verbatim
104*> JOBU is CHARACTER*1
105*> Specifies whether to compute the columns of U:
106*> = 'U': N columns of U are returned in the array U.
107*> = 'F': full set of M left sing. vectors is returned in the array U.
108*> = 'W': U may be used as workspace of length M*N. See the description
109*> of U.
110*> = 'N': U is not computed.
111*> \endverbatim
112*>
113*> \param[in] JOBV
114*> \verbatim
115*> JOBV is CHARACTER*1
116*> Specifies whether to compute the matrix V:
117*> = 'V': N columns of V are returned in the array V; Jacobi rotations
118*> are not explicitly accumulated.
119*> = 'J': N columns of V are returned in the array V, but they are
120*> computed as the product of Jacobi rotations, if JOBT = 'N'.
121*> = 'W': V may be used as workspace of length N*N. See the description
122*> of V.
123*> = 'N': V is not computed.
124*> \endverbatim
125*>
126*> \param[in] JOBR
127*> \verbatim
128*> JOBR is CHARACTER*1
129*> Specifies the RANGE for the singular values. Issues the licence to
130*> set to zero small positive singular values if they are outside
131*> specified range. If A .NE. 0 is scaled so that the largest singular
132*> value of c*A is around SQRT(BIG), BIG=DLAMCH('O'), then JOBR issues
133*> the licence to kill columns of A whose norm in c*A is less than
134*> SQRT(SFMIN) (for JOBR = 'R'), or less than SMALL=SFMIN/EPSLN,
135*> where SFMIN=DLAMCH('S'), EPSLN=DLAMCH('E').
136*> = 'N': Do not kill small columns of c*A. This option assumes that
137*> BLAS and QR factorizations and triangular solvers are
138*> implemented to work in that range. If the condition of A
139*> is greater than BIG, use ZGESVJ.
140*> = 'R': RESTRICTED range for sigma(c*A) is [SQRT(SFMIN), SQRT(BIG)]
141*> (roughly, as described above). This option is recommended.
142*> ===========================
143*> For computing the singular values in the FULL range [SFMIN,BIG]
144*> use ZGESVJ.
145*> \endverbatim
146*>
147*> \param[in] JOBT
148*> \verbatim
149*> JOBT is CHARACTER*1
150*> If the matrix is square then the procedure may determine to use
151*> transposed A if A^* seems to be better with respect to convergence.
152*> If the matrix is not square, JOBT is ignored.
153*> The decision is based on two values of entropy over the adjoint
154*> orbit of A^* * A. See the descriptions of RWORK(6) and RWORK(7).
155*> = 'T': transpose if entropy test indicates possibly faster
156*> convergence of Jacobi process if A^* is taken as input. If A is
157*> replaced with A^*, then the row pivoting is included automatically.
158*> = 'N': do not speculate.
159*> The option 'T' can be used to compute only the singular values, or
160*> the full SVD (U, SIGMA and V). For only one set of singular vectors
161*> (U or V), the caller should provide both U and V, as one of the
162*> matrices is used as workspace if the matrix A is transposed.
163*> The implementer can easily remove this constraint and make the
164*> code more complicated. See the descriptions of U and V.
165*> In general, this option is considered experimental, and 'N'; should
166*> be preferred. This is subject to changes in the future.
167*> \endverbatim
168*>
169*> \param[in] JOBP
170*> \verbatim
171*> JOBP is CHARACTER*1
172*> Issues the licence to introduce structured perturbations to drown
173*> denormalized numbers. This licence should be active if the
174*> denormals are poorly implemented, causing slow computation,
175*> especially in cases of fast convergence (!). For details see [1,2].
176*> For the sake of simplicity, this perturbations are included only
177*> when the full SVD or only the singular values are requested. The
178*> implementer/user can easily add the perturbation for the cases of
179*> computing one set of singular vectors.
180*> = 'P': introduce perturbation
181*> = 'N': do not perturb
182*> \endverbatim
183*>
184*> \param[in] M
185*> \verbatim
186*> M is INTEGER
187*> The number of rows of the input matrix A. M >= 0.
188*> \endverbatim
189*>
190*> \param[in] N
191*> \verbatim
192*> N is INTEGER
193*> The number of columns of the input matrix A. M >= N >= 0.
194*> \endverbatim
195*>
196*> \param[in,out] A
197*> \verbatim
198*> A is COMPLEX*16 array, dimension (LDA,N)
199*> On entry, the M-by-N matrix A.
200*> \endverbatim
201*>
202*> \param[in] LDA
203*> \verbatim
204*> LDA is INTEGER
205*> The leading dimension of the array A. LDA >= max(1,M).
206*> \endverbatim
207*>
208*> \param[out] SVA
209*> \verbatim
210*> SVA is DOUBLE PRECISION array, dimension (N)
211*> On exit,
212*> - For RWORK(1)/RWORK(2) = ONE: The singular values of A. During
213*> the computation SVA contains Euclidean column norms of the
214*> iterated matrices in the array A.
215*> - For RWORK(1) .NE. RWORK(2): The singular values of A are
216*> (RWORK(1)/RWORK(2)) * SVA(1:N). This factored form is used if
217*> sigma_max(A) overflows or if small singular values have been
218*> saved from underflow by scaling the input matrix A.
219*> - If JOBR='R' then some of the singular values may be returned
220*> as exact zeros obtained by "set to zero" because they are
221*> below the numerical rank threshold or are denormalized numbers.
222*> \endverbatim
223*>
224*> \param[out] U
225*> \verbatim
226*> U is COMPLEX*16 array, dimension ( LDU, N )
227*> If JOBU = 'U', then U contains on exit the M-by-N matrix of
228*> the left singular vectors.
229*> If JOBU = 'F', then U contains on exit the M-by-M matrix of
230*> the left singular vectors, including an ONB
231*> of the orthogonal complement of the Range(A).
232*> If JOBU = 'W' .AND. (JOBV = 'V' .AND. JOBT = 'T' .AND. M = N),
233*> then U is used as workspace if the procedure
234*> replaces A with A^*. In that case, [V] is computed
235*> in U as left singular vectors of A^* and then
236*> copied back to the V array. This 'W' option is just
237*> a reminder to the caller that in this case U is
238*> reserved as workspace of length N*N.
239*> If JOBU = 'N' U is not referenced, unless JOBT='T'.
240*> \endverbatim
241*>
242*> \param[in] LDU
243*> \verbatim
244*> LDU is INTEGER
245*> The leading dimension of the array U, LDU >= 1.
246*> IF JOBU = 'U' or 'F' or 'W', then LDU >= M.
247*> \endverbatim
248*>
249*> \param[out] V
250*> \verbatim
251*> V is COMPLEX*16 array, dimension ( LDV, N )
252*> If JOBV = 'V', 'J' then V contains on exit the N-by-N matrix of
253*> the right singular vectors;
254*> If JOBV = 'W', AND (JOBU = 'U' AND JOBT = 'T' AND M = N),
255*> then V is used as workspace if the procedure
256*> replaces A with A^*. In that case, [U] is computed
257*> in V as right singular vectors of A^* and then
258*> copied back to the U array. This 'W' option is just
259*> a reminder to the caller that in this case V is
260*> reserved as workspace of length N*N.
261*> If JOBV = 'N' V is not referenced, unless JOBT='T'.
262*> \endverbatim
263*>
264*> \param[in] LDV
265*> \verbatim
266*> LDV is INTEGER
267*> The leading dimension of the array V, LDV >= 1.
268*> If JOBV = 'V' or 'J' or 'W', then LDV >= N.
269*> \endverbatim
270*>
271*> \param[out] CWORK
272*> \verbatim
273*> CWORK is COMPLEX*16 array, dimension (MAX(2,LWORK))
274*> If the call to ZGEJSV is a workspace query (indicated by LWORK=-1 or
275*> LRWORK=-1), then on exit CWORK(1) contains the required length of
276*> CWORK for the job parameters used in the call.
277*> \endverbatim
278*>
279*> \param[in] LWORK
280*> \verbatim
281*> LWORK is INTEGER
282*> Length of CWORK to confirm proper allocation of workspace.
283*> LWORK depends on the job:
284*>
285*> 1. If only SIGMA is needed ( JOBU = 'N', JOBV = 'N' ) and
286*> 1.1 .. no scaled condition estimate required (JOBA.NE.'E'.AND.JOBA.NE.'G'):
287*> LWORK >= 2*N+1. This is the minimal requirement.
288*> ->> For optimal performance (blocked code) the optimal value
289*> is LWORK >= N + (N+1)*NB. Here NB is the optimal
290*> block size for ZGEQP3 and ZGEQRF.
291*> In general, optimal LWORK is computed as
292*> LWORK >= max(N+LWORK(ZGEQP3),N+LWORK(ZGEQRF), LWORK(ZGESVJ)).
293*> 1.2. .. an estimate of the scaled condition number of A is
294*> required (JOBA='E', or 'G'). In this case, LWORK the minimal
295*> requirement is LWORK >= N*N + 2*N.
296*> ->> For optimal performance (blocked code) the optimal value
297*> is LWORK >= max(N+(N+1)*NB, N*N+2*N)=N**2+2*N.
298*> In general, the optimal length LWORK is computed as
299*> LWORK >= max(N+LWORK(ZGEQP3),N+LWORK(ZGEQRF), LWORK(ZGESVJ),
300*> N*N+LWORK(ZPOCON)).
301*> 2. If SIGMA and the right singular vectors are needed (JOBV = 'V'),
302*> (JOBU = 'N')
303*> 2.1 .. no scaled condition estimate requested (JOBE = 'N'):
304*> -> the minimal requirement is LWORK >= 3*N.
305*> -> For optimal performance,
306*> LWORK >= max(N+(N+1)*NB, 2*N+N*NB)=2*N+N*NB,
307*> where NB is the optimal block size for ZGEQP3, ZGEQRF, ZGELQF,
308*> ZUNMLQ. In general, the optimal length LWORK is computed as
309*> LWORK >= max(N+LWORK(ZGEQP3), N+LWORK(ZGESVJ),
310*> N+LWORK(ZGELQF), 2*N+LWORK(ZGEQRF), N+LWORK(ZUNMLQ)).
311*> 2.2 .. an estimate of the scaled condition number of A is
312*> required (JOBA='E', or 'G').
313*> -> the minimal requirement is LWORK >= 3*N.
314*> -> For optimal performance,
315*> LWORK >= max(N+(N+1)*NB, 2*N,2*N+N*NB)=2*N+N*NB,
316*> where NB is the optimal block size for ZGEQP3, ZGEQRF, ZGELQF,
317*> ZUNMLQ. In general, the optimal length LWORK is computed as
318*> LWORK >= max(N+LWORK(ZGEQP3), LWORK(ZPOCON), N+LWORK(ZGESVJ),
319*> N+LWORK(ZGELQF), 2*N+LWORK(ZGEQRF), N+LWORK(ZUNMLQ)).
320*> 3. If SIGMA and the left singular vectors are needed
321*> 3.1 .. no scaled condition estimate requested (JOBE = 'N'):
322*> -> the minimal requirement is LWORK >= 3*N.
323*> -> For optimal performance:
324*> if JOBU = 'U' :: LWORK >= max(3*N, N+(N+1)*NB, 2*N+N*NB)=2*N+N*NB,
325*> where NB is the optimal block size for ZGEQP3, ZGEQRF, ZUNMQR.
326*> In general, the optimal length LWORK is computed as
327*> LWORK >= max(N+LWORK(ZGEQP3), 2*N+LWORK(ZGEQRF), N+LWORK(ZUNMQR)).
328*> 3.2 .. an estimate of the scaled condition number of A is
329*> required (JOBA='E', or 'G').
330*> -> the minimal requirement is LWORK >= 3*N.
331*> -> For optimal performance:
332*> if JOBU = 'U' :: LWORK >= max(3*N, N+(N+1)*NB, 2*N+N*NB)=2*N+N*NB,
333*> where NB is the optimal block size for ZGEQP3, ZGEQRF, ZUNMQR.
334*> In general, the optimal length LWORK is computed as
335*> LWORK >= max(N+LWORK(ZGEQP3),N+LWORK(ZPOCON),
336*> 2*N+LWORK(ZGEQRF), N+LWORK(ZUNMQR)).
337*> 4. If the full SVD is needed: (JOBU = 'U' or JOBU = 'F') and
338*> 4.1. if JOBV = 'V'
339*> the minimal requirement is LWORK >= 5*N+2*N*N.
340*> 4.2. if JOBV = 'J' the minimal requirement is
341*> LWORK >= 4*N+N*N.
342*> In both cases, the allocated CWORK can accommodate blocked runs
343*> of ZGEQP3, ZGEQRF, ZGELQF, SUNMQR, ZUNMLQ.
344*>
345*> If the call to ZGEJSV is a workspace query (indicated by LWORK=-1 or
346*> LRWORK=-1), then on exit CWORK(1) contains the optimal and CWORK(2) contains the
347*> minimal length of CWORK for the job parameters used in the call.
348*> \endverbatim
349*>
350*> \param[out] RWORK
351*> \verbatim
352*> RWORK is DOUBLE PRECISION array, dimension (MAX(7,LRWORK))
353*> On exit,
354*> RWORK(1) = Determines the scaling factor SCALE = RWORK(2) / RWORK(1)
355*> such that SCALE*SVA(1:N) are the computed singular values
356*> of A. (See the description of SVA().)
357*> RWORK(2) = See the description of RWORK(1).
358*> RWORK(3) = SCONDA is an estimate for the condition number of
359*> column equilibrated A. (If JOBA = 'E' or 'G')
360*> SCONDA is an estimate of SQRT(||(R^* * R)^(-1)||_1).
361*> It is computed using ZPOCON. It holds
362*> N^(-1/4) * SCONDA <= ||R^(-1)||_2 <= N^(1/4) * SCONDA
363*> where R is the triangular factor from the QRF of A.
364*> However, if R is truncated and the numerical rank is
365*> determined to be strictly smaller than N, SCONDA is
366*> returned as -1, thus indicating that the smallest
367*> singular values might be lost.
368*>
369*> If full SVD is needed, the following two condition numbers are
370*> useful for the analysis of the algorithm. They are provided for
371*> a developer/implementer who is familiar with the details of
372*> the method.
373*>
374*> RWORK(4) = an estimate of the scaled condition number of the
375*> triangular factor in the first QR factorization.
376*> RWORK(5) = an estimate of the scaled condition number of the
377*> triangular factor in the second QR factorization.
378*> The following two parameters are computed if JOBT = 'T'.
379*> They are provided for a developer/implementer who is familiar
380*> with the details of the method.
381*> RWORK(6) = the entropy of A^* * A :: this is the Shannon entropy
382*> of diag(A^* * A) / Trace(A^* * A) taken as point in the
383*> probability simplex.
384*> RWORK(7) = the entropy of A * A^*. (See the description of RWORK(6).)
385*> If the call to ZGEJSV is a workspace query (indicated by LWORK=-1 or
386*> LRWORK=-1), then on exit RWORK(1) contains the required length of
387*> RWORK for the job parameters used in the call.
388*> \endverbatim
389*>
390*> \param[in] LRWORK
391*> \verbatim
392*> LRWORK is INTEGER
393*> Length of RWORK to confirm proper allocation of workspace.
394*> LRWORK depends on the job:
395*>
396*> 1. If only the singular values are requested i.e. if
397*> LSAME(JOBU,'N') .AND. LSAME(JOBV,'N')
398*> then:
399*> 1.1. If LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G'),
400*> then: LRWORK = max( 7, 2 * M ).
401*> 1.2. Otherwise, LRWORK = max( 7, N ).
402*> 2. If singular values with the right singular vectors are requested
403*> i.e. if
404*> (LSAME(JOBV,'V').OR.LSAME(JOBV,'J')) .AND.
405*> .NOT.(LSAME(JOBU,'U').OR.LSAME(JOBU,'F'))
406*> then:
407*> 2.1. If LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G'),
408*> then LRWORK = max( 7, 2 * M ).
409*> 2.2. Otherwise, LRWORK = max( 7, N ).
410*> 3. If singular values with the left singular vectors are requested, i.e. if
411*> (LSAME(JOBU,'U').OR.LSAME(JOBU,'F')) .AND.
412*> .NOT.(LSAME(JOBV,'V').OR.LSAME(JOBV,'J'))
413*> then:
414*> 3.1. If LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G'),
415*> then LRWORK = max( 7, 2 * M ).
416*> 3.2. Otherwise, LRWORK = max( 7, N ).
417*> 4. If singular values with both the left and the right singular vectors
418*> are requested, i.e. if
419*> (LSAME(JOBU,'U').OR.LSAME(JOBU,'F')) .AND.
420*> (LSAME(JOBV,'V').OR.LSAME(JOBV,'J'))
421*> then:
422*> 4.1. If LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G'),
423*> then LRWORK = max( 7, 2 * M ).
424*> 4.2. Otherwise, LRWORK = max( 7, N ).
425*>
426*> If, on entry, LRWORK = -1 or LWORK=-1, a workspace query is assumed and
427*> the length of RWORK is returned in RWORK(1).
428*> \endverbatim
429*>
430*> \param[out] IWORK
431*> \verbatim
432*> IWORK is INTEGER array, of dimension at least 4, that further depends
433*> on the job:
434*>
435*> 1. If only the singular values are requested then:
436*> If ( LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G') )
437*> then the length of IWORK is N+M; otherwise the length of IWORK is N.
438*> 2. If the singular values and the right singular vectors are requested then:
439*> If ( LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G') )
440*> then the length of IWORK is N+M; otherwise the length of IWORK is N.
441*> 3. If the singular values and the left singular vectors are requested then:
442*> If ( LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G') )
443*> then the length of IWORK is N+M; otherwise the length of IWORK is N.
444*> 4. If the singular values with both the left and the right singular vectors
445*> are requested, then:
446*> 4.1. If LSAME(JOBV,'J') the length of IWORK is determined as follows:
447*> If ( LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G') )
448*> then the length of IWORK is N+M; otherwise the length of IWORK is N.
449*> 4.2. If LSAME(JOBV,'V') the length of IWORK is determined as follows:
450*> If ( LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G') )
451*> then the length of IWORK is 2*N+M; otherwise the length of IWORK is 2*N.
452*>
453*> On exit,
454*> IWORK(1) = the numerical rank determined after the initial
455*> QR factorization with pivoting. See the descriptions
456*> of JOBA and JOBR.
457*> IWORK(2) = the number of the computed nonzero singular values
458*> IWORK(3) = if nonzero, a warning message:
459*> If IWORK(3) = 1 then some of the column norms of A
460*> were denormalized floats. The requested high accuracy
461*> is not warranted by the data.
462*> IWORK(4) = 1 or -1. If IWORK(4) = 1, then the procedure used A^* to
463*> do the job as specified by the JOB parameters.
464*> If the call to ZGEJSV is a workspace query (indicated by LWORK = -1 or
465*> LRWORK = -1), then on exit IWORK(1) contains the required length of
466*> IWORK for the job parameters used in the call.
467*> \endverbatim
468*>
469*> \param[out] INFO
470*> \verbatim
471*> INFO is INTEGER
472*> < 0: if INFO = -i, then the i-th argument had an illegal value.
473*> = 0: successful exit;
474*> > 0: ZGEJSV did not converge in the maximal allowed number
475*> of sweeps. The computed values may be inaccurate.
476*> \endverbatim
477*
478* Authors:
479* ========
480*
481*> \author Univ. of Tennessee
482*> \author Univ. of California Berkeley
483*> \author Univ. of Colorado Denver
484*> \author NAG Ltd.
485*
486*> \ingroup gejsv
487*
488*> \par Further Details:
489* =====================
490*>
491*> \verbatim
492*>
493*> ZGEJSV implements a preconditioned Jacobi SVD algorithm. It uses ZGEQP3,
494*> ZGEQRF, and ZGELQF as preprocessors and preconditioners. Optionally, an
495*> additional row pivoting can be used as a preprocessor, which in some
496*> cases results in much higher accuracy. An example is matrix A with the
497*> structure A = D1 * C * D2, where D1, D2 are arbitrarily ill-conditioned
498*> diagonal matrices and C is well-conditioned matrix. In that case, complete
499*> pivoting in the first QR factorizations provides accuracy dependent on the
500*> condition number of C, and independent of D1, D2. Such higher accuracy is
501*> not completely understood theoretically, but it works well in practice.
502*> Further, if A can be written as A = B*D, with well-conditioned B and some
503*> diagonal D, then the high accuracy is guaranteed, both theoretically and
504*> in software, independent of D. For more details see [1], [2].
505*> The computational range for the singular values can be the full range
506*> ( UNDERFLOW,OVERFLOW ), provided that the machine arithmetic and the BLAS
507*> & LAPACK routines called by ZGEJSV are implemented to work in that range.
508*> If that is not the case, then the restriction for safe computation with
509*> the singular values in the range of normalized IEEE numbers is that the
510*> spectral condition number kappa(A)=sigma_max(A)/sigma_min(A) does not
511*> overflow. This code (ZGEJSV) is best used in this restricted range,
512*> meaning that singular values of magnitude below ||A||_2 / DLAMCH('O') are
513*> returned as zeros. See JOBR for details on this.
514*> Further, this implementation is somewhat slower than the one described
515*> in [1,2] due to replacement of some non-LAPACK components, and because
516*> the choice of some tuning parameters in the iterative part (ZGESVJ) is
517*> left to the implementer on a particular machine.
518*> The rank revealing QR factorization (in this code: ZGEQP3) should be
519*> implemented as in [3]. We have a new version of ZGEQP3 under development
520*> that is more robust than the current one in LAPACK, with a cleaner cut in
521*> rank deficient cases. It will be available in the SIGMA library [4].
522*> If M is much larger than N, it is obvious that the initial QRF with
523*> column pivoting can be preprocessed by the QRF without pivoting. That
524*> well known trick is not used in ZGEJSV because in some cases heavy row
525*> weighting can be treated with complete pivoting. The overhead in cases
526*> M much larger than N is then only due to pivoting, but the benefits in
527*> terms of accuracy have prevailed. The implementer/user can incorporate
528*> this extra QRF step easily. The implementer can also improve data movement
529*> (matrix transpose, matrix copy, matrix transposed copy) - this
530*> implementation of ZGEJSV uses only the simplest, naive data movement.
531*> \endverbatim
532*
533*> \par Contributor:
534* ==================
535*>
536*> Zlatko Drmac, Department of Mathematics, Faculty of Science,
537*> University of Zagreb (Zagreb, Croatia); drmac@math.hr
538*
539*> \par References:
540* ================
541*>
542*> \verbatim
543*>
544*> [1] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm I.
545*> SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1322-1342.
546*> LAPACK Working note 169.
547*> [2] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm II.
548*> SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1343-1362.
549*> LAPACK Working note 170.
550*> [3] Z. Drmac and Z. Bujanovic: On the failure of rank-revealing QR
551*> factorization software - a case study.
552*> ACM Trans. Math. Softw. Vol. 35, No 2 (2008), pp. 1-28.
553*> LAPACK Working note 176.
554*> [4] Z. Drmac: SIGMA - mathematical software library for accurate SVD, PSV,
555*> QSVD, (H,K)-SVD computations.
556*> Department of Mathematics, University of Zagreb, 2008, 2016.
557*> \endverbatim
558*
559*> \par Bugs, examples and comments:
560* =================================
561*>
562*> Please report all bugs and send interesting examples and/or comments to
563*> drmac@math.hr. Thank you.
564*>
565* =====================================================================
566 SUBROUTINE zgejsv( JOBA, JOBU, JOBV, JOBR, JOBT, JOBP,
567 $ M, N, A, LDA, SVA, U, LDU, V, LDV,
568 $ CWORK, LWORK, RWORK, LRWORK, IWORK, INFO )
569*
570* -- LAPACK computational routine --
571* -- LAPACK is a software package provided by Univ. of Tennessee, --
572* -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
573*
574* .. Scalar Arguments ..
575 IMPLICIT NONE
576 INTEGER INFO, LDA, LDU, LDV, LWORK, LRWORK, M, N
577* ..
578* .. Array Arguments ..
579 COMPLEX*16 A( LDA, * ), U( LDU, * ), V( LDV, * ),
580 $ CWORK( LWORK )
581 DOUBLE PRECISION SVA( N ), RWORK( LRWORK )
582 INTEGER IWORK( * )
583 CHARACTER*1 JOBA, JOBP, JOBR, JOBT, JOBU, JOBV
584* ..
585*
586* ===========================================================================
587*
588* .. Local Parameters ..
589 DOUBLE PRECISION ZERO, ONE
590 PARAMETER ( ZERO = 0.0d0, one = 1.0d0 )
591 COMPLEX*16 CZERO, CONE
592 parameter( czero = ( 0.0d0, 0.0d0 ), cone = ( 1.0d0, 0.0d0 ) )
593* ..
594* .. Local Scalars ..
595 COMPLEX*16 CTEMP
596 DOUBLE PRECISION AAPP, AAQQ, AATMAX, AATMIN, BIG, BIG1,
597 $ cond_ok, condr1, condr2, entra, entrat, epsln,
598 $ maxprj, scalem, sconda, sfmin, small, temp1,
599 $ uscal1, uscal2, xsc
600 INTEGER IERR, N1, NR, NUMRANK, p, q, WARNING
601 LOGICAL ALMORT, DEFR, ERREST, GOSCAL, JRACC, KILL, LQUERY,
602 $ lsvec, l2aber, l2kill, l2pert, l2rank, l2tran, noscal,
603 $ rowpiv, rsvec, transp
604*
605 INTEGER OPTWRK, MINWRK, MINRWRK, MINIWRK
606 INTEGER LWCON, LWLQF, LWQP3, LWQRF, LWUNMLQ, LWUNMQR, LWUNMQRM,
607 $ lwsvdj, lwsvdjv, lrwqp3, lrwcon, lrwsvdj, iwoff
608 INTEGER LWRK_ZGELQF, LWRK_ZGEQP3, LWRK_ZGEQP3N, LWRK_ZGEQRF,
609 $ LWRK_ZGESVJ, LWRK_ZGESVJV, LWRK_ZGESVJU, LWRK_ZUNMLQ,
610 $ lwrk_zunmqr, lwrk_zunmqrm
611* ..
612* .. Local Arrays
613 COMPLEX*16 CDUMMY(1)
614 DOUBLE PRECISION RDUMMY(1)
615*
616* .. Intrinsic Functions ..
617 INTRINSIC abs, dcmplx, conjg, dlog, max, min, dble, nint, sqrt
618* ..
619* .. External Functions ..
620 DOUBLE PRECISION DLAMCH, DZNRM2
621 INTEGER IDAMAX, IZAMAX
622 LOGICAL LSAME
623 EXTERNAL idamax, izamax, lsame, dlamch, dznrm2
624* ..
625* .. External Subroutines ..
629 $ xerbla
630*
631 EXTERNAL zgesvj
632* ..
633*
634* Test the input arguments
635*
636 lsvec = lsame( jobu, 'U' ) .OR. lsame( jobu, 'F' )
637 jracc = lsame( jobv, 'J' )
638 rsvec = lsame( jobv, 'V' ) .OR. jracc
639 rowpiv = lsame( joba, 'F' ) .OR. lsame( joba, 'G' )
640 l2rank = lsame( joba, 'R' )
641 l2aber = lsame( joba, 'A' )
642 errest = lsame( joba, 'E' ) .OR. lsame( joba, 'G' )
643 l2tran = lsame( jobt, 'T' ) .AND. ( m .EQ. n )
644 l2kill = lsame( jobr, 'R' )
645 defr = lsame( jobr, 'N' )
646 l2pert = lsame( jobp, 'P' )
647*
648 lquery = ( lwork .EQ. -1 ) .OR. ( lrwork .EQ. -1 )
649*
650 IF ( .NOT.(rowpiv .OR. l2rank .OR. l2aber .OR.
651 $ errest .OR. lsame( joba, 'C' ) )) THEN
652 info = - 1
653 ELSE IF ( .NOT.( lsvec .OR. lsame( jobu, 'N' ) .OR.
654 $ ( lsame( jobu, 'W' ) .AND. rsvec .AND. l2tran ) ) ) THEN
655 info = - 2
656 ELSE IF ( .NOT.( rsvec .OR. lsame( jobv, 'N' ) .OR.
657 $ ( lsame( jobv, 'W' ) .AND. lsvec .AND. l2tran ) ) ) THEN
658 info = - 3
659 ELSE IF ( .NOT. ( l2kill .OR. defr ) ) THEN
660 info = - 4
661 ELSE IF ( .NOT. ( lsame(jobt,'T') .OR. lsame(jobt,'N') ) ) THEN
662 info = - 5
663 ELSE IF ( .NOT. ( l2pert .OR. lsame( jobp, 'N' ) ) ) THEN
664 info = - 6
665 ELSE IF ( m .LT. 0 ) THEN
666 info = - 7
667 ELSE IF ( ( n .LT. 0 ) .OR. ( n .GT. m ) ) THEN
668 info = - 8
669 ELSE IF ( lda .LT. m ) THEN
670 info = - 10
671 ELSE IF ( lsvec .AND. ( ldu .LT. m ) ) THEN
672 info = - 13
673 ELSE IF ( rsvec .AND. ( ldv .LT. n ) ) THEN
674 info = - 15
675 ELSE
676* #:)
677 info = 0
678 END IF
679*
680 IF ( info .EQ. 0 ) THEN
681* .. compute the minimal and the optimal workspace lengths
682* [[The expressions for computing the minimal and the optimal
683* values of LCWORK, LRWORK are written with a lot of redundancy and
684* can be simplified. However, this verbose form is useful for
685* maintenance and modifications of the code.]]
686*
687* .. minimal workspace length for ZGEQP3 of an M x N matrix,
688* ZGEQRF of an N x N matrix, ZGELQF of an N x N matrix,
689* ZUNMLQ for computing N x N matrix, ZUNMQR for computing N x N
690* matrix, ZUNMQR for computing M x N matrix, respectively.
691 lwqp3 = n+1
692 lwqrf = max( 1, n )
693 lwlqf = max( 1, n )
694 lwunmlq = max( 1, n )
695 lwunmqr = max( 1, n )
696 lwunmqrm = max( 1, m )
697* .. minimal workspace length for ZPOCON of an N x N matrix
698 lwcon = 2 * n
699* .. minimal workspace length for ZGESVJ of an N x N matrix,
700* without and with explicit accumulation of Jacobi rotations
701 lwsvdj = max( 2 * n, 1 )
702 lwsvdjv = max( 2 * n, 1 )
703* .. minimal REAL workspace length for ZGEQP3, ZPOCON, ZGESVJ
704 lrwqp3 = 2 * n
705 lrwcon = n
706 lrwsvdj = n
707 IF ( lquery ) THEN
708 CALL zgeqp3( m, n, a, lda, iwork, cdummy, cdummy, -1,
709 $ rdummy, ierr )
710 lwrk_zgeqp3 = int( cdummy(1) )
711 CALL zgeqrf( n, n, a, lda, cdummy, cdummy,-1, ierr )
712 lwrk_zgeqrf = int( cdummy(1) )
713 CALL zgelqf( n, n, a, lda, cdummy, cdummy,-1, ierr )
714 lwrk_zgelqf = int( cdummy(1) )
715 END IF
716 minwrk = 2
717 optwrk = 2
718 miniwrk = n
719 IF ( .NOT. (lsvec .OR. rsvec ) ) THEN
720* .. minimal and optimal sizes of the complex workspace if
721* only the singular values are requested
722 IF ( errest ) THEN
723 minwrk = max( n+lwqp3, n**2+lwcon, n+lwqrf, lwsvdj )
724 ELSE
725 minwrk = max( n+lwqp3, n+lwqrf, lwsvdj )
726 END IF
727 IF ( lquery ) THEN
728 CALL zgesvj( 'L', 'N', 'N', n, n, a, lda, sva, n, v,
729 $ ldv, cdummy, -1, rdummy, -1, ierr )
730 lwrk_zgesvj = int( cdummy(1) )
731 IF ( errest ) THEN
732 optwrk = max( n+lwrk_zgeqp3, n**2+lwcon,
733 $ n+lwrk_zgeqrf, lwrk_zgesvj )
734 ELSE
735 optwrk = max( n+lwrk_zgeqp3, n+lwrk_zgeqrf,
736 $ lwrk_zgesvj )
737 END IF
738 END IF
739 IF ( l2tran .OR. rowpiv ) THEN
740 IF ( errest ) THEN
741 minrwrk = max( 7, 2*m, lrwqp3, lrwcon, lrwsvdj )
742 ELSE
743 minrwrk = max( 7, 2*m, lrwqp3, lrwsvdj )
744 END IF
745 ELSE
746 IF ( errest ) THEN
747 minrwrk = max( 7, lrwqp3, lrwcon, lrwsvdj )
748 ELSE
749 minrwrk = max( 7, lrwqp3, lrwsvdj )
750 END IF
751 END IF
752 IF ( rowpiv .OR. l2tran ) miniwrk = miniwrk + m
753 ELSE IF ( rsvec .AND. (.NOT.lsvec) ) THEN
754* .. minimal and optimal sizes of the complex workspace if the
755* singular values and the right singular vectors are requested
756 IF ( errest ) THEN
757 minwrk = max( n+lwqp3, lwcon, lwsvdj, n+lwlqf,
758 $ 2*n+lwqrf, n+lwsvdj, n+lwunmlq )
759 ELSE
760 minwrk = max( n+lwqp3, lwsvdj, n+lwlqf, 2*n+lwqrf,
761 $ n+lwsvdj, n+lwunmlq )
762 END IF
763 IF ( lquery ) THEN
764 CALL zgesvj( 'L', 'U', 'N', n,n, u, ldu, sva, n, a,
765 $ lda, cdummy, -1, rdummy, -1, ierr )
766 lwrk_zgesvj = int( cdummy(1) )
767 CALL zunmlq( 'L', 'C', n, n, n, a, lda, cdummy,
768 $ v, ldv, cdummy, -1, ierr )
769 lwrk_zunmlq = int( cdummy(1) )
770 IF ( errest ) THEN
771 optwrk = max( n+lwrk_zgeqp3, lwcon, lwrk_zgesvj,
772 $ n+lwrk_zgelqf, 2*n+lwrk_zgeqrf,
773 $ n+lwrk_zgesvj, n+lwrk_zunmlq )
774 ELSE
775 optwrk = max( n+lwrk_zgeqp3, lwrk_zgesvj,n+lwrk_zgelqf,
776 $ 2*n+lwrk_zgeqrf, n+lwrk_zgesvj,
777 $ n+lwrk_zunmlq )
778 END IF
779 END IF
780 IF ( l2tran .OR. rowpiv ) THEN
781 IF ( errest ) THEN
782 minrwrk = max( 7, 2*m, lrwqp3, lrwsvdj, lrwcon )
783 ELSE
784 minrwrk = max( 7, 2*m, lrwqp3, lrwsvdj )
785 END IF
786 ELSE
787 IF ( errest ) THEN
788 minrwrk = max( 7, lrwqp3, lrwsvdj, lrwcon )
789 ELSE
790 minrwrk = max( 7, lrwqp3, lrwsvdj )
791 END IF
792 END IF
793 IF ( rowpiv .OR. l2tran ) miniwrk = miniwrk + m
794 ELSE IF ( lsvec .AND. (.NOT.rsvec) ) THEN
795* .. minimal and optimal sizes of the complex workspace if the
796* singular values and the left singular vectors are requested
797 IF ( errest ) THEN
798 minwrk = n + max( lwqp3,lwcon,n+lwqrf,lwsvdj,lwunmqrm )
799 ELSE
800 minwrk = n + max( lwqp3, n+lwqrf, lwsvdj, lwunmqrm )
801 END IF
802 IF ( lquery ) THEN
803 CALL zgesvj( 'L', 'U', 'N', n,n, u, ldu, sva, n, a,
804 $ lda, cdummy, -1, rdummy, -1, ierr )
805 lwrk_zgesvj = int( cdummy(1) )
806 CALL zunmqr( 'L', 'N', m, n, n, a, lda, cdummy, u,
807 $ ldu, cdummy, -1, ierr )
808 lwrk_zunmqrm = int( cdummy(1) )
809 IF ( errest ) THEN
810 optwrk = n + max( lwrk_zgeqp3, lwcon, n+lwrk_zgeqrf,
811 $ lwrk_zgesvj, lwrk_zunmqrm )
812 ELSE
813 optwrk = n + max( lwrk_zgeqp3, n+lwrk_zgeqrf,
814 $ lwrk_zgesvj, lwrk_zunmqrm )
815 END IF
816 END IF
817 IF ( l2tran .OR. rowpiv ) THEN
818 IF ( errest ) THEN
819 minrwrk = max( 7, 2*m, lrwqp3, lrwsvdj, lrwcon )
820 ELSE
821 minrwrk = max( 7, 2*m, lrwqp3, lrwsvdj )
822 END IF
823 ELSE
824 IF ( errest ) THEN
825 minrwrk = max( 7, lrwqp3, lrwsvdj, lrwcon )
826 ELSE
827 minrwrk = max( 7, lrwqp3, lrwsvdj )
828 END IF
829 END IF
830 IF ( rowpiv .OR. l2tran ) miniwrk = miniwrk + m
831 ELSE
832* .. minimal and optimal sizes of the complex workspace if the
833* full SVD is requested
834 IF ( .NOT. jracc ) THEN
835 IF ( errest ) THEN
836 minwrk = max( n+lwqp3, n+lwcon, 2*n+n**2+lwcon,
837 $ 2*n+lwqrf, 2*n+lwqp3,
838 $ 2*n+n**2+n+lwlqf, 2*n+n**2+n+n**2+lwcon,
839 $ 2*n+n**2+n+lwsvdj, 2*n+n**2+n+lwsvdjv,
840 $ 2*n+n**2+n+lwunmqr,2*n+n**2+n+lwunmlq,
841 $ n+n**2+lwsvdj, n+lwunmqrm )
842 ELSE
843 minwrk = max( n+lwqp3, 2*n+n**2+lwcon,
844 $ 2*n+lwqrf, 2*n+lwqp3,
845 $ 2*n+n**2+n+lwlqf, 2*n+n**2+n+n**2+lwcon,
846 $ 2*n+n**2+n+lwsvdj, 2*n+n**2+n+lwsvdjv,
847 $ 2*n+n**2+n+lwunmqr,2*n+n**2+n+lwunmlq,
848 $ n+n**2+lwsvdj, n+lwunmqrm )
849 END IF
850 miniwrk = miniwrk + n
851 IF ( rowpiv .OR. l2tran ) miniwrk = miniwrk + m
852 ELSE
853 IF ( errest ) THEN
854 minwrk = max( n+lwqp3, n+lwcon, 2*n+lwqrf,
855 $ 2*n+n**2+lwsvdjv, 2*n+n**2+n+lwunmqr,
856 $ n+lwunmqrm )
857 ELSE
858 minwrk = max( n+lwqp3, 2*n+lwqrf,
859 $ 2*n+n**2+lwsvdjv, 2*n+n**2+n+lwunmqr,
860 $ n+lwunmqrm )
861 END IF
862 IF ( rowpiv .OR. l2tran ) miniwrk = miniwrk + m
863 END IF
864 IF ( lquery ) THEN
865 CALL zunmqr( 'L', 'N', m, n, n, a, lda, cdummy, u,
866 $ ldu, cdummy, -1, ierr )
867 lwrk_zunmqrm = int( cdummy(1) )
868 CALL zunmqr( 'L', 'N', n, n, n, a, lda, cdummy, u,
869 $ ldu, cdummy, -1, ierr )
870 lwrk_zunmqr = int( cdummy(1) )
871 IF ( .NOT. jracc ) THEN
872 CALL zgeqp3( n,n, a, lda, iwork, cdummy,cdummy, -1,
873 $ rdummy, ierr )
874 lwrk_zgeqp3n = int( cdummy(1) )
875 CALL zgesvj( 'L', 'U', 'N', n, n, u, ldu, sva,
876 $ n, v, ldv, cdummy, -1, rdummy, -1, ierr )
877 lwrk_zgesvj = int( cdummy(1) )
878 CALL zgesvj( 'U', 'U', 'N', n, n, u, ldu, sva,
879 $ n, v, ldv, cdummy, -1, rdummy, -1, ierr )
880 lwrk_zgesvju = int( cdummy(1) )
881 CALL zgesvj( 'L', 'U', 'V', n, n, u, ldu, sva,
882 $ n, v, ldv, cdummy, -1, rdummy, -1, ierr )
883 lwrk_zgesvjv = int( cdummy(1) )
884 CALL zunmlq( 'L', 'C', n, n, n, a, lda, cdummy,
885 $ v, ldv, cdummy, -1, ierr )
886 lwrk_zunmlq = int( cdummy(1) )
887 IF ( errest ) THEN
888 optwrk = max( n+lwrk_zgeqp3, n+lwcon,
889 $ 2*n+n**2+lwcon, 2*n+lwrk_zgeqrf,
890 $ 2*n+lwrk_zgeqp3n,
891 $ 2*n+n**2+n+lwrk_zgelqf,
892 $ 2*n+n**2+n+n**2+lwcon,
893 $ 2*n+n**2+n+lwrk_zgesvj,
894 $ 2*n+n**2+n+lwrk_zgesvjv,
895 $ 2*n+n**2+n+lwrk_zunmqr,
896 $ 2*n+n**2+n+lwrk_zunmlq,
897 $ n+n**2+lwrk_zgesvju,
898 $ n+lwrk_zunmqrm )
899 ELSE
900 optwrk = max( n+lwrk_zgeqp3,
901 $ 2*n+n**2+lwcon, 2*n+lwrk_zgeqrf,
902 $ 2*n+lwrk_zgeqp3n,
903 $ 2*n+n**2+n+lwrk_zgelqf,
904 $ 2*n+n**2+n+n**2+lwcon,
905 $ 2*n+n**2+n+lwrk_zgesvj,
906 $ 2*n+n**2+n+lwrk_zgesvjv,
907 $ 2*n+n**2+n+lwrk_zunmqr,
908 $ 2*n+n**2+n+lwrk_zunmlq,
909 $ n+n**2+lwrk_zgesvju,
910 $ n+lwrk_zunmqrm )
911 END IF
912 ELSE
913 CALL zgesvj( 'L', 'U', 'V', n, n, u, ldu, sva,
914 $ n, v, ldv, cdummy, -1, rdummy, -1, ierr )
915 lwrk_zgesvjv = int( cdummy(1) )
916 CALL zunmqr( 'L', 'N', n, n, n, cdummy, n, cdummy,
917 $ v, ldv, cdummy, -1, ierr )
918 lwrk_zunmqr = int( cdummy(1) )
919 CALL zunmqr( 'L', 'N', m, n, n, a, lda, cdummy, u,
920 $ ldu, cdummy, -1, ierr )
921 lwrk_zunmqrm = int( cdummy(1) )
922 IF ( errest ) THEN
923 optwrk = max( n+lwrk_zgeqp3, n+lwcon,
924 $ 2*n+lwrk_zgeqrf, 2*n+n**2,
925 $ 2*n+n**2+lwrk_zgesvjv,
926 $ 2*n+n**2+n+lwrk_zunmqr,n+lwrk_zunmqrm )
927 ELSE
928 optwrk = max( n+lwrk_zgeqp3, 2*n+lwrk_zgeqrf,
929 $ 2*n+n**2, 2*n+n**2+lwrk_zgesvjv,
930 $ 2*n+n**2+n+lwrk_zunmqr,
931 $ n+lwrk_zunmqrm )
932 END IF
933 END IF
934 END IF
935 IF ( l2tran .OR. rowpiv ) THEN
936 minrwrk = max( 7, 2*m, lrwqp3, lrwsvdj, lrwcon )
937 ELSE
938 minrwrk = max( 7, lrwqp3, lrwsvdj, lrwcon )
939 END IF
940 END IF
941 minwrk = max( 2, minwrk )
942 optwrk = max( minwrk, optwrk )
943 IF ( lwork .LT. minwrk .AND. (.NOT.lquery) ) info = - 17
944 IF ( lrwork .LT. minrwrk .AND. (.NOT.lquery) ) info = - 19
945 END IF
946*
947 IF ( info .NE. 0 ) THEN
948* #:(
949 CALL xerbla( 'ZGEJSV', - info )
950 RETURN
951 ELSE IF ( lquery ) THEN
952 cwork(1) = optwrk
953 cwork(2) = minwrk
954 rwork(1) = minrwrk
955 iwork(1) = max( 4, miniwrk )
956 RETURN
957 END IF
958*
959* Quick return for void matrix (Y3K safe)
960* #:)
961 IF ( ( m .EQ. 0 ) .OR. ( n .EQ. 0 ) ) THEN
962 iwork(1:4) = 0
963 rwork(1:7) = 0
964 RETURN
965 ENDIF
966*
967* Determine whether the matrix U should be M x N or M x M
968*
969 IF ( lsvec ) THEN
970 n1 = n
971 IF ( lsame( jobu, 'F' ) ) n1 = m
972 END IF
973*
974* Set numerical parameters
975*
976*! NOTE: Make sure DLAMCH() does not fail on the target architecture.
977*
978 epsln = dlamch('Epsilon')
979 sfmin = dlamch('SafeMinimum')
980 small = sfmin / epsln
981 big = dlamch('O')
982* BIG = ONE / SFMIN
983*
984* Initialize SVA(1:N) = diag( ||A e_i||_2 )_1^N
985*
986*(!) If necessary, scale SVA() to protect the largest norm from
987* overflow. It is possible that this scaling pushes the smallest
988* column norm left from the underflow threshold (extreme case).
989*
990 scalem = one / sqrt(dble(m)*dble(n))
991 noscal = .true.
992 goscal = .true.
993 DO 1874 p = 1, n
994 aapp = zero
995 aaqq = one
996 CALL zlassq( m, a(1,p), 1, aapp, aaqq )
997 IF ( aapp .GT. big ) THEN
998 info = - 9
999 CALL xerbla( 'ZGEJSV', -info )
1000 RETURN
1001 END IF
1002 aaqq = sqrt(aaqq)
1003 IF ( ( aapp .LT. (big / aaqq) ) .AND. noscal ) THEN
1004 sva(p) = aapp * aaqq
1005 ELSE
1006 noscal = .false.
1007 sva(p) = aapp * ( aaqq * scalem )
1008 IF ( goscal ) THEN
1009 goscal = .false.
1010 CALL dscal( p-1, scalem, sva, 1 )
1011 END IF
1012 END IF
1013 1874 CONTINUE
1014*
1015 IF ( noscal ) scalem = one
1016*
1017 aapp = zero
1018 aaqq = big
1019 DO 4781 p = 1, n
1020 aapp = max( aapp, sva(p) )
1021 IF ( sva(p) .NE. zero ) aaqq = min( aaqq, sva(p) )
1022 4781 CONTINUE
1023*
1024* Quick return for zero M x N matrix
1025* #:)
1026 IF ( aapp .EQ. zero ) THEN
1027 IF ( lsvec ) CALL zlaset( 'G', m, n1, czero, cone, u, ldu )
1028 IF ( rsvec ) CALL zlaset( 'G', n, n, czero, cone, v, ldv )
1029 rwork(1) = one
1030 rwork(2) = one
1031 IF ( errest ) rwork(3) = one
1032 IF ( lsvec .AND. rsvec ) THEN
1033 rwork(4) = one
1034 rwork(5) = one
1035 END IF
1036 IF ( l2tran ) THEN
1037 rwork(6) = zero
1038 rwork(7) = zero
1039 END IF
1040 iwork(1) = 0
1041 iwork(2) = 0
1042 iwork(3) = 0
1043 iwork(4) = -1
1044 RETURN
1045 END IF
1046*
1047* Issue warning if denormalized column norms detected. Override the
1048* high relative accuracy request. Issue licence to kill nonzero columns
1049* (set them to zero) whose norm is less than sigma_max / BIG (roughly).
1050* #:(
1051 warning = 0
1052 IF ( aaqq .LE. sfmin ) THEN
1053 l2rank = .true.
1054 l2kill = .true.
1055 warning = 1
1056 END IF
1057*
1058* Quick return for one-column matrix
1059* #:)
1060 IF ( n .EQ. 1 ) THEN
1061*
1062 IF ( lsvec ) THEN
1063 CALL zlascl( 'G',0,0,sva(1),scalem, m,1,a(1,1),lda,ierr )
1064 CALL zlacpy( 'A', m, 1, a, lda, u, ldu )
1065* computing all M left singular vectors of the M x 1 matrix
1066 IF ( n1 .NE. n ) THEN
1067 CALL zgeqrf( m, n, u,ldu, cwork, cwork(n+1),lwork-n,ierr )
1068 CALL zungqr( m,n1,1, u,ldu,cwork,cwork(n+1),lwork-n,ierr )
1069 CALL zcopy( m, a(1,1), 1, u(1,1), 1 )
1070 END IF
1071 END IF
1072 IF ( rsvec ) THEN
1073 v(1,1) = cone
1074 END IF
1075 IF ( sva(1) .LT. (big*scalem) ) THEN
1076 sva(1) = sva(1) / scalem
1077 scalem = one
1078 END IF
1079 rwork(1) = one / scalem
1080 rwork(2) = one
1081 IF ( sva(1) .NE. zero ) THEN
1082 iwork(1) = 1
1083 IF ( ( sva(1) / scalem) .GE. sfmin ) THEN
1084 iwork(2) = 1
1085 ELSE
1086 iwork(2) = 0
1087 END IF
1088 ELSE
1089 iwork(1) = 0
1090 iwork(2) = 0
1091 END IF
1092 iwork(3) = 0
1093 iwork(4) = -1
1094 IF ( errest ) rwork(3) = one
1095 IF ( lsvec .AND. rsvec ) THEN
1096 rwork(4) = one
1097 rwork(5) = one
1098 END IF
1099 IF ( l2tran ) THEN
1100 rwork(6) = zero
1101 rwork(7) = zero
1102 END IF
1103 RETURN
1104*
1105 END IF
1106*
1107 transp = .false.
1108*
1109 aatmax = -one
1110 aatmin = big
1111 IF ( rowpiv .OR. l2tran ) THEN
1112*
1113* Compute the row norms, needed to determine row pivoting sequence
1114* (in the case of heavily row weighted A, row pivoting is strongly
1115* advised) and to collect information needed to compare the
1116* structures of A * A^* and A^* * A (in the case L2TRAN.EQ..TRUE.).
1117*
1118 IF ( l2tran ) THEN
1119 DO 1950 p = 1, m
1120 xsc = zero
1121 temp1 = one
1122 CALL zlassq( n, a(p,1), lda, xsc, temp1 )
1123* ZLASSQ gets both the ell_2 and the ell_infinity norm
1124* in one pass through the vector
1125 rwork(m+p) = xsc * scalem
1126 rwork(p) = xsc * (scalem*sqrt(temp1))
1127 aatmax = max( aatmax, rwork(p) )
1128 IF (rwork(p) .NE. zero)
1129 $ aatmin = min(aatmin,rwork(p))
1130 1950 CONTINUE
1131 ELSE
1132 DO 1904 p = 1, m
1133 rwork(m+p) = scalem*abs( a(p,izamax(n,a(p,1),lda)) )
1134 aatmax = max( aatmax, rwork(m+p) )
1135 aatmin = min( aatmin, rwork(m+p) )
1136 1904 CONTINUE
1137 END IF
1138*
1139 END IF
1140*
1141* For square matrix A try to determine whether A^* would be better
1142* input for the preconditioned Jacobi SVD, with faster convergence.
1143* The decision is based on an O(N) function of the vector of column
1144* and row norms of A, based on the Shannon entropy. This should give
1145* the right choice in most cases when the difference actually matters.
1146* It may fail and pick the slower converging side.
1147*
1148 entra = zero
1149 entrat = zero
1150 IF ( l2tran ) THEN
1151*
1152 xsc = zero
1153 temp1 = one
1154 CALL dlassq( n, sva, 1, xsc, temp1 )
1155 temp1 = one / temp1
1156*
1157 entra = zero
1158 DO 1113 p = 1, n
1159 big1 = ( ( sva(p) / xsc )**2 ) * temp1
1160 IF ( big1 .NE. zero ) entra = entra + big1 * dlog(big1)
1161 1113 CONTINUE
1162 entra = - entra / dlog(dble(n))
1163*
1164* Now, SVA().^2/Trace(A^* * A) is a point in the probability simplex.
1165* It is derived from the diagonal of A^* * A. Do the same with the
1166* diagonal of A * A^*, compute the entropy of the corresponding
1167* probability distribution. Note that A * A^* and A^* * A have the
1168* same trace.
1169*
1170 entrat = zero
1171 DO 1114 p = 1, m
1172 big1 = ( ( rwork(p) / xsc )**2 ) * temp1
1173 IF ( big1 .NE. zero ) entrat = entrat + big1 * dlog(big1)
1174 1114 CONTINUE
1175 entrat = - entrat / dlog(dble(m))
1176*
1177* Analyze the entropies and decide A or A^*. Smaller entropy
1178* usually means better input for the algorithm.
1179*
1180 transp = ( entrat .LT. entra )
1181*
1182* If A^* is better than A, take the adjoint of A. This is allowed
1183* only for square matrices, M=N.
1184 IF ( transp ) THEN
1185* In an optimal implementation, this trivial transpose
1186* should be replaced with faster transpose.
1187 DO 1115 p = 1, n - 1
1188 a(p,p) = conjg(a(p,p))
1189 DO 1116 q = p + 1, n
1190 ctemp = conjg(a(q,p))
1191 a(q,p) = conjg(a(p,q))
1192 a(p,q) = ctemp
1193 1116 CONTINUE
1194 1115 CONTINUE
1195 a(n,n) = conjg(a(n,n))
1196 DO 1117 p = 1, n
1197 rwork(m+p) = sva(p)
1198 sva(p) = rwork(p)
1199* previously computed row 2-norms are now column 2-norms
1200* of the transposed matrix
1201 1117 CONTINUE
1202 temp1 = aapp
1203 aapp = aatmax
1204 aatmax = temp1
1205 temp1 = aaqq
1206 aaqq = aatmin
1207 aatmin = temp1
1208 kill = lsvec
1209 lsvec = rsvec
1210 rsvec = kill
1211 IF ( lsvec ) n1 = n
1212*
1213 rowpiv = .true.
1214 END IF
1215*
1216 END IF
1217* END IF L2TRAN
1218*
1219* Scale the matrix so that its maximal singular value remains less
1220* than SQRT(BIG) -- the matrix is scaled so that its maximal column
1221* has Euclidean norm equal to SQRT(BIG/N). The only reason to keep
1222* SQRT(BIG) instead of BIG is the fact that ZGEJSV uses LAPACK and
1223* BLAS routines that, in some implementations, are not capable of
1224* working in the full interval [SFMIN,BIG] and that they may provoke
1225* overflows in the intermediate results. If the singular values spread
1226* from SFMIN to BIG, then ZGESVJ will compute them. So, in that case,
1227* one should use ZGESVJ instead of ZGEJSV.
1228* >> change in the April 2016 update: allow bigger range, i.e. the
1229* largest column is allowed up to BIG/N and ZGESVJ will do the rest.
1230 big1 = sqrt( big )
1231 temp1 = sqrt( big / dble(n) )
1232* TEMP1 = BIG/DBLE(N)
1233*
1234 CALL dlascl( 'G', 0, 0, aapp, temp1, n, 1, sva, n, ierr )
1235 IF ( aaqq .GT. (aapp * sfmin) ) THEN
1236 aaqq = ( aaqq / aapp ) * temp1
1237 ELSE
1238 aaqq = ( aaqq * temp1 ) / aapp
1239 END IF
1240 temp1 = temp1 * scalem
1241 CALL zlascl( 'G', 0, 0, aapp, temp1, m, n, a, lda, ierr )
1242*
1243* To undo scaling at the end of this procedure, multiply the
1244* computed singular values with USCAL2 / USCAL1.
1245*
1246 uscal1 = temp1
1247 uscal2 = aapp
1248*
1249 IF ( l2kill ) THEN
1250* L2KILL enforces computation of nonzero singular values in
1251* the restricted range of condition number of the initial A,
1252* sigma_max(A) / sigma_min(A) approx. SQRT(BIG)/SQRT(SFMIN).
1253 xsc = sqrt( sfmin )
1254 ELSE
1255 xsc = small
1256*
1257* Now, if the condition number of A is too big,
1258* sigma_max(A) / sigma_min(A) .GT. SQRT(BIG/N) * EPSLN / SFMIN,
1259* as a precaution measure, the full SVD is computed using ZGESVJ
1260* with accumulated Jacobi rotations. This provides numerically
1261* more robust computation, at the cost of slightly increased run
1262* time. Depending on the concrete implementation of BLAS and LAPACK
1263* (i.e. how they behave in presence of extreme ill-conditioning) the
1264* implementor may decide to remove this switch.
1265 IF ( ( aaqq.LT.sqrt(sfmin) ) .AND. lsvec .AND. rsvec ) THEN
1266 jracc = .true.
1267 END IF
1268*
1269 END IF
1270 IF ( aaqq .LT. xsc ) THEN
1271 DO 700 p = 1, n
1272 IF ( sva(p) .LT. xsc ) THEN
1273 CALL zlaset( 'A', m, 1, czero, czero, a(1,p), lda )
1274 sva(p) = zero
1275 END IF
1276 700 CONTINUE
1277 END IF
1278*
1279* Preconditioning using QR factorization with pivoting
1280*
1281 IF ( rowpiv ) THEN
1282* Optional row permutation (Bjoerck row pivoting):
1283* A result by Cox and Higham shows that the Bjoerck's
1284* row pivoting combined with standard column pivoting
1285* has similar effect as Powell-Reid complete pivoting.
1286* The ell-infinity norms of A are made nonincreasing.
1287 IF ( ( lsvec .AND. rsvec ) .AND. .NOT.( jracc ) ) THEN
1288 iwoff = 2*n
1289 ELSE
1290 iwoff = n
1291 END IF
1292 DO 1952 p = 1, m - 1
1293 q = idamax( m-p+1, rwork(m+p), 1 ) + p - 1
1294 iwork(iwoff+p) = q
1295 IF ( p .NE. q ) THEN
1296 temp1 = rwork(m+p)
1297 rwork(m+p) = rwork(m+q)
1298 rwork(m+q) = temp1
1299 END IF
1300 1952 CONTINUE
1301 CALL zlaswp( n, a, lda, 1, m-1, iwork(iwoff+1), 1 )
1302 END IF
1303*
1304* End of the preparation phase (scaling, optional sorting and
1305* transposing, optional flushing of small columns).
1306*
1307* Preconditioning
1308*
1309* If the full SVD is needed, the right singular vectors are computed
1310* from a matrix equation, and for that we need theoretical analysis
1311* of the Businger-Golub pivoting. So we use ZGEQP3 as the first RR QRF.
1312* In all other cases the first RR QRF can be chosen by other criteria
1313* (eg speed by replacing global with restricted window pivoting, such
1314* as in xGEQPX from TOMS # 782). Good results will be obtained using
1315* xGEQPX with properly (!) chosen numerical parameters.
1316* Any improvement of ZGEQP3 improves overall performance of ZGEJSV.
1317*
1318* A * P1 = Q1 * [ R1^* 0]^*:
1319 DO 1963 p = 1, n
1320* .. all columns are free columns
1321 iwork(p) = 0
1322 1963 CONTINUE
1323 CALL zgeqp3( m, n, a, lda, iwork, cwork, cwork(n+1), lwork-n,
1324 $ rwork, ierr )
1325*
1326* The upper triangular matrix R1 from the first QRF is inspected for
1327* rank deficiency and possibilities for deflation, or possible
1328* ill-conditioning. Depending on the user specified flag L2RANK,
1329* the procedure explores possibilities to reduce the numerical
1330* rank by inspecting the computed upper triangular factor. If
1331* L2RANK or L2ABER are up, then ZGEJSV will compute the SVD of
1332* A + dA, where ||dA|| <= f(M,N)*EPSLN.
1333*
1334 nr = 1
1335 IF ( l2aber ) THEN
1336* Standard absolute error bound suffices. All sigma_i with
1337* sigma_i < N*EPSLN*||A|| are flushed to zero. This is an
1338* aggressive enforcement of lower numerical rank by introducing a
1339* backward error of the order of N*EPSLN*||A||.
1340 temp1 = sqrt(dble(n))*epsln
1341 DO 3001 p = 2, n
1342 IF ( abs(a(p,p)) .GE. (temp1*abs(a(1,1))) ) THEN
1343 nr = nr + 1
1344 ELSE
1345 GO TO 3002
1346 END IF
1347 3001 CONTINUE
1348 3002 CONTINUE
1349 ELSE IF ( l2rank ) THEN
1350* .. similarly as above, only slightly more gentle (less aggressive).
1351* Sudden drop on the diagonal of R1 is used as the criterion for
1352* close-to-rank-deficient.
1353 temp1 = sqrt(sfmin)
1354 DO 3401 p = 2, n
1355 IF ( ( abs(a(p,p)) .LT. (epsln*abs(a(p-1,p-1))) ) .OR.
1356 $ ( abs(a(p,p)) .LT. small ) .OR.
1357 $ ( l2kill .AND. (abs(a(p,p)) .LT. temp1) ) ) GO TO 3402
1358 nr = nr + 1
1359 3401 CONTINUE
1360 3402 CONTINUE
1361*
1362 ELSE
1363* The goal is high relative accuracy. However, if the matrix
1364* has high scaled condition number the relative accuracy is in
1365* general not feasible. Later on, a condition number estimator
1366* will be deployed to estimate the scaled condition number.
1367* Here we just remove the underflowed part of the triangular
1368* factor. This prevents the situation in which the code is
1369* working hard to get the accuracy not warranted by the data.
1370 temp1 = sqrt(sfmin)
1371 DO 3301 p = 2, n
1372 IF ( ( abs(a(p,p)) .LT. small ) .OR.
1373 $ ( l2kill .AND. (abs(a(p,p)) .LT. temp1) ) ) GO TO 3302
1374 nr = nr + 1
1375 3301 CONTINUE
1376 3302 CONTINUE
1377*
1378 END IF
1379*
1380 almort = .false.
1381 IF ( nr .EQ. n ) THEN
1382 maxprj = one
1383 DO 3051 p = 2, n
1384 temp1 = abs(a(p,p)) / sva(iwork(p))
1385 maxprj = min( maxprj, temp1 )
1386 3051 CONTINUE
1387 IF ( maxprj**2 .GE. one - dble(n)*epsln ) almort = .true.
1388 END IF
1389*
1390*
1391 sconda = - one
1392 condr1 = - one
1393 condr2 = - one
1394*
1395 IF ( errest ) THEN
1396 IF ( n .EQ. nr ) THEN
1397 IF ( rsvec ) THEN
1398* .. V is available as workspace
1399 CALL zlacpy( 'U', n, n, a, lda, v, ldv )
1400 DO 3053 p = 1, n
1401 temp1 = sva(iwork(p))
1402 CALL zdscal( p, one/temp1, v(1,p), 1 )
1403 3053 CONTINUE
1404 IF ( lsvec )THEN
1405 CALL zpocon( 'U', n, v, ldv, one, temp1,
1406 $ cwork(n+1), rwork, ierr )
1407 ELSE
1408 CALL zpocon( 'U', n, v, ldv, one, temp1,
1409 $ cwork, rwork, ierr )
1410 END IF
1411*
1412 ELSE IF ( lsvec ) THEN
1413* .. U is available as workspace
1414 CALL zlacpy( 'U', n, n, a, lda, u, ldu )
1415 DO 3054 p = 1, n
1416 temp1 = sva(iwork(p))
1417 CALL zdscal( p, one/temp1, u(1,p), 1 )
1418 3054 CONTINUE
1419 CALL zpocon( 'U', n, u, ldu, one, temp1,
1420 $ cwork(n+1), rwork, ierr )
1421 ELSE
1422 CALL zlacpy( 'U', n, n, a, lda, cwork, n )
1423*[] CALL ZLACPY( 'U', N, N, A, LDA, CWORK(N+1), N )
1424* Change: here index shifted by N to the left, CWORK(1:N)
1425* not needed for SIGMA only computation
1426 DO 3052 p = 1, n
1427 temp1 = sva(iwork(p))
1428*[] CALL ZDSCAL( p, ONE/TEMP1, CWORK(N+(p-1)*N+1), 1 )
1429 CALL zdscal( p, one/temp1, cwork((p-1)*n+1), 1 )
1430 3052 CONTINUE
1431* .. the columns of R are scaled to have unit Euclidean lengths.
1432*[] CALL ZPOCON( 'U', N, CWORK(N+1), N, ONE, TEMP1,
1433*[] $ CWORK(N+N*N+1), RWORK, IERR )
1434 CALL zpocon( 'U', n, cwork, n, one, temp1,
1435 $ cwork(n*n+1), rwork, ierr )
1436*
1437 END IF
1438 IF ( temp1 .NE. zero ) THEN
1439 sconda = one / sqrt(temp1)
1440 ELSE
1441 sconda = - one
1442 END IF
1443* SCONDA is an estimate of SQRT(||(R^* * R)^(-1)||_1).
1444* N^(-1/4) * SCONDA <= ||R^(-1)||_2 <= N^(1/4) * SCONDA
1445 ELSE
1446 sconda = - one
1447 END IF
1448 END IF
1449*
1450 l2pert = l2pert .AND. ( abs( a(1,1)/a(nr,nr) ) .GT. sqrt(big1) )
1451* If there is no violent scaling, artificial perturbation is not needed.
1452*
1453* Phase 3:
1454*
1455 IF ( .NOT. ( rsvec .OR. lsvec ) ) THEN
1456*
1457* Singular Values only
1458*
1459* .. transpose A(1:NR,1:N)
1460 DO 1946 p = 1, min( n-1, nr )
1461 CALL zcopy( n-p, a(p,p+1), lda, a(p+1,p), 1 )
1462 CALL zlacgv( n-p+1, a(p,p), 1 )
1463 1946 CONTINUE
1464 IF ( nr .EQ. n ) a(n,n) = conjg(a(n,n))
1465*
1466* The following two DO-loops introduce small relative perturbation
1467* into the strict upper triangle of the lower triangular matrix.
1468* Small entries below the main diagonal are also changed.
1469* This modification is useful if the computing environment does not
1470* provide/allow FLUSH TO ZERO underflow, for it prevents many
1471* annoying denormalized numbers in case of strongly scaled matrices.
1472* The perturbation is structured so that it does not introduce any
1473* new perturbation of the singular values, and it does not destroy
1474* the job done by the preconditioner.
1475* The licence for this perturbation is in the variable L2PERT, which
1476* should be .FALSE. if FLUSH TO ZERO underflow is active.
1477*
1478 IF ( .NOT. almort ) THEN
1479*
1480 IF ( l2pert ) THEN
1481* XSC = SQRT(SMALL)
1482 xsc = epsln / dble(n)
1483 DO 4947 q = 1, nr
1484 ctemp = dcmplx(xsc*abs(a(q,q)),zero)
1485 DO 4949 p = 1, n
1486 IF ( ( (p.GT.q) .AND. (abs(a(p,q)).LE.temp1) )
1487 $ .OR. ( p .LT. q ) )
1488* $ A(p,q) = TEMP1 * ( A(p,q) / ABS(A(p,q)) )
1489 $ a(p,q) = ctemp
1490 4949 CONTINUE
1491 4947 CONTINUE
1492 ELSE
1493 CALL zlaset( 'U', nr-1,nr-1, czero,czero, a(1,2),lda )
1494 END IF
1495*
1496* .. second preconditioning using the QR factorization
1497*
1498 CALL zgeqrf( n,nr, a,lda, cwork, cwork(n+1),lwork-n, ierr )
1499*
1500* .. and transpose upper to lower triangular
1501 DO 1948 p = 1, nr - 1
1502 CALL zcopy( nr-p, a(p,p+1), lda, a(p+1,p), 1 )
1503 CALL zlacgv( nr-p+1, a(p,p), 1 )
1504 1948 CONTINUE
1505*
1506 END IF
1507*
1508* Row-cyclic Jacobi SVD algorithm with column pivoting
1509*
1510* .. again some perturbation (a "background noise") is added
1511* to drown denormals
1512 IF ( l2pert ) THEN
1513* XSC = SQRT(SMALL)
1514 xsc = epsln / dble(n)
1515 DO 1947 q = 1, nr
1516 ctemp = dcmplx(xsc*abs(a(q,q)),zero)
1517 DO 1949 p = 1, nr
1518 IF ( ( (p.GT.q) .AND. (abs(a(p,q)).LE.temp1) )
1519 $ .OR. ( p .LT. q ) )
1520* $ A(p,q) = TEMP1 * ( A(p,q) / ABS(A(p,q)) )
1521 $ a(p,q) = ctemp
1522 1949 CONTINUE
1523 1947 CONTINUE
1524 ELSE
1525 CALL zlaset( 'U', nr-1, nr-1, czero, czero, a(1,2), lda )
1526 END IF
1527*
1528* .. and one-sided Jacobi rotations are started on a lower
1529* triangular matrix (plus perturbation which is ignored in
1530* the part which destroys triangular form (confusing?!))
1531*
1532 CALL zgesvj( 'L', 'N', 'N', nr, nr, a, lda, sva,
1533 $ n, v, ldv, cwork, lwork, rwork, lrwork, info )
1534*
1535 scalem = rwork(1)
1536 numrank = nint(rwork(2))
1537*
1538*
1539 ELSE IF ( ( rsvec .AND. ( .NOT. lsvec ) .AND. ( .NOT. jracc ) )
1540 $ .OR.
1541 $ ( jracc .AND. ( .NOT. lsvec ) .AND. ( nr .NE. n ) ) ) THEN
1542*
1543* -> Singular Values and Right Singular Vectors <-
1544*
1545 IF ( almort ) THEN
1546*
1547* .. in this case NR equals N
1548 DO 1998 p = 1, nr
1549 CALL zcopy( n-p+1, a(p,p), lda, v(p,p), 1 )
1550 CALL zlacgv( n-p+1, v(p,p), 1 )
1551 1998 CONTINUE
1552 CALL zlaset( 'U', nr-1,nr-1, czero, czero, v(1,2), ldv )
1553*
1554 CALL zgesvj( 'L','U','N', n, nr, v, ldv, sva, nr, a, lda,
1555 $ cwork, lwork, rwork, lrwork, info )
1556 scalem = rwork(1)
1557 numrank = nint(rwork(2))
1558
1559 ELSE
1560*
1561* .. two more QR factorizations ( one QRF is not enough, two require
1562* accumulated product of Jacobi rotations, three are perfect )
1563*
1564 CALL zlaset( 'L', nr-1,nr-1, czero, czero, a(2,1), lda )
1565 CALL zgelqf( nr,n, a, lda, cwork, cwork(n+1), lwork-n, ierr)
1566 CALL zlacpy( 'L', nr, nr, a, lda, v, ldv )
1567 CALL zlaset( 'U', nr-1,nr-1, czero, czero, v(1,2), ldv )
1568 CALL zgeqrf( nr, nr, v, ldv, cwork(n+1), cwork(2*n+1),
1569 $ lwork-2*n, ierr )
1570 DO 8998 p = 1, nr
1571 CALL zcopy( nr-p+1, v(p,p), ldv, v(p,p), 1 )
1572 CALL zlacgv( nr-p+1, v(p,p), 1 )
1573 8998 CONTINUE
1574 CALL zlaset('U', nr-1, nr-1, czero, czero, v(1,2), ldv)
1575*
1576 CALL zgesvj( 'L', 'U','N', nr, nr, v,ldv, sva, nr, u,
1577 $ ldu, cwork(n+1), lwork-n, rwork, lrwork, info )
1578 scalem = rwork(1)
1579 numrank = nint(rwork(2))
1580 IF ( nr .LT. n ) THEN
1581 CALL zlaset( 'A',n-nr, nr, czero,czero, v(nr+1,1), ldv )
1582 CALL zlaset( 'A',nr, n-nr, czero,czero, v(1,nr+1), ldv )
1583 CALL zlaset( 'A',n-nr,n-nr,czero,cone, v(nr+1,nr+1),ldv )
1584 END IF
1585*
1586 CALL zunmlq( 'L', 'C', n, n, nr, a, lda, cwork,
1587 $ v, ldv, cwork(n+1), lwork-n, ierr )
1588*
1589 END IF
1590* .. permute the rows of V
1591* DO 8991 p = 1, N
1592* CALL ZCOPY( N, V(p,1), LDV, A(IWORK(p),1), LDA )
1593* 8991 CONTINUE
1594* CALL ZLACPY( 'All', N, N, A, LDA, V, LDV )
1595 CALL zlapmr( .false., n, n, v, ldv, iwork )
1596*
1597 IF ( transp ) THEN
1598 CALL zlacpy( 'A', n, n, v, ldv, u, ldu )
1599 END IF
1600*
1601 ELSE IF ( jracc .AND. (.NOT. lsvec) .AND. ( nr.EQ. n ) ) THEN
1602*
1603 CALL zlaset( 'L', n-1,n-1, czero, czero, a(2,1), lda )
1604*
1605 CALL zgesvj( 'U','N','V', n, n, a, lda, sva, n, v, ldv,
1606 $ cwork, lwork, rwork, lrwork, info )
1607 scalem = rwork(1)
1608 numrank = nint(rwork(2))
1609 CALL zlapmr( .false., n, n, v, ldv, iwork )
1610*
1611 ELSE IF ( lsvec .AND. ( .NOT. rsvec ) ) THEN
1612*
1613* .. Singular Values and Left Singular Vectors ..
1614*
1615* .. second preconditioning step to avoid need to accumulate
1616* Jacobi rotations in the Jacobi iterations.
1617 DO 1965 p = 1, nr
1618 CALL zcopy( n-p+1, a(p,p), lda, u(p,p), 1 )
1619 CALL zlacgv( n-p+1, u(p,p), 1 )
1620 1965 CONTINUE
1621 CALL zlaset( 'U', nr-1, nr-1, czero, czero, u(1,2), ldu )
1622*
1623 CALL zgeqrf( n, nr, u, ldu, cwork(n+1), cwork(2*n+1),
1624 $ lwork-2*n, ierr )
1625*
1626 DO 1967 p = 1, nr - 1
1627 CALL zcopy( nr-p, u(p,p+1), ldu, u(p+1,p), 1 )
1628 CALL zlacgv( n-p+1, u(p,p), 1 )
1629 1967 CONTINUE
1630 CALL zlaset( 'U', nr-1, nr-1, czero, czero, u(1,2), ldu )
1631*
1632 CALL zgesvj( 'L', 'U', 'N', nr,nr, u, ldu, sva, nr, a,
1633 $ lda, cwork(n+1), lwork-n, rwork, lrwork, info )
1634 scalem = rwork(1)
1635 numrank = nint(rwork(2))
1636*
1637 IF ( nr .LT. m ) THEN
1638 CALL zlaset( 'A', m-nr, nr,czero, czero, u(nr+1,1), ldu )
1639 IF ( nr .LT. n1 ) THEN
1640 CALL zlaset( 'A',nr, n1-nr, czero, czero, u(1,nr+1),ldu )
1641 CALL zlaset( 'A',m-nr,n1-nr,czero,cone,u(nr+1,nr+1),ldu )
1642 END IF
1643 END IF
1644*
1645 CALL zunmqr( 'L', 'N', m, n1, n, a, lda, cwork, u,
1646 $ ldu, cwork(n+1), lwork-n, ierr )
1647*
1648 IF ( rowpiv )
1649 $ CALL zlaswp( n1, u, ldu, 1, m-1, iwork(iwoff+1), -1 )
1650*
1651 DO 1974 p = 1, n1
1652 xsc = one / dznrm2( m, u(1,p), 1 )
1653 CALL zdscal( m, xsc, u(1,p), 1 )
1654 1974 CONTINUE
1655*
1656 IF ( transp ) THEN
1657 CALL zlacpy( 'A', n, n, u, ldu, v, ldv )
1658 END IF
1659*
1660 ELSE
1661*
1662* .. Full SVD ..
1663*
1664 IF ( .NOT. jracc ) THEN
1665*
1666 IF ( .NOT. almort ) THEN
1667*
1668* Second Preconditioning Step (QRF [with pivoting])
1669* Note that the composition of TRANSPOSE, QRF and TRANSPOSE is
1670* equivalent to an LQF CALL. Since in many libraries the QRF
1671* seems to be better optimized than the LQF, we do explicit
1672* transpose and use the QRF. This is subject to changes in an
1673* optimized implementation of ZGEJSV.
1674*
1675 DO 1968 p = 1, nr
1676 CALL zcopy( n-p+1, a(p,p), lda, v(p,p), 1 )
1677 CALL zlacgv( n-p+1, v(p,p), 1 )
1678 1968 CONTINUE
1679*
1680* .. the following two loops perturb small entries to avoid
1681* denormals in the second QR factorization, where they are
1682* as good as zeros. This is done to avoid painfully slow
1683* computation with denormals. The relative size of the perturbation
1684* is a parameter that can be changed by the implementer.
1685* This perturbation device will be obsolete on machines with
1686* properly implemented arithmetic.
1687* To switch it off, set L2PERT=.FALSE. To remove it from the
1688* code, remove the action under L2PERT=.TRUE., leave the ELSE part.
1689* The following two loops should be blocked and fused with the
1690* transposed copy above.
1691*
1692 IF ( l2pert ) THEN
1693 xsc = sqrt(small)
1694 DO 2969 q = 1, nr
1695 ctemp = dcmplx(xsc*abs( v(q,q) ),zero)
1696 DO 2968 p = 1, n
1697 IF ( ( p .GT. q ) .AND. ( abs(v(p,q)) .LE. temp1 )
1698 $ .OR. ( p .LT. q ) )
1699* $ V(p,q) = TEMP1 * ( V(p,q) / ABS(V(p,q)) )
1700 $ v(p,q) = ctemp
1701 IF ( p .LT. q ) v(p,q) = - v(p,q)
1702 2968 CONTINUE
1703 2969 CONTINUE
1704 ELSE
1705 CALL zlaset( 'U', nr-1, nr-1, czero, czero, v(1,2), ldv )
1706 END IF
1707*
1708* Estimate the row scaled condition number of R1
1709* (If R1 is rectangular, N > NR, then the condition number
1710* of the leading NR x NR submatrix is estimated.)
1711*
1712 CALL zlacpy( 'L', nr, nr, v, ldv, cwork(2*n+1), nr )
1713 DO 3950 p = 1, nr
1714 temp1 = dznrm2(nr-p+1,cwork(2*n+(p-1)*nr+p),1)
1715 CALL zdscal(nr-p+1,one/temp1,cwork(2*n+(p-1)*nr+p),1)
1716 3950 CONTINUE
1717 CALL zpocon('L',nr,cwork(2*n+1),nr,one,temp1,
1718 $ cwork(2*n+nr*nr+1),rwork,ierr)
1719 condr1 = one / sqrt(temp1)
1720* .. here need a second opinion on the condition number
1721* .. then assume worst case scenario
1722* R1 is OK for inverse <=> CONDR1 .LT. DBLE(N)
1723* more conservative <=> CONDR1 .LT. SQRT(DBLE(N))
1724*
1725 cond_ok = sqrt(sqrt(dble(nr)))
1726*[TP] COND_OK is a tuning parameter.
1727*
1728 IF ( condr1 .LT. cond_ok ) THEN
1729* .. the second QRF without pivoting. Note: in an optimized
1730* implementation, this QRF should be implemented as the QRF
1731* of a lower triangular matrix.
1732* R1^* = Q2 * R2
1733 CALL zgeqrf( n, nr, v, ldv, cwork(n+1), cwork(2*n+1),
1734 $ lwork-2*n, ierr )
1735*
1736 IF ( l2pert ) THEN
1737 xsc = sqrt(small)/epsln
1738 DO 3959 p = 2, nr
1739 DO 3958 q = 1, p - 1
1740 ctemp=dcmplx(xsc*min(abs(v(p,p)),abs(v(q,q))),
1741 $ zero)
1742 IF ( abs(v(q,p)) .LE. temp1 )
1743* $ V(q,p) = TEMP1 * ( V(q,p) / ABS(V(q,p)) )
1744 $ v(q,p) = ctemp
1745 3958 CONTINUE
1746 3959 CONTINUE
1747 END IF
1748*
1749 IF ( nr .NE. n )
1750 $ CALL zlacpy( 'A', n, nr, v, ldv, cwork(2*n+1), n )
1751* .. save ...
1752*
1753* .. this transposed copy should be better than naive
1754 DO 1969 p = 1, nr - 1
1755 CALL zcopy( nr-p, v(p,p+1), ldv, v(p+1,p), 1 )
1756 CALL zlacgv(nr-p+1, v(p,p), 1 )
1757 1969 CONTINUE
1758 v(nr,nr)=conjg(v(nr,nr))
1759*
1760 condr2 = condr1
1761*
1762 ELSE
1763*
1764* .. ill-conditioned case: second QRF with pivoting
1765* Note that windowed pivoting would be equally good
1766* numerically, and more run-time efficient. So, in
1767* an optimal implementation, the next call to ZGEQP3
1768* should be replaced with eg. CALL ZGEQPX (ACM TOMS #782)
1769* with properly (carefully) chosen parameters.
1770*
1771* R1^* * P2 = Q2 * R2
1772 DO 3003 p = 1, nr
1773 iwork(n+p) = 0
1774 3003 CONTINUE
1775 CALL zgeqp3( n, nr, v, ldv, iwork(n+1), cwork(n+1),
1776 $ cwork(2*n+1), lwork-2*n, rwork, ierr )
1777** CALL ZGEQRF( N, NR, V, LDV, CWORK(N+1), CWORK(2*N+1),
1778** $ LWORK-2*N, IERR )
1779 IF ( l2pert ) THEN
1780 xsc = sqrt(small)
1781 DO 3969 p = 2, nr
1782 DO 3968 q = 1, p - 1
1783 ctemp=dcmplx(xsc*min(abs(v(p,p)),abs(v(q,q))),
1784 $ zero)
1785 IF ( abs(v(q,p)) .LE. temp1 )
1786* $ V(q,p) = TEMP1 * ( V(q,p) / ABS(V(q,p)) )
1787 $ v(q,p) = ctemp
1788 3968 CONTINUE
1789 3969 CONTINUE
1790 END IF
1791*
1792 CALL zlacpy( 'A', n, nr, v, ldv, cwork(2*n+1), n )
1793*
1794 IF ( l2pert ) THEN
1795 xsc = sqrt(small)
1796 DO 8970 p = 2, nr
1797 DO 8971 q = 1, p - 1
1798 ctemp=dcmplx(xsc*min(abs(v(p,p)),abs(v(q,q))),
1799 $ zero)
1800* V(p,q) = - TEMP1*( V(q,p) / ABS(V(q,p)) )
1801 v(p,q) = - ctemp
1802 8971 CONTINUE
1803 8970 CONTINUE
1804 ELSE
1805 CALL zlaset( 'L',nr-1,nr-1,czero,czero,v(2,1),ldv )
1806 END IF
1807* Now, compute R2 = L3 * Q3, the LQ factorization.
1808 CALL zgelqf( nr, nr, v, ldv, cwork(2*n+n*nr+1),
1809 $ cwork(2*n+n*nr+nr+1), lwork-2*n-n*nr-nr, ierr )
1810* .. and estimate the condition number
1811 CALL zlacpy( 'L',nr,nr,v,ldv,cwork(2*n+n*nr+nr+1),nr )
1812 DO 4950 p = 1, nr
1813 temp1 = dznrm2( p, cwork(2*n+n*nr+nr+p), nr )
1814 CALL zdscal( p, one/temp1, cwork(2*n+n*nr+nr+p), nr )
1815 4950 CONTINUE
1816 CALL zpocon( 'L',nr,cwork(2*n+n*nr+nr+1),nr,one,temp1,
1817 $ cwork(2*n+n*nr+nr+nr*nr+1),rwork,ierr )
1818 condr2 = one / sqrt(temp1)
1819*
1820*
1821 IF ( condr2 .GE. cond_ok ) THEN
1822* .. save the Householder vectors used for Q3
1823* (this overwrites the copy of R2, as it will not be
1824* needed in this branch, but it does not overwrite the
1825* Huseholder vectors of Q2.).
1826 CALL zlacpy( 'U', nr, nr, v, ldv, cwork(2*n+1), n )
1827* .. and the rest of the information on Q3 is in
1828* WORK(2*N+N*NR+1:2*N+N*NR+N)
1829 END IF
1830*
1831 END IF
1832*
1833 IF ( l2pert ) THEN
1834 xsc = sqrt(small)
1835 DO 4968 q = 2, nr
1836 ctemp = xsc * v(q,q)
1837 DO 4969 p = 1, q - 1
1838* V(p,q) = - TEMP1*( V(p,q) / ABS(V(p,q)) )
1839 v(p,q) = - ctemp
1840 4969 CONTINUE
1841 4968 CONTINUE
1842 ELSE
1843 CALL zlaset( 'U', nr-1,nr-1, czero,czero, v(1,2), ldv )
1844 END IF
1845*
1846* Second preconditioning finished; continue with Jacobi SVD
1847* The input matrix is lower triangular.
1848*
1849* Recover the right singular vectors as solution of a well
1850* conditioned triangular matrix equation.
1851*
1852 IF ( condr1 .LT. cond_ok ) THEN
1853*
1854 CALL zgesvj( 'L','U','N',nr,nr,v,ldv,sva,nr,u, ldu,
1855 $ cwork(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,rwork,
1856 $ lrwork, info )
1857 scalem = rwork(1)
1858 numrank = nint(rwork(2))
1859 DO 3970 p = 1, nr
1860 CALL zcopy( nr, v(1,p), 1, u(1,p), 1 )
1861 CALL zdscal( nr, sva(p), v(1,p), 1 )
1862 3970 CONTINUE
1863
1864* .. pick the right matrix equation and solve it
1865*
1866 IF ( nr .EQ. n ) THEN
1867* :)) .. best case, R1 is inverted. The solution of this matrix
1868* equation is Q2*V2 = the product of the Jacobi rotations
1869* used in ZGESVJ, premultiplied with the orthogonal matrix
1870* from the second QR factorization.
1871 CALL ztrsm('L','U','N','N', nr,nr,cone, a,lda, v,ldv)
1872 ELSE
1873* .. R1 is well conditioned, but non-square. Adjoint of R2
1874* is inverted to get the product of the Jacobi rotations
1875* used in ZGESVJ. The Q-factor from the second QR
1876* factorization is then built in explicitly.
1877 CALL ztrsm('L','U','C','N',nr,nr,cone,cwork(2*n+1),
1878 $ n,v,ldv)
1879 IF ( nr .LT. n ) THEN
1880 CALL zlaset('A',n-nr,nr,czero,czero,v(nr+1,1),ldv)
1881 CALL zlaset('A',nr,n-nr,czero,czero,v(1,nr+1),ldv)
1882 CALL zlaset('A',n-nr,n-nr,czero,cone,v(nr+1,nr+1),ldv)
1883 END IF
1884 CALL zunmqr('L','N',n,n,nr,cwork(2*n+1),n,cwork(n+1),
1885 $ v,ldv,cwork(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr)
1886 END IF
1887*
1888 ELSE IF ( condr2 .LT. cond_ok ) THEN
1889*
1890* The matrix R2 is inverted. The solution of the matrix equation
1891* is Q3^* * V3 = the product of the Jacobi rotations (applied to
1892* the lower triangular L3 from the LQ factorization of
1893* R2=L3*Q3), pre-multiplied with the transposed Q3.
1894 CALL zgesvj( 'L', 'U', 'N', nr, nr, v, ldv, sva, nr, u,
1895 $ ldu, cwork(2*n+n*nr+nr+1), lwork-2*n-n*nr-nr,
1896 $ rwork, lrwork, info )
1897 scalem = rwork(1)
1898 numrank = nint(rwork(2))
1899 DO 3870 p = 1, nr
1900 CALL zcopy( nr, v(1,p), 1, u(1,p), 1 )
1901 CALL zdscal( nr, sva(p), u(1,p), 1 )
1902 3870 CONTINUE
1903 CALL ztrsm('L','U','N','N',nr,nr,cone,cwork(2*n+1),n,
1904 $ u,ldu)
1905* .. apply the permutation from the second QR factorization
1906 DO 873 q = 1, nr
1907 DO 872 p = 1, nr
1908 cwork(2*n+n*nr+nr+iwork(n+p)) = u(p,q)
1909 872 CONTINUE
1910 DO 874 p = 1, nr
1911 u(p,q) = cwork(2*n+n*nr+nr+p)
1912 874 CONTINUE
1913 873 CONTINUE
1914 IF ( nr .LT. n ) THEN
1915 CALL zlaset( 'A',n-nr,nr,czero,czero,v(nr+1,1),ldv )
1916 CALL zlaset( 'A',nr,n-nr,czero,czero,v(1,nr+1),ldv )
1917 CALL zlaset('A',n-nr,n-nr,czero,cone,v(nr+1,nr+1),ldv)
1918 END IF
1919 CALL zunmqr( 'L','N',n,n,nr,cwork(2*n+1),n,cwork(n+1),
1920 $ v,ldv,cwork(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr )
1921 ELSE
1922* Last line of defense.
1923* #:( This is a rather pathological case: no scaled condition
1924* improvement after two pivoted QR factorizations. Other
1925* possibility is that the rank revealing QR factorization
1926* or the condition estimator has failed, or the COND_OK
1927* is set very close to ONE (which is unnecessary). Normally,
1928* this branch should never be executed, but in rare cases of
1929* failure of the RRQR or condition estimator, the last line of
1930* defense ensures that ZGEJSV completes the task.
1931* Compute the full SVD of L3 using ZGESVJ with explicit
1932* accumulation of Jacobi rotations.
1933 CALL zgesvj( 'L', 'U', 'V', nr, nr, v, ldv, sva, nr, u,
1934 $ ldu, cwork(2*n+n*nr+nr+1), lwork-2*n-n*nr-nr,
1935 $ rwork, lrwork, info )
1936 scalem = rwork(1)
1937 numrank = nint(rwork(2))
1938 IF ( nr .LT. n ) THEN
1939 CALL zlaset( 'A',n-nr,nr,czero,czero,v(nr+1,1),ldv )
1940 CALL zlaset( 'A',nr,n-nr,czero,czero,v(1,nr+1),ldv )
1941 CALL zlaset('A',n-nr,n-nr,czero,cone,v(nr+1,nr+1),ldv)
1942 END IF
1943 CALL zunmqr( 'L','N',n,n,nr,cwork(2*n+1),n,cwork(n+1),
1944 $ v,ldv,cwork(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr )
1945*
1946 CALL zunmlq( 'L', 'C', nr, nr, nr, cwork(2*n+1), n,
1947 $ cwork(2*n+n*nr+1), u, ldu, cwork(2*n+n*nr+nr+1),
1948 $ lwork-2*n-n*nr-nr, ierr )
1949 DO 773 q = 1, nr
1950 DO 772 p = 1, nr
1951 cwork(2*n+n*nr+nr+iwork(n+p)) = u(p,q)
1952 772 CONTINUE
1953 DO 774 p = 1, nr
1954 u(p,q) = cwork(2*n+n*nr+nr+p)
1955 774 CONTINUE
1956 773 CONTINUE
1957*
1958 END IF
1959*
1960* Permute the rows of V using the (column) permutation from the
1961* first QRF. Also, scale the columns to make them unit in
1962* Euclidean norm. This applies to all cases.
1963*
1964 temp1 = sqrt(dble(n)) * epsln
1965 DO 1972 q = 1, n
1966 DO 972 p = 1, n
1967 cwork(2*n+n*nr+nr+iwork(p)) = v(p,q)
1968 972 CONTINUE
1969 DO 973 p = 1, n
1970 v(p,q) = cwork(2*n+n*nr+nr+p)
1971 973 CONTINUE
1972 xsc = one / dznrm2( n, v(1,q), 1 )
1973 IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1974 $ CALL zdscal( n, xsc, v(1,q), 1 )
1975 1972 CONTINUE
1976* At this moment, V contains the right singular vectors of A.
1977* Next, assemble the left singular vector matrix U (M x N).
1978 IF ( nr .LT. m ) THEN
1979 CALL zlaset('A', m-nr, nr, czero, czero, u(nr+1,1), ldu)
1980 IF ( nr .LT. n1 ) THEN
1981 CALL zlaset('A',nr,n1-nr,czero,czero,u(1,nr+1),ldu)
1982 CALL zlaset('A',m-nr,n1-nr,czero,cone,
1983 $ u(nr+1,nr+1),ldu)
1984 END IF
1985 END IF
1986*
1987* The Q matrix from the first QRF is built into the left singular
1988* matrix U. This applies to all cases.
1989*
1990 CALL zunmqr( 'L', 'N', m, n1, n, a, lda, cwork, u,
1991 $ ldu, cwork(n+1), lwork-n, ierr )
1992
1993* The columns of U are normalized. The cost is O(M*N) flops.
1994 temp1 = sqrt(dble(m)) * epsln
1995 DO 1973 p = 1, nr
1996 xsc = one / dznrm2( m, u(1,p), 1 )
1997 IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1998 $ CALL zdscal( m, xsc, u(1,p), 1 )
1999 1973 CONTINUE
2000*
2001* If the initial QRF is computed with row pivoting, the left
2002* singular vectors must be adjusted.
2003*
2004 IF ( rowpiv )
2005 $ CALL zlaswp( n1, u, ldu, 1, m-1, iwork(iwoff+1), -1 )
2006*
2007 ELSE
2008*
2009* .. the initial matrix A has almost orthogonal columns and
2010* the second QRF is not needed
2011*
2012 CALL zlacpy( 'U', n, n, a, lda, cwork(n+1), n )
2013 IF ( l2pert ) THEN
2014 xsc = sqrt(small)
2015 DO 5970 p = 2, n
2016 ctemp = xsc * cwork( n + (p-1)*n + p )
2017 DO 5971 q = 1, p - 1
2018* CWORK(N+(q-1)*N+p)=-TEMP1 * ( CWORK(N+(p-1)*N+q) /
2019* $ ABS(CWORK(N+(p-1)*N+q)) )
2020 cwork(n+(q-1)*n+p)=-ctemp
2021 5971 CONTINUE
2022 5970 CONTINUE
2023 ELSE
2024 CALL zlaset( 'L',n-1,n-1,czero,czero,cwork(n+2),n )
2025 END IF
2026*
2027 CALL zgesvj( 'U', 'U', 'N', n, n, cwork(n+1), n, sva,
2028 $ n, u, ldu, cwork(n+n*n+1), lwork-n-n*n, rwork, lrwork,
2029 $ info )
2030*
2031 scalem = rwork(1)
2032 numrank = nint(rwork(2))
2033 DO 6970 p = 1, n
2034 CALL zcopy( n, cwork(n+(p-1)*n+1), 1, u(1,p), 1 )
2035 CALL zdscal( n, sva(p), cwork(n+(p-1)*n+1), 1 )
2036 6970 CONTINUE
2037*
2038 CALL ztrsm( 'L', 'U', 'N', 'N', n, n,
2039 $ cone, a, lda, cwork(n+1), n )
2040 DO 6972 p = 1, n
2041 CALL zcopy( n, cwork(n+p), n, v(iwork(p),1), ldv )
2042 6972 CONTINUE
2043 temp1 = sqrt(dble(n))*epsln
2044 DO 6971 p = 1, n
2045 xsc = one / dznrm2( n, v(1,p), 1 )
2046 IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
2047 $ CALL zdscal( n, xsc, v(1,p), 1 )
2048 6971 CONTINUE
2049*
2050* Assemble the left singular vector matrix U (M x N).
2051*
2052 IF ( n .LT. m ) THEN
2053 CALL zlaset( 'A', m-n, n, czero, czero, u(n+1,1), ldu )
2054 IF ( n .LT. n1 ) THEN
2055 CALL zlaset('A',n, n1-n, czero, czero, u(1,n+1),ldu)
2056 CALL zlaset( 'A',m-n,n1-n, czero, cone,u(n+1,n+1),ldu)
2057 END IF
2058 END IF
2059 CALL zunmqr( 'L', 'N', m, n1, n, a, lda, cwork, u,
2060 $ ldu, cwork(n+1), lwork-n, ierr )
2061 temp1 = sqrt(dble(m))*epsln
2062 DO 6973 p = 1, n1
2063 xsc = one / dznrm2( m, u(1,p), 1 )
2064 IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
2065 $ CALL zdscal( m, xsc, u(1,p), 1 )
2066 6973 CONTINUE
2067*
2068 IF ( rowpiv )
2069 $ CALL zlaswp( n1, u, ldu, 1, m-1, iwork(iwoff+1), -1 )
2070*
2071 END IF
2072*
2073* end of the >> almost orthogonal case << in the full SVD
2074*
2075 ELSE
2076*
2077* This branch deploys a preconditioned Jacobi SVD with explicitly
2078* accumulated rotations. It is included as optional, mainly for
2079* experimental purposes. It does perform well, and can also be used.
2080* In this implementation, this branch will be automatically activated
2081* if the condition number sigma_max(A) / sigma_min(A) is predicted
2082* to be greater than the overflow threshold. This is because the
2083* a posteriori computation of the singular vectors assumes robust
2084* implementation of BLAS and some LAPACK procedures, capable of working
2085* in presence of extreme values, e.g. when the singular values spread from
2086* the underflow to the overflow threshold.
2087*
2088 DO 7968 p = 1, nr
2089 CALL zcopy( n-p+1, a(p,p), lda, v(p,p), 1 )
2090 CALL zlacgv( n-p+1, v(p,p), 1 )
2091 7968 CONTINUE
2092*
2093 IF ( l2pert ) THEN
2094 xsc = sqrt(small/epsln)
2095 DO 5969 q = 1, nr
2096 ctemp = dcmplx(xsc*abs( v(q,q) ),zero)
2097 DO 5968 p = 1, n
2098 IF ( ( p .GT. q ) .AND. ( abs(v(p,q)) .LE. temp1 )
2099 $ .OR. ( p .LT. q ) )
2100* $ V(p,q) = TEMP1 * ( V(p,q) / ABS(V(p,q)) )
2101 $ v(p,q) = ctemp
2102 IF ( p .LT. q ) v(p,q) = - v(p,q)
2103 5968 CONTINUE
2104 5969 CONTINUE
2105 ELSE
2106 CALL zlaset( 'U', nr-1, nr-1, czero, czero, v(1,2), ldv )
2107 END IF
2108
2109 CALL zgeqrf( n, nr, v, ldv, cwork(n+1), cwork(2*n+1),
2110 $ lwork-2*n, ierr )
2111 CALL zlacpy( 'L', n, nr, v, ldv, cwork(2*n+1), n )
2112*
2113 DO 7969 p = 1, nr
2114 CALL zcopy( nr-p+1, v(p,p), ldv, u(p,p), 1 )
2115 CALL zlacgv( nr-p+1, u(p,p), 1 )
2116 7969 CONTINUE
2117
2118 IF ( l2pert ) THEN
2119 xsc = sqrt(small/epsln)
2120 DO 9970 q = 2, nr
2121 DO 9971 p = 1, q - 1
2122 ctemp = dcmplx(xsc * min(abs(u(p,p)),abs(u(q,q))),
2123 $ zero)
2124* U(p,q) = - TEMP1 * ( U(q,p) / ABS(U(q,p)) )
2125 u(p,q) = - ctemp
2126 9971 CONTINUE
2127 9970 CONTINUE
2128 ELSE
2129 CALL zlaset('U', nr-1, nr-1, czero, czero, u(1,2), ldu )
2130 END IF
2131
2132 CALL zgesvj( 'L', 'U', 'V', nr, nr, u, ldu, sva,
2133 $ n, v, ldv, cwork(2*n+n*nr+1), lwork-2*n-n*nr,
2134 $ rwork, lrwork, info )
2135 scalem = rwork(1)
2136 numrank = nint(rwork(2))
2137
2138 IF ( nr .LT. n ) THEN
2139 CALL zlaset( 'A',n-nr,nr,czero,czero,v(nr+1,1),ldv )
2140 CALL zlaset( 'A',nr,n-nr,czero,czero,v(1,nr+1),ldv )
2141 CALL zlaset( 'A',n-nr,n-nr,czero,cone,v(nr+1,nr+1),ldv )
2142 END IF
2143
2144 CALL zunmqr( 'L','N',n,n,nr,cwork(2*n+1),n,cwork(n+1),
2145 $ v,ldv,cwork(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr )
2146*
2147* Permute the rows of V using the (column) permutation from the
2148* first QRF. Also, scale the columns to make them unit in
2149* Euclidean norm. This applies to all cases.
2150*
2151 temp1 = sqrt(dble(n)) * epsln
2152 DO 7972 q = 1, n
2153 DO 8972 p = 1, n
2154 cwork(2*n+n*nr+nr+iwork(p)) = v(p,q)
2155 8972 CONTINUE
2156 DO 8973 p = 1, n
2157 v(p,q) = cwork(2*n+n*nr+nr+p)
2158 8973 CONTINUE
2159 xsc = one / dznrm2( n, v(1,q), 1 )
2160 IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
2161 $ CALL zdscal( n, xsc, v(1,q), 1 )
2162 7972 CONTINUE
2163*
2164* At this moment, V contains the right singular vectors of A.
2165* Next, assemble the left singular vector matrix U (M x N).
2166*
2167 IF ( nr .LT. m ) THEN
2168 CALL zlaset( 'A', m-nr, nr, czero, czero, u(nr+1,1), ldu )
2169 IF ( nr .LT. n1 ) THEN
2170 CALL zlaset('A',nr, n1-nr, czero, czero, u(1,nr+1),ldu)
2171 CALL zlaset('A',m-nr,n1-nr, czero, cone,u(nr+1,nr+1),ldu)
2172 END IF
2173 END IF
2174*
2175 CALL zunmqr( 'L', 'N', m, n1, n, a, lda, cwork, u,
2176 $ ldu, cwork(n+1), lwork-n, ierr )
2177*
2178 IF ( rowpiv )
2179 $ CALL zlaswp( n1, u, ldu, 1, m-1, iwork(iwoff+1), -1 )
2180*
2181*
2182 END IF
2183 IF ( transp ) THEN
2184* .. swap U and V because the procedure worked on A^*
2185 DO 6974 p = 1, n
2186 CALL zswap( n, u(1,p), 1, v(1,p), 1 )
2187 6974 CONTINUE
2188 END IF
2189*
2190 END IF
2191* end of the full SVD
2192*
2193* Undo scaling, if necessary (and possible)
2194*
2195 IF ( uscal2 .LE. (big/sva(1))*uscal1 ) THEN
2196 CALL dlascl( 'G', 0, 0, uscal1, uscal2, nr, 1, sva, n, ierr )
2197 uscal1 = one
2198 uscal2 = one
2199 END IF
2200*
2201 IF ( nr .LT. n ) THEN
2202 DO 3004 p = nr+1, n
2203 sva(p) = zero
2204 3004 CONTINUE
2205 END IF
2206*
2207 rwork(1) = uscal2 * scalem
2208 rwork(2) = uscal1
2209 IF ( errest ) rwork(3) = sconda
2210 IF ( lsvec .AND. rsvec ) THEN
2211 rwork(4) = condr1
2212 rwork(5) = condr2
2213 END IF
2214 IF ( l2tran ) THEN
2215 rwork(6) = entra
2216 rwork(7) = entrat
2217 END IF
2218*
2219 iwork(1) = nr
2220 iwork(2) = numrank
2221 iwork(3) = warning
2222 IF ( transp ) THEN
2223 iwork(4) = 1
2224 ELSE
2225 iwork(4) = -1
2226 END IF
2227
2228*
2229 RETURN
2230* ..
2231* .. END OF ZGEJSV
2232* ..
2233 END
2234*
subroutine xerbla(srname, info)
Definition cblat2.f:3285
subroutine zcopy(n, zx, incx, zy, incy)
ZCOPY
Definition zcopy.f:81
subroutine zgejsv(joba, jobu, jobv, jobr, jobt, jobp, m, n, a, lda, sva, u, ldu, v, ldv, cwork, lwork, rwork, lrwork, iwork, info)
ZGEJSV
Definition zgejsv.f:569
subroutine zgelqf(m, n, a, lda, tau, work, lwork, info)
ZGELQF
Definition zgelqf.f:143
subroutine zgeqp3(m, n, a, lda, jpvt, tau, work, lwork, rwork, info)
ZGEQP3
Definition zgeqp3.f:159
subroutine zgeqrf(m, n, a, lda, tau, work, lwork, info)
ZGEQRF
Definition zgeqrf.f:146
subroutine zgesvj(joba, jobu, jobv, m, n, a, lda, sva, mv, v, ldv, cwork, lwork, rwork, lrwork, info)
ZGESVJ
Definition zgesvj.f:351
subroutine zlacgv(n, x, incx)
ZLACGV conjugates a complex vector.
Definition zlacgv.f:74
subroutine zlacpy(uplo, m, n, a, lda, b, ldb)
ZLACPY copies all or part of one two-dimensional array to another.
Definition zlacpy.f:103
subroutine zlapmr(forwrd, m, n, x, ldx, k)
ZLAPMR rearranges rows of a matrix as specified by a permutation vector.
Definition zlapmr.f:104
subroutine zlascl(type, kl, ku, cfrom, cto, m, n, a, lda, info)
ZLASCL multiplies a general rectangular matrix by a real scalar defined as cto/cfrom.
Definition zlascl.f:143
subroutine dlascl(type, kl, ku, cfrom, cto, m, n, a, lda, info)
DLASCL multiplies a general rectangular matrix by a real scalar defined as cto/cfrom.
Definition dlascl.f:143
subroutine zlaset(uplo, m, n, alpha, beta, a, lda)
ZLASET initializes the off-diagonal elements and the diagonal elements of a matrix to given values.
Definition zlaset.f:106
subroutine zlassq(n, x, incx, scale, sumsq)
ZLASSQ updates a sum of squares represented in scaled form.
Definition zlassq.f90:124
subroutine dlassq(n, x, incx, scale, sumsq)
DLASSQ updates a sum of squares represented in scaled form.
Definition dlassq.f90:124
subroutine zlaswp(n, a, lda, k1, k2, ipiv, incx)
ZLASWP performs a series of row interchanges on a general rectangular matrix.
Definition zlaswp.f:115
subroutine zpocon(uplo, n, a, lda, anorm, rcond, work, rwork, info)
ZPOCON
Definition zpocon.f:121
subroutine dscal(n, da, dx, incx)
DSCAL
Definition dscal.f:79
subroutine zdscal(n, da, zx, incx)
ZDSCAL
Definition zdscal.f:78
subroutine zswap(n, zx, incx, zy, incy)
ZSWAP
Definition zswap.f:81
subroutine ztrsm(side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb)
ZTRSM
Definition ztrsm.f:180
subroutine zungqr(m, n, k, a, lda, tau, work, lwork, info)
ZUNGQR
Definition zungqr.f:128
subroutine zunmlq(side, trans, m, n, k, a, lda, tau, c, ldc, work, lwork, info)
ZUNMLQ
Definition zunmlq.f:167
subroutine zunmqr(side, trans, m, n, k, a, lda, tau, c, ldc, work, lwork, info)
ZUNMQR
Definition zunmqr.f:167