LAPACK  3.6.1
LAPACK: Linear Algebra PACKage
cgejsv.f
Go to the documentation of this file.
1 *> \brief \b CGEJSV
2 *
3 * =========== DOCUMENTATION ===========
4 *
5 * Online html documentation available at
6 * http://www.netlib.org/lapack/explore-html/
7 *
8 *> \htmlonly
9 *> Download CGEJSV + dependencies
10 *> <a href="http://www.netlib.org/cgi-bin/netlibfiles.tgz?format=tgz&filename=/lapack/lapack_routine/cgejsv.f">
11 *> [TGZ]</a>
12 *> <a href="http://www.netlib.org/cgi-bin/netlibfiles.zip?format=zip&filename=/lapack/lapack_routine/cgejsv.f">
13 *> [ZIP]</a>
14 *> <a href="http://www.netlib.org/cgi-bin/netlibfiles.txt?format=txt&filename=/lapack/lapack_routine/cgejsv.f">
15 *> [TXT]</a>
16 *> \endhtmlonly
17 *
18 * Definition:
19 * ===========
20 *
21 * SUBROUTINE CGEJSV( JOBA, JOBU, JOBV, JOBR, JOBT, JOBP,
22 * M, N, A, LDA, SVA, U, LDU, V, LDV,
23 * CWORK, LWORK, RWORK, LRWORK, IWORK, INFO )
24 *
25 * .. Scalar Arguments ..
26 * IMPLICIT NONE
27 * INTEGER INFO, LDA, LDU, LDV, LWORK, M, N
28 * ..
29 * .. Array Arguments ..
30 * COMPLEX A( LDA, * ), U( LDU, * ), V( LDV, * ), CWORK( LWORK )
31 * REAL SVA( N ), RWORK( LRWORK )
32 * INTEGER IWORK( * )
33 * CHARACTER*1 JOBA, JOBP, JOBR, JOBT, JOBU, JOBV
34 * ..
35 *
36 *
37 *> \par Purpose:
38 * =============
39 *>
40 *> \verbatim
41 *>
42 *> CGEJSV computes the singular value decomposition (SVD) of a complex M-by-N
43 *> matrix [A], where M >= N. The SVD of [A] is written as
44 *>
45 *> [A] = [U] * [SIGMA] * [V]^*,
46 *>
47 *> where [SIGMA] is an N-by-N (M-by-N) matrix which is zero except for its N
48 *> diagonal elements, [U] is an M-by-N (or M-by-M) unitary matrix, and
49 *> [V] is an N-by-N unitary matrix. The diagonal elements of [SIGMA] are
50 *> the singular values of [A]. The columns of [U] and [V] are the left and
51 *> the right singular vectors of [A], respectively. The matrices [U] and [V]
52 *> are computed and stored in the arrays U and V, respectively. The diagonal
53 *> of [SIGMA] is computed and stored in the array SVA.
54 *> \endverbatim
55 *>
56 *> Arguments:
57 *> ==========
58 *>
59 *> \param[in] JOBA
60 *> \verbatim
61 *> JOBA is CHARACTER*1
62 *> Specifies the level of accuracy:
63 *> = 'C': This option works well (high relative accuracy) if A = B * D,
64 *> with well-conditioned B and arbitrary diagonal matrix D.
65 *> The accuracy cannot be spoiled by COLUMN scaling. The
66 *> accuracy of the computed output depends on the condition of
67 *> B, and the procedure aims at the best theoretical accuracy.
68 *> The relative error max_{i=1:N}|d sigma_i| / sigma_i is
69 *> bounded by f(M,N)*epsilon* cond(B), independent of D.
70 *> The input matrix is preprocessed with the QRF with column
71 *> pivoting. This initial preprocessing and preconditioning by
72 *> a rank revealing QR factorization is common for all values of
73 *> JOBA. Additional actions are specified as follows:
74 *> = 'E': Computation as with 'C' with an additional estimate of the
75 *> condition number of B. It provides a realistic error bound.
76 *> = 'F': If A = D1 * C * D2 with ill-conditioned diagonal scalings
77 *> D1, D2, and well-conditioned matrix C, this option gives
78 *> higher accuracy than the 'C' option. If the structure of the
79 *> input matrix is not known, and relative accuracy is
80 *> desirable, then this option is advisable. The input matrix A
81 *> is preprocessed with QR factorization with FULL (row and
82 *> column) pivoting.
83 *> = 'G' Computation as with 'F' with an additional estimate of the
84 *> condition number of B, where A=D*B. If A has heavily weighted
85 *> rows, then using this condition number gives too pessimistic
86 *> error bound.
87 *> = 'A': Small singular values are the noise and the matrix is treated
88 *> as numerically rank defficient. The error in the computed
89 *> singular values is bounded by f(m,n)*epsilon*||A||.
90 *> The computed SVD A = U * S * V^* restores A up to
91 *> f(m,n)*epsilon*||A||.
92 *> This gives the procedure the licence to discard (set to zero)
93 *> all singular values below N*epsilon*||A||.
94 *> = 'R': Similar as in 'A'. Rank revealing property of the initial
95 *> QR factorization is used do reveal (using triangular factor)
96 *> a gap sigma_{r+1} < epsilon * sigma_r in which case the
97 *> numerical RANK is declared to be r. The SVD is computed with
98 *> absolute error bounds, but more accurately than with 'A'.
99 *> \endverbatim
100 *>
101 *> \param[in] JOBU
102 *> \verbatim
103 *> JOBU is CHARACTER*1
104 *> Specifies whether to compute the columns of U:
105 *> = 'U': N columns of U are returned in the array U.
106 *> = 'F': full set of M left sing. vectors is returned in the array U.
107 *> = 'W': U may be used as workspace of length M*N. See the description
108 *> of U.
109 *> = 'N': U is not computed.
110 *> \endverbatim
111 *>
112 *> \param[in] JOBV
113 *> \verbatim
114 *> JOBV is CHARACTER*1
115 *> Specifies whether to compute the matrix V:
116 *> = 'V': N columns of V are returned in the array V; Jacobi rotations
117 *> are not explicitly accumulated.
118 *> = 'J': N columns of V are returned in the array V, but they are
119 *> computed as the product of Jacobi rotations. This option is
120 *> allowed only if JOBU .NE. 'N', i.e. in computing the full SVD.
121 *> = 'W': V may be used as workspace of length N*N. See the description
122 *> of V.
123 *> = 'N': V is not computed.
124 *> \endverbatim
125 *>
126 *> \param[in] JOBR
127 *> \verbatim
128 *> JOBR is CHARACTER*1
129 *> Specifies the RANGE for the singular values. Issues the licence to
130 *> set to zero small positive singular values if they are outside
131 *> specified range. If A .NE. 0 is scaled so that the largest singular
132 *> value of c*A is around SQRT(BIG), BIG=SLAMCH('O'), then JOBR issues
133 *> the licence to kill columns of A whose norm in c*A is less than
134 *> SQRT(SFMIN) (for JOBR.EQ.'R'), or less than SMALL=SFMIN/EPSLN,
135 *> where SFMIN=SLAMCH('S'), EPSLN=SLAMCH('E').
136 *> = 'N': Do not kill small columns of c*A. This option assumes that
137 *> BLAS and QR factorizations and triangular solvers are
138 *> implemented to work in that range. If the condition of A
139 *> is greater than BIG, use CGESVJ.
140 *> = 'R': RESTRICTED range for sigma(c*A) is [SQRT(SFMIN), SQRT(BIG)]
141 *> (roughly, as described above). This option is recommended.
142 *> ===========================
143 *> For computing the singular values in the FULL range [SFMIN,BIG]
144 *> use CGESVJ.
145 *> \endverbatim
146 *>
147 *> \param[in] JOBT
148 *> \verbatim
149 *> JOBT is CHARACTER*1
150 *> If the matrix is square then the procedure may determine to use
151 *> transposed A if A^* seems to be better with respect to convergence.
152 *> If the matrix is not square, JOBT is ignored. This is subject to
153 *> changes in the future.
154 *> The decision is based on two values of entropy over the adjoint
155 *> orbit of A^* * A. See the descriptions of WORK(6) and WORK(7).
156 *> = 'T': transpose if entropy test indicates possibly faster
157 *> convergence of Jacobi process if A^* is taken as input. If A is
158 *> replaced with A^*, then the row pivoting is included automatically.
159 *> = 'N': do not speculate.
160 *> This option can be used to compute only the singular values, or the
161 *> full SVD (U, SIGMA and V). For only one set of singular vectors
162 *> (U or V), the caller should provide both U and V, as one of the
163 *> matrices is used as workspace if the matrix A is transposed.
164 *> The implementer can easily remove this constraint and make the
165 *> code more complicated. See the descriptions of U and V.
166 *> \endverbatim
167 *>
168 *> \param[in] JOBP
169 *> \verbatim
170 *> JOBP is CHARACTER*1
171 *> Issues the licence to introduce structured perturbations to drown
172 *> denormalized numbers. This licence should be active if the
173 *> denormals are poorly implemented, causing slow computation,
174 *> especially in cases of fast convergence (!). For details see [1,2].
175 *> For the sake of simplicity, this perturbations are included only
176 *> when the full SVD or only the singular values are requested. The
177 *> implementer/user can easily add the perturbation for the cases of
178 *> computing one set of singular vectors.
179 *> = 'P': introduce perturbation
180 *> = 'N': do not perturb
181 *> \endverbatim
182 *>
183 *> \param[in] M
184 *> \verbatim
185 *> M is INTEGER
186 *> The number of rows of the input matrix A. M >= 0.
187 *> \endverbatim
188 *>
189 *> \param[in] N
190 *> \verbatim
191 *> N is INTEGER
192 *> The number of columns of the input matrix A. M >= N >= 0.
193 *> \endverbatim
194 *>
195 *> \param[in,out] A
196 *> \verbatim
197 *> A is COMPLEX array, dimension (LDA,N)
198 *> On entry, the M-by-N matrix A.
199 *> \endverbatim
200 *>
201 *> \param[in] LDA
202 *> \verbatim
203 *> LDA is INTEGER
204 *> The leading dimension of the array A. LDA >= max(1,M).
205 *> \endverbatim
206 *>
207 *> \param[out] SVA
208 *> \verbatim
209 *> SVA is REAL array, dimension (N)
210 *> On exit,
211 *> - For WORK(1)/WORK(2) = ONE: The singular values of A. During the
212 *> computation SVA contains Euclidean column norms of the
213 *> iterated matrices in the array A.
214 *> - For WORK(1) .NE. WORK(2): The singular values of A are
215 *> (WORK(1)/WORK(2)) * SVA(1:N). This factored form is used if
216 *> sigma_max(A) overflows or if small singular values have been
217 *> saved from underflow by scaling the input matrix A.
218 *> - If JOBR='R' then some of the singular values may be returned
219 *> as exact zeros obtained by "set to zero" because they are
220 *> below the numerical rank threshold or are denormalized numbers.
221 *> \endverbatim
222 *>
223 *> \param[out] U
224 *> \verbatim
225 *> U is COMPLEX array, dimension ( LDU, N ) or ( LDU, M )
226 *> If JOBU = 'U', then U contains on exit the M-by-N matrix of
227 *> the left singular vectors.
228 *> If JOBU = 'F', then U contains on exit the M-by-M matrix of
229 *> the left singular vectors, including an ONB
230 *> of the orthogonal complement of the Range(A).
231 *> If JOBU = 'W' .AND. (JOBV.EQ.'V' .AND. JOBT.EQ.'T' .AND. M.EQ.N),
232 *> then U is used as workspace if the procedure
233 *> replaces A with A^*. In that case, [V] is computed
234 *> in U as left singular vectors of A^* and then
235 *> copied back to the V array. This 'W' option is just
236 *> a reminder to the caller that in this case U is
237 *> reserved as workspace of length N*N.
238 *> If JOBU = 'N' U is not referenced, unless JOBT='T'.
239 *> \endverbatim
240 *>
241 *> \param[in] LDU
242 *> \verbatim
243 *> LDU is INTEGER
244 *> The leading dimension of the array U, LDU >= 1.
245 *> IF JOBU = 'U' or 'F' or 'W', then LDU >= M.
246 *> \endverbatim
247 *>
248 *> \param[out] V
249 *> \verbatim
250 *> V is COMPLEX array, dimension ( LDV, N )
251 *> If JOBV = 'V', 'J' then V contains on exit the N-by-N matrix of
252 *> the right singular vectors;
253 *> If JOBV = 'W', AND (JOBU.EQ.'U' AND JOBT.EQ.'T' AND M.EQ.N),
254 *> then V is used as workspace if the pprocedure
255 *> replaces A with A^*. In that case, [U] is computed
256 *> in V as right singular vectors of A^* and then
257 *> copied back to the U array. This 'W' option is just
258 *> a reminder to the caller that in this case V is
259 *> reserved as workspace of length N*N.
260 *> If JOBV = 'N' V is not referenced, unless JOBT='T'.
261 *> \endverbatim
262 *>
263 *> \param[in] LDV
264 *> \verbatim
265 *> LDV is INTEGER
266 *> The leading dimension of the array V, LDV >= 1.
267 *> If JOBV = 'V' or 'J' or 'W', then LDV >= N.
268 *> \endverbatim
269 *>
270 *> \param[out] CWORK
271 *> \verbatim
272 *> CWORK is COMPLEX array, dimension at least LWORK.
273 *> \endverbatim
274 *>
275 *> \param[in] LWORK
276 *> \verbatim
277 *> LWORK is INTEGER
278 *> Length of CWORK to confirm proper allocation of workspace.
279 *> LWORK depends on the job:
280 *>
281 *> 1. If only SIGMA is needed ( JOBU.EQ.'N', JOBV.EQ.'N' ) and
282 *> 1.1 .. no scaled condition estimate required (JOBA.NE.'E'.AND.JOBA.NE.'G'):
283 *> LWORK >= 2*N+1. This is the minimal requirement.
284 *> ->> For optimal performance (blocked code) the optimal value
285 *> is LWORK >= N + (N+1)*NB. Here NB is the optimal
286 *> block size for CGEQP3 and CGEQRF.
287 *> In general, optimal LWORK is computed as
288 *> LWORK >= max(N+LWORK(CGEQP3),N+LWORK(CGEQRF)).
289 *> 1.2. .. an estimate of the scaled condition number of A is
290 *> required (JOBA='E', or 'G'). In this case, LWORK the minimal
291 *> requirement is LWORK >= N*N + 3*N.
292 *> ->> For optimal performance (blocked code) the optimal value
293 *> is LWORK >= max(N+(N+1)*NB, N*N+3*N).
294 *> In general, the optimal length LWORK is computed as
295 *> LWORK >= max(N+LWORK(CGEQP3),N+LWORK(CGEQRF),
296 *> N+N*N+LWORK(CPOCON)).
297 *>
298 *> 2. If SIGMA and the right singular vectors are needed (JOBV.EQ.'V'),
299 *> (JOBU.EQ.'N')
300 *> -> the minimal requirement is LWORK >= 3*N.
301 *> -> For optimal performance, LWORK >= max(N+(N+1)*NB, 3*N,2*N+N*NB),
302 *> where NB is the optimal block size for CGEQP3, CGEQRF, CGELQF,
303 *> CUNMLQ. In general, the optimal length LWORK is computed as
304 *> LWORK >= max(N+LWORK(CGEQP3), N+LWORK(CPOCON), N+LWORK(CGESVJ),
305 *> N+LWORK(CGELQF), 2*N+LWORK(CGEQRF), N+LWORK(CUNMLQ)).
306 *>
307 *> 3. If SIGMA and the left singular vectors are needed
308 *> -> the minimal requirement is LWORK >= 3*N.
309 *> -> For optimal performance:
310 *> if JOBU.EQ.'U' :: LWORK >= max(3*N, N+(N+1)*NB, 2*N+N*NB),
311 *> where NB is the optimal block size for CGEQP3, CGEQRF, CUNMQR.
312 *> In general, the optimal length LWORK is computed as
313 *> LWORK >= max(N+LWORK(CGEQP3),N+LWORK(CPOCON),
314 *> 2*N+LWORK(CGEQRF), N+LWORK(CUNMQR)).
315 *>
316 *> 4. If the full SVD is needed: (JOBU.EQ.'U' or JOBU.EQ.'F') and
317 *> 4.1. if JOBV.EQ.'V'
318 *> the minimal requirement is LWORK >= 5*N+2*N*N.
319 *> 4.2. if JOBV.EQ.'J' the minimal requirement is
320 *> LWORK >= 4*N+N*N.
321 *> In both cases, the allocated CWORK can accommodate blocked runs
322 *> of CGEQP3, CGEQRF, CGELQF, CUNMQR, CUNMLQ.
323 *> \endverbatim
324 *>
325 *> \param[out] RWORK
326 *> \verbatim
327 *> RWORK is REAL array, dimension at least LRWORK.
328 *> On exit,
329 *> RWORK(1) = Determines the scaling factor SCALE = RWORK(2) / RWORK(1)
330 *> such that SCALE*SVA(1:N) are the computed singular values
331 *> of A. (See the description of SVA().)
332 *> RWORK(2) = See the description of RWORK(1).
333 *> RWORK(3) = SCONDA is an estimate for the condition number of
334 *> column equilibrated A. (If JOBA .EQ. 'E' or 'G')
335 *> SCONDA is an estimate of SQRT(||(R^* * R)^(-1)||_1).
336 *> It is computed using SPOCON. It holds
337 *> N^(-1/4) * SCONDA <= ||R^(-1)||_2 <= N^(1/4) * SCONDA
338 *> where R is the triangular factor from the QRF of A.
339 *> However, if R is truncated and the numerical rank is
340 *> determined to be strictly smaller than N, SCONDA is
341 *> returned as -1, thus indicating that the smallest
342 *> singular values might be lost.
343 *>
344 *> If full SVD is needed, the following two condition numbers are
345 *> useful for the analysis of the algorithm. They are provied for
346 *> a developer/implementer who is familiar with the details of
347 *> the method.
348 *>
349 *> RWORK(4) = an estimate of the scaled condition number of the
350 *> triangular factor in the first QR factorization.
351 *> RWORK(5) = an estimate of the scaled condition number of the
352 *> triangular factor in the second QR factorization.
353 *> The following two parameters are computed if JOBT .EQ. 'T'.
354 *> They are provided for a developer/implementer who is familiar
355 *> with the details of the method.
356 *> RWORK(6) = the entropy of A^* * A :: this is the Shannon entropy
357 *> of diag(A^* * A) / Trace(A^* * A) taken as point in the
358 *> probability simplex.
359 *> RWORK(7) = the entropy of A * A^*. (See the description of RWORK(6).)
360 *> \endverbatim
361 *>
362 *> \param[in] LRWORK
363 *> \verbatim
364 *> LRWORK is INTEGER
365 *> Length of RWORK to confirm proper allocation of workspace.
366 *> LRWORK depends on the job:
367 *>
368 *> 1. If only singular values are requested i.e. if
369 *> LSAME(JOBU,'N') .AND. LSAME(JOBV,'N')
370 *> then:
371 *> 1.1. If LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G'),
372 *> then LRWORK = max( 7, N + 2 * M ).
373 *> 1.2. Otherwise, LRWORK = max( 7, 2 * N ).
374 *> 2. If singular values with the right singular vectors are requested
375 *> i.e. if
376 *> (LSAME(JOBV,'V').OR.LSAME(JOBV,'J')) .AND.
377 *> .NOT.(LSAME(JOBU,'U').OR.LSAME(JOBU,'F'))
378 *> then:
379 *> 2.1. If LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G'),
380 *> then LRWORK = max( 7, N + 2 * M ).
381 *> 2.2. Otherwise, LRWORK = max( 7, 2 * N ).
382 *> 3. If singular values with the left singular vectors are requested, i.e. if
383 *> (LSAME(JOBU,'U').OR.LSAME(JOBU,'F')) .AND.
384 *> .NOT.(LSAME(JOBV,'V').OR.LSAME(JOBV,'J'))
385 *> then:
386 *> 3.1. If LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G'),
387 *> then LRWORK = max( 7, N + 2 * M ).
388 *> 3.2. Otherwise, LRWORK = max( 7, 2 * N ).
389 *> 4. If singular values with both the left and the right singular vectors
390 *> are requested, i.e. if
391 *> (LSAME(JOBU,'U').OR.LSAME(JOBU,'F')) .AND.
392 *> (LSAME(JOBV,'V').OR.LSAME(JOBV,'J'))
393 *> then:
394 *> 4.1. If LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G'),
395 *> then LRWORK = max( 7, N + 2 * M ).
396 *> 4.2. Otherwise, LRWORK = max( 7, 2 * N ).
397 *> \endverbatim
398 *>
399 *> \param[out] IWORK
400 *> \verbatim
401 *> IWORK is INTEGER array, of dimension:
402 *> If LSAME(JOBA,'F') .OR. LSAME(JOBA,'G'), then
403 *> the dimension of IWORK is max( 3, 2 * N + M ).
404 *> Otherwise, the dimension of IWORK is
405 *> -> max( 3, 2*N ) for full SVD
406 *> -> max( 3, N ) for singular values only or singular
407 *> values with one set of singular vectors (left or right)
408 *> On exit,
409 *> IWORK(1) = the numerical rank determined after the initial
410 *> QR factorization with pivoting. See the descriptions
411 *> of JOBA and JOBR.
412 *> IWORK(2) = the number of the computed nonzero singular values
413 *> IWORK(3) = if nonzero, a warning message:
414 *> If IWORK(3).EQ.1 then some of the column norms of A
415 *> were denormalized floats. The requested high accuracy
416 *> is not warranted by the data.
417 *> \endverbatim
418 *>
419 *> \param[out] INFO
420 *> \verbatim
421 *> INFO is INTEGER
422 *> < 0 : if INFO = -i, then the i-th argument had an illegal value.
423 *> = 0 : successfull exit;
424 *> > 0 : CGEJSV did not converge in the maximal allowed number
425 *> of sweeps. The computed values may be inaccurate.
426 *> \endverbatim
427 *
428 * Authors:
429 * ========
430 *
431 *> \author Univ. of Tennessee
432 *> \author Univ. of California Berkeley
433 *> \author Univ. of Colorado Denver
434 *> \author NAG Ltd.
435 *
436 *> \date June 2016
437 *
438 *> \ingroup complexGEsing
439 *
440 *> \par Further Details:
441 * =====================
442 *>
443 *> \verbatim
444 *>
445 *> CGEJSV implements a preconditioned Jacobi SVD algorithm. It uses CGEQP3,
446 *> CGEQRF, and CGELQF as preprocessors and preconditioners. Optionally, an
447 *> additional row pivoting can be used as a preprocessor, which in some
448 *> cases results in much higher accuracy. An example is matrix A with the
449 *> structure A = D1 * C * D2, where D1, D2 are arbitrarily ill-conditioned
450 *> diagonal matrices and C is well-conditioned matrix. In that case, complete
451 *> pivoting in the first QR factorizations provides accuracy dependent on the
452 *> condition number of C, and independent of D1, D2. Such higher accuracy is
453 *> not completely understood theoretically, but it works well in practice.
454 *> Further, if A can be written as A = B*D, with well-conditioned B and some
455 *> diagonal D, then the high accuracy is guaranteed, both theoretically and
456 *> in software, independent of D. For more details see [1], [2].
457 *> The computational range for the singular values can be the full range
458 *> ( UNDERFLOW,OVERFLOW ), provided that the machine arithmetic and the BLAS
459 *> & LAPACK routines called by CGEJSV are implemented to work in that range.
460 *> If that is not the case, then the restriction for safe computation with
461 *> the singular values in the range of normalized IEEE numbers is that the
462 *> spectral condition number kappa(A)=sigma_max(A)/sigma_min(A) does not
463 *> overflow. This code (CGEJSV) is best used in this restricted range,
464 *> meaning that singular values of magnitude below ||A||_2 / SLAMCH('O') are
465 *> returned as zeros. See JOBR for details on this.
466 *> Further, this implementation is somewhat slower than the one described
467 *> in [1,2] due to replacement of some non-LAPACK components, and because
468 *> the choice of some tuning parameters in the iterative part (CGESVJ) is
469 *> left to the implementer on a particular machine.
470 *> The rank revealing QR factorization (in this code: CGEQP3) should be
471 *> implemented as in [3]. We have a new version of CGEQP3 under development
472 *> that is more robust than the current one in LAPACK, with a cleaner cut in
473 *> rank defficient cases. It will be available in the SIGMA library [4].
474 *> If M is much larger than N, it is obvious that the inital QRF with
475 *> column pivoting can be preprocessed by the QRF without pivoting. That
476 *> well known trick is not used in CGEJSV because in some cases heavy row
477 *> weighting can be treated with complete pivoting. The overhead in cases
478 *> M much larger than N is then only due to pivoting, but the benefits in
479 *> terms of accuracy have prevailed. The implementer/user can incorporate
480 *> this extra QRF step easily. The implementer can also improve data movement
481 *> (matrix transpose, matrix copy, matrix transposed copy) - this
482 *> implementation of CGEJSV uses only the simplest, naive data movement.
483 *
484 *> \par Contributors:
485 * ==================
486 *>
487 *> Zlatko Drmac (Zagreb, Croatia) and Kresimir Veselic (Hagen, Germany)
488 *
489 *> \par References:
490 * ================
491 *>
492 *> \verbatim
493 *>
494 *> [1] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm I.
495 *> SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1322-1342.
496 *> LAPACK Working note 169.
497 *> [2] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm II.
498 *> SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1343-1362.
499 *> LAPACK Working note 170.
500 *> [3] Z. Drmac and Z. Bujanovic: On the failure of rank-revealing QR
501 *> factorization software - a case study.
502 *> ACM Trans. Math. Softw. Vol. 35, No 2 (2008), pp. 1-28.
503 *> LAPACK Working note 176.
504 *> [4] Z. Drmac: SIGMA - mathematical software library for accurate SVD, PSV,
505 *> QSVD, (H,K)-SVD computations.
506 *> Department of Mathematics, University of Zagreb, 2008.
507 *> \endverbatim
508 *
509 *> \par Bugs, examples and comments:
510 * =================================
511 *>
512 *> Please report all bugs and send interesting examples and/or comments to
513 *> drmac@math.hr. Thank you.
514 *>
515 * =====================================================================
516  SUBROUTINE cgejsv( JOBA, JOBU, JOBV, JOBR, JOBT, JOBP,
517  $ m, n, a, lda, sva, u, ldu, v, ldv,
518  $ cwork, lwork, rwork, lrwork, iwork, info )
519 *
520 * -- LAPACK computational routine (version 3.6.1) --
521 * -- LAPACK is a software package provided by Univ. of Tennessee, --
522 * -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
523 * June 2016
524 *
525 * .. Scalar Arguments ..
526  IMPLICIT NONE
527  INTEGER INFO, LDA, LDU, LDV, LWORK, LRWORK, M, N
528 * ..
529 * .. Array Arguments ..
530  COMPLEX A( lda, * ), U( ldu, * ), V( ldv, * ), CWORK( lwork )
531  REAL SVA( n ), RWORK( * )
532  INTEGER IWORK( * )
533  CHARACTER*1 JOBA, JOBP, JOBR, JOBT, JOBU, JOBV
534 * ..
535 *
536 * ===========================================================================
537 *
538 * .. Local Parameters ..
539  REAL ZERO, ONE
540  parameter( zero = 0.0e0, one = 1.0e0 )
541  COMPLEX CZERO, CONE
542  parameter( czero = ( 0.0e0, 0.0e0 ), cone = ( 1.0e0, 0.0e0 ) )
543 * ..
544 * .. Local Scalars ..
545  COMPLEX CTEMP
546  REAL AAPP, AAQQ, AATMAX, AATMIN, BIG, BIG1, COND_OK,
547  $ condr1, condr2, entra, entrat, epsln, maxprj, scalem,
548  $ sconda, sfmin, small, temp1, uscal1, uscal2, xsc
549  INTEGER IERR, N1, NR, NUMRANK, p, q, WARNING
550  LOGICAL ALMORT, DEFR, ERREST, GOSCAL, JRACC, KILL, LSVEC,
551  $ l2aber, l2kill, l2pert, l2rank, l2tran,
552  $ noscal, rowpiv, rsvec, transp
553 * ..
554 * .. Intrinsic Functions ..
555  INTRINSIC abs, conjg, alog, amax1, amin1, cmplx, float,
556  $ max0, min0, nint, sign, sqrt
557 * ..
558 * .. External Functions ..
559  REAL SLAMCH, SCNRM2
560  INTEGER ISAMAX, ICAMAX
561  LOGICAL LSAME
562  EXTERNAL isamax, icamax, lsame, slamch, scnrm2
563 * ..
564 * .. External Subroutines ..
565  EXTERNAL ccopy, cgelqf, cgeqp3, cgeqrf, clacpy, clascl,
568 *
569  EXTERNAL cgesvj
570 * ..
571 *
572 * Test the input arguments
573 *
574  lsvec = lsame( jobu, 'U' ) .OR. lsame( jobu, 'F' )
575  jracc = lsame( jobv, 'J' )
576  rsvec = lsame( jobv, 'V' ) .OR. jracc
577  rowpiv = lsame( joba, 'F' ) .OR. lsame( joba, 'G' )
578  l2rank = lsame( joba, 'R' )
579  l2aber = lsame( joba, 'A' )
580  errest = lsame( joba, 'E' ) .OR. lsame( joba, 'G' )
581  l2tran = lsame( jobt, 'T' )
582  l2kill = lsame( jobr, 'R' )
583  defr = lsame( jobr, 'N' )
584  l2pert = lsame( jobp, 'P' )
585 *
586  IF ( .NOT.(rowpiv .OR. l2rank .OR. l2aber .OR.
587  $ errest .OR. lsame( joba, 'C' ) )) THEN
588  info = - 1
589  ELSE IF ( .NOT.( lsvec .OR. lsame( jobu, 'N' ) .OR.
590  $ lsame( jobu, 'W' )) ) THEN
591  info = - 2
592  ELSE IF ( .NOT.( rsvec .OR. lsame( jobv, 'N' ) .OR.
593  $ lsame( jobv, 'W' )) .OR. ( jracc .AND. (.NOT.lsvec) ) ) THEN
594  info = - 3
595  ELSE IF ( .NOT. ( l2kill .OR. defr ) ) THEN
596  info = - 4
597  ELSE IF ( .NOT. ( l2tran .OR. lsame( jobt, 'N' ) ) ) THEN
598  info = - 5
599  ELSE IF ( .NOT. ( l2pert .OR. lsame( jobp, 'N' ) ) ) THEN
600  info = - 6
601  ELSE IF ( m .LT. 0 ) THEN
602  info = - 7
603  ELSE IF ( ( n .LT. 0 ) .OR. ( n .GT. m ) ) THEN
604  info = - 8
605  ELSE IF ( lda .LT. m ) THEN
606  info = - 10
607  ELSE IF ( lsvec .AND. ( ldu .LT. m ) ) THEN
608  info = - 13
609  ELSE IF ( rsvec .AND. ( ldv .LT. n ) ) THEN
610  info = - 15
611  ELSE IF ( (.NOT.(lsvec .OR. rsvec .OR. errest).AND.
612  $ (lwork .LT. 2*n+1)) .OR.
613  $ (.NOT.(lsvec .OR. rsvec) .AND. errest .AND.
614  $ (lwork .LT. n*n+3*n)) .OR.
615  $ (lsvec .AND. (.NOT.rsvec) .AND. (lwork .LT. 3*n))
616  $ .OR.
617  $ (rsvec .AND. (.NOT.lsvec) .AND. (lwork .LT. 3*n))
618  $ .OR.
619  $ (lsvec .AND. rsvec .AND. (.NOT.jracc) .AND.
620  $ (lwork.LT.5*n+2*n*n))
621  $ .OR. (lsvec .AND. rsvec .AND. jracc .AND.
622  $ lwork.LT.4*n+n*n))
623  $ THEN
624  info = - 17
625  ELSE IF ( lrwork.LT. max0(n+2*m,7)) THEN
626  info = -19
627  ELSE
628 * #:)
629  info = 0
630  END IF
631 *
632  IF ( info .NE. 0 ) THEN
633 * #:(
634  CALL xerbla( 'CGEJSV', - info )
635  RETURN
636  END IF
637 *
638 * Quick return for void matrix (Y3K safe)
639 * #:)
640  IF ( ( m .EQ. 0 ) .OR. ( n .EQ. 0 ) ) THEN
641  iwork(1:3) = 0
642  rwork(1:7) = 0
643  RETURN
644  ENDIF
645 *
646 * Determine whether the matrix U should be M x N or M x M
647 *
648  IF ( lsvec ) THEN
649  n1 = n
650  IF ( lsame( jobu, 'F' ) ) n1 = m
651  END IF
652 *
653 * Set numerical parameters
654 *
655 *! NOTE: Make sure SLAMCH() does not fail on the target architecture.
656 *
657  epsln = slamch('Epsilon')
658  sfmin = slamch('SafeMinimum')
659  small = sfmin / epsln
660  big = slamch('O')
661 * BIG = ONE / SFMIN
662 *
663 * Initialize SVA(1:N) = diag( ||A e_i||_2 )_1^N
664 *
665 *(!) If necessary, scale SVA() to protect the largest norm from
666 * overflow. It is possible that this scaling pushes the smallest
667 * column norm left from the underflow threshold (extreme case).
668 *
669  scalem = one / sqrt(float(m)*float(n))
670  noscal = .true.
671  goscal = .true.
672  DO 1874 p = 1, n
673  aapp = zero
674  aaqq = one
675  CALL classq( m, a(1,p), 1, aapp, aaqq )
676  IF ( aapp .GT. big ) THEN
677  info = - 9
678  CALL xerbla( 'CGEJSV', -info )
679  RETURN
680  END IF
681  aaqq = sqrt(aaqq)
682  IF ( ( aapp .LT. (big / aaqq) ) .AND. noscal ) THEN
683  sva(p) = aapp * aaqq
684  ELSE
685  noscal = .false.
686  sva(p) = aapp * ( aaqq * scalem )
687  IF ( goscal ) THEN
688  goscal = .false.
689  CALL sscal( p-1, scalem, sva, 1 )
690  END IF
691  END IF
692  1874 CONTINUE
693 *
694  IF ( noscal ) scalem = one
695 *
696  aapp = zero
697  aaqq = big
698  DO 4781 p = 1, n
699  aapp = amax1( aapp, sva(p) )
700  IF ( sva(p) .NE. zero ) aaqq = amin1( aaqq, sva(p) )
701  4781 CONTINUE
702 *
703 * Quick return for zero M x N matrix
704 * #:)
705  IF ( aapp .EQ. zero ) THEN
706  IF ( lsvec ) CALL claset( 'G', m, n1, czero, cone, u, ldu )
707  IF ( rsvec ) CALL claset( 'G', n, n, czero, cone, v, ldv )
708  rwork(1) = one
709  rwork(2) = one
710  IF ( errest ) rwork(3) = one
711  IF ( lsvec .AND. rsvec ) THEN
712  rwork(4) = one
713  rwork(5) = one
714  END IF
715  IF ( l2tran ) THEN
716  rwork(6) = zero
717  rwork(7) = zero
718  END IF
719  iwork(1) = 0
720  iwork(2) = 0
721  iwork(3) = 0
722  RETURN
723  END IF
724 *
725 * Issue warning if denormalized column norms detected. Override the
726 * high relative accuracy request. Issue licence to kill columns
727 * (set them to zero) whose norm is less than sigma_max / BIG (roughly).
728 * #:(
729  warning = 0
730  IF ( aaqq .LE. sfmin ) THEN
731  l2rank = .true.
732  l2kill = .true.
733  warning = 1
734  END IF
735 *
736 * Quick return for one-column matrix
737 * #:)
738  IF ( n .EQ. 1 ) THEN
739 *
740  IF ( lsvec ) THEN
741  CALL clascl( 'G',0,0,sva(1),scalem, m,1,a(1,1),lda,ierr )
742  CALL clacpy( 'A', m, 1, a, lda, u, ldu )
743 * computing all M left singular vectors of the M x 1 matrix
744  IF ( n1 .NE. n ) THEN
745  CALL cgeqrf( m, n, u,ldu, cwork, cwork(n+1),lwork-n,ierr )
746  CALL cungqr( m,n1,1, u,ldu,cwork,cwork(n+1),lwork-n,ierr )
747  CALL ccopy( m, a(1,1), 1, u(1,1), 1 )
748  END IF
749  END IF
750  IF ( rsvec ) THEN
751  v(1,1) = cone
752  END IF
753  IF ( sva(1) .LT. (big*scalem) ) THEN
754  sva(1) = sva(1) / scalem
755  scalem = one
756  END IF
757  rwork(1) = one / scalem
758  rwork(2) = one
759  IF ( sva(1) .NE. zero ) THEN
760  iwork(1) = 1
761  IF ( ( sva(1) / scalem) .GE. sfmin ) THEN
762  iwork(2) = 1
763  ELSE
764  iwork(2) = 0
765  END IF
766  ELSE
767  iwork(1) = 0
768  iwork(2) = 0
769  END IF
770  iwork(3) = 0
771  IF ( errest ) rwork(3) = one
772  IF ( lsvec .AND. rsvec ) THEN
773  rwork(4) = one
774  rwork(5) = one
775  END IF
776  IF ( l2tran ) THEN
777  rwork(6) = zero
778  rwork(7) = zero
779  END IF
780  RETURN
781 *
782  END IF
783 *
784  transp = .false.
785  l2tran = l2tran .AND. ( m .EQ. n )
786 *
787  aatmax = -one
788  aatmin = big
789  IF ( rowpiv .OR. l2tran ) THEN
790 *
791 * Compute the row norms, needed to determine row pivoting sequence
792 * (in the case of heavily row weighted A, row pivoting is strongly
793 * advised) and to collect information needed to compare the
794 * structures of A * A^* and A^* * A (in the case L2TRAN.EQ..TRUE.).
795 *
796  IF ( l2tran ) THEN
797  DO 1950 p = 1, m
798  xsc = zero
799  temp1 = one
800  CALL classq( n, a(p,1), lda, xsc, temp1 )
801 * CLASSQ gets both the ell_2 and the ell_infinity norm
802 * in one pass through the vector
803  rwork(m+n+p) = xsc * scalem
804  rwork(n+p) = xsc * (scalem*sqrt(temp1))
805  aatmax = amax1( aatmax, rwork(n+p) )
806  IF (rwork(n+p) .NE. zero)
807  $ aatmin = amin1(aatmin,rwork(n+p))
808  1950 CONTINUE
809  ELSE
810  DO 1904 p = 1, m
811  rwork(m+n+p) = scalem*abs( a(p,icamax(n,a(p,1),lda)) )
812  aatmax = amax1( aatmax, rwork(m+n+p) )
813  aatmin = amin1( aatmin, rwork(m+n+p) )
814  1904 CONTINUE
815  END IF
816 *
817  END IF
818 *
819 * For square matrix A try to determine whether A^* would be better
820 * input for the preconditioned Jacobi SVD, with faster convergence.
821 * The decision is based on an O(N) function of the vector of column
822 * and row norms of A, based on the Shannon entropy. This should give
823 * the right choice in most cases when the difference actually matters.
824 * It may fail and pick the slower converging side.
825 *
826  entra = zero
827  entrat = zero
828  IF ( l2tran ) THEN
829 *
830  xsc = zero
831  temp1 = one
832  CALL slassq( n, sva, 1, xsc, temp1 )
833  temp1 = one / temp1
834 *
835  entra = zero
836  DO 1113 p = 1, n
837  big1 = ( ( sva(p) / xsc )**2 ) * temp1
838  IF ( big1 .NE. zero ) entra = entra + big1 * alog(big1)
839  1113 CONTINUE
840  entra = - entra / alog(float(n))
841 *
842 * Now, SVA().^2/Trace(A^* * A) is a point in the probability simplex.
843 * It is derived from the diagonal of A^* * A. Do the same with the
844 * diagonal of A * A^*, compute the entropy of the corresponding
845 * probability distribution. Note that A * A^* and A^* * A have the
846 * same trace.
847 *
848  entrat = zero
849  DO 1114 p = n+1, n+m
850  big1 = ( ( rwork(p) / xsc )**2 ) * temp1
851  IF ( big1 .NE. zero ) entrat = entrat + big1 * alog(big1)
852  1114 CONTINUE
853  entrat = - entrat / alog(float(m))
854 *
855 * Analyze the entropies and decide A or A^*. Smaller entropy
856 * usually means better input for the algorithm.
857 *
858  transp = ( entrat .LT. entra )
859  transp = .true.
860 *
861 * If A^* is better than A, take the adjoint of A.
862 *
863  IF ( transp ) THEN
864 * In an optimal implementation, this trivial transpose
865 * should be replaced with faster transpose.
866  DO 1115 p = 1, n - 1
867  a(p,p) = conjg(a(p,p))
868  DO 1116 q = p + 1, n
869  ctemp = conjg(a(q,p))
870  a(q,p) = conjg(a(p,q))
871  a(p,q) = ctemp
872  1116 CONTINUE
873  1115 CONTINUE
874  a(n,n) = conjg(a(n,n))
875  DO 1117 p = 1, n
876  rwork(m+n+p) = sva(p)
877  sva(p) = rwork(n+p)
878 * previously computed row 2-norms are now column 2-norms
879 * of the transposed matrix
880  1117 CONTINUE
881  temp1 = aapp
882  aapp = aatmax
883  aatmax = temp1
884  temp1 = aaqq
885  aaqq = aatmin
886  aatmin = temp1
887  kill = lsvec
888  lsvec = rsvec
889  rsvec = kill
890  IF ( lsvec ) n1 = n
891 *
892  rowpiv = .true.
893  END IF
894 *
895  END IF
896 * END IF L2TRAN
897 *
898 * Scale the matrix so that its maximal singular value remains less
899 * than SQRT(BIG) -- the matrix is scaled so that its maximal column
900 * has Euclidean norm equal to SQRT(BIG/N). The only reason to keep
901 * SQRT(BIG) instead of BIG is the fact that CGEJSV uses LAPACK and
902 * BLAS routines that, in some implementations, are not capable of
903 * working in the full interval [SFMIN,BIG] and that they may provoke
904 * overflows in the intermediate results. If the singular values spread
905 * from SFMIN to BIG, then CGESVJ will compute them. So, in that case,
906 * one should use CGESVJ instead of CGEJSV.
907 *
908  big1 = sqrt( big )
909  temp1 = sqrt( big / float(n) )
910 *
911  CALL slascl( 'G', 0, 0, aapp, temp1, n, 1, sva, n, ierr )
912  IF ( aaqq .GT. (aapp * sfmin) ) THEN
913  aaqq = ( aaqq / aapp ) * temp1
914  ELSE
915  aaqq = ( aaqq * temp1 ) / aapp
916  END IF
917  temp1 = temp1 * scalem
918  CALL clascl( 'G', 0, 0, aapp, temp1, m, n, a, lda, ierr )
919 *
920 * To undo scaling at the end of this procedure, multiply the
921 * computed singular values with USCAL2 / USCAL1.
922 *
923  uscal1 = temp1
924  uscal2 = aapp
925 *
926  IF ( l2kill ) THEN
927 * L2KILL enforces computation of nonzero singular values in
928 * the restricted range of condition number of the initial A,
929 * sigma_max(A) / sigma_min(A) approx. SQRT(BIG)/SQRT(SFMIN).
930  xsc = sqrt( sfmin )
931  ELSE
932  xsc = small
933 *
934 * Now, if the condition number of A is too big,
935 * sigma_max(A) / sigma_min(A) .GT. SQRT(BIG/N) * EPSLN / SFMIN,
936 * as a precaution measure, the full SVD is computed using CGESVJ
937 * with accumulated Jacobi rotations. This provides numerically
938 * more robust computation, at the cost of slightly increased run
939 * time. Depending on the concrete implementation of BLAS and LAPACK
940 * (i.e. how they behave in presence of extreme ill-conditioning) the
941 * implementor may decide to remove this switch.
942  IF ( ( aaqq.LT.sqrt(sfmin) ) .AND. lsvec .AND. rsvec ) THEN
943  jracc = .true.
944  END IF
945 *
946  END IF
947  IF ( aaqq .LT. xsc ) THEN
948  DO 700 p = 1, n
949  IF ( sva(p) .LT. xsc ) THEN
950  CALL claset( 'A', m, 1, czero, czero, a(1,p), lda )
951  sva(p) = zero
952  END IF
953  700 CONTINUE
954  END IF
955 *
956 * Preconditioning using QR factorization with pivoting
957 *
958  IF ( rowpiv ) THEN
959 * Optional row permutation (Bjoerck row pivoting):
960 * A result by Cox and Higham shows that the Bjoerck's
961 * row pivoting combined with standard column pivoting
962 * has similar effect as Powell-Reid complete pivoting.
963 * The ell-infinity norms of A are made nonincreasing.
964  DO 1952 p = 1, m - 1
965  q = isamax( m-p+1, rwork(m+n+p), 1 ) + p - 1
966  iwork(2*n+p) = q
967  IF ( p .NE. q ) THEN
968  temp1 = rwork(m+n+p)
969  rwork(m+n+p) = rwork(m+n+q)
970  rwork(m+n+q) = temp1
971  END IF
972  1952 CONTINUE
973  CALL claswp( n, a, lda, 1, m-1, iwork(2*n+1), 1 )
974  END IF
975 *
976 * End of the preparation phase (scaling, optional sorting and
977 * transposing, optional flushing of small columns).
978 *
979 * Preconditioning
980 *
981 * If the full SVD is needed, the right singular vectors are computed
982 * from a matrix equation, and for that we need theoretical analysis
983 * of the Businger-Golub pivoting. So we use CGEQP3 as the first RR QRF.
984 * In all other cases the first RR QRF can be chosen by other criteria
985 * (eg speed by replacing global with restricted window pivoting, such
986 * as in xGEQPX from TOMS # 782). Good results will be obtained using
987 * xGEQPX with properly (!) chosen numerical parameters.
988 * Any improvement of CGEQP3 improves overal performance of CGEJSV.
989 *
990 * A * P1 = Q1 * [ R1^* 0]^*:
991  DO 1963 p = 1, n
992 * .. all columns are free columns
993  iwork(p) = 0
994  1963 CONTINUE
995  CALL cgeqp3( m, n, a, lda, iwork, cwork, cwork(n+1), lwork-n,
996  $ rwork, ierr )
997 *
998 * The upper triangular matrix R1 from the first QRF is inspected for
999 * rank deficiency and possibilities for deflation, or possible
1000 * ill-conditioning. Depending on the user specified flag L2RANK,
1001 * the procedure explores possibilities to reduce the numerical
1002 * rank by inspecting the computed upper triangular factor. If
1003 * L2RANK or L2ABER are up, then CGEJSV will compute the SVD of
1004 * A + dA, where ||dA|| <= f(M,N)*EPSLN.
1005 *
1006  nr = 1
1007  IF ( l2aber ) THEN
1008 * Standard absolute error bound suffices. All sigma_i with
1009 * sigma_i < N*EPSLN*||A|| are flushed to zero. This is an
1010 * agressive enforcement of lower numerical rank by introducing a
1011 * backward error of the order of N*EPSLN*||A||.
1012  temp1 = sqrt(float(n))*epsln
1013  DO 3001 p = 2, n
1014  IF ( abs(a(p,p)) .GE. (temp1*abs(a(1,1))) ) THEN
1015  nr = nr + 1
1016  ELSE
1017  GO TO 3002
1018  END IF
1019  3001 CONTINUE
1020  3002 CONTINUE
1021  ELSE IF ( l2rank ) THEN
1022 * .. similarly as above, only slightly more gentle (less agressive).
1023 * Sudden drop on the diagonal of R1 is used as the criterion for
1024 * close-to-rank-defficient.
1025  temp1 = sqrt(sfmin)
1026  DO 3401 p = 2, n
1027  IF ( ( abs(a(p,p)) .LT. (epsln*abs(a(p-1,p-1))) ) .OR.
1028  $ ( abs(a(p,p)) .LT. small ) .OR.
1029  $ ( l2kill .AND. (abs(a(p,p)) .LT. temp1) ) ) GO TO 3402
1030  nr = nr + 1
1031  3401 CONTINUE
1032  3402 CONTINUE
1033 *
1034  ELSE
1035 * The goal is high relative accuracy. However, if the matrix
1036 * has high scaled condition number the relative accuracy is in
1037 * general not feasible. Later on, a condition number estimator
1038 * will be deployed to estimate the scaled condition number.
1039 * Here we just remove the underflowed part of the triangular
1040 * factor. This prevents the situation in which the code is
1041 * working hard to get the accuracy not warranted by the data.
1042  temp1 = sqrt(sfmin)
1043  DO 3301 p = 2, n
1044  IF ( ( abs(a(p,p)) .LT. small ) .OR.
1045  $ ( l2kill .AND. (abs(a(p,p)) .LT. temp1) ) ) GO TO 3302
1046  nr = nr + 1
1047  3301 CONTINUE
1048  3302 CONTINUE
1049 *
1050  END IF
1051 *
1052  almort = .false.
1053  IF ( nr .EQ. n ) THEN
1054  maxprj = one
1055  DO 3051 p = 2, n
1056  temp1 = abs(a(p,p)) / sva(iwork(p))
1057  maxprj = amin1( maxprj, temp1 )
1058  3051 CONTINUE
1059  IF ( maxprj**2 .GE. one - float(n)*epsln ) almort = .true.
1060  END IF
1061 *
1062 *
1063  sconda = - one
1064  condr1 = - one
1065  condr2 = - one
1066 *
1067  IF ( errest ) THEN
1068  IF ( n .EQ. nr ) THEN
1069  IF ( rsvec ) THEN
1070 * .. V is available as workspace
1071  CALL clacpy( 'U', n, n, a, lda, v, ldv )
1072  DO 3053 p = 1, n
1073  temp1 = sva(iwork(p))
1074  CALL csscal( p, one/temp1, v(1,p), 1 )
1075  3053 CONTINUE
1076  CALL cpocon( 'U', n, v, ldv, one, temp1,
1077  $ cwork(n+1), rwork, ierr )
1078 *
1079  ELSE IF ( lsvec ) THEN
1080 * .. U is available as workspace
1081  CALL clacpy( 'U', n, n, a, lda, u, ldu )
1082  DO 3054 p = 1, n
1083  temp1 = sva(iwork(p))
1084  CALL csscal( p, one/temp1, u(1,p), 1 )
1085  3054 CONTINUE
1086  CALL cpocon( 'U', n, u, ldu, one, temp1,
1087  $ cwork(n+1), rwork, ierr )
1088  ELSE
1089  CALL clacpy( 'U', n, n, a, lda, cwork(n+1), n )
1090  DO 3052 p = 1, n
1091  temp1 = sva(iwork(p))
1092  CALL csscal( p, one/temp1, cwork(n+(p-1)*n+1), 1 )
1093  3052 CONTINUE
1094 * .. the columns of R are scaled to have unit Euclidean lengths.
1095  CALL cpocon( 'U', n, cwork(n+1), n, one, temp1,
1096  $ cwork(n+n*n+1), rwork, ierr )
1097 *
1098  END IF
1099  sconda = one / sqrt(temp1)
1100 * SCONDA is an estimate of SQRT(||(R^* * R)^(-1)||_1).
1101 * N^(-1/4) * SCONDA <= ||R^(-1)||_2 <= N^(1/4) * SCONDA
1102  ELSE
1103  sconda = - one
1104  END IF
1105  END IF
1106 *
1107  l2pert = l2pert .AND. ( abs( a(1,1)/a(nr,nr) ) .GT. sqrt(big1) )
1108 * If there is no violent scaling, artificial perturbation is not needed.
1109 *
1110 * Phase 3:
1111 *
1112  IF ( .NOT. ( rsvec .OR. lsvec ) ) THEN
1113 *
1114 * Singular Values only
1115 *
1116 * .. transpose A(1:NR,1:N)
1117  DO 1946 p = 1, min0( n-1, nr )
1118  CALL ccopy( n-p, a(p,p+1), lda, a(p+1,p), 1 )
1119  CALL clacgv( n-p+1, a(p,p), 1 )
1120  1946 CONTINUE
1121  IF ( nr .EQ. n ) a(n,n) = conjg(a(n,n))
1122 *
1123 * The following two DO-loops introduce small relative perturbation
1124 * into the strict upper triangle of the lower triangular matrix.
1125 * Small entries below the main diagonal are also changed.
1126 * This modification is useful if the computing environment does not
1127 * provide/allow FLUSH TO ZERO underflow, for it prevents many
1128 * annoying denormalized numbers in case of strongly scaled matrices.
1129 * The perturbation is structured so that it does not introduce any
1130 * new perturbation of the singular values, and it does not destroy
1131 * the job done by the preconditioner.
1132 * The licence for this perturbation is in the variable L2PERT, which
1133 * should be .FALSE. if FLUSH TO ZERO underflow is active.
1134 *
1135  IF ( .NOT. almort ) THEN
1136 *
1137  IF ( l2pert ) THEN
1138 * XSC = SQRT(SMALL)
1139  xsc = epsln / float(n)
1140  DO 4947 q = 1, nr
1141  ctemp = cmplx(xsc*abs(a(q,q)),zero)
1142  DO 4949 p = 1, n
1143  IF ( ( (p.GT.q) .AND. (abs(a(p,q)).LE.temp1) )
1144  $ .OR. ( p .LT. q ) )
1145 * $ A(p,q) = TEMP1 * ( A(p,q) / ABS(A(p,q)) )
1146  $ a(p,q) = ctemp
1147  4949 CONTINUE
1148  4947 CONTINUE
1149  ELSE
1150  CALL claset( 'U', nr-1,nr-1, czero,czero, a(1,2),lda )
1151  END IF
1152 *
1153 * .. second preconditioning using the QR factorization
1154 *
1155  CALL cgeqrf( n,nr, a,lda, cwork, cwork(n+1),lwork-n, ierr )
1156 *
1157 * .. and transpose upper to lower triangular
1158  DO 1948 p = 1, nr - 1
1159  CALL ccopy( nr-p, a(p,p+1), lda, a(p+1,p), 1 )
1160  CALL clacgv( nr-p+1, a(p,p), 1 )
1161  1948 CONTINUE
1162 *
1163  END IF
1164 *
1165 * Row-cyclic Jacobi SVD algorithm with column pivoting
1166 *
1167 * .. again some perturbation (a "background noise") is added
1168 * to drown denormals
1169  IF ( l2pert ) THEN
1170 * XSC = SQRT(SMALL)
1171  xsc = epsln / float(n)
1172  DO 1947 q = 1, nr
1173  ctemp = cmplx(xsc*abs(a(q,q)),zero)
1174  DO 1949 p = 1, nr
1175  IF ( ( (p.GT.q) .AND. (abs(a(p,q)).LE.temp1) )
1176  $ .OR. ( p .LT. q ) )
1177 * $ A(p,q) = TEMP1 * ( A(p,q) / ABS(A(p,q)) )
1178  $ a(p,q) = ctemp
1179  1949 CONTINUE
1180  1947 CONTINUE
1181  ELSE
1182  CALL claset( 'U', nr-1, nr-1, czero, czero, a(1,2), lda )
1183  END IF
1184 *
1185 * .. and one-sided Jacobi rotations are started on a lower
1186 * triangular matrix (plus perturbation which is ignored in
1187 * the part which destroys triangular form (confusing?!))
1188 *
1189  CALL cgesvj( 'L', 'NoU', 'NoV', nr, nr, a, lda, sva,
1190  $ n, v, ldv, cwork, lwork, rwork, lrwork, info )
1191 *
1192  scalem = rwork(1)
1193  numrank = nint(rwork(2))
1194 *
1195 *
1196  ELSE IF ( rsvec .AND. ( .NOT. lsvec ) ) THEN
1197 *
1198 * -> Singular Values and Right Singular Vectors <-
1199 *
1200  IF ( almort ) THEN
1201 *
1202 * .. in this case NR equals N
1203  DO 1998 p = 1, nr
1204  CALL ccopy( n-p+1, a(p,p), lda, v(p,p), 1 )
1205  CALL clacgv( n-p+1, v(p,p), 1 )
1206  1998 CONTINUE
1207  CALL claset( 'Upper', nr-1,nr-1, czero, czero, v(1,2), ldv )
1208 *
1209  CALL cgesvj( 'L','U','N', n, nr, v,ldv, sva, nr, a,lda,
1210  $ cwork, lwork, rwork, lrwork, info )
1211  scalem = rwork(1)
1212  numrank = nint(rwork(2))
1213 
1214  ELSE
1215 *
1216 * .. two more QR factorizations ( one QRF is not enough, two require
1217 * accumulated product of Jacobi rotations, three are perfect )
1218 *
1219  CALL claset( 'Lower', nr-1,nr-1, czero, czero, a(2,1), lda )
1220  CALL cgelqf( nr,n, a, lda, cwork, cwork(n+1), lwork-n, ierr)
1221  CALL clacpy( 'Lower', nr, nr, a, lda, v, ldv )
1222  CALL claset( 'Upper', nr-1,nr-1, czero, czero, v(1,2), ldv )
1223  CALL cgeqrf( nr, nr, v, ldv, cwork(n+1), cwork(2*n+1),
1224  $ lwork-2*n, ierr )
1225  DO 8998 p = 1, nr
1226  CALL ccopy( nr-p+1, v(p,p), ldv, v(p,p), 1 )
1227  CALL clacgv( nr-p+1, v(p,p), 1 )
1228  8998 CONTINUE
1229  CALL claset('Upper', nr-1, nr-1, czero, czero, v(1,2), ldv)
1230 *
1231  CALL cgesvj( 'Lower', 'U','N', nr, nr, v,ldv, sva, nr, u,
1232  $ ldu, cwork(n+1), lwork-n, rwork, lrwork, info )
1233  scalem = rwork(1)
1234  numrank = nint(rwork(2))
1235  IF ( nr .LT. n ) THEN
1236  CALL claset( 'A',n-nr, nr, czero,czero, v(nr+1,1), ldv )
1237  CALL claset( 'A',nr, n-nr, czero,czero, v(1,nr+1), ldv )
1238  CALL claset( 'A',n-nr,n-nr,czero,cone, v(nr+1,nr+1),ldv )
1239  END IF
1240 *
1241  CALL cunmlq( 'Left', 'C', n, n, nr, a, lda, cwork,
1242  $ v, ldv, cwork(n+1), lwork-n, ierr )
1243 *
1244  END IF
1245 *
1246  DO 8991 p = 1, n
1247  CALL ccopy( n, v(p,1), ldv, a(iwork(p),1), lda )
1248  8991 CONTINUE
1249  CALL clacpy( 'All', n, n, a, lda, v, ldv )
1250 *
1251  IF ( transp ) THEN
1252  CALL clacpy( 'All', n, n, v, ldv, u, ldu )
1253  END IF
1254 *
1255  ELSE IF ( lsvec .AND. ( .NOT. rsvec ) ) THEN
1256 *
1257 * .. Singular Values and Left Singular Vectors ..
1258 *
1259 * .. second preconditioning step to avoid need to accumulate
1260 * Jacobi rotations in the Jacobi iterations.
1261  DO 1965 p = 1, nr
1262  CALL ccopy( n-p+1, a(p,p), lda, u(p,p), 1 )
1263  CALL clacgv( n-p+1, u(p,p), 1 )
1264  1965 CONTINUE
1265  CALL claset( 'Upper', nr-1, nr-1, czero, czero, u(1,2), ldu )
1266 *
1267  CALL cgeqrf( n, nr, u, ldu, cwork(n+1), cwork(2*n+1),
1268  $ lwork-2*n, ierr )
1269 *
1270  DO 1967 p = 1, nr - 1
1271  CALL ccopy( nr-p, u(p,p+1), ldu, u(p+1,p), 1 )
1272  CALL clacgv( n-p+1, u(p,p), 1 )
1273  1967 CONTINUE
1274  CALL claset( 'Upper', nr-1, nr-1, czero, czero, u(1,2), ldu )
1275 *
1276  CALL cgesvj( 'Lower', 'U', 'N', nr,nr, u, ldu, sva, nr, a,
1277  $ lda, cwork(n+1), lwork-n, rwork, lrwork, info )
1278  scalem = rwork(1)
1279  numrank = nint(rwork(2))
1280 *
1281  IF ( nr .LT. m ) THEN
1282  CALL claset( 'A', m-nr, nr,czero, czero, u(nr+1,1), ldu )
1283  IF ( nr .LT. n1 ) THEN
1284  CALL claset( 'A',nr, n1-nr, czero, czero, u(1,nr+1),ldu )
1285  CALL claset( 'A',m-nr,n1-nr,czero,cone,u(nr+1,nr+1),ldu )
1286  END IF
1287  END IF
1288 *
1289  CALL cunmqr( 'Left', 'No Tr', m, n1, n, a, lda, cwork, u,
1290  $ ldu, cwork(n+1), lwork-n, ierr )
1291 *
1292  IF ( rowpiv )
1293  $ CALL claswp( n1, u, ldu, 1, m-1, iwork(2*n+1), -1 )
1294 *
1295  DO 1974 p = 1, n1
1296  xsc = one / scnrm2( m, u(1,p), 1 )
1297  CALL csscal( m, xsc, u(1,p), 1 )
1298  1974 CONTINUE
1299 *
1300  IF ( transp ) THEN
1301  CALL clacpy( 'All', n, n, u, ldu, v, ldv )
1302  END IF
1303 *
1304  ELSE
1305 *
1306 * .. Full SVD ..
1307 *
1308  IF ( .NOT. jracc ) THEN
1309 *
1310  IF ( .NOT. almort ) THEN
1311 *
1312 * Second Preconditioning Step (QRF [with pivoting])
1313 * Note that the composition of TRANSPOSE, QRF and TRANSPOSE is
1314 * equivalent to an LQF CALL. Since in many libraries the QRF
1315 * seems to be better optimized than the LQF, we do explicit
1316 * transpose and use the QRF. This is subject to changes in an
1317 * optimized implementation of CGEJSV.
1318 *
1319  DO 1968 p = 1, nr
1320  CALL ccopy( n-p+1, a(p,p), lda, v(p,p), 1 )
1321  CALL clacgv( n-p+1, v(p,p), 1 )
1322  1968 CONTINUE
1323 *
1324 * .. the following two loops perturb small entries to avoid
1325 * denormals in the second QR factorization, where they are
1326 * as good as zeros. This is done to avoid painfully slow
1327 * computation with denormals. The relative size of the perturbation
1328 * is a parameter that can be changed by the implementer.
1329 * This perturbation device will be obsolete on machines with
1330 * properly implemented arithmetic.
1331 * To switch it off, set L2PERT=.FALSE. To remove it from the
1332 * code, remove the action under L2PERT=.TRUE., leave the ELSE part.
1333 * The following two loops should be blocked and fused with the
1334 * transposed copy above.
1335 *
1336  IF ( l2pert ) THEN
1337  xsc = sqrt(small)
1338  DO 2969 q = 1, nr
1339  ctemp = cmplx(xsc*abs( v(q,q) ),zero)
1340  DO 2968 p = 1, n
1341  IF ( ( p .GT. q ) .AND. ( abs(v(p,q)) .LE. temp1 )
1342  $ .OR. ( p .LT. q ) )
1343 * $ V(p,q) = TEMP1 * ( V(p,q) / ABS(V(p,q)) )
1344  $ v(p,q) = ctemp
1345  IF ( p .LT. q ) v(p,q) = - v(p,q)
1346  2968 CONTINUE
1347  2969 CONTINUE
1348  ELSE
1349  CALL claset( 'U', nr-1, nr-1, czero, czero, v(1,2), ldv )
1350  END IF
1351 *
1352 * Estimate the row scaled condition number of R1
1353 * (If R1 is rectangular, N > NR, then the condition number
1354 * of the leading NR x NR submatrix is estimated.)
1355 *
1356  CALL clacpy( 'L', nr, nr, v, ldv, cwork(2*n+1), nr )
1357  DO 3950 p = 1, nr
1358  temp1 = scnrm2(nr-p+1,cwork(2*n+(p-1)*nr+p),1)
1359  CALL csscal(nr-p+1,one/temp1,cwork(2*n+(p-1)*nr+p),1)
1360  3950 CONTINUE
1361  CALL cpocon('Lower',nr,cwork(2*n+1),nr,one,temp1,
1362  $ cwork(2*n+nr*nr+1),rwork,ierr)
1363  condr1 = one / sqrt(temp1)
1364 * .. here need a second oppinion on the condition number
1365 * .. then assume worst case scenario
1366 * R1 is OK for inverse <=> CONDR1 .LT. FLOAT(N)
1367 * more conservative <=> CONDR1 .LT. SQRT(FLOAT(N))
1368 *
1369  cond_ok = sqrt(sqrt(float(nr)))
1370 *[TP] COND_OK is a tuning parameter.
1371 *
1372  IF ( condr1 .LT. cond_ok ) THEN
1373 * .. the second QRF without pivoting. Note: in an optimized
1374 * implementation, this QRF should be implemented as the QRF
1375 * of a lower triangular matrix.
1376 * R1^* = Q2 * R2
1377  CALL cgeqrf( n, nr, v, ldv, cwork(n+1), cwork(2*n+1),
1378  $ lwork-2*n, ierr )
1379 *
1380  IF ( l2pert ) THEN
1381  xsc = sqrt(small)/epsln
1382  DO 3959 p = 2, nr
1383  DO 3958 q = 1, p - 1
1384  ctemp=cmplx(xsc*amin1(abs(v(p,p)),abs(v(q,q))),
1385  $ zero)
1386  IF ( abs(v(q,p)) .LE. temp1 )
1387 * $ V(q,p) = TEMP1 * ( V(q,p) / ABS(V(q,p)) )
1388  $ v(q,p) = ctemp
1389  3958 CONTINUE
1390  3959 CONTINUE
1391  END IF
1392 *
1393  IF ( nr .NE. n )
1394  $ CALL clacpy( 'A', n, nr, v, ldv, cwork(2*n+1), n )
1395 * .. save ...
1396 *
1397 * .. this transposed copy should be better than naive
1398  DO 1969 p = 1, nr - 1
1399  CALL ccopy( nr-p, v(p,p+1), ldv, v(p+1,p), 1 )
1400  CALL clacgv(nr-p+1, v(p,p), 1 )
1401  1969 CONTINUE
1402  v(nr,nr)=conjg(v(nr,nr))
1403 *
1404  condr2 = condr1
1405 *
1406  ELSE
1407 *
1408 * .. ill-conditioned case: second QRF with pivoting
1409 * Note that windowed pivoting would be equaly good
1410 * numerically, and more run-time efficient. So, in
1411 * an optimal implementation, the next call to CGEQP3
1412 * should be replaced with eg. CALL CGEQPX (ACM TOMS #782)
1413 * with properly (carefully) chosen parameters.
1414 *
1415 * R1^* * P2 = Q2 * R2
1416  DO 3003 p = 1, nr
1417  iwork(n+p) = 0
1418  3003 CONTINUE
1419  CALL cgeqp3( n, nr, v, ldv, iwork(n+1), cwork(n+1),
1420  $ cwork(2*n+1), lwork-2*n, rwork, ierr )
1421 ** CALL CGEQRF( N, NR, V, LDV, CWORK(N+1), CWORK(2*N+1),
1422 ** $ LWORK-2*N, IERR )
1423  IF ( l2pert ) THEN
1424  xsc = sqrt(small)
1425  DO 3969 p = 2, nr
1426  DO 3968 q = 1, p - 1
1427  ctemp=cmplx(xsc*amin1(abs(v(p,p)),abs(v(q,q))),
1428  $ zero)
1429  IF ( abs(v(q,p)) .LE. temp1 )
1430 * $ V(q,p) = TEMP1 * ( V(q,p) / ABS(V(q,p)) )
1431  $ v(q,p) = ctemp
1432  3968 CONTINUE
1433  3969 CONTINUE
1434  END IF
1435 *
1436  CALL clacpy( 'A', n, nr, v, ldv, cwork(2*n+1), n )
1437 *
1438  IF ( l2pert ) THEN
1439  xsc = sqrt(small)
1440  DO 8970 p = 2, nr
1441  DO 8971 q = 1, p - 1
1442  ctemp=cmplx(xsc*amin1(abs(v(p,p)),abs(v(q,q))),
1443  $ zero)
1444 * V(p,q) = - TEMP1*( V(q,p) / ABS(V(q,p)) )
1445  v(p,q) = - ctemp
1446  8971 CONTINUE
1447  8970 CONTINUE
1448  ELSE
1449  CALL claset( 'L',nr-1,nr-1,czero,czero,v(2,1),ldv )
1450  END IF
1451 * Now, compute R2 = L3 * Q3, the LQ factorization.
1452  CALL cgelqf( nr, nr, v, ldv, cwork(2*n+n*nr+1),
1453  $ cwork(2*n+n*nr+nr+1), lwork-2*n-n*nr-nr, ierr )
1454 * .. and estimate the condition number
1455  CALL clacpy( 'L',nr,nr,v,ldv,cwork(2*n+n*nr+nr+1),nr )
1456  DO 4950 p = 1, nr
1457  temp1 = scnrm2( p, cwork(2*n+n*nr+nr+p), nr )
1458  CALL csscal( p, one/temp1, cwork(2*n+n*nr+nr+p), nr )
1459  4950 CONTINUE
1460  CALL cpocon( 'L',nr,cwork(2*n+n*nr+nr+1),nr,one,temp1,
1461  $ cwork(2*n+n*nr+nr+nr*nr+1),rwork,ierr )
1462  condr2 = one / sqrt(temp1)
1463 *
1464 *
1465  IF ( condr2 .GE. cond_ok ) THEN
1466 * .. save the Householder vectors used for Q3
1467 * (this overwrittes the copy of R2, as it will not be
1468 * needed in this branch, but it does not overwritte the
1469 * Huseholder vectors of Q2.).
1470  CALL clacpy( 'U', nr, nr, v, ldv, cwork(2*n+1), n )
1471 * .. and the rest of the information on Q3 is in
1472 * WORK(2*N+N*NR+1:2*N+N*NR+N)
1473  END IF
1474 *
1475  END IF
1476 *
1477  IF ( l2pert ) THEN
1478  xsc = sqrt(small)
1479  DO 4968 q = 2, nr
1480  ctemp = xsc * v(q,q)
1481  DO 4969 p = 1, q - 1
1482 * V(p,q) = - SIGN( TEMP1, V(q,p) )
1483 * V(p,q) = - TEMP1*( V(p,q) / ABS(V(p,q)) )
1484  v(p,q) = - ctemp
1485  4969 CONTINUE
1486  4968 CONTINUE
1487  ELSE
1488  CALL claset( 'U', nr-1,nr-1, czero,czero, v(1,2), ldv )
1489  END IF
1490 *
1491 * Second preconditioning finished; continue with Jacobi SVD
1492 * The input matrix is lower trinagular.
1493 *
1494 * Recover the right singular vectors as solution of a well
1495 * conditioned triangular matrix equation.
1496 *
1497  IF ( condr1 .LT. cond_ok ) THEN
1498 *
1499  CALL cgesvj( 'L','U','N',nr,nr,v,ldv,sva,nr,u, ldu,
1500  $ cwork(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,rwork,
1501  $ lrwork, info )
1502  scalem = rwork(1)
1503  numrank = nint(rwork(2))
1504  DO 3970 p = 1, nr
1505  CALL ccopy( nr, v(1,p), 1, u(1,p), 1 )
1506  CALL csscal( nr, sva(p), v(1,p), 1 )
1507  3970 CONTINUE
1508 
1509 * .. pick the right matrix equation and solve it
1510 *
1511  IF ( nr .EQ. n ) THEN
1512 * :)) .. best case, R1 is inverted. The solution of this matrix
1513 * equation is Q2*V2 = the product of the Jacobi rotations
1514 * used in CGESVJ, premultiplied with the orthogonal matrix
1515 * from the second QR factorization.
1516  CALL ctrsm('L','U','N','N', nr,nr,cone, a,lda, v,ldv)
1517  ELSE
1518 * .. R1 is well conditioned, but non-square. Adjoint of R2
1519 * is inverted to get the product of the Jacobi rotations
1520 * used in CGESVJ. The Q-factor from the second QR
1521 * factorization is then built in explicitly.
1522  CALL ctrsm('L','U','C','N',nr,nr,cone,cwork(2*n+1),
1523  $ n,v,ldv)
1524  IF ( nr .LT. n ) THEN
1525  CALL claset('A',n-nr,nr,czero,czero,v(nr+1,1),ldv)
1526  CALL claset('A',nr,n-nr,czero,czero,v(1,nr+1),ldv)
1527  CALL claset('A',n-nr,n-nr,czero,cone,v(nr+1,nr+1),ldv)
1528  END IF
1529  CALL cunmqr('L','N',n,n,nr,cwork(2*n+1),n,cwork(n+1),
1530  $ v,ldv,cwork(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr)
1531  END IF
1532 *
1533  ELSE IF ( condr2 .LT. cond_ok ) THEN
1534 *
1535 * The matrix R2 is inverted. The solution of the matrix equation
1536 * is Q3^* * V3 = the product of the Jacobi rotations (appplied to
1537 * the lower triangular L3 from the LQ factorization of
1538 * R2=L3*Q3), pre-multiplied with the transposed Q3.
1539  CALL cgesvj( 'L', 'U', 'N', nr, nr, v, ldv, sva, nr, u,
1540  $ ldu, cwork(2*n+n*nr+nr+1), lwork-2*n-n*nr-nr,
1541  $ rwork, lrwork, info )
1542  scalem = rwork(1)
1543  numrank = nint(rwork(2))
1544  DO 3870 p = 1, nr
1545  CALL ccopy( nr, v(1,p), 1, u(1,p), 1 )
1546  CALL csscal( nr, sva(p), u(1,p), 1 )
1547  3870 CONTINUE
1548  CALL ctrsm('L','U','N','N',nr,nr,cone,cwork(2*n+1),n,
1549  $ u,ldu)
1550 * .. apply the permutation from the second QR factorization
1551  DO 873 q = 1, nr
1552  DO 872 p = 1, nr
1553  cwork(2*n+n*nr+nr+iwork(n+p)) = u(p,q)
1554  872 CONTINUE
1555  DO 874 p = 1, nr
1556  u(p,q) = cwork(2*n+n*nr+nr+p)
1557  874 CONTINUE
1558  873 CONTINUE
1559  IF ( nr .LT. n ) THEN
1560  CALL claset( 'A',n-nr,nr,czero,czero,v(nr+1,1),ldv )
1561  CALL claset( 'A',nr,n-nr,czero,czero,v(1,nr+1),ldv )
1562  CALL claset('A',n-nr,n-nr,czero,cone,v(nr+1,nr+1),ldv)
1563  END IF
1564  CALL cunmqr( 'L','N',n,n,nr,cwork(2*n+1),n,cwork(n+1),
1565  $ v,ldv,cwork(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr )
1566  ELSE
1567 * Last line of defense.
1568 * #:( This is a rather pathological case: no scaled condition
1569 * improvement after two pivoted QR factorizations. Other
1570 * possibility is that the rank revealing QR factorization
1571 * or the condition estimator has failed, or the COND_OK
1572 * is set very close to ONE (which is unnecessary). Normally,
1573 * this branch should never be executed, but in rare cases of
1574 * failure of the RRQR or condition estimator, the last line of
1575 * defense ensures that CGEJSV completes the task.
1576 * Compute the full SVD of L3 using CGESVJ with explicit
1577 * accumulation of Jacobi rotations.
1578  CALL cgesvj( 'L', 'U', 'V', nr, nr, v, ldv, sva, nr, u,
1579  $ ldu, cwork(2*n+n*nr+nr+1), lwork-2*n-n*nr-nr,
1580  $ rwork, lrwork, info )
1581  scalem = rwork(1)
1582  numrank = nint(rwork(2))
1583  IF ( nr .LT. n ) THEN
1584  CALL claset( 'A',n-nr,nr,czero,czero,v(nr+1,1),ldv )
1585  CALL claset( 'A',nr,n-nr,czero,czero,v(1,nr+1),ldv )
1586  CALL claset('A',n-nr,n-nr,czero,cone,v(nr+1,nr+1),ldv)
1587  END IF
1588  CALL cunmqr( 'L','N',n,n,nr,cwork(2*n+1),n,cwork(n+1),
1589  $ v,ldv,cwork(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr )
1590 *
1591  CALL cunmlq( 'L', 'C', nr, nr, nr, cwork(2*n+1), n,
1592  $ cwork(2*n+n*nr+1), u, ldu, cwork(2*n+n*nr+nr+1),
1593  $ lwork-2*n-n*nr-nr, ierr )
1594  DO 773 q = 1, nr
1595  DO 772 p = 1, nr
1596  cwork(2*n+n*nr+nr+iwork(n+p)) = u(p,q)
1597  772 CONTINUE
1598  DO 774 p = 1, nr
1599  u(p,q) = cwork(2*n+n*nr+nr+p)
1600  774 CONTINUE
1601  773 CONTINUE
1602 *
1603  END IF
1604 *
1605 * Permute the rows of V using the (column) permutation from the
1606 * first QRF. Also, scale the columns to make them unit in
1607 * Euclidean norm. This applies to all cases.
1608 *
1609  temp1 = sqrt(float(n)) * epsln
1610  DO 1972 q = 1, n
1611  DO 972 p = 1, n
1612  cwork(2*n+n*nr+nr+iwork(p)) = v(p,q)
1613  972 CONTINUE
1614  DO 973 p = 1, n
1615  v(p,q) = cwork(2*n+n*nr+nr+p)
1616  973 CONTINUE
1617  xsc = one / scnrm2( n, v(1,q), 1 )
1618  IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1619  $ CALL csscal( n, xsc, v(1,q), 1 )
1620  1972 CONTINUE
1621 * At this moment, V contains the right singular vectors of A.
1622 * Next, assemble the left singular vector matrix U (M x N).
1623  IF ( nr .LT. m ) THEN
1624  CALL claset('A', m-nr, nr, czero, czero, u(nr+1,1), ldu)
1625  IF ( nr .LT. n1 ) THEN
1626  CALL claset('A',nr,n1-nr,czero,czero,u(1,nr+1),ldu)
1627  CALL claset('A',m-nr,n1-nr,czero,cone,
1628  $ u(nr+1,nr+1),ldu)
1629  END IF
1630  END IF
1631 *
1632 * The Q matrix from the first QRF is built into the left singular
1633 * matrix U. This applies to all cases.
1634 *
1635  CALL cunmqr( 'Left', 'No_Tr', m, n1, n, a, lda, cwork, u,
1636  $ ldu, cwork(n+1), lwork-n, ierr )
1637 
1638 * The columns of U are normalized. The cost is O(M*N) flops.
1639  temp1 = sqrt(float(m)) * epsln
1640  DO 1973 p = 1, nr
1641  xsc = one / scnrm2( m, u(1,p), 1 )
1642  IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1643  $ CALL csscal( m, xsc, u(1,p), 1 )
1644  1973 CONTINUE
1645 *
1646 * If the initial QRF is computed with row pivoting, the left
1647 * singular vectors must be adjusted.
1648 *
1649  IF ( rowpiv )
1650  $ CALL claswp( n1, u, ldu, 1, m-1, iwork(2*n+1), -1 )
1651 *
1652  ELSE
1653 *
1654 * .. the initial matrix A has almost orthogonal columns and
1655 * the second QRF is not needed
1656 *
1657  CALL clacpy( 'Upper', n, n, a, lda, cwork(n+1), n )
1658  IF ( l2pert ) THEN
1659  xsc = sqrt(small)
1660  DO 5970 p = 2, n
1661  ctemp = xsc * cwork( n + (p-1)*n + p )
1662  DO 5971 q = 1, p - 1
1663 * CWORK(N+(q-1)*N+p)=-TEMP1 * ( CWORK(N+(p-1)*N+q) /
1664 * $ ABS(CWORK(N+(p-1)*N+q)) )
1665  cwork(n+(q-1)*n+p)=-ctemp
1666  5971 CONTINUE
1667  5970 CONTINUE
1668  ELSE
1669  CALL claset( 'Lower',n-1,n-1,czero,czero,cwork(n+2),n )
1670  END IF
1671 *
1672  CALL cgesvj( 'Upper', 'U', 'N', n, n, cwork(n+1), n, sva,
1673  $ n, u, ldu, cwork(n+n*n+1), lwork-n-n*n, rwork, lrwork,
1674  $ info )
1675 *
1676  scalem = rwork(1)
1677  numrank = nint(rwork(2))
1678  DO 6970 p = 1, n
1679  CALL ccopy( n, cwork(n+(p-1)*n+1), 1, u(1,p), 1 )
1680  CALL csscal( n, sva(p), cwork(n+(p-1)*n+1), 1 )
1681  6970 CONTINUE
1682 *
1683  CALL ctrsm( 'Left', 'Upper', 'NoTrans', 'No UD', n, n,
1684  $ cone, a, lda, cwork(n+1), n )
1685  DO 6972 p = 1, n
1686  CALL ccopy( n, cwork(n+p), n, v(iwork(p),1), ldv )
1687  6972 CONTINUE
1688  temp1 = sqrt(float(n))*epsln
1689  DO 6971 p = 1, n
1690  xsc = one / scnrm2( n, v(1,p), 1 )
1691  IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1692  $ CALL csscal( n, xsc, v(1,p), 1 )
1693  6971 CONTINUE
1694 *
1695 * Assemble the left singular vector matrix U (M x N).
1696 *
1697  IF ( n .LT. m ) THEN
1698  CALL claset( 'A', m-n, n, czero, czero, u(n+1,1), ldu )
1699  IF ( n .LT. n1 ) THEN
1700  CALL claset('A',n, n1-n, czero, czero, u(1,n+1),ldu)
1701  CALL claset( 'A',m-n,n1-n, czero, cone,u(n+1,n+1),ldu)
1702  END IF
1703  END IF
1704  CALL cunmqr( 'Left', 'No Tr', m, n1, n, a, lda, cwork, u,
1705  $ ldu, cwork(n+1), lwork-n, ierr )
1706  temp1 = sqrt(float(m))*epsln
1707  DO 6973 p = 1, n1
1708  xsc = one / scnrm2( m, u(1,p), 1 )
1709  IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1710  $ CALL csscal( m, xsc, u(1,p), 1 )
1711  6973 CONTINUE
1712 *
1713  IF ( rowpiv )
1714  $ CALL claswp( n1, u, ldu, 1, m-1, iwork(2*n+1), -1 )
1715 *
1716  END IF
1717 *
1718 * end of the >> almost orthogonal case << in the full SVD
1719 *
1720  ELSE
1721 *
1722 * This branch deploys a preconditioned Jacobi SVD with explicitly
1723 * accumulated rotations. It is included as optional, mainly for
1724 * experimental purposes. It does perfom well, and can also be used.
1725 * In this implementation, this branch will be automatically activated
1726 * if the condition number sigma_max(A) / sigma_min(A) is predicted
1727 * to be greater than the overflow threshold. This is because the
1728 * a posteriori computation of the singular vectors assumes robust
1729 * implementation of BLAS and some LAPACK procedures, capable of working
1730 * in presence of extreme values. Since that is not always the case, ...
1731 *
1732  DO 7968 p = 1, nr
1733  CALL ccopy( n-p+1, a(p,p), lda, v(p,p), 1 )
1734  CALL clacgv( n-p+1, v(p,p), 1 )
1735  7968 CONTINUE
1736 *
1737  IF ( l2pert ) THEN
1738  xsc = sqrt(small/epsln)
1739  DO 5969 q = 1, nr
1740  ctemp = cmplx(xsc*abs( v(q,q) ),zero)
1741  DO 5968 p = 1, n
1742  IF ( ( p .GT. q ) .AND. ( abs(v(p,q)) .LE. temp1 )
1743  $ .OR. ( p .LT. q ) )
1744 * $ V(p,q) = TEMP1 * ( V(p,q) / ABS(V(p,q)) )
1745  $ v(p,q) = ctemp
1746  IF ( p .LT. q ) v(p,q) = - v(p,q)
1747  5968 CONTINUE
1748  5969 CONTINUE
1749  ELSE
1750  CALL claset( 'U', nr-1, nr-1, czero, czero, v(1,2), ldv )
1751  END IF
1752 
1753  CALL cgeqrf( n, nr, v, ldv, cwork(n+1), cwork(2*n+1),
1754  $ lwork-2*n, ierr )
1755  CALL clacpy( 'L', n, nr, v, ldv, cwork(2*n+1), n )
1756 *
1757  DO 7969 p = 1, nr
1758  CALL ccopy( nr-p+1, v(p,p), ldv, u(p,p), 1 )
1759  CALL clacgv( nr-p+1, u(p,p), 1 )
1760  7969 CONTINUE
1761 
1762  IF ( l2pert ) THEN
1763  xsc = sqrt(small/epsln)
1764  DO 9970 q = 2, nr
1765  DO 9971 p = 1, q - 1
1766  ctemp = cmplx(xsc * amin1(abs(u(p,p)),abs(u(q,q))),
1767  $ zero)
1768 * U(p,q) = - TEMP1 * ( U(q,p) / ABS(U(q,p)) )
1769  u(p,q) = - ctemp
1770  9971 CONTINUE
1771  9970 CONTINUE
1772  ELSE
1773  CALL claset('U', nr-1, nr-1, czero, czero, u(1,2), ldu )
1774  END IF
1775 
1776  CALL cgesvj( 'L', 'U', 'V', nr, nr, u, ldu, sva,
1777  $ n, v, ldv, cwork(2*n+n*nr+1), lwork-2*n-n*nr,
1778  $ rwork, lrwork, info )
1779  scalem = rwork(1)
1780  numrank = nint(rwork(2))
1781 
1782  IF ( nr .LT. n ) THEN
1783  CALL claset( 'A',n-nr,nr,czero,czero,v(nr+1,1),ldv )
1784  CALL claset( 'A',nr,n-nr,czero,czero,v(1,nr+1),ldv )
1785  CALL claset( 'A',n-nr,n-nr,czero,cone,v(nr+1,nr+1),ldv )
1786  END IF
1787 
1788  CALL cunmqr( 'L','N',n,n,nr,cwork(2*n+1),n,cwork(n+1),
1789  $ v,ldv,cwork(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr )
1790 *
1791 * Permute the rows of V using the (column) permutation from the
1792 * first QRF. Also, scale the columns to make them unit in
1793 * Euclidean norm. This applies to all cases.
1794 *
1795  temp1 = sqrt(float(n)) * epsln
1796  DO 7972 q = 1, n
1797  DO 8972 p = 1, n
1798  cwork(2*n+n*nr+nr+iwork(p)) = v(p,q)
1799  8972 CONTINUE
1800  DO 8973 p = 1, n
1801  v(p,q) = cwork(2*n+n*nr+nr+p)
1802  8973 CONTINUE
1803  xsc = one / scnrm2( n, v(1,q), 1 )
1804  IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1805  $ CALL csscal( n, xsc, v(1,q), 1 )
1806  7972 CONTINUE
1807 *
1808 * At this moment, V contains the right singular vectors of A.
1809 * Next, assemble the left singular vector matrix U (M x N).
1810 *
1811  IF ( nr .LT. m ) THEN
1812  CALL claset( 'A', m-nr, nr, czero, czero, u(nr+1,1), ldu )
1813  IF ( nr .LT. n1 ) THEN
1814  CALL claset('A',nr, n1-nr, czero, czero, u(1,nr+1),ldu)
1815  CALL claset('A',m-nr,n1-nr, czero, cone,u(nr+1,nr+1),ldu)
1816  END IF
1817  END IF
1818 *
1819  CALL cunmqr( 'Left', 'No Tr', m, n1, n, a, lda, cwork, u,
1820  $ ldu, cwork(n+1), lwork-n, ierr )
1821 *
1822  IF ( rowpiv )
1823  $ CALL claswp( n1, u, ldu, 1, m-1, iwork(2*n+1), -1 )
1824 *
1825 *
1826  END IF
1827  IF ( transp ) THEN
1828 * .. swap U and V because the procedure worked on A^*
1829  DO 6974 p = 1, n
1830  CALL cswap( n, u(1,p), 1, v(1,p), 1 )
1831  6974 CONTINUE
1832  END IF
1833 *
1834  END IF
1835 * end of the full SVD
1836 *
1837 * Undo scaling, if necessary (and possible)
1838 *
1839  IF ( uscal2 .LE. (big/sva(1))*uscal1 ) THEN
1840  CALL slascl( 'G', 0, 0, uscal1, uscal2, nr, 1, sva, n, ierr )
1841  uscal1 = one
1842  uscal2 = one
1843  END IF
1844 *
1845  IF ( nr .LT. n ) THEN
1846  DO 3004 p = nr+1, n
1847  sva(p) = zero
1848  3004 CONTINUE
1849  END IF
1850 *
1851  rwork(1) = uscal2 * scalem
1852  rwork(2) = uscal1
1853  IF ( errest ) rwork(3) = sconda
1854  IF ( lsvec .AND. rsvec ) THEN
1855  rwork(4) = condr1
1856  rwork(5) = condr2
1857  END IF
1858  IF ( l2tran ) THEN
1859  rwork(6) = entra
1860  rwork(7) = entrat
1861  END IF
1862 *
1863  iwork(1) = nr
1864  iwork(2) = numrank
1865  iwork(3) = warning
1866 *
1867  RETURN
1868 * ..
1869 * .. END OF CGEJSV
1870 * ..
1871  END
1872 *
subroutine slassq(N, X, INCX, SCALE, SUMSQ)
SLASSQ updates a sum of squares represented in scaled form.
Definition: slassq.f:105
subroutine clascl(TYPE, KL, KU, CFROM, CTO, M, N, A, LDA, INFO)
CLASCL multiplies a general rectangular matrix by a real scalar defined as cto/cfrom.
Definition: clascl.f:145
subroutine slascl(TYPE, KL, KU, CFROM, CTO, M, N, A, LDA, INFO)
SLASCL multiplies a general rectangular matrix by a real scalar defined as cto/cfrom.
Definition: slascl.f:145
subroutine classq(N, X, INCX, SCALE, SUMSQ)
CLASSQ updates a sum of squares represented in scaled form.
Definition: classq.f:108
subroutine xerbla(SRNAME, INFO)
XERBLA
Definition: xerbla.f:62
subroutine cpocon(UPLO, N, A, LDA, ANORM, RCOND, WORK, RWORK, INFO)
CPOCON
Definition: cpocon.f:123
subroutine ctrsm(SIDE, UPLO, TRANSA, DIAG, M, N, ALPHA, A, LDA, B, LDB)
CTRSM
Definition: ctrsm.f:182
subroutine cunmqr(SIDE, TRANS, M, N, K, A, LDA, TAU, C, LDC, WORK, LWORK, INFO)
CUNMQR
Definition: cunmqr.f:170
subroutine cgelqf(M, N, A, LDA, TAU, WORK, LWORK, INFO)
CGELQF
Definition: cgelqf.f:137
subroutine claset(UPLO, M, N, ALPHA, BETA, A, LDA)
CLASET initializes the off-diagonal elements and the diagonal elements of a matrix to given values...
Definition: claset.f:108
subroutine cgeqp3(M, N, A, LDA, JPVT, TAU, WORK, LWORK, RWORK, INFO)
CGEQP3
Definition: cgeqp3.f:161
subroutine cgeqrf(M, N, A, LDA, TAU, WORK, LWORK, INFO)
CGEQRF
Definition: cgeqrf.f:138
subroutine cgejsv(JOBA, JOBU, JOBV, JOBR, JOBT, JOBP, M, N, A, LDA, SVA, U, LDU, V, LDV, CWORK, LWORK, RWORK, LRWORK, IWORK, INFO)
CGEJSV
Definition: cgejsv.f:519
subroutine cgesvj(JOBA, JOBU, JOBV, M, N, A, LDA, SVA, MV, V, LDV, CWORK, LWORK, RWORK, LRWORK, INFO)
CGESVJ
Definition: cgesvj.f:332
subroutine clacpy(UPLO, M, N, A, LDA, B, LDB)
CLACPY copies all or part of one two-dimensional array to another.
Definition: clacpy.f:105
subroutine sscal(N, SA, SX, INCX)
SSCAL
Definition: sscal.f:55
subroutine cunmlq(SIDE, TRANS, M, N, K, A, LDA, TAU, C, LDC, WORK, LWORK, INFO)
CUNMLQ
Definition: cunmlq.f:170
subroutine ccopy(N, CX, INCX, CY, INCY)
CCOPY
Definition: ccopy.f:52
subroutine clacgv(N, X, INCX)
CLACGV conjugates a complex vector.
Definition: clacgv.f:76
subroutine cswap(N, CX, INCX, CY, INCY)
CSWAP
Definition: cswap.f:52
subroutine cungqr(M, N, K, A, LDA, TAU, WORK, LWORK, INFO)
CUNGQR
Definition: cungqr.f:130
subroutine claswp(N, A, LDA, K1, K2, IPIV, INCX)
CLASWP performs a series of row interchanges on a general rectangular matrix.
Definition: claswp.f:116
subroutine csscal(N, SA, CX, INCX)
CSSCAL
Definition: csscal.f:54