The templates are ready for immediate use or as a departure
point for generalization,
e.g. problems over multiple variables with
orthogonality constraints, or code optimizations.
In the simplest mode, the
user needs only to supply a function to minimize (and first and
second derivatives, optionally) in F.m
(in dF.m
and
ddF.m
) and an initial guess Y0
. The calling sequence is
then a single call to sg_min
(named in honor
of Stiefel and Grassmann):
[fopt, yopt] = sg_min(Y0).
For example, if the function F.m
has the form
function f=F(Y) f=trace( Y' * diag(1:10) * Y * diag(1:3) );we can call
sg_min(rand(10,3))
, which specifies a random
starting point.
We strongly recommend also providing first derivative information:
function df=dF(Y) df = 2 * diag(1:10) * Y * diag(1:3);The code can do finite differences, but it is very slow and problematic. Second derivative information can also be provided by the user (this is not nearly as crucial for speed as providing first derivative information, but may improve accuracy):
function ddf=ddF(Y,H) ddf = 2 * diag(1:10) * H * diag(1:3);A sample test call where is known to have optimal value is
>> rand('state',0); % initialize random number generator >> fmin = sg_min(rand(10,3)) iter grad F(Y) flops step type 0 1.877773e+01 3.132748e+01 2470 none invdgrad: Hessian not positive definite, CG terminating early 1 1.342879e+01 2.011465e+01 163639 Newton step invdgrad: Hessian not positive definite, CG terminating early 2 1.130915e+01 1.368044e+01 344198 Newton step invdgrad: Hessian not positive definite, CG terminating early 3 5.974819e+00 1.063045e+01 515919 Newton step invdgrad: max iterations reached inverting the Hessian by CG 4 1.135417e+00 1.006835e+01 789520 Newton step 5 5.526359e-02 1.000009e+01 1049594 Newton step 6 5.072118e-05 1.000000e+01 1337540 Newton step 7 1.276025e-06 1.000000e+01 1762455 Newton step fmin = 10.0000
The full calling sequence to sg_min
is
[fopt, yopt]=sg_min(Y0,rc,mode,metric,motion,verbose,... gradtol,ftol,partition),where
Y0
is required and the optional arguments are
(see Table 9.1):
Argument | Description |
rc | {'real', 'complex' } |
mode | {'newton', 'dog', 'prcg', 'frcg' } |
metric | {'flat', 'euclidean', 'canonical' } |
motion | {'approximate', 'exact' } |
verbose | {'verbose', 'quiet'} |
ftol | First convergence tolerance |
gradtol | Second convergence tolerance |
partition | Partition of indicating symmetries of |
rc
is vital for counting the dimension of the problem
correctly. When omitted, sg_min
guesses are based on whether
Y0
has nonzero imaginary part.
ddF.m
).
While 'newton' is the default, this is by no means our endorsement of
it over the other methods, which might be more accurate and less
expensive for some problems.
SGdata
. When this
argument is 'quiet' then no convergence data is displayed or recorded.
grad/gradinit < gradtol
(default 1e-7
) or (f-fold)/f < ftol
(default 1e-10
),
where gradinit
is the initial magnitude of the gradient and
fold
is the value of at the last iteration.
1:p
.
If has no symmetry,
then the partition is num2cell(1:p)
.
If for all orthogonal , then the
partition is {1:p}
.
The partition is {1:2,3:5}
if the symmetry in is
for orthogonal with sparsity
structure
{3:5,1:2}
or {[5 3 4],[2 1]}
for the partition; i.e., a partition
is a set of sets.
The purpose of the partition is
to project away the null space of the Hessian
resulting from any block-diagonal orthogonal symmetries of .
If the argument is omitted,
our code will pick a partition
by determining the symmetries of
(using its partition.m
function).
However, the user should be aware that
a wrong partition can destroy convergence.