Singular value problems

Singular values and singular vectors

It is possible to iteratively compute a few singular values and corresponding left and right singular vectors using the function svdsolve:

KrylovKit.svdsolveFunction
svdsolve(A::AbstractMatrix, [x₀, howmany = 1, which = :LR, T = eltype(A)]; kwargs...)
svdsolve(f, m::Int, [howmany = 1, which = :LR, T = Float64]; kwargs...)
svdsolve(f, x₀, [howmany = 1, which = :LR]; kwargs...)
# expert version:
svdsolve(f, x₀, howmany, which, algorithm; alg_rrule=...)

Compute howmany singular values from the linear map encoded in the matrix A or by the function f. Return singular values, left and right singular vectors and a ConvergenceInfo structure.

Arguments:

The linear map can be an AbstractMatrix (dense or sparse) or a general function or callable object. Since both the action of the linear map and its adjoint are required in order to compute singular values, f can either be a tuple of two callable objects (each accepting a single argument), representing the linear map and its adjoint respectively, or, f can be a single callable object that accepts two input arguments, where the second argument is a flag of type Val{true} or Val{false} that indicates whether the adjoint or the normal action of the linear map needs to be computed. The latter form still combines well with the do block syntax of Julia, as in

vals, lvecs, rvecs, info = svdsolve(x₀, howmany, which; kwargs...) do x, flag
    if flag === Val(true)
        # y = compute action of adjoint map on x
    else
        # y = compute action of linear map on x
    end
    return y
end

For a general linear map encoded using either the tuple or the two-argument form, the best approach is to provide a start vector x₀ (in the codomain, i.e. column space, of the linear map). Alternatively, one can specify the number m of rows of the linear map, in which case x₀ = rand(T, m) is used, where the default value of T is Float64, unless specified differently. If an AbstractMatrix is used, a starting vector x₀ does not need to be provided; it is chosen as rand(T, size(A, 1)).

The next arguments are optional, but should typically be specified. howmany specifies how many singular values and vectors should be computed; which specifies which singular values should be targeted. Valid specifications of which are

  • LR: largest singular values
  • SR: smallest singular values However, the largest singular values tend to converge more rapidly.

Return values:

The return value is always of the form vals, lvecs, rvecs, info = svdsolve(...) with

  • vals: a Vector{<:Real} containing the singular values, of length at least howmany, but could be longer if more singular values were converged at the same cost.
  • lvecs: a Vector of corresponding left singular vectors, of the same length as vals.
  • rvecs: a Vector of corresponding right singular vectors, of the same length as vals. Note that singular vectors are not returned as a matrix, as the linear map could act on any custom Julia type with vector like behavior, i.e. the elements of the

lists lvecs(rvecs) are objects that are typically similar to the starting guess x₀(A' * x₀), up to a possibly different eltype. When the linear map is a simple AbstractMatrix, lvecs and rvecs will be Vector{Vector{<:Number}}.

  • info: an object of type [ConvergenceInfo], which has the following fields

    • info.converged::Int: indicates how many singular values and vectors were actually converged to the specified tolerance tol (see below under keyword arguments)
    • info.residual::Vector: a list of the same length as vals containing the residuals info.residual[i] = A * rvecs[i] - vals[i] * lvecs[i].
    • info.normres::Vector{<:Real}: list of the same length as vals containing the norm of the residual info.normres[i] = norm(info.residual[i])
    • info.numops::Int: number of times the linear map was applied, i.e. number of times f was called, or a vector was multiplied with A or A'.
    • info.numiter::Int: number of times the Krylov subspace was restarted (see below)
Check for convergence

No warning is printed if not all requested singular values were converged, so always check if info.converged >= howmany.

Keyword arguments:

Keyword arguments and their default values are given by:

  • verbosity::Int = 0: verbosity level, i.e. 0 (no messages), 1 (single message at the end), 2 (information after every iteration), 3 (information per Krylov step)
  • krylovdim: the maximum dimension of the Krylov subspace that will be constructed. Note that the dimension of the vector space is not known or checked, e.g. x₀ should not necessarily support the Base.length function. If you know the actual problem dimension is smaller than the default value, it is useful to reduce the value of krylovdim, though in principle this should be detected.
  • tol: the requested accuracy according to normres as defined above. If you work in e.g. single precision (Float32), you should definitely change the default value.
  • maxiter: the number of times the Krylov subspace can be rebuilt; see below for further details on the algorithms.
  • orth: the orthogonalization method to be used, see Orthogonalizer
  • eager::Bool = false: if true, eagerly compute the SVD after every expansion of the Krylov subspace to test for convergence, otherwise wait until the Krylov subspace has dimension krylovdim

The final keyword argument alg_rrule is relevant only when svdsolve is used in a setting where reverse-mode automatic differentation will be used. A custom ChainRulesCore.rrule is defined for svdsolve, which can be evaluated using different algorithms that can be specified via alg_rrule. A suitable default is chosen, so this keyword argument should only be used when this default choice is failing or not performing efficiently. Check the documentation for more information on the possible values for alg_rrule and their implications on the algorithm being used.

Algorithm

The last method, without default values and keyword arguments, is the one that is finally called, and can also be used directly. Here the algorithm is specified, though currently only GKL is available. GKL refers to the the partial Golub-Kahan-Lanczos bidiagonalization which forms the basis for computing the approximation to the singular values. This factorization is dynamically shrunk and expanded (i.e. thick restart) similar to the Krylov-Schur factorization for eigenvalues.

source

Automatic differentation

The svdsolve routine can be used in conjunction with reverse-mode automatic differentiation, using AD engines that are compatible with the ChainRules ecosystem. The adjoint problem of a singular value problem contains a linear problem, although it can also be formulated as an eigenvalue problem. Details about this approach will be published in a forthcoming manuscript.

Both svdsolve and the adjoint problem associated with it require the action of the linear map as well as of its adjoint[1]. Hence, no new information about the linear map is required for the adjoint problem. However, the linear map is the only argument that affects the svdsolve output (from a theoretical perspective, the starting vector and algorithm parameters should have no effect), so that this is where the adjoint variables need to be propagated to.

The adjoint problem (also referred to as cotangent problem) can thus be solved as a linear problem or as an eigenvalue problem. Note that this eigenvalue problem is never symmetric or Hermitian. The different implementations of the rrule can be selected using the alg_rrule keyword argument. If a linear solver such as GMRES or BiCGStab is specified, the adjoint problem requires solving a] number of linear problems equal to the number of requested singular values and vectors. If an eigenvalue solver is specified, for which Arnoldi is essentially the only option, then the adjoint problem is solved as a single (but larger) eigenvalue problem.

Note that the common pair of left and right singular vectors has an arbitrary phase freedom. Hence, a well-defined cost function constructed from singular should depend on these in such a way that its value is not affected by simultaneously changing the left and right singular vector with a common phase factor, i.e. the cost function should be 'gauge invariant'. If this is not the case, the cost function is said to be 'gauge dependent', and this can be detected in the resulting adjoint variables for those singular vectors. The KrylovKit rrule for svdsolve will print a warning if it detects from the incoming adjoint variables that the cost function is gauge dependent. This warning can be suppressed by passing alg_rrule an algorithm with verbosity=-1.

with its (conjugate) transpose, at least with respect to the standard Euclidean inner product.

  • 1For a linear map, the adjoint or pullback required in the reverse-order chain rule coincides