This lecture is the most important lecture in Math 202A: we will obtain a remarkably transparent of morphisms in in total generality. It is perhaps surprising that this can be done; it is perhaps not surprising that it takes a couple of tries to get it right.
Take any two objects and any morphism
between them. Our job is to understand
without making any special assumptions on its structure. A natural strategy – perhaps the only strategy in such generality – is to find a canonical (in this case, meaning defined in the same way for all spaces) subspace
of
on which we can understand the behavior of
completely. The hope is that we can bootstrap this local understanding of
on the special subspace
to a global understanding on
on all of
This is yet another opportunity to point out that the quantum category is better than the classical one. Indeed, suppose we can find a subspace of
on which we can calculate the outputs of
exactly. If we think set theoretically, then in order to complete our understanding of
we would have to find a way to understand it on the set-theoretic complement
But linear algebraically, in order to complete our understanding of it is sufficient to bootstrap our understanding to the orthogonal complement
For example in three dimensions, this is the difference between extending from the line of understanding to the whole space with that line deleted, and extending only to the plane orthogonal to the line of understanding. The latter is a dimension reduction, the former is not.
There is one obvious candidate for an “exactly solvable” subspace canonically associated to any
namely
We know exactly how to compute
for every
the output is
Unfortunately, the kernel is not a good starting point for a bootstrap understanding of an arbitrary linear transformation. First of all, if
is injective than
, and the statement
is just part of what it means to be a linear transformation. Furthermore, even in the case where
is nontrivial we learn nothing, and attempting to bootstrap from
to all of
just brings us back to the injective situation, with no drop in complexity. The image of
under
is the zero space
in
Decomposing
we tautologically have that maps
into
Restricting
to a map
we are back in the injective situation. The dimension of the codomain
did not go down, and neither did the rank of
In terms of matrices, if we choose an orthonormal basis
of latex $V_0 = \mathrm{Ker}(A)$ and an orthonormal basis
and take
to be any orthonormal basis, then with respect to the bases
and
we have
where the row echelon form of the nested has the same row echelon form as the external
up to the removal of zero columns.
The basic lesson to be learned from this failure is that in order to implement our bootstrap strategy we need to identify an exactly solvable subspace canonically associated to every morphism
by means to algebra and geometry, not just pure algebra. One such space is
the space
whose nonzero vectors maximize the norm ratio a quantity bounded above by the operator norm
. At the beginning of this week’s lectures we proved the following nontrivial theorem using the Parallelogram Law.
Proposition 16.1. The set is a nonzero subspace of
The key idea of the Singular Value Decomposition is that is a space on which the behavior of
can be completely understood, and this understanding can be bootstrapped to all of
That is really what I want you to appreciate at a level beyond formulas. An arbitrary linear transformation can be completely understood in terms of the vectors which it maximally stretches.
First, let us explain why is “exactly solvable” on
no matter what
is. There are really two distinct cases:
and
. The first case means
is the zero transformation, and nothing more needs to be said. In the second case,
let
be the image of
under
and consider the restriction of
to
which is a morphism
Since
the map
is surjective by definition, and furthermore
Let us write
in the form
where
Since
is a nonzero scalar multiple of
it is also surjective. Moreover,
has operator norm equal to one, so it is an isometry and hence injective. Thus,
is an isometric isomorphism: the image of any orthonormal basis of
under
is an orthonormal basis of
and the matrix of
with respect to these bases is the square identity matrix
Thus, the morphism with respect to the bases
and
is
so considering the restriction of to
to be “exactly solved” should be noncontroversial. If you want a Euclidean image to hold in your mind, one option is the following: you can imagine
and
as two copies of the same space, and visualize
as mapping
onto
by first performing a rotation and then a dilation.
Now we want to bootstrap our local understanding of on
to a global understanding of
on
. It may happen that we get lucky, and no bootstrapping is required:
This means exactly that the transformation
is of the form
with
and
an isometric isomorphism.
We now proceed with the case where is a proper subspace of
(note that this implies
is not the zero transformation). In this case we have to figure out how
acts on the complementary space
which is not the zero subspace of
. Structurally, this means we need to figure out how
behaves with respect to the decompositions
where by definition is the image of
under
A good starting point is to wonder how
relates to the decomposition
, which is a point of intrinsic interest: what is the relationship between the vectors which
maximally stretches and those which it annihilates? It is easy to see that
Indeed, if
then we have
and since the condition
forces
Proposition 16.2. We have
Proof: A more geometrically suggestive equivalent statement is to take the complement and write the claim as This says that any vector in
which is maximally stretched by
is orthogonal to every vector annihilated by
and indeed this can be established using elementary geometric reasoning. (Note to self: somehow I never noticed this striking fact before – this is a good exam problem).
The counterpart of Proposition 16.3 in the target space decomposition is very straightforward: since
we have
which is equivalent to
Now consider the extreme case of Proposition 16.2 where exhausts
. Then, our decompositions
becomes
and we have achieved our goal of understanding how acts on all of its domain
– it acts as the scaled isometric isomorphism
in
, and it acts as the zero operator in
This situation is important enough to warrant special terminology.
Definition 16.3. A morphism is said to be a partial isometry if it acts isometrically on a nonzero subspace
of
and annihilates the complement
.
Equivalently, is a partial isometry if
and
What we have shown is that any nonzero morphism such that
and
are complementary subspaces of
is a positive multiple of a partial isometry.
The extremal case of Proposition 16.2 is just that – an extreme case. In general may be a proper subspace of
and it remains to analyze this situation. This is where the following result (which we already proved using a perturbation argument) is essential.
Proposition 16.4. We have .
The force of proposition 16.4 is that in conjunction with the decompositions
it reduces our problem to a genuinely lower-complexity version of the same problem: we have
where is a morphism of the form
with
and
an isometric isomorphism
which remains to be analyzed, but has lower complexity than our original morphism
because
We now repeat our analysis of for the lower-complexity situation
where both the dimension of the domain and the rank of the morphism have strictly decreased – this is what failed to happen when we tried to bootstrap starting with
Let us carry out this iteration. Let i.e.
where Note that since
is nothing but the restriction of
to
we have
and the inequality is strict, as pointed out to me by Evan Young at least twice. If
we are done:
is the zero operator and we are back in the situation where
is a positive multiple of a partial isometry
Otherwise,
and we define
Then, the restriction of
to
is a morphism
of the form
with
an isometric isomorphism. As with our first pass, it remains to understand how
acts on the orthogonal complement of
in
Structurally, we need to understand the decompositions
As in our first iteration, is a subspace of
. If
exhausts
then we are done: we have
which lifts back up to the level of as
If not, we repeat the process on the restriction of to
which is a morphism
of rank
Continuing the process until we exhaust we arrive at the Singular Value Decomposition.
Theorem 16.5. (SVD, chunky geometric form) For any morphism we have orthogonal decompositions
and numbers
such that, for each the spaces
are nonzero and the restriction of
to
is a morphism
of the form
with
an isometric isomorphism.
The paired spaces in the SVD are called the left and right singular spaces of
Although these spaces are nonzero, the number
of singular spaces may be zero. Indeed,
, and the case
occurs precisely when
is the zero transformation.
In general, the worst case scenario for the runtime of the recursive routine which establishes the SVD occurs when has a one-dimensional optimizer space
and the restriction of
to
again has a one-dimensional optimizer space, and so on until the rank
of
is exhausted after
iterations. In this case,
has
pairs of one-dimensional singular spaces
and
singular values
Even when the worst case scenario is not what occurs, we can write the Singular Value Decomposition in a form where all subspaces involved are one-dimensional, simply by choosing a basis within each and writing it as a direct sum of lines. This leads to an alternative formulation of the SVD with possibly repeated singular values, which really means singular values with multiplicity.
Theorem 16.6. (SVD, fine-grained geometric version) For any morphism of rank
we have orthogonal decompositions
and numbers
such that, for each the spaces
are one-dimensional and the restriction of
to
is a morphism
of the form
with
an isometric isomorphism of the lines
and
.
What is confusing in the way the two versions of the SVD are stated is that the symbols represent different things, but these things are not unrelated. Nevertheless I will leave our two versions as stated above instead of introducing different letters for each version.