Math 202A: Review Lecture 5

A key concept is that there is a third fundamentally important subspace associated to every morphism A \in \mathrm{Hom}(V,W). Unlike \mathrm{Ker}(A) \leq V and \mathrm{Im}(A) \leq W, this subspace is geometric rather than algebraic in nature: it is defined as the set

\mathrm{Opt}(A) = \{v \in V \colon \|Av\|=\|A\|\|v\|\}

of vectors in V which saturate the operator norm inequality for A. It is not obvious that this actually is a vector space, and the following result is both crucial and nontrivial.

Theorem 5.1. \mathrm{Opt}(A) is a nonzero subspace of V.

The fact that \mathrm{Opt}(A) contains a unit vector follows from the extreme value theorem. We found a way to use the parallelogram law to show that \mathrm{Opt}(A) is a subspace of V. The essence of the singular value decomposition is understanding the relationship between the three fundamental subspaces

\mathrm{Ker}(A),\ \mathrm{Opt}(A),\ \mathrm{Im}(A),

a geometric endeavor which goes beyond the algebraic relationship between kernel and image given by the rank-nullity theorem. In order to succeed in this question, one needs the following fact.

Orthogonality Lemma. Two nonzero vectors v_1,v_2 \in V are orthogonal if and only if \|v_1\| \leq \|v_1+\lambda v_2\| for all \lambda \in \mathbb{C}.

Proof: Geometrically, the Orthogonality Lemma says that v_1,v_2 are orthogonal if and only if the point on the affine line v_1 + \mathbb{C}v_2 closes to 0_V is v_1. In fact, this characterization remains correct even in the infnite-dimensional setting.

Suppose first that v_1 and v_2 are orthogonal. Then, for any scalar \lambda we have

\|v_1+\lambda v_2\|^2 = \|v_1\|^2 + |\lambda|^2\|v_2\|^2 \geq \|v_1\|^2.

Conversely, suppose that the nonnegative function

f(\lambda) =\|v_1+\lambda v_2\|^2

is minimized by taking \lambda =0, the minimum being f(0)=\|v_1\|^2. Suppose \alpha = \langle v_1,v_2 \rangle \neq 0, and look at

\lambda_0 := -\frac{\alpha}{\|v_2\|^2}.

We then compute

f(\lambda_0) = \|v_1\|^2 -\frac{|\alpha|^2}{\|v_2\|^2} < \|v_1\|^2,

a contraction. \square

Armed with the Orthogonality Lemma, we turn to an examination of the geometric relationships between the three fundamental subspaces of a linear transformation. The first such relationship is the nice fact that maximally stretched vectors are orthogonal to squashed ones.

Theorem 5.2. Assuming A \neq 0_{\mathrm{Hom}(V,W)}, we have

\mathrm{Opt}(A) \leq \mathrm{Ker}(A)^\perp.

Proof: Since \mathrm{Opt}(A) is a nonzero subspace of V, we can choose a nonzero vector v_1 \in \mathrm{Opt}(A). If the kernel of A is trivial, the statement we want to prove is obvious. If not, we can choose a nonzero vector v_2 \in \mathrm{Ker}(A). Now use the Orthogonality Lemma on the affine line v_1+\mathbb{C}v_2. \square

Now let us see how Theorem 5.2 refines what the rank-nullity theorem tells us. We have the orthogonal decompositions

V= \mathrm{Ker}(A)^\perp \oplus \mathrm{Ker}(A) \quad\text{ and }\quad W = \mathrm{Im}(A) \oplus \mathrm{Im}(A)^\perp.

Rank-nullity says that the restriction of A to a morphism A_1 \in \mathrm{Hom}(\mathrm{Ker}(A)^\perp, \mathrm{Im}(A) is an isomorphism. Theorem 5.2 tells us that sitting inside \mathrm{Ker}(A)^\perp we have the space \mathrm{Opt}(A) of maximally stretched vectors. The best case scenario is that \mathrm{Opt}(A) exhausts \mathrm{Ker}(A)^\perp. Then, we learn that the isomorphism A_1 \in \mathrm{Hom}(\mathrm{Ker}(A)^\perp,\mathrm{Im}(A)) is almost an isometric isomorphism: it is \sigma_1 times the isometric isomorphism U_1 \in \mathrm{Hom}(\mathrm{Ker}(A)^\perp,\mathrm{Im}(A)) given by U_1=\frac{1}{\sigma_1}U_1. If we are not so lucky, then we have

\mathrm{Ker}(A)^\perp = \mathrm{Opt}(A) \oplus V_2,

where V_2, the orthogonal complement of \mathrm{Opt}(A) inside \mathrm{Ker}(A)^\perp, is a nontrivial subspace of \mathrm{Ker}(A)^\perp. Now we have to try again: this is the start of an iterative process which terminates after finitely many steps, the output being the Singular Value Decomposition, which is adequately treated in the main notes. If you understand this process, you understand everything about linear algebra that you need to know for Math 202A.

Math 202A: Review Lecture 4

The next stage in our comparison of the classical computational category \mathbf{FSet} and the quantum computational category focused on contrasting \mathrm{Hom}(X,Y) with \mathrm{Hom}(V,W) = \mathrm{Hom}(\mathcal{F}(X),\mathcal{F}(Y)). We have seen already that

|\mathrm{Hom}(X,Y)|= |Y|^{|X|},

which is a polynomial function of |Y| and an exponential function of |X|. Of course, the cardinality of \mathrm{Hom}(V,W) is infinite, but its quantum cardinality,

\dim \mathrm{Hom}(V,W) = |X||Y|,

is a linear function of |X| and a linear function of |Y|.

We now consider subobjects canonically associated to morphisms in \mathbf{FSet} and \mathbf{FHil}.

Associated to a morphism f \in \mathrm{Hom}(X,Y) is its image,

\mathrm{Im}(f) = \{f(x) \colon x \in X\},

which is a subset of the target Y. The cardinality of \mathrm{Im}(f) may be called the rank of the set function f. The rank of f is bounded by the cardinality of its source X, and we have \mathrm{rank}(f)=|X| if and only if f is injective. The rank of f is also bounded by the cardinality of its target Y, and we have \mathrm{rank}(f)=|Y| if and only if f is surjective. Thus,

\mathrm{rank}(f) \leq \min\left( |X|,|Y|\right).

A morphism f \in \mathrm{Hom}(X,Y) is said to have full rank if

\mathrm{rank}(f) = \min\left(|X|,|Y|\right).

In particular, a full rank function is either injective or surjective, but we cannot say which without knowing which of |X|,|Y| is larger. On the other hand, the concept of rank itself is an elementary combinatorial notion, and one can find an explicit formula for the number of rank r morphisms in \mathrm{Hom}(X,Y) in terms of Stirling numbers. Using this formula, one can show that the set of full rank morphisms in \mathrm{Hom}(X,Y) is small in comparison to the total cardinality |Y|^{|X|}.

Associated to a morphism A \in \mathrm{Hom}(V,W) is its image,

\mathrm{Im}(A) = \{f(v) \colon v \in V\},

which is a subspace of the target space W. The rank of the linear transformation A is the dimension of \mathrm{Im}(A). The rank of A is bounded by the dimension of V, and this bound is achieved if and only if A is injective. The rank of A is bounded by the dimension of W, and this bound is achieved if and only if A is surjective. Thus,

\mathrm{rank}(A) \leq \min(\dim V,\dim W).

A morphism A \in \mathrm{Hom}(V,W) is said to have full rank if this upper bound is achieved, and just like in the classical situation this means either that A is injective or surjective, but we cannot determine which without more information. What is very different in the quantum world is that a generic morphism in \mathrm{Hom}(V,W) has full rank.

Theorem 4.1. The set \mathrm{Hom}(V,W)_\text{f} of full rank morphisms in \mathrm{Hom}(V,W) is open and dense in the norm topology.

Note that we have two norms on \mathrm{Hom}(V,W), the operator norm and the Frobenius norm, and we did not specify which is being used in Theorem 4.1. This is because these two norms generate the same topology, as do all norms on a finite-dimensional vector space.

Theorem 4.1 is another way to motivate the long journey we took towards the Singular Value Decomposition. The first step on this journey is to recognize that the there is a second subobject associated to A \in \mathrm{Hom}(V,W) which has no classical counterpart, namely the kernel

\mathrm{Ker}(A)=\{v \in V \colon Av=0_W\},

which is a subspace of V. The dimension of \mathrm{Ker}(A) is called the nullity of A and denoted \mathrm{null}(A).

For an arbitrary morphism A \in \mathrm{Hom}(V,W), we have two affiliated spaces: \mathrm{Ker}(A) \leq V and \mathrm{Im}(A) \leq W. Consider the associated orthogonal decompositions,

V= \mathrm{Ker}(A)^\perp \oplus \quad\text{and}\quad W = \mathrm{Im}(A) \oplus \mathrm{Im}(A)^\perp.

The rank-nullity theorem says that the restriction of A to \mathrm{Ker}(A)^\perp is an isomorphism onto \mathrm{Im}(A). This result is purely algebraic, and does not involve in the Hilbert space structure on V and W. The Singular Value Decomposition may be viewed as a remarkable refinement of the rank-nullity theorem which uses the geometric structure of Hilbert space to achieve a much more precise description of morphisms in \mathbf{FHil}.

A nice observation we made is that one can guess what this mysterious enhanced decomposition looks like for morphisms A \in \mathrm{Hom}(V,W)=\mathrm{Hom}(\mathcal{F}(X),\mathcal{F}(Y)) which are quantizations of set maps, i.e. the case where A = \mathcal{F}(f) for f \in \mathrm{Hom}(X,Y). In this case, using purely combinatorial reasoning, one can construct an orthonormal basis for \mathrm{Ker}(A)^\perp and an orthonormal basis for \mathrm{Im}(A) such that the restriction of A to an isomorphism \mathrm{Ker}(A)^\perp \to \mathrm{Im}(A) has a very explicit description in terms of the combinatorial statistics of the underlying set function f. It is useful to review this combinatorial “bare hands” special case of the SVD for quantized set functions before starting down the geometric road to developing it for arbitrary morphisms in \mathrm{Hom}(V,W).

Math 202A: Review Lecture 3

After setting up the dichotomy between the classical computational category \mathbf{FSet} and the quantum computational category \mathbf{FHil}, we started to compare the two. This comparison begins with matrices: matrices over B=\{0,1\} suffice to encode both objects and morphisms in \mathbf{FSet}, whereas matrices over \mathbb{C} are needed to encode objects and morphisms in \mathbf{FHil} but the encoding works the same way. Change of ordering in the one-hot encoding of a finite set X is implemented by left multiplication by permutation matrices, whereas change of orthonormal basis in a finite-dimensional Hilbert space is implemented by left multiplication by unitary matrices; in this sense, unitary matrices are the quantum generalization of permutation matrices. Change of ordering in the source and target sets of \mathrm{Hom}(X,Y) corresponds to left and right multiplying the matrix encoding of morphisms by permutation matrices, while change of ordered orthonormal bases in the source and target spaces of \mathrm{Hom}(V,W) corresponds to left and right multiplying the matrix encoding of morphisms by unitary matrices.

A more interesting line of investigation is to consider the extent to which \mathrm{Hom}(V,W) can be enriched beyond \mathrm{Hom}(X,Y), i.e. made into something more than a set. First, the existence of a vector space structure on \mathrm{Hom}(V,W) is very straightforward, and in fact a special case of the fact that the set of functions from any set into a vector space is again a vector space under the pointwise operations.

It is also true that \mathrm{Hom}(V,W) is finite-dimensional. Let X \subset V and Y \subset W be orthonormal bases, which is the same thing as saying V is isomorphic to \mathcal{F}(X) and W is isomorphic to \mathcal{F}(Y). What we showed is that \mathrm{Hom}(V,W) is isomorphic to \mathcal{F}(Y \times X). Indeed, for each (y,x) \in Y \times X, define a corresponding elementary transformation E_{yx} \in \mathrm{Hom}(V,W) by

E_{yx}v = y\langle x,v\rangle, \quad v \in V.

We proved that \{E_{yx} \colon (y,x) \in X\} is a basis of \mathrm{Hom}(V,W), and indeed that for any A \in \mathrm{Hom}(V,W) we have

A = \sum\limits_{y \in Y}\sum\limits_{x \in X} \langle y,Ax\rangle E_{yx}.

This is a good way to present the information contained in the matrix of A relative to the orthonormal bases X \subset V and Y \subset W without having to choose orderings just to write things in an array.

The elementary basis E_{yx} of \mathrm{Hom}(V,W) relative to a pair of orthonormal bases X \subset V and Y \subset W also gives us a convenient way to analyze topological properties of arbitrary morphisms A \in \mathrm{Hom}(V,W). Indeed, for any v \in V we have that

\|E_{yx}v\|=\|y\langle x,v\rangle\|=\|y\||\langle x,v\rangle| \leq \|y\|\|x\|\|v\| = \|v\|,

where the inequality is Cauchy-Schwarz and we used the fact that x \in X and y \in Y are unit vectors. Thus, each morphism in the elementary basis \{E_{yx} \colon (y,x) \in Y \times X\} of \mathrm{Hom}(V,W) is a contractive mapping. Thus for an arbitrary morphism A \in \mathrm{Hom}(V,W),

A=\sum\limits_{y \in Y}\sum\limits_{x \in X} \alpha_{yx}E_{yx},

the triangle inequality gives us

\|Av\| \leq \|A\|_1 \|v\|, \quad v \in V,

where

\|A\|_1 \leq \sum\limits_{y \in Y}\sum\limits_{x \in X} |\alpha_{yx}|

is a norm on \mathrm{Hom}(V,W), specifically the 1-norm relative to the elementary basis E_{yx} of \mathrm{Hom}(V,W), which is itself defined relative to the orthonormal bases X \subset V and Y \subset W. This points us to a much better norm on \mathrm{Hom}(V,W), since the above tells us that every A \in \mathrm{Hom}(V,W) is not just continuous but Lipschitz, with Lipschitz constant bounded by \|A\|_1.

Definition 3.1. The operator norm \|A\| of A \in \mathrm{Hom}(V,W) is by definition its Lipshitz constant.

One of the many virtues of the operator norm is that it is basis independent: to define it we only need to know that all morphisms in \mathrm{Home}(V,W) are Lipschitz functions. Definition 3.1 does not care that we used orthonormal bases of V and W to establish Lipschitz continuity of linear transformations. On the other hand, one could argue that a demerit of the operator norm is that it is not induced by a scalar product.

Problem 3.1. Prove that there is no scalar product on \mathrm{Hom}(V,W) such that \|A\|=\sqrt{\langle A,A\rangle} coincides with the operator norm.

On the other hand, it is completely straightforward to promote \mathrm{Hom}(V,W) to a Hilbert space: we simply equip it with the scalar product in which \{E_{yx} \colon (y,x) \in Y \times X\} forms an orthonormal basis. This is the Frobenius scalar product on \mathrm{Hom}(V,W), and it is given explicitly as follows: for any A,B \in \mathrm{Hom}(V,W) we have

\langle A,B \rangle = \sum\limits_{y \in Y}\sum\limits_{x \in X} \overline{\langle y,Ax\rangle}\langle y,Bx\rangle.

Problem 3.2. Show that for any A,B \in \mathrm{Hom}(V,W) we have

\langle A,B \rangle = \sum\limits_{x \in X} \langle Ax,Bx\rangle.

The norm \|\cdot\|_2\| on \mathrm{Hom}(V,W) obtained from the Frobenius scalar product is called the Frobenius norm, and denoted

\|A\|_2 = \sum\limits_{y \in Y}\sum\limits_{x \in X} |\langle y,Ax\rangle|^2 = \sum\limits_{x \in X} \|Ax\|^2.

The problem with the above is that we don’t know whether we have built the Frobenius scalar product or a Frobenius scalar product: if we had performed this construction using two different orthonormal bases X^\prime \subset V and Y^\prime \subset W, would we have arrived at the same Hermitian form on \mathrm{Hom}(V,W)? We would like to know whether \|\cdot\|_2 is basis independent, like the operator norm \|\cdot\|, or whether it is basis dependent, like \|\cdot\|_1.

The answer is that the Frobenius scalar product is basis independent. A morphism U \in \mathrm{Hom}(V,W) such that \|Uv\|=\|v\| for every v \in V is called an isometry. Polarization shows that norm preservation is the same as scalar product preservation: U \in \mathrm{Hom}(V,W) is an isometry if and only if

\langle Uv_1,Uv_2 \rangle = \langle v_1,v_2 \rangle, \quad v_1,v_2 \in V.

Furthermore, every isometry is injective, so that an isometry in \mathrm{End}(V) is automatically an automorphism. Isometric automorphisms of V are also referred to as unitary operators. That’s right, there are two names for the same thing. While it might be better practice to never ever override general categorical terminology in category-specific settings, this is just one of many things about the world that could be improved. Even worse, if V and W are possibly different finite-dimensional Hilbert spaces with \dim V=\dim W, then again an isometry U \in \mathrm{Hom}(V,W) is automatically an isomorphism, and such isometric isomorphisms are also often called unitary transformations. Saints preserve us. The reason for this abuse of terminology is that the matrix of an isometric isomorphism U \in \mathrm{Hom}(V,W) is a unitary matrix even when the spaces V,W are not the same, just like the matrix of set isomorphism f \in \mathrm{Hom}(X,Y) is a permutation matrix even when X and Y are distinct sets.

Problem 3.3. Show that U \in \mathrm{End}(V) is a unitary operator if and only if the image \{Ux \colon x \in X\} of an orthonormal basis X \subset V under U is an orthonormal basis of V.

Problem 3.2 already removes the dependence of the Frobenius scalar product on a choice of orthonormal basis in the target space. Thus what is left to check is that for any orthonormal basis X \subset V and any unitary operator U \in \mathrm{End}(V) we have

\sum\limits_{x \in X} \langle AUx,AUx \rangle = \sum\limits_{x \in X} \langle Ax,Ax \rangle.

In other words, thanks to polarization, we only need to check that the Frobenius norm is unitarily invariant. We have

\left\langle AUx, AUx \right\rangle = \sum\limits_{y \in X}\sum\limits_{z \in X} \overline{\langle y,Ux\rangle}\langle z,Ux\rangle \langle Ay,Az\rangle.

Thus,

\sum\limits_{x \in X} \langle AUx,AUx \rangle = \sum\limits_{y,z \in X} \langle Ay,Az\rangle \sum\limits_{x \in X} \overline{\langle y,Ux\rangle}\langle z,Ux\rangle,

and to finish it remains only to establish the following, which is a consequence of Problem 3.3.

Problem 3.4. For any unitary operator U \in \mathrm{End}(V) and any orthonormal basis X \subset V, we have

\sum\limits_{x \in X} \overline{\langle y,Ux\rangle}\langle z,Ux\rangle=\delta_{yz}, \quad y,z \in X.

In the lectures I presented a different proof of unitary invariance of the Frobenius scalar product on \mathrm{Hom}(V,W) which was inductive in the dimension of W. The reason for doing this was that in the base case, \dim W=1, we are dealing with the dual space of V and this leads to a discussion of vectors and covectors (Riesz duality). However, I did not do a very good job with this discussion and therefore have decided to leave it out of the exam topics.

Math 202A: Review Lecture 2

Let \mathbf{FHil} be the category whose objects are complex vector spaces V equipped with a scalar product \langle \cdot,\cdot \rangle and whose morphisms are linear transformations. We call pairs (V,\langle \cdot,\cdot \rangle) Hilbert spaces, and our convention is that the scalar product is linear in the second slot. The standard definition of Hilbert space includes an extra clause which we do not need and will omit (see below),

Problem 2.1. Prove that A \in \mathrm{Hom}(V,W) is an isomorphism if and only if it is a bijection. That is, isomorphisms in \mathbf{FHil} are precisely linear bijections.

The Hilbert space norm on V is defined by

\|v\| = \sqrt{\langle v,v \rangle}.

To prove that this really is a norm, one needs the Cauchy-Schwarz inequality in order to verify the triangle inequality,

\|v+w\|^2 = \|v\|^2 + 2\Re \langle v,w\rangle + \|w\|^2 \leq \|v\|^2 + 2\|v\|\|w\| + \|w\|^2 = (\|v\|+\|w\|)^2.

Problem 2.2. Prove the reverse triangle inequality: \left| \|v\|-\|w\| \right|\leq \|v-w\|. Make sure you understand why this implies that the function V \to \mathbb{C} defined by v \mapsto \|v\| is continuous, in fact 1-Lipschitz.

An important fact is that the scalar product in a Hilbert space V can be recovered from the norm.

Theorem 2.3. (Polarization) For any v,w \in V, we have

\langle v,w \rangle = \frac{1}{4}\left( \|v+w\|^2 - \|v-w\|^2 -i\|v+iw\|^2+i\|v-iw\|^2 \right).

Two vectors v,w in Hilbert space V are said to be orthogonal if \langle v,w \rangle=0.

Theorem 2.4. (Pythagoras) For any orthogonal vectors v,w \in V, we have \|v+w\|^2=\|v\|^2 + \|w\|^2.

An important generalization of the Pythagorean theorem is the following.

Theorem 2.5 (Parallelogram Law) For any v,w \in V, we have \|v+w\|^2+\|v-w\|^2=2\|v\|^2+2\|w\|^2.

Much of the importance of the Parallelogram Law is that it characterizes when a normed vector space is a Hilbert space. This remarkable fact was first noticed by von Neumann.

Problem 2.3. If V is a complex vector space equipped with a norm \|\cdot\|, then there exists a scalar product on V such that \|v\|^2=\langle v,v \rangle if and only if the Parallelogram Law holds.

The standard definition of Hilbert space includes an extra completeness condition which we was not used above, and will not be used going forward. Therefore, we omit this condition from our definition of what constitutes a Hilbert space.

We define the quantization functor

\mathcal{F} \colon \mathbf{FSet} \longrightarrow \mathbf{Hil}

as follows. First, for every finite set X we declare

\mathcal{F}(X) = \{a \colon X \to \mathbb{C}\}

to be the vector space of complex-valued functions on X together with the scalar product

\langle a,b\rangle = \sum\limits_{x \in X} \overline{a(x)}b(x).

Second, if X,Y are finite sets and f \in \mathrm{Hom}(X,Y) is a function, then we declare \mathcal{F}(f) \in \mathrm{Hom}(\mathcal{F}(X),\mathcal{F}(Y)) to be the linear transformation defined by

[\mathcal{F}(f)a](y) = \sum\limits_{x \in f^{-1}(y)} a(x).

In words, [\mathcal{F}(f)a](y) is the sum of the values of a \in \mathcal{F}(X) over the points in the fiber of f over y. Assuming X,Y are disjoint, you can visualize this by associating to f \in \mathrm{Hom}(X,Y) the bipartite graph with vertex set X \sqcup Y and edges \{x,f(x)\} given by input-output pairs of f. This bipartite graph is a disjoint union of star graphs: each star consists of a hub vertex y \in Y, and the remaining vertices of the star are the points of X which map to y via f. The function \mathcal{F}(f)a acts on Y by summing the values of a over non-hub vertices.

Problem 2.4. Prove that finite sets X and Y are isomorphic if and only if the Hilbert spaces \mathcal{F}(X) and \mathcal{F}(Y) are isomorphic.

Now let us explain why we refer to \mathcal{F} as the quantization functor. For each x \in X, the corresponding elementary function e_x \in \mathcal{F}(X) is defined by

e_x(x^\prime) = \delta_{xx^\prime}.

A basic feature of these functions is their orthogonality,

\langle e_{x_1},e_{x_2} \rangle = \delta_{x_1x_2}.

Theorem 2.6. For any a \in \mathcal{F}(X), there exist unique scalars \alpha_x, x \in X such that

a = \sum\limits_{x \in X} \alpha_xe_x.

Proof: To prove existence, check that \alpha_x=a(x) works. For uniqueness, imagine we have two such representations of a and then use evaluations of a to show these representations are the same. \square

Theorem 2.6 allow us to develop an alternative perspective which is widely used in algebra: we view complex-valued functions on X as formal \mathbb{C}-linear combinations of its points,

a=\sum\limits_{x \in X} \alpha_x x.

This is really just notation, and one should remember that the expression above is just a different way of writing down the function a(x)=\alpha_x, x \in X. The formal linear combination perspective on functions is also widely used in physics. A set X, for example the natural set X=[n], can be viewed as the state space of a particle \bullet on the integer lattice which may be located at any of the sites 1,\dots,n. The particle \bullet is “classical” in the sense of classical mechanics: it has a definite location, which may be any of the sites 1,\dots,n. A quantum particle does not have a definition location: before it is observed, it exists simultaneously in each of the states 1,\dots,n,, and only after it is observed does its location become definite. A (pure) quantum state is a formal linear combination

a = \sum\limits_{x \in X} \alpha_x x

of classical states x such that

\sum\limits_{x \in X} |\alpha_x|^2 =1,

where the amplitude |\alpha_x|^2 is the probability that \bullet will be in state x once it is observed. Thus, (pure) quantum states are the same thing as unit vectors in the Hilbert space \mathcal{F}(X). In applying the functor \mathcal{F}, we are passing from the state space X of a classical particle to the state space \mathcal{F}(X) of a quantum particle.

Mathematically, many features of Hilbert space can be viewed as quantum generalizations of familiar set-theoretic properties. For example, let X_1 and X_2 be two subsets of X, and let

a_1 = \sum\limits_{x \in X_1} x \text{ and } a_2=\sum\limits_{x \in X_2} x

be their indicator functions in \mathcal{F}(X). whose norms are \|a_1\|^2=|X_1| and \|a_2\|^2=|X_2|. If X_1 and X_2 are disjoint, then a_1 and a_2 are orthogonal in \mathcal{F}(X), and the Pythogorean theorem

\|a_1 + a_2\|^2 = \|a_1\|^2 + \|a_2\|^2

reproduces |X \sqcup Y|=|X|+|Y|. However, the Pythagorean theorem holds for all pairs of orthogonal vectors in \mathcal{F}(X), not just indicator functions of disjoint subsets of X. Similarly, the Parallelogram Law can be seen as a quantum generalization of the inclusion-exclusion principle.

Definition 2.7. A Hilbert space V is said to be finite-dimensional if there exists a finite set X such that V is isomorphic to \mathcal{F}(X). Otherwise, it is said to be infinite-dimensional.

This definition is non-traditional, but it has certain advantages. In particular, thanks to Problem 2.4 we can define the dimension of a finite-dimensional Hilbert space V to be the cardinality of any finite set X such that V is isomorphic to \mathcal{F}(X).

Problem 2.4. Prove that finite-dimensional Hilbert spaces V and W are isomorphic if and only if they have the same dimension.

A more traditional approach to dimension is based on the following definition.

Definition 2.8. A finite set S of vectors in a Hilbert space V is said to be linearly independent if

\sum\limits_{s \in S} \alpha_s s=0_V \implies \alpha_x=0 \text{ for all }s \in S.

It is easy to see that any finite orthonormal set X \subset V is linear independent. The converse is also true.

Proposition 2.9. (Gram-Schmidt) If V contains a linearly independent set of cardinality n, then it contains an orthonormal set of cardinality n.

From here we establish the equivalence of Definition 2.7 and the standard definition of dimension.

Theorem 2.10. A Hilbert space V has dimension n \in \mathbb{N} if and only if it contains an orthonormal set of cardinality n and does not contain an orthonormal set of cardinality n+1.

Math 202A: Review Lecture 1

Let \mathbf{Set} be the category whose objects are sets and whose morphisms are functions. In more detail, for any two sets X and Y, let X \times Y be their Cartesian product and let \mathcal{P}(X \times Y) be the power set of X \times Y. Then \mathrm{Hom}(X,Y) is the subset of \mathcal{P}(X \times Y) defined by the condition

f \in \mathrm{Hom}(X,Y) \iff \left[ (x,y_1)\in f \text{ and } (x,y_2) \in f \implies y_1=y_2\right].

In particular, \mathbf{Set} is a locally small category. Going forward we will use the standard functional notation f(x)=y to denote (x,y) \in f.

A function f \in \mathrm{Hom}(X,Y) is said to be injective if

f(x_1)=f(x_2) \implies x_1=x_2,

and surjective if

f(X)=Y,

where f(X) = \{f(x) \colon x \in X\}. A function which is both injective and surjective is called bijective.

Problem 1.1. Prove that two sets X,Y are isomorphic if and only if \mathrm{Hom}(X,Y) contains a bijection.

A natural set is a set of the form [n]=\{1,\dots,n\}, where n \in \mathbb{N} is a natural number. By Problem 1.1, two natural sets [m] and [n] are isomorphic if and only if m=n, in which case [m]=[n]. Thus, the full subcategory \mathbf{NSet} of \mathbf{Set} whose objects are natural sets is a skeletal category.

Definition 1.1. A set X is said to be finite if it is isomorphic to a natural set.

Since \mathbf{NSet} is skeletal, this means that there is one and only one natural number n \in \mathbb{N} such that \mathrm{Hom}(X,[n]) contains an isomorphism. This natural number is called the cardinality of the finite set X, and we write |X|=n. Let \mathbf{FSet} be the full subcategory of \mathbf{Set} whose objects are finite sets.

Problem 1.2. Given finite sets X and Y with |X|=n and |Y|=m, prove that \mathrm{Hom}(X,Y) is a finite set and calculate its cardinality.

Problem 1.2 shows that the category \mathbf{FSet} is enriched over itself. \mathbf{FSet} is called the classical computational category because its objects and morphisms can be encoded using classical bits, and thus instantiated on a computer. The adjective “classical” is used to contrast \mathbf{FSet} with the quantum computational category, whose objects and morphisms can presumably be instantiated on a quantum computer.

We will construct the quantum computational category in Lecture 2. In the remainder of Lecture 1, we focus on encoding objects and morphisms in \mathbf{FSet}. Let m and n be natural numbers, and let Z be a set, not necessarily finite.

Definition 1.2. An m\times n matrix over Z is an element of \mathrm{Hom}([m] \times [n],Z).

Given a matrix M \in \mathrm{Hom}([m] \times [n],Z) , the notation M_{ij}:=M(i,j) is used for its values, and we write Z^{m \times n} instead of \mathrm{Hom}([m] \times [n],Z) for the set of m \times n matrices over Z. This is because matrices M \in Z^{m \times n} are typically thought of as two-dimensional arrays populated by elements of Z,

M = \begin{bmatrix} {} & \vdots & {} \\ \dots & M_{ij} & \dots \\ {} & \vdots & {} \end{bmatrix}.

Matrices M \in Z^{m \times 1} are called column matrices because they are visualized as arrays consisting of a single column,

M= \begin{bmatrix} \vdots \\ M_{i1} \\ \vdots \end{bmatrix},

while matrices M \in Z^{1 \times n} are called row matrices because they are visualized as arrays consisting of a single row,

M=\begin{bmatrix} \dots & M_{1j} & \dots \end{bmatrix}.

While these visualizations are very convenient, one should remember that an m \times n matrix over Z is really a function in \mathrm{Hom}([m] \times [n],Z).

Classical computation utilizes matrices over B=\{0,1\}, which are called binary matrices. Since B is a finite set, binary matrices are morphisms in \mathbf{FSet}. Let X be a set of cardinality n. Then \mathrm{Hom}(X,[n]) contains an isomorphism f, and hence \mathrm{Hom}([n],X) contains an isomorphism g. An isomorphism g \in \mathrm{Hom}([n],X) is referred to as an ordering of X, and we write

x_1:=g(1),\dots,x_n:=g(n).

Using this enumeration of the elements of X, we can define an injective function [] \in \mathrm{Hom}(X,B^{n \times 1}) by

[x_1]=\begin{bmatrix} 1 \\ 0 \\ \vdots \\ 0\end{bmatrix}, [x_2]=\begin{bmatrix} 0 \\ 1 \\ \vdots \\ 0\end{bmatrix},\dots,[x_n]=\begin{bmatrix} 0 \\ 0 \\ \vdots \\ 1\end{bmatrix}.

The injection [] \in \mathrm{Hom}(X,B^{n \times 1}) is called the one-hot encoding of X relative to the ordering x_1,\dots,x_m, and the output [x] is called the column matrix of x relative to this ordering.

Now let X and Y be two sets of cardinalities n and m, respectively, and let x_1,\dots,x_n and y_1,\dots,y_m be orderings of these sets. Using these orderings, we can define an injective function [] \in \mathrm{Hom}(\mathrm{Hom}(X,Y),B^{m \times n}) by

[f]_{ij} = \begin{cases} 1, \text{ if }f(x_j)=y_i \\ 0, \text{ otherwise}. \end{cases}.

The injection ] \in \mathrm{Hom}(\mathrm{Hom}(X,Y),B^{m \times n}) is called the binary encoding of \mathrm{Hom}(X,Y) relative to the orderings x_1,\dots,x_n and y_1,\dots,y_m. The definition of [f] may be equivalently stated as follows: for each 1 \leq j \leq n, column j of [f] is the one-hot encoding of [f(x_j)] relative to the ordering y_1,\dots,y_m of Y.

Now let X,Y,Z be three finite sets of cardinalities n,m,l, respectively. Let x_1,\dots,x_n and y_1,\dots,y_m and z_1,\dots,z_l be orderings of these sets. Let f \in \mathrm{Hom}(X,Y) and \mathrm{Hom}(Y,Z) be given functions, and let g \circ f \in \mathrm{Hom}(X,Z) be their composition. We want to understand the relationship between the matrices

[f] \in B^{m \times n},\ [g] \in B^{l \times m},\ [g \circ f] \in B^{l \times n},

of the functions f,g and g \circ f relative to the given orderings of X,Y,Z.

Problem 1.3. Prove that the matrix M \in \mathbb{N}^{l \times n} defined by

M=\sum\limits_{k=1}^m [g]_{ik}[f]_{kj}

in fact belongs to B^{l \times n}, and that [g \circ f]=M.

Math 202A: Lecture 22

Let A \in \mathrm{Hom}(V,W) be a morphism with left and right singular blocks

V=V_1 \oplus \dots \oplus V_k \oplus \mathrm{Ker}(A) \quad\text{and}\quad W=W_1 \oplus \dots \oplus W_k \oplus \mathrm{Im}(A)^\perp,

and let \sigma_1>\dots>\sigma_k>0 be the corresponding singular values: the restriction of A to V_i has the form \sigma_iU_i, where U_i \in \mathrm{Hom}(V_i,W_i) is an isometric isomorphism. As a decomposition of A itself, we have

A=\sigma_1U_1P_1 + \dots + \sigma_kU_kP_k,

where P_i is the orthogonal projection of V onto the left singular block V_i. We have seen previously that \{U_1,P_1\dots,U_kP_k\} is an orthogonal set in \mathrm{Hom}(V,W) with respect to the Frobenius scalar product: we have

\langle U_iP_i,U_jP_j \rangle = \delta_{ij} \dim V_i = \delta_{ij} \dim W_i.

In particular, we have the Frobenius norm

\|A\|= \sum\limits_{i=1}^k \sigma_k^2 \dim V_i.

Previously, we wrote this in terms of repeated singular values, meaning that we decomposed each V_i and W_i into a direct sum of orthogonal lines instead of incorporating the multiplicity factor \dim V_i. The two ways of thinking about the SVD — blocks and lines — are both correct, but we have to keep in mind that the block decomposition is unique whereas the line decomposition is only unique when the number k of singular values \sigma_1> \dots > \sigma_k is equal to the rank of A. In particular, if we want to use the SVD to define the adjoint, we should use the block version.

Definition 22.1. The adjoint of A\in \mathrm{Hom}(V,W) is the morphism A^* \in \mathrm{Hom}(W,V) defined by

A^*=\sigma_1U_1^{-1}Q_1 + \dots + \sigma_kU_k^{-1}Q_k,

where Q_i \in \mathrm{End}(W) is the orthogonal projection of W onto the right singular block W_i.

A particularly simple (but important) situation is when A=U \in \mathrm{Hom}(V,W) comes to us as an isometric isomorphism. Then, the singular value decompositions of V and W are

V=V \quad\text{and}\quad W=W,

the singular value decomposition of U is

U=U,

and U^*=U^{-1}.

It is clear from Definition 22.1 that the adjoint is an involution, (A^*)^*=A. Geometrically, this just says that swapping left and right singular blocks twice is the same as not swapping them at all.

Now we consider the case where the source and target spaces of A \in \mathrm{Hom}(V,W) coincide, meaning that V=W and \mathrm{Hom}(V,W)=\mathrm{Hom}(V,V)= \mathrm{End}(V). In this setting there are two further coincidences that could potentially occur: the left and right singular blocks of A could coincide, V_i=W_i, or A could coincide with its adjoint, A^*=A. In the first case we say that A is normal, and in the second we say that A is selfadjoint. A priori, it is not clear what the relationship between normality and selfadjointness might be. Of course, we already know the answer from prolonged exposure to matrix algebra.

Theorem 22.2. Selfadjoint implies normal.

I would like to give a geometric proof of the fact that selfadjointness,

A=\sum\limits_{i=1}^k \sigma_i U_iP_i = \sum\limits_{i=1}^k \sigma_iU_i^{-1}Q_i=A^*

implies normality,

V_i=W_i, \quad 1 \leq i \leq k.

This amounts to showing that

\sum\limits_{I=1}^k \sigma_i(U_iP_i-U_i^{-1}Q_i)=0_{\mathrm{End}(V)}

forces P_i=Q_i in \mathrm{End}(V), for each 1 \leq i \leq k. I have not yet succeeded in finding such an argument, and while I plan to try again on my own time you may be getting tired of me constantly turning things upside down and inside out and generally treating linear algebra results we can look up as if they were open research problems. If so, you may have given up on Math 202A, which is fine with me provided you are working on the mathematics that matters to you. I mean this sincerely and will extend the following provision: if you have fallen behind on 202A homework sets because you are composing your mathematical opus, you can turn in that work for homework credit in 202A.

Look up the definition of a normal matrix in any textbook and you’ll get an algebraic characterization of normality which makes Theorem 22.2 obvious.

Theorem 22.3. A is normal if and only if A^*A=AA^*.

Problem 22.1. Prove that selfadjoint implies normal. You have two options: either find the direct geometric argument I failed to produce, or prove the standard result Theorem 22.3 and deduce the claimed implication from it.

Now, on to the spectral theorem. Assume A \in \mathrm{End}(V) is selfadjoint. Then, because A is normal, its left and right singular blocks coincide. The SVD tells us that the restriction of A to the singular block V_i is

A|_{V_i} = \sigma_iU_i,

where U_i \in \mathrm{End}(V_i) is an isometric automorphism of V_i, aka a unitary operator on V_i. The same reasoning applied to A^* gives

A^*|_{V_i} = \sigma_i U_i^{-1}=\sigma_iU_i^*.

Since A=A^*, we have

\sigma_iU_i = \sigma_iU_i^*,

and since \sigma_i>0 we can cancel it to get

U_i^*=U_i.

In summary, the SVD is reducing the study of selfadjoint operators to the study of unitary selfadjoint operators: we have shown that A being selfdadjoint means that

A=\sum\limits_{i=1}^k \sigma_iU_i,

where U_i \in \mathrm{End}(V_i) is both selfadjoint and unitary. So let us step back and consider the highly constrained problem of analyzing a selfajdoint unitary operator U \in \mathrm{End}(V).

Since U is unitary, U^*=U^{-1}, and since U is selfadjoint, U^*=U. Combining these gives U=U^{-1}, and we conclude that a selfadjoint unitary operator U \in \mathrm{End}(V) satisfies

U^2=I_V,

where I \in \mathrm{End}(V) is the identity operator on V. Two obvious solutions to this equation are U=I and latex U=-I, and in fact all solutions are mixtures of these two. More precisely, let

V^+ = \mathrm{Ker}(U-I) =\{v \in V \colon Uv=v\}

and

V^{-}=\mathrm{Ker}(U+I)=\{v \in V \colon Uv=-v\}.

Theorem 22.4. We have V=V^+ \oplus V^{-}.

Note that it is *not* necessarily true that both V^+ and V^- are nonzero subspaces, as the examples U=I and U=-I already show.

Now return to the general situation, where we know that A \in \mathrm{End}(V) is selfadjoint, but we are not assuming it is also unitary. What we do know is that

V = V_1 \oplus \dots \oplus V_k \oplus \mathrm{Ker}(A),

where the restriction of A to V_i is \sigma_iU_i with U_i \in \mathrm{End}(V_i) selfadjoint and unitary. This gives us the decomposition

V= \bigoplus_{i=1}^k V_i^+ \oplus V_i^-,

where A acts in V_i^+ as multiplication by \sigma_i acts in V_i^- as multiplication by -\sigma_i. In words, the condition that A \in \mathrm{End}(V) allows us to refine its SVD by decomposing each singular block V_i into a positive part V_i^+ and a negative part V_i^-, and since V_i is not the zero space at least one of V_i^+,V_i^- is nonzero.

Theorem 22.5. (Spectral Theorem) Given A \in \mathrm{End}(V) selfadjoint, there exists an orthogonal decomposition

V = E_1 \oplus \dots \oplus E_l \oplus \mathrm{Ker}(A)

where E_1,\dots,E_l are nonzero subspaces such that

A|_{E_i} = \lambda_i I_{E_i},

where \lambda_1,\dots,\lambda_l are nonzero real numbers. Each E_i is a signed singular block of A, and each \lambda_i is a signed singular value of A.

Math 202A: Lecture 21

Given a morphism A \in \mathrm{Hom}(V,W) we have discussed extensively the singular value decomposition of A.

Theorem 21.1. (Geometric SVD, block form) There exists a positive integer k, distinct positive numbers \sigma_1> \dots > \sigma_k>0, and orthogonal decompositions

V=V_1 \oplus \dots \oplus V_k \oplus \mathrm{Ker}(A)\quad\text{and}\quad W=W_1\oplus \dots \oplus W_k \oplus \mathrm{Im}(A)^\perp

such that the restriction of A to V_i equals \sigma_iU_i, where U_i \in \mathrm{Hom}(V_i,W_i) is an isometric isomorphism.

The spaces V_1,\dots,V_k are the left singular blocks of A, the spaces W_1,\dots,W_k are the right singular blocks of A, and the numbers \sigma_1>\dots>\sigma_k are the singular values of A, and the set \{\sigma_1,\dots,\sigma_k\} is the singular spectrum of A.

Problem 21.1. Give a precise formulation and proof of the following statement: the block SVD of a linear transformation is unique. (Hint: one way to write this down formally is using Fermat’s method of infinite descent. Suppose there do exist morphisms which admit two distinct SVDs. Let r be the minimal rank of such a morphism. Prove that the first singular block and singular value in these two decompositions must be the same. Then, the non-uniqueness comes from the restriction of this morphism to the orthogonal complement of the first singular block, which has lower rank; contradiction).

For each 1 \leq i \leq k, the positive number \dim V_i = \dim W_i is the multiplicity of the singular value \sigma_i, and if this number is equal to one we say that \sigma_i is a simple singular value of A. If all singular values of A are simple, then we say that A has simple singular spectrum; this means precisely that A has r=\mathrm{rank}(A) distinct singular values \sigma_1> \dots > \sigma_r. In this case, the left singular blocks V_1,\dots,V_r are one-dimensional subspaces and we call them the left singular lines of A. Likewise, W_1,\dots,W_r are one-dimensional subspaces of W, the right singular lines of A. It is often useful to employ this perspective even when A does not have simple singular spectrum.

Theorem 21.2. (Geometric SVD, line form) There exist r = \mathrm{rank}(A) positive numbers \sigma_1 \geq \dots \geq \sigma_r \geq 0

V=V_1 \oplus \dots \oplus V_r \oplus \mathrm{Ker}(A)\quad\text{and}\quad W=W_1\oplus \dots \oplus W_r \oplus \mathrm{Im}(A)^\perp

such that the restriction of A to V_i equals \sigma_iU_i, where U_i \in \mathrm{Hom}(V_i,W_i) is an isometric isomorphism.

When k=r, the block and line forms of the SVD are literally the same statement. However, when k <r, they are not, because the line form becomes ambiguous: there is at least one i \in \{1,\dots,k\} such that we are decomposing each of the singular blocks V_i,W_i into a direct sum of orthogonal lines, and these decompositions can be performed arbitrarily and independently. I have generally been pretty sloppy about accounting for this, because it almost never happens.

Theorem 21.3. (Generic simplicity) The set \mathrm{Hom}(V,W)^\circ of morphisms with simple singular spectrum is a dense open set in \mathrm{Hom}(V,W).

Proof: Proving this is definitely my job, and I am still thinking about how to present the simplest and most elegant argument. Here is the basic idea. First, notice that I did not specify a norm on \mathrm{Hom}(V,W) in the statement. This is intentional: the vector space \mathrm{Hom}(V,W) is finite-dimensional, so all norms are equivalent (note to self: go back and prove this in the discussion of norms on Hilbert space), and since density is a topological as opposed to geometric property (closure equals whole space) we are free to choose whichever norm is the most convenient. The best choice is operator norm, since this allows us to construct an infinite descent argument compatible with the iterative construction of the SVD based on repeatedly saturating the operator norm; this is similar in spirit to the one used to verify uniqueness of the block SVD. Namely, since the block SVD is constructed iteratively by looking for vectors which saturate the operator norm inequality, we can reduce to verifying that the set of transformations A \in \mathrm{Hom}(V,W) such that V_1(A) = \{v \in V \colon \|Av\|=\sigma_1(A)\|v\|\} is one-dimensional is an operator norm dense open set in \mathrm{Hom}(V,W), which is actually not too difficult. (Note to student: if you can either fine-tune this argument, or find a better one, please show me your approach). \square

Theorem 21.3 means that every time I am being sloppy about non-uniqueness of the line SVD, I am being sloppy on the complement of a dense open set, and such sets do not have civil rights. In the remainder of the lecture I will continue doing this instead of altering all statements to account for the possibility of repeated singular values. The actual meaning of this is that all theorems from this point forward a priori only hold for A \in \mathrm{Hom}(V,W)^\circ a morphism with simple singular spectrum. Theorem 2.3 is an extra incantation which mystically causes these results to hold on all of \mathrm{Hom}(V,W).

For the remainder of the lecture, fix a rank r morphism A \in \mathrm{Hom}(V,W)^\circ with simple singular spectrum \sigma_1>\dots>\sigma_r>0. Let V_1,\dots,V_r be its uniquely determined left singular lines, and let W_1,\dots,W_r be the corresponding right singular lines.

Let us describe the polar decomposition of A \in \mathrm{Hom}(V,W), which is an almost automatic consequence of the upstairs SVD of A,

A = \sigma_1E_1+\dots+\sigma_rE_r,

where E_i=U_iP_i and P_i is the orthogonal projection of V onto the left singular line V_i.

Proposition 21.4. (Left polar decomposition) We have

A = UP

P \in \mathrm{End}(V) is the operator

P=\sigma_1P_1 + \dots + \sigma_rP_r

and U \in \mathrm{Hom}(V,W) is the partial isometry

U=E_1+\dots+E_r=U_1P_1+\dots+U_rP_r.

Proof: Because the left singular lines V_1,\dots,V_r of A are pairwise orthogonal, the projections P_1,\dots,P_r satisfy

P_iP_j = \delta_{ij}I_V,

where I_V \in \mathrm{End}(V) is the identity operator. We thus have

UP = \sum\limits_{i,j=1}^r U_iP_i\sigma_jP_j=\sum\limits_{i=1}^r\sigma_iU_iP_i = \sum\limits_{i=1}^r \sigma_iE_i = A.

as claimed. \square

Problem 21.2. Prove that A is a partial isometry if and only if all its singular values are equal to 1.

As the name “left polar decomposition” suggests, every A \in \mathrm{Hom}(V,W) also admits a right polar decomposition. This is obtained by observing that each of the rank one partial isometries E_i \in \mathrm{Hom}(V,W) in the upstairs SVD of A can be written as

E_i=Q_iU_iP_i,

where Q_i \in \mathrm{End}(W) is the orthogonal projection of W onto the right singular line W_i.

Proposition 21.3. (Right polar decomposition) We have

A=QU,

where U \in \mathrm{Hom}(V,W) is as in Proposition 21.2 and Q \in \mathrm{End}(W) is the operator

Q=\sigma_1Q_1 + \dots + \sigma_rQ_r.

Proof: Because the right singular lines W_1,\dots,W_r of A are pairwise orthogonal, we have

Q_iU_jP_j = \delta_{ij} U_iP_i.

Thus,

QU = \sum\limits_{I,j=1}^r\sigma_iQ_iU_jP_j=\sum\limits_{i=1}^r \sigma_iU_iP_j = A,

as claimed. \square

Now we come to the geometric definition of the adjoint, which relies on uniqueness of the SVD.

Definition 21.4. The adjoint of A \in \mathrm{Hom}(V,W) is the morphism A^* \in \mathrm{Hom}(W,V) defined by the conditions:

  1. The left singular lines of A^* are the right singular lines of A;
  2. The right singular lines of A^* are the left singular lines of A;
  3. The singular values of A^* coincide with the singular values of A.

As automatic consequences of the definition, we have that

\mathrm{rank}(A^*)=\mathrm{rank}(A) \quad\text{and}\quad \mathrm{Ker}(A^*)=\mathrm{Im}(A)^\perp.

Equivalently, Definition 21.4 says the following: given the “upstairs” SVD of A \in \mathrm{Hom}(V,W),

A=\sigma_1U_1P_1+\dots+\sigma_rU_rP_r,

we define A^* \in \mathrm{Hom}(W,V) to be the morphism whose upstairs SVD is

A^* = \sigma_1U_1^{-1}Q_1+\dots+\sigma_rU_r^{-1}Q_r.

From here, we see that the left polar decomposition of A^* \in \mathrm{Hom}(W,V) is

A^*=U^*Q,

where U^* \in \mathrm{Hom}(W,V) is the morphism

U^*= U_1^{-1}Q_1 + \dots + U_r^{-1}Q_r

and Q \in \mathrm{End}(W) is the morphism

Q=\sigma_1Q_1+\dots+\sigma_rQ_r.

The right singular decomposition of A^* \in \mathrm{Hom}(V,W) is

A^*=PU^*,

where as P =\sigma_1P_1+\dots+\sigma_rP_r \in \mathrm{End}(V) is once again the SV-weighted sum of orthogonal projections onto the left singular lines of A, which are the right singular lines of A^*.

Problem 21.3. Show that the operators A^*A \in \mathrm{End}(V) and AA^* \in \mathrm{End}(W) are given by weighted projection sums

A^*A=\sum\limits_{i=1}^r \sigma_i^2P_i \quad\text{and}\quad AA^*=\sum\limits_{i=1}^r \sigma_i^2Q_i.

The rest of the course (and virtually all of Math 202B) is about the Hilbert space \mathrm{End}(V)=\mathrm{Hom}(V,V) of endomorphisms of an underlying Hilbert space V. The reason for this is that \mathrm{End}(V) is an algebra: in addition to scalar multiplication (Frobenius), it carries a notion of vector multiplication (composition), and this additional structure is the core content of Math 202B. But even ignoring this additional structure for the moment and simply thinking of \mathrm{End}(V) as a special case of \mathrm{Hom}(V,W) in which V and W happen to coincide, there are interesting things to be said wrt SVD.

Definition 21.6. An operator A \in \mathrm{End}(V) is said to be normal if its left and right singular lines coincide.

Perhaps you know that this is not the normal definition of normal – if so, forget that corrupt knowledge and accept in its place the geometric truth that is Definition 21.1.

Problem 21.4. Show that every normal operator A \in \mathrm{End}(V) commutes with its adjoint, A^*A=AA^*.

Math 202A: Lecture 19

Let V,W \in \mathbf{FHil} be any two finite-dimensional Hilbert spaces, and let A \in \mathrm{Hom}(V,W) be an arbitrary morphism.

Theorem 19.1 (Geometric SVD) We have orthogonal decompositions

V=V_1 \oplus \dots \oplus V_r \oplus \mathrm{Ker}(A) \quad\text{and}\quad W=W_1 \oplus \dots \oplus W_r \oplus \mathrm{Im}(A)^\perp,

where r = \dim \mathrm{Im}(A) is the rank of A, the spaces V_1,W_1,\dots,V_r,W_r are lines, the numbers \sigma_1 \geq \dots \geq \sigma_r >0 are positive, and the restrictions A_i=A|_{V_i} are of the form A_i = \sigma_iU_i with U_i \in \mathrm{Hom}(V_i,W_i) an isometric isomorphism for 1 \leq i \leq r.

In this theorem, “decomposition” really means two things: first, it refers to simultaneous orthogonal decompositions of the spaces V and W into orthogonal lines with chunky remainders \mathrm{Ker}(A) \leq V and \mathrm{Im}(A)^\perp \leq W, and second it refers to a corresponding decomposition of the transformation A into a combination of line-to-line mappings and a terminal collapsing of the (possibly non-existent) chunk \mathrm{Ker}(A) into the zero vector of W. The decomposition of A may be written compactly as

A= \sum\limits_{i=1}^r \sigma_iU_iP_i,

where P_i \in \mathrm{End}(V) is the orthogonal projection of V onto the subspace V_i \leq V. The best way to consider this operator decomposition is an orthogonal decomposition of A in a Hilbert space of transformations, which makes the singular values \sigma_1,\dots,\sigma_r the coordinates of A.

More precisely, \mathrm{Hom}(V,W) is itself a Hilbert space: letting X \subset V and Y \subset W be orthonormal bases, we defined the Frobenius scalar product on \mathrm{Hom}(V,W) by

\langle A,B \rangle =\sum\limits_{x \in X}\sum\limits_{y \in Y} \overline{\langle y,Ax\rangle}\langle y,Bx \rangle,

or equivalently

\langle A,B \rangle= \sum\limits_{x \in X} \langle Ax,Bx\rangle.

This gives the Frobenius norm

\|A\|^2 = \sum\limits_{x \in X}\sum\limits_{y \in Y}|\langle y,Ax\rangle|^2,

or equivalently

\|A\|^2 = \sum\limits_{x \in X} \|Ax\|^2.

We had a nice time proving early on, from first principles, that the Frobenius scalar product is in fact independent of the Cartesian coordinate systems in V and W used to construct it. Equivalently, if T \in \mathrm{End}(V) and U \in \mathrm{End}(W) are unitary operators then the Frobenius scalar product satisfies

\langle UAT,UBT\rangle=\langle A,B \rangle.

Now let us consider Theorem 19.1 from the point of view of Hilbert space geometry in \mathrm{Hom}(V,W) equipped with the Frobenius scalar product. Let us write the SVD of A \in \mathrm{Hom}(V,W) as

A = \sum\limits_{i=1}^r \sigma_iE_i,

where E_i=U_iP_i \in \mathrm{Hom}(V,W).

Proposition 19.2. \{E_1,\dots,E_r\} is an orthonormal set in \mathrm{Hom}(V,W).

Proof: Let X \subset V be an ordered orthonormal basis adapted to the decomposition V=\oplus_iV_i, and let Y \subset W be an ordered orthonormal basis adapted to the decomposition W=\oplus_i W_i. This means that, for each 1 \leq i \leq r, we have

Ax_i = \sigma_iy_i.

Equivalently, the transformation E_i=U_iP_i which acts on V by first orthogonally projecting onto V_i and then mapping this line isometrically onto the line W_i is a rank one isometry.

Now compute the Frobenius norm of E_i using the adapted basis X \subset V,

\|E_i\|^2 = \sum\limits_{x \in X} \|E_ix\|^2 = \|x\|^2=1.

Similarly, suppose i\neq j and consider the scalar product

\langle E_i,E_j\rangle = \sum_{x \in X} \langle E_ix,E_jx\rangle.

There are two mutually exclusive cases for the term in this sum corresponding to a given x \in X. First case: x lies on neither of the lines V_i,V_j, so E_ix=E_jx is the zero vector in W. Second case: x lies on at least one of the lines V_i so at least one of the vectors E_ix,E_jx is the zero vector in W. \square

Now, the rank r of A\in \mathrm{Hom}(V,W) satisfies r \leq \min(\dim V,\dim W) but we can extend the orthonormal set \{E_1,\dots,E_r\} \subset \mathrm{Hom}(V,W) to an orthonormal basis, relative to the Frobenius scalar product. Then, the SVD simply says that the singular values of A are its nonzero coordinates in this basis. In particular, we have the following

Proposition 19.3. For any A \in \mathrm{Hom}(V,W), we have

\|A\|^2=\sum\limits_{i=1}^r \sigma_i^2.

Proof: By Proposition 19.2, this is an instance of the Pythagorean theorem. \square

A difficult and interesting question is to try to understand how the singular values of a sum A+B of two morphisms A,B \in \mathrm{Hom}(V,W) are are determined by the singular values of the summands A and B. More conceptually, this means that we want to define functionals \sigma_i on \mathrm{Hom}(V,W) which send a morphism \in \mathrm{Hom}(V,W) to its ith largest singular value (so there are \min(\dim V,\dim W) of these functionals). These functionals are nonlinear and it is difficult to say much about them in general. One thing we can say immediately is that

\sigma_1(A+B) \leq \sigma_1(A)+\sigma_1(B),

which is a consequence of the fact that \sigma_1(A) is the operator norm of A, and indeed most statements that can be made are inequalities rather than identities. However, the following identity is an immediate consequence of the Hilbert space structure on \mathrm{Hom}(V,W).

Proposition 19.4. For any A,B \in \mathrm{Hom}(V,W) we have

\sum_i \sigma_i(A+B)^2 + \sum_i \sigma_i(A_B)^2 = 2 \sum_i \sigma_i(A)^2 + 2 \sum_i\sigma_i(B)^2.

Proof: By Proposition 19.2, this is an instance of the Parallelogram Law. \square

Math 202A: Lecture 20

***Happy Thanksgiving – all problems here due 12/03, 23:59.***

In Lecture 19, we interpreted the SVD of a morphism A \in \mathrm{Hom}(V,W) in terms of the Hilbert space geometry of \mathrm{Hom}(V,W) equipped with the Frobenius scalar product. Let us start “downstairs,” with the orthogonal decompositions

V=\underbrace{V_1 \oplus \dots \oplus V_r}_{\mathrm{Ker}(A)^\perp} \oplus \mathrm{Ker}(A) \quad\text{and} \quad W=\underbrace{W_1 \oplus \dots \oplus W_r}_{\mathrm{Im}(A)} \oplus \mathrm{Im}(A)^\perp

The left singular lines V_1,\dots,V_r are orthogonal one-dimensional subspaces of V, and the right singular lines W_1,\dots,W_r are orthogonal one-dimensional subspaces in W. The restriction of A to V_i for 1 \leq i \leq r has the form \sigma_iU_i, with U_i \in \mathrm{Hom}(V_i,W_i) a line-to-line isometry and \sigma_1 \geq \dots \geq \sigma_r >0. Corresponding to this, we have an “upstairs” decomposition

A=\sigma_1E_1 + \dots + \sigma_rE_r,

happening in the Hilbert space \mathrm{Hom}(V,W), where E_i = U_iP_i and P_i \in \mathrm{End}(V) is the orthogonal projection of V onto the line V_i. In Lecture 19, we proved that \{E_1,\dots,E_r\} \subset \mathrm{Hom}(V,W) is an orthonormal set wrt the Frobenius scalar product, hence it can be extended to an orthonormal basis \{E_1,\dots,E_r,\dots,E_{mn}\} of the Hilbert space \mathrm{Hom}(V,W) such that

[A]_{\{E_i\}}=\begin{bmatrix} \sigma_1 \\ \vdots \\ \sigma_r \\ 0 \\ \vdots \\ 0 \end{bmatrix}.

Thus, for any 1 \leq q \leq r, the vector A_q \in \mathrm{Hom}(V,W) defined by

A_q = \sigma_1E_1+\dots+\sigma_qE_q

has coordinates

[A_q]_{\{E_i\}}=\begin{bmatrix} \sigma_1 \\ \vdots \\ \sigma_q \\ 0 \\ \vdots \\ 0 \end{bmatrix},

and is simply the orthogonal projection of the vector A \in \mathrm{Hom}(V,W) onto the coordinate space \mathrm{Span}\{E_1,\dots,E_q\}. Remembering that A,A_q \in \mathrm{Hom}(V,W) are morphism and not just inert vectors, we ask: what is the significance of A_q as a linear transformation from V into W?

Theorem (Eckart-Young-Mirsky): Among all rank q morphisms in \mathrm{Hom}(V,W), the transformation A_q is an optimal approximation to A. More precisely, if B \in \mathrm{Hom}(V,W) is any morphism of rank q, then \|A-B\| \geq \|A-A_q\| in Frobenius norm.

At a recent conference, my colleague Jorge Garza-Vargas emphasized to me that this result is crucial in applications of linear algebra to data compression, and that I would be doing you the student a grave injustice if I skipped it. So we will now prove the EYM theorem via a geometric argument which hopefully gives some understanding of why this result matters in data compression.

First of all, the upstairs SVD of A_q \in \mathrm{Hom}(V,W) is by construction

A_q = \sigma_1E_1 + \dots + \sigma_qE_q,

so it is clear that the rank of A_q is indeed q. Our task is to demonstrate that if B \in \mathrm{Hom}(V,W) is an arbitrary transformation of rank q, then

\|A-A_q\| \leq \|A-B\|.

We will decompose this into a two-part optimization problem by first optimizing over all approximants with a fixed image, and then optimizing over the image. Thus, let U \in \mathcal{L}_q(W) be a point in the Grassmannian of q-dimensional subspaces of W.

Problem 20.1. Show that the optimization problem

\text{minimize } \|A-B\| \text{ subject to } \mathrm{Im}(B)=U

is solved by B=P_UA, where P_U \in \mathrm{End}(W) is the orthogonal projection of W onto the subspace U.

The morphism P_UA \in \mathrm{Hom}(V,W) is called the compression of A into U, and by construction it has rank \dim U=q. It remains now to optimize the functional

S_A(U) = \|A-P_UA\|^2

over U \in \mathcal{L}_q(W). To do this we will express S_A(U) in terms of the singular data of A. Let L \subset V and R \subset W be ordered orthonormal bases adapted to the downstairs decompositions

V=V_1 \oplus \dots \oplus V_r \oplus \mathrm{Ker}(A)\quad\text{and}\quad W=W_1\oplus \dots \oplus W_r \oplus \mathrm{Im}(A)^\perp.

This means that L=\{v_1,\dots,v_r,v_{r+1},\dots,v_n\} and R=\{w_1,\dots,w_r,w_{r+1},\dots,w_m\} are orthonormal bases of V and W such that

Av_i = \sigma_i w_i

for 1 \leq i \leq r, which is summarized by saying that v_1,\dots,v_r are left singular vectors for A and w_1,\dots,w_r are right singular vectors for A. The remaining vectors v_{r+1},\dots,v_n in L form an orthonormal basis of the kernel of A in V, while the remaining vectors w_{r+1},\dots,w_m in R form an orthonormal basis for the complement of the image of A in W. Now, to any subspace U \in \mathcal{L}(W) we associate the compression numbers c_1(U),\dots,c_r(U) defined as the norms

c_i(U) = \|P_Uw_i\|

of the vectors obtained by projecting each right singular vector w_1,\dots,w_r of A onto U. Note that c_i(U) is in fact independent of which unit vector w_i we sample from the line W_i, since any other unit sample from W_i will be a scalar multiple of w_i by a complex number of modulus one. Note that 0 \leq c_i(U) \leq 1 with nonnegativity due to the fact that these numbers are norms, and the upper bound due to the fact that they are norms of projections of unit vectors. In fact, there is one more constraint on the compression numbers of a q-dimensional subspace.

Problem 20.2. Show that for any U \in \mathcal{L}(W) we have

\sum_{i=1}^r c_i(U)^2 \leq \dim U.

Now we are ready to quantify compression error.

Problem 20.3. Show that

S_A(U)=\|(I-P_U)A\|^2 = \sum\limits_{i=1}^r \sigma_i^2(1-c_i(U)^2).

Let us write the error formula above as

S_A(U) = \sum\limits_{i=1}^r \sigma_i^2 - \sum\limits_{i=1}^r \sigma_i^2c_i(U)^2=\|A\|_F^2 -\sum\limits_{i=1}^r \sigma_i^2c_i(U)^2.

Here we are seeing that the squared error S_A(U) in approximating A by the compression P_UA is at most the Frobenius norm of A, which is the error we incur if we compress A to zero.

Our task now is to maximize the part of the sum which reduces this worst-case error. This is a very simple optimization problem: maximize the quadratic form

S(c_1,\dots,c_r) = \sum\limits_{i=1}^r \sigma_i^2 c_i^2

subject to the constraints

0 \leq c_i \leq 1 \quad\text{and}\quad \sum\limits_{i=1}^r c_i^2 \leq q.

Remembering that the coefficients of S satisfy

\sigma_1 \geq \dots \leq \sigma_r>0,

our optimal strategy is to put as much weight as possible on the q largest coefficients by taking

(c_1,\dots,c_r) = (\underbrace{1,\dots,1}_1,\underbrace{0,\dots,0}_{r-q}).

Problem 20.4. Finish the proof of the EYM theorem by showing that the compression numbers of U=W_1 \oplus \dots \oplus W_r realize the above maximizer, and that the corresponding compression is P_U=\sigma_1E_1+\dots +\sigma_qE_q.