Math 202A: Lecture 16

This lecture is the most important lecture in Math 202A: we will obtain a remarkably transparent of morphisms in \mathbf{FHil}, in total generality. It is perhaps surprising that this can be done; it is perhaps not surprising that it takes a couple of tries to get it right.

Take any two objects V,W and any morphism A \in \mathrm{Hom}(V,W) between them. Our job is to understand A without making any special assumptions on its structure. A natural strategy – perhaps the only strategy in such generality – is to find a canonical (in this case, meaning defined in the same way for all spaces) subspace V_0 of V on which we can understand the behavior of A completely. The hope is that we can bootstrap this local understanding of A on the special subspace V_0 to a global understanding on A on all of V.

This is yet another opportunity to point out that the quantum category is better than the classical one. Indeed, suppose we can find a subspace V_0 of V on which we can calculate the outputs of A exactly. If we think set theoretically, then in order to complete our understanding of A we would have to find a way to understand it on the set-theoretic complement

V_0^c = \{v \in V \colon v \not\in V_0\}.

But linear algebraically, in order to complete our understanding of A it is sufficient to bootstrap our understanding to the orthogonal complement

V_0^\perp=\{v \in V \colon \langle v,v_0 \rangle=0 \text{ for all }v \in V_0\}.

For example in three dimensions, this is the difference between extending from the line of understanding to the whole space with that line deleted, and extending only to the plane orthogonal to the line of understanding. The latter is a dimension reduction, the former is not.

There is one obvious candidate for an “exactly solvable” subspace V_0 canonically associated to any A \in \mathrm{Hom}(V,W) namely V_0 = \mathrm{Ker}(A). We know exactly how to compute Av for every v \in V_0, the output is Av=0_W. Unfortunately, the kernel is not a good starting point for a bootstrap understanding of an arbitrary linear transformation. First of all, if A is injective than V_0 = \{0_V\}, and the statement A(0_V)=0_W is just part of what it means to be a linear transformation. Furthermore, even in the case where V_0=\mathrm{Ker}(A) is nontrivial we learn nothing, and attempting to bootstrap from V_0 to all of V just brings us back to the injective situation, with no drop in complexity. The image of V_0 under A is the zero space W_0=\{0_W\} in W. Decomposing

V = V_0 \oplus V_0^\perp \quad\text{and}\quad W=W_0 \oplus W_0^\perp,

we tautologically have that A maps V_0^\perp into W_0^\perp=W. Restricting A to a map A \in \mathrm{Hom}(V_0^\perp,W), we are back in the injective situation. The dimension of the codomain W did not go down, and neither did the rank of A. In terms of matrices, if we choose an orthonormal basis X \subset V_0 of latex $V_0 = \mathrm{Ker}(A)$ and an orthonormal basis X^\perp \subset V_0^\perp, and take Y \subset W to be any orthonormal basis, then with respect to the bases X \sqcup X^\perp and Y we have

[A] =\begin{bmatrix} 0 & [A]\end{bmatrix},

where the row echelon form of the nested [A] has the same row echelon form as the external [A] up to the removal of zero columns.

The basic lesson to be learned from this failure is that in order to implement our bootstrap strategy we need to identify an exactly solvable subspace V_1 canonically associated to every morphism \mathrm{Hom}(V,W) by means to algebra and geometry, not just pure algebra. One such space is V_1=\mathrm{Opt}(A), the space

V_1 = \{v \in V \colon \|Av\|=\sigma_1\|v\|\}

whose nonzero vectors maximize the norm ratio \frac{\|Av\|}{\|v||}, a quantity bounded above by the operator norm \sigma_1=\|A\|. At the beginning of this week’s lectures we proved the following nontrivial theorem using the Parallelogram Law.

Proposition 16.1. The set V_1 is a nonzero subspace of V.

The key idea of the Singular Value Decomposition is that V_1 is a space on which the behavior of A can be completely understood, and this understanding can be bootstrapped to all of V. That is really what I want you to appreciate at a level beyond formulas. An arbitrary linear transformation can be completely understood in terms of the vectors which it maximally stretches.

First, let us explain why A is “exactly solvable” on V_1, no matter what A is. There are really two distinct cases: \sigma_1=0 and \sigma_1>0. The first case means A \in \mathrm{Hom}(V,W) is the zero transformation, and nothing more needs to be said. In the second case, \sigma_1>0, let W_1 be the image of V_1 under A and consider the restriction of A to V_1, which is a morphism A_1 \in \mathrm{Hom}(V_1,W_1). Since W_1=A(V_1), the map A_1 \in \mathrm{Hom}(V_1,W_1) is surjective by definition, and furthermore \|A_1\|=\sigma_1. Let us write A_1 \in \mathrm{Hom}(V_1,W_1) in the form A=\sigma_1 U_1 where U_1=\frac{1}{\sigma_1}A. Since U_1 is a nonzero scalar multiple of A_1, it is also surjective. Moreover, U_1 has operator norm equal to one, so it is an isometry and hence injective. Thus, U_1 \in \mathrm{Hom}(V_1,W_1) is an isometric isomorphism: the image of any orthonormal basis of X_1 \subset V_1 under U_1 is an orthonormal basis of Y_1 \subset W_1, and the matrix of U_1 \in \mathrm{Hom}(V_1,W_1) with respect to these bases is the square identity matrix

[U_1] = \begin{bmatrix} 1 & {} & {} \\ {} & \ddots & {} \\ {} & {} & 1\end{bmatrix}.

Thus, the morphism A_1 \in \mathrm{Hom}(V_1,W_1) with respect to the bases X_1 \subset V_1 and Y_1 \subset W_1 is

[A_1] = \begin{bmatrix} 1 & {} & {} \\ {} & \ddots & {} \\ {} & {} & 1\end{bmatrix},

so considering the restriction of A to V_1=\mathrm{Opt}(A) to be “exactly solved” should be noncontroversial. If you want a Euclidean image to hold in your mind, one option is the following: you can imagine V_1 and W_1 as two copies of the same space, and visualize A_1=\sigma_1 U_1 as mapping V_1 onto W_1 by first performing a rotation and then a dilation.

Now we want to bootstrap our local understanding of A \neq 0 on V_1 to a global understanding of A on V. It may happen that we get lucky, and no bootstrapping is required: V_1=V. This means exactly that the transformation A \in \mathrm{Hom}(V,W) is of the form A=\sigma U with \sigma>0 and U \in \mathrm{Hom}(V,W) an isometric isomorphism.

We now proceed with the case where V_1=\mathrm{Opt}(A) is a proper subspace of V (note that this implies A \in \mathrm{Hom}(V,W) is not the zero transformation). In this case we have to figure out how A acts on the complementary space V_1^\perp, which is not the zero subspace of V. Structurally, this means we need to figure out how A behaves with respect to the decompositions

V=V_1 \oplus V_1^\perp \quad\text{and}\quad W=W_1 \oplus W_1^\perp,

where by definition W_1=A(V_1) is the image of V_1=\mathrm{Opt}(A) under A. A good starting point is to wonder how V_0=\mathrm{Ker}(A) relates to the decomposition V=V_1 \oplus V_1^\perp, which is a point of intrinsic interest: what is the relationship between the vectors which A maximally stretches and those which it annihilates? It is easy to see that V_0 \cap V_1=\{0_V\}. Indeed, if v \in V_0 \cap V_1 then we have

Av=0_W \quad\text{and}\quad \|Av\|=\sigma_1\|v\|,

and since \sigma_1>0 the condition \sigma_1\|v\|=0 forces \|v\|=0.

Proposition 16.2. We have V_0 \leq V_1^\perp.

Proof: A more geometrically suggestive equivalent statement is to take the complement and write the claim as V_1 \leq V_0^\perp. This says that any vector in V which is maximally stretched by A is orthogonal to every vector annihilated by A, and indeed this can be established using elementary geometric reasoning. (Note to self: somehow I never noticed this striking fact before – this is a good exam problem). \square

The counterpart of Proposition 16.3 in the target space decomposition W=W_1 \oplus W_1^\perp is very straightforward: since W_1 = A(V_1) we have W_1 \leq \mathrm{Im}(A) which is equivalent to \mathrm{Im}(A)^\perp \leq W_1^\perp.

Now consider the extreme case of Proposition 16.2 where V_0=\mathrm{Ker}(A) exhausts V_1^\perp = \mathrm{Opt}(A)^\perp. Then, our decompositions

V= V_1 \oplus V_1^\perp \quad\text{and}\quad W=W_1 \oplus W_1^\perp

becomes

V=\mathrm{Opt}(A) \oplus \mathrm{Ker}(A) \quad\text{and}\quad W=\mathrm{Im}(A) \oplus \mathrm{Im}(A)^\perp,

and we have achieved our goal of understanding how A \in \mathrm{Hom}(V,W) acts on all of its domain V – it acts as the scaled isometric isomorphism \sigma_1U_1 in V_1=\mathrm{Opt}(A), and it acts as the zero operator in V_1^\perp = \mathrm{Ker}(A). This situation is important enough to warrant special terminology.

Definition 16.3. A morphism U \in \mathrm{Hom}(V,W) is said to be a partial isometry if it acts isometrically on a nonzero subspace V_1 of V and annihilates the complement V_1^\perp.

Equivalently, U \in \mathrm{Hom}(V,W) is a partial isometry if \|U\|=1 and

V=\mathrm{Opt}(U) \oplus \mathrm{Ker}(U).

What we have shown is that any nonzero morphism A \in \mathrm{Hom}(V,W) such that \mathrm{Opt}(A) and \mathrm{Ker}(A) are complementary subspaces of V is a positive multiple of a partial isometry.

The extremal case of Proposition 16.2 is just that – an extreme case. In general V_0=\mathrm{Ker}(A) may be a proper subspace of V_1^\perp = \mathrm{Opt}(A)^\perp and it remains to analyze this situation. This is where the following result (which we already proved using a perturbation argument) is essential.

Proposition 16.4. We have A(V_1)^\perp \leq W_1^\perp.

The force of proposition 16.4 is that in conjunction with the decompositions

V=V_1 \oplus V_1^\perp \quad\text{and}\quad W = W_1 \oplus W_1^\perp

it reduces our problem to a genuinely lower-complexity version of the same problem: we have

A = A_1 \oplus A_1^\perp,

where A_1 \in \mathrm{Hom}(V_1,W_1) is a morphism of the form A_1=\sigma_1U_1 with \sigma_1>0 and U_1 \in \mathrm{Hom}(V_1,W_1) an isometric isomorphism A_1^\perp \in \mathrm{Hom}(V_1^\perp,W_1^\perp) which remains to be analyzed, but has lower complexity than our original morphism A \in \mathrm{Hom}(V,W) because

\mathrm{rank}(A) = \mathrm{rank}(A_1) + \mathrm{rank}(A_1^\perp) \implies \mathrm{rank}(A_1^\perp) = \mathrm{rank}(A) - \dim V_1.

We now repeat our analysis of A \in \mathrm{Hom}(V,W) for the lower-complexity situation A_1^\perp \in \mathrm{Hom}(V_1^\perp,W_1^\perp), where both the dimension of the domain and the rank of the morphism have strictly decreased – this is what failed to happen when we tried to bootstrap starting with V_0=\mathrm{Ker}(A).

Let us carry out this iteration. Let V_2 = \mathrm{Opt}(A_1^\perp), i.e.

V_2 = \{v \in V_1^\perp \colon \|A_1^\perp v\|=\sigma_2\|v\|\}

where \sigma_2 = \|A_1^\perp\|. Note that since A_1^\perp \in \mathrm{Hom}(V_1^\perp,W_1^\perp) is nothing but the restriction of A \in \mathrm{Hom}(V,W) to V_1^\perp = \mathrm{Opt}(A)^\perp, we have \sigma_2 < \sigma_1 and the inequality is strict, as pointed out to me by Evan Young at least twice. If \sigma_2=0 we are done: A_1^\perp is the zero operator and we are back in the situation where A \in \mathrm{Hom}(V,W) is a positive multiple of a partial isometry U \in \mathrm{Hom}(V,W). Otherwise, \sigma_2 >0 and we define W_2 = A(V_2). Then, the restriction of A_1^\perp to W_2 is a morphism A_2 \in \mathrm{Hom}(V_2,W_2) of the form A_2=\sigma_2 U_2 with U_2 \in \mathrm{Hom}(V_2,W_2) an isometric isomorphism. As with our first pass, it remains to understand how A_1^\perp acts on the orthogonal complement of V_2 in V_1^\perp. Structurally, we need to understand the decompositions

V_1^\perp = V_2 \oplus V_2^\perp \quad\text{and}\quad W_1^\perp = W_2 \oplus W_2^\perp.

As in our first iteration, \mathrm{Ker}(A_1^\perp) is a subspace of V_2^\perp=\mathrm{Opt}(A_1^\perp)^\perp. If \mathrm{Ker}(A_1^\perp) exhausts V_2^\perp then we are done: we have

V_1^\perp = V_2 \oplus \mathrm{Ker}(A_1^\perp) \quad\text{and}\quad W_1^\perp = W_2 \oplus \mathrm{Im}(A_1^\perp)^\perp,

which lifts back up to the level of A \in \mathrm{Hom}(V,W) as

V= V_1 \oplus V_2 \oplus \mathrm{Ker}(A) \quad\text{and}\quad W=W_1 \oplus W_1 \oplus \mathrm{Im}(A)^\perp.

If not, we repeat the process on the restriction of A_1^\perp to V_2^\perp, which is a morphism A_3 \in \mathrm{Hom}(V_2^\perp,W_2^\perp) of rank

\mathrm{rank}(A_3)=\mathrm{rank}(A)-\dim V_1 - \dim V_2.

Continuing the process until we exhaust \mathrm{rank}(A) \leq \min(\dim V,\dim W), we arrive at the Singular Value Decomposition.

Theorem 16.5. (SVD, chunky geometric form) For any morphism A \in \mathrm{Hom}(V,W), we have orthogonal decompositions

V = V_1 \oplus \dots V_k \oplus \mathrm{Ker}(A) \quad\text{and}\quad W=W_1 \oplus \dots \oplus W_k \oplus \mathrm{Im}(A)^\perp

and numbers

\sigma_1 > \dots > \sigma_k >0

such that, for each 1 \leq i \leq k, the spaces V_i,W_i are nonzero and the restriction of A to V_i is a morphism A_i \in \mathrm{Hom}(V_i,W_i) of the form A_i=\sigma_iU_i with U_i \in \mathrm{Hom}(V_i,W_i) an isometric isomorphism.

The paired spaces V_1,W_1,\dots,V_k,W_k in the SVD are called the left and right singular spaces of A. Although these spaces are nonzero, the number k of singular spaces may be zero. Indeed, \dim(V_1)+\dots+\dim(V_k)=\mathrm{rank}(A), and the case k=0 occurs precisely when A \in \mathrm{Hom}(V,W) is the zero transformation.

In general, the worst case scenario for the runtime of the recursive routine which establishes the SVD occurs when A \in \mathrm{Hom}(V,W) has a one-dimensional optimizer space V_1=\mathrm{Opt}(A), and the restriction of A to V_1^\perp again has a one-dimensional optimizer space, and so on until the rank r of A is exhausted after r iterations. In this case, A has r pairs of one-dimensional singular spaces V_1,W_1,\dots,V_r,W_r, and r singular values \sigma_1 > \dots > \sigma_r>0.

Even when the worst case scenario is not what occurs, we can write the Singular Value Decomposition in a form where all subspaces involved are one-dimensional, simply by choosing a basis within each V_i and writing it as a direct sum of lines. This leads to an alternative formulation of the SVD with possibly repeated singular values, which really means singular values with multiplicity.

Theorem 16.6. (SVD, fine-grained geometric version) For any morphism A \in \mathrm{Hom}(V,W) of rank r, we have orthogonal decompositions

V = V_1 \oplus \dots V_r \oplus \mathrm{Ker}(A) \quad\text{and}\quad W=W_1 \oplus \dots \oplus W_r \oplus \mathrm{Im}(A)^\perp

and numbers

\sigma_1 \geq \dots \geq \sigma_r >0

such that, for each 1 \leq i \leq r, the spaces V_i,W_i are one-dimensional and the restriction of A to V_i is a morphism A_i \in \mathrm{Hom}(V_i,W_i) of the form A_i=\sigma_iU_i with U_i \in \mathrm{Hom}(V_i,W_i) an isometric isomorphism of the lines V_i and W_i.

What is confusing in the way the two versions of the SVD are stated is that the symbols V_i,W_i represent different things, but these things are not unrelated. Nevertheless I will leave our two versions as stated above instead of introducing different letters for each version.

Math 202A: Lecture 17

In this lecture we will discuss the adjoint and transpose. You probably know all about these matrix operations, and may feel that you are too enlightened to participate in this discussion. That feeling is called “ignorance”. Before enlightenment: take adjoints, transpose matrices. After enlightenment: take adjoints, transpose matrices.

In the beginning (of Math 202A), there was quantization,

\mathcal{F} \colon \mathbf{FSet} \longrightarrow \mathbf{FHil}.

\mathcal{F} sends a finite set X to the free Hilbert space \mathcal{F}(X) containing X as an orthonormal basis. It also sends a set function f \in \mathrm{Hom}(X,Y) to the linear transformation \mathcal{F}(f) \in \mathrm{Hom}(\mathcal{F}(X),\mathcal{F}(Y)) whose action on this basis is

\mathcal{F}(f)x = f(x), \quad x \in X.

\mathcal{F} is not the only functor from sets to spaces. As the Good Book says, the adjoint

\mathcal{F}^* \colon \mathbf{FSet} \longrightarrow \mathbf{FHil}

was with quantization, and the adjoint was quantization. Indeed, \mathcal{F}^* matches \mathcal{F} on sets,

\mathcal{F}^*(X)=\mathcal{F}(X),

but acts differently on functions: instead of pushing a point of X forward onto a point of Y via f, \mathcal{F}^* pulls a point of Y back onto its preimage under f,

\mathcal{F}^*(f)y = \sum\limits_{x \in f^{-1}(y)} x.

Thus the adjoint \mathcal{F}^* of \mathcal{F} is a contravariant functor,

\mathcal{F}^*(f) \in \mathrm{Hom}(\mathcal{F}^*(Y),\mathcal{F}^*(X))=\mathrm{Hom}(\mathcal{F}(Y),\mathcal{F}(X)).

It is worth noting that while \mathcal{F}(f) \in \mathrm{Hom}(\mathcal{F}(X),\mathcal{F}(Y)) is a linear transformation which contains the same information as the original set function f \in \mathrm{Hom}(X,Y), presented in the same way, \mathcal{F}^*(f) \in \mathrm{Hom}(\mathcal{F}(Y),\mathcal{F}(X)) has no direct counterpart in \mathbf{FSet}. In the classical computational category, you can identify a function f with the ordered list of its fibers, but the fibers of f are not points of its domain, they are subobjects of it. In the quantum computational category you can consider \mathcal{F}^*(f)y, which is literally a point of the domain of \mathcal{F}(f). It is precisely this fact which allowed us to establish the Singular Value Decomposition for quantized set functions with no additional technology.

One more remark about the relationship between \mathcal{F}(f) and \mathcal{F}^*(f). A function f \in \mathrm{Hom}(X,Y) can be viewed as a bipartite graph. The transformation \mathcal{F}(f) encodes the edges of the bipartite graph f, but not quite in the usual way graph theorists do. A graph theorist would encode adjacency in f as an endomorphism of

\mathcal{F}(X \sqcup Y) \simeq \mathcal{F}(X) \oplus \mathcal{F}(Y),

namely as the linear operator A whose matrix in the ordered orthonormal basis X \sqcup Y = \{x_1,\dots,x_n,y_1,\dots,y_m\} is

[A] = \begin{bmatrix} 0_{m \times n} & \mathcal{F}^*(f) \\ \mathcal{F}(f) & 0_{n \times m} \end{bmatrix}.

There is a very concise way to capture the relationship between the pushforward \mathcal{F}(f) and the pullback \mathcal{F}^*(f) using scalar products.

Problem 17.1. Let V=\mathcal{F}(X) and W=\mathcal{F}(Y); write A=\mathcal{F}(f) and A^*=\mathcal{F}^*(f). Show that

\langle w,Av \rangle =\langle A^*w,v\rangle=\overline{\langle v,A^*w\rangle}, \quad\text{for all }v \in V,\ w \in W.

Now we look at Problem 16.1 and ask ourselves the same question we asked for the Singular Value Decomposition: can we untether this from combinatorics? If so, what does the untethered relation mean geometrically? More precisely, given any two finite-dimensional Hilbert spaces V,W and any linear transformation A \in \mathrm{Hom}(V,W), we seek a linear transformation A^* \in \mathrm{Hom}(V,W) which satisfies

\langle w,Av \rangle =\langle A^*w,v\rangle=\overline{\langle v,A^*w\rangle}, \quad\text{for all }v \in V,\ w \in W.

Assuming such an A^* exists, we want a geometric interpretation of it.

Problem 17.2. Show that if A^* exists, it is unique.

The proof of the existence of A^* is very straightforward. Let X \subset V and Y \subset W be any two orthonormal bases; these are guaranteed to exist by Gram-Schmidt and we have V=\mathcal{F}(X) and W = \mathcal{F}(Y). However, it is not necessarily true that we can select X,Y such that our pre-selected A \in \mathrm{Hom}(V,W) maps every point of X to a point of Y; indeed, a necessary condition for this to be possible is that A be an isometry. Nevertheless, we can simply define A^* \in \mathrm{Hom}(W,V) by declaring its matrix elements to be what we want them to be. More precisely, set

\langle x,A^*y\rangle := \langle Ax,y \rangle = \overline{\langle y,Ax \rangle}, \quad x \in X,\ y \in Y.

Then, for any latex v \in V and w \in W we have

\langle v,A^*w\rangle = \sum\limits_{x \in X} \sum\limits_{y \in Y} \overline{\langle x,v \rangle}\langle y,w\rangle \langle x,A^*y\rangle

and this coincides with

\langle Av,w\rangle = \sum\limits_{x \in X} \sum\limits_{y \in Y} \overline{\langle x,v \rangle}\langle y,w\rangle \overline{\langle Ax,y\rangle}

thanks to our definition of A^*. This is completely true, and defines the adjoint A^* \in \mathrm{Hom}(W,V) of a linear transformation A \in \mathrm{Hom}(V,W) in a manner which is logically unassailable and yet totally unsatisfying for all but the most soulless of algebraists.

A natural geometric approach to reversing and arbitrary linear transformation A \in \mathrm{Hom}(V,W) can be developed using the Singular Value Decomposition, which gives us orthogonal decompositions

V = V_1 \oplus \dots \oplus V_r \oplus \mathrm{Ker}(A) \quad\text{and}\quad W=W_1 \oplus \dots \oplus W_r \oplus \mathrm{Im}(A)^\perp,

positive numbers

\sigma_1 \geq \dots \geq \sigma_r >0,

and isometric isomorphisms

U_1 \in \mathrm{Hom}(V_1,W_1),\dots,U_r \in \mathrm{Hom}(V_r,W_r)

such that

A|_{V_i} = \sigma_1 U_i, \quad 1 \leq i \leq r.

By abuse of notation, we write

A= \sigma_1U_1 \oplus \dots \oplus \sigma_rU_r \oplus 0_{p\times q},

where p= \dim \mathrm{Ker}(A) is the nullity of A and q=\dim \mathrm{Im}(A)^\perp is the corank of A, so that 0_{p\times q} \in \mathrm{Hom}(\mathrm{Ker}(A),\mathrm{Im}(A)^\perp) is the zero transformation. Now let us define a corresponding transformation A^\# \in \mathrm{Hom}(W,V) by

A^\# = \sigma_1U_1^{-1} \oplus \dots \oplus \sigma_rU_r^{-1} \oplus 0_{q \times p},

which is shorthand for declaring that the restriction of A^\# to W_i is \sigma_i U_i^{-1} for 1 \leq i \leq r and the restriction of A^\# to \mathrm{Im}(A)^\perp is the zero transformation into \mathrm{Ker}(A).

Problem 17.3. Prove that A^\#=A^*. Deduce \mathrm{Ker}(A^*)=\mathrm{Im}(A)^\perp and \mathrm{Im}(A^*)=\mathrm{Ker}(A)^\perp.

One advantage to the geometric interpretation of A^* as “reversing” the singular value decomposition of A is that it immediately leads you to wonder about “inverting” the singular value decomposition of A. More precisely, the the linear transformation in \mathrm{Hom}(W,V) defined by

A^+ = \sigma_1^{-1}U_1^{-1} \oplus \dots \oplus \sigma_r^{-1}U_r^{-1} \oplus 0_{q \times p}

is known as the Moore-Penrose pseudoinverse of A. Indeed, A^+ is defined by inverting the invertible part of A, in the following sense.

Problem 17.4. Show that the restriction of A to V_1 \oplus \dots \oplus V_r is an isomorphism onto W_1 \oplus \dots \oplus W_r with inverse A^+. Prove that AA^+A=A holds in \mathrm{Hom}(V,W) and that A^+AA^+=A^+ holds in \mathrm{Hom}(W,V).

Now let us discuss the transpose of A \in \mathrm{Hom}(V,W). No doubt you are familiar with the transpose of a matrix, so I will make the following remark: the transpose of the matrix of A is the matrix of A^* with respect to orthonormal bases adapted to the singular value decomposition, i.e. with respect to orthonormal bases of V and W consisting of left and right singular vectors, respectively. This is a singular consequence of the fact that the singular values of A are real. For general orthonormal bases of V and W, the corresponding matrices of A and A^* are related by conjugate transpose, not transpose.

The intrinsic meaning of the matrix transpose is not to be found in the adjoint operation, but in a different operation coming from Riesz duality in \mathrm{FHil}. Recall that for every finite dimensional Hilbert space V we have an antilinear isomorphism with V^*=\mathrm{Hom}(V,\mathbb{C}), the space of linear functionals on V, given by the Riesz mapping

v \mapsto \varphi_v(\cdot) = \langle v,\cdot\rangle, \quad v \in V.

n particular, this allows us to promote V^* from a vector space to a Hilbert space by defining the scalar product

\langle \varphi_v,\varphi_{v^\prime}\rangle = \langle v,v^\prime\rangle, \quad v,v^\prime \in V.

This scalar product is on V^*=\mathrm{Hom}(V,\mathbb{C}) is precisely what we used to prove that the Frobenius scalar product on \mathrm{Hom}(V,W) is basis-independent via an argument inductive in the dimension of W. If X \subset V is an orthonormal basis, we get a corresponding orthonormal basis X^* \subset V^* consisting of the linear functionals \varphi_x(\cdot)=\langle x,\cdot\rangle.

We now declare the transpose of A \in \mathrm{Hom}(V,W) to be the morphism A^t \in \mathrm{Hom}(W^*,V^*) defined by

A^t \psi \colon = \psi \circ A, \quad \psi \in W^*.

Problem 17.5. Verify that A^t really is a linear transformation W^* \to V^*. Taking orthonormal bases X \subset V and Y \subset W, show that the matrix of A^t \in \mathrm{Hom}(W^*,V^*) relative to the dual bases Y^* \subset W^* and X^* \subset V^* is the transpose of the matrix of A \in \mathrm{Hom}(V,W) relative to X \subset V and Y \subset W.

Math 202A: Lecture 15

Let V,W be any two objects of \mathbf{FHil}, and let A be any morphism in \mathrm{Hom}(V,W). Let \sigma_1=\|A\| be the operator norm of A, and let

V_1=\mathrm{Opt}(A)=\{v \in V \colon \|Av\|=\sigma_1\|v\|\}

be the set of vectors in V which are maximally stretched by A. In Lecture 14, we showed that V_1 is a nonzero subspace of V. As a consequence,

W_1=A(V_1) = \{w \in W \colon w=Av \text{ for some }v \in V_1\}

is a subspace of W. Thus we have associated to A two geometrically defined subspaces V_1 \leq V and W_1 \leq W, and we can view A as a linear transformation between these subspaces, i.e. as a morphism A \in \mathrm{Hom}(V_1,W_1).

Proposition 15.1. For A \neq 0, there exists an isometric isomorphism U_1 \colon V_1 \to W_1 such that A = \sigma_1 U_1.

Proof: U_1=\frac{1}{\sigma_1}A_1. \square

Now let us consider the orthogonal decompositions

V = V_1 \oplus V_1^\perp \quad\text{and}\quad W=W_1 \oplus W_1^\perp.

It is a quite important fact that A respects this decomposition: not only does A map V_1 into W_1, it maps V_1^\perp into W_1^\perp. This fact will allow us to prove the Singular Value Decomposition geometrically, by induction on the rank of A.

Theorem 15.1. We have A(V_1^\perp) \leq W_1^\perp.

Proof: Let u \in V be any vector which is orthogonal to every vector v \in V_1. We have to show that

\langle Au,Av\rangle = 0 \quad\text{ for all } v \in V_1.

If u=0_V this is true simply because A is a linear transformation. If \|u\|>0 then we can assume without loss in generality that \|u\|=1, again because A is a linear transformation:

\langle A\left( \frac{u}{\|u\|} \right),Av \rangle = \frac{1}{\|u\|}\langle Au,Av\rangle.

Thus, our job is to show that if u \in V is a unit vector orthogonal to every vector in V_1, then Au is orthogonal to Av for all v\in V_1. So, let v \in V_1 be arbitrary, and consider the scalar product

\alpha := \langle Au,Av\rangle.

We will show that \alpha =0. First, let us show this under the following assumption.

Nonnegative assumption: \alpha \geq 0.

Let \varepsilon >0, and consider the corresponding perturbation v+\varepsilon u of the stretch optimizer v \in V_1 in the direction of the unit vector u \in V. As with every vector in V, this perturbation satisfies

\|A(v+\varepsilon u)\|^2 \leq \sigma_1^2 \|v+\varepsilon u\|^2.

Let us carefully examine each side of this equality. The LHS is

\langle Av + \varepsilon Au,Av + \varepsilon Au\rangle = \langle Av,Av\rangle + \varepsilon \langle Av,Au\rangle +\varepsilon \langle Au,Av \rangle + \varepsilon^2 \langle u,u\rangle.

Simplifying using v \in V_1, the definition of \alpha, and \|u\|=1, this becomes

\|A(v+\varepsilon u)\|^2 = \sigma_1^2\|v\|^2 + 2\alpha\varepsilon + \varepsilon^2.

As for the RHS, this is

\sigma_1^2 \|v+\varepsilon u\|^2 = \sigma_1^2\|v\|^2 +\varepsilon^2\sigma_1^2.

Thus we obtain the inequality

2\alpha\varepsilon + \varepsilon^2 \leq \varepsilon^2\sigma_1^2\,

which implies

2\alpha\varepsilon < \varepsilon^2\sigma_1^2.

We have thus shown that 2\alpha < \varepsilon \sigma_1^2 for every \varepsilon >0, which forces \alpha=1. \square

Problem 15.2. Complete the proof of Theorem 15.1 by removing the assumption \alpha \geq 0. (Hint: this is not hard, so don’t make it hard).

Problem 15.3. For A nonzero, show that the rank of the restriction A \in \mathrm{Hom}(V_1^\perp,W_1^\perp) is strictly less than the rank of A \in \mathrm{Hom}(V,W).

Now we can state the geometric form of the singular value decomposition: a remarkable stratification of the source and target spaces of any linear transformation between finite-dimensional Hilbert spaces which provides a remarkably detailed understanding of its structure.

Theorem 15.2 (SVD, geometric form) Let r be the rank of A \in \mathrm{Hom}(V,W). Then, there exist positive numbers \sigma_1 \geq \dots \geq \sigma_r > 0 and orthogonal decompositions

V = \mathrm{Ker}(A) \oplus V_1 \oplus \dots \oplus V_r \quad\text{and}\quad W=\mathrm{Im}(A)^\perp \oplus W_1 \oplus \dots \oplus W_r

such that for each 1 \leq i \leq r the restriction of A to V_i is of the form A=\sigma_i U_i for U_i \in \mathrm{Hom}(V_i,W_i) and isometry.$

Math 202A: Lecture 14

Last week, we considered the problem of canonical forms for the classical computational category \mathbf{FSet}, whose objects are finite sets with morphisms being functions. Concretely, we are given finite sets X,Y and a function f \in \mathrm{Hom}(X,Y). Our objective is to choose orderings x_1,\dots,x_n and y_1,\dots,y_m of X and Y such that the m \times n matrix [f] of f with respect to these orderings assumes a simple, transparent form.

As with everything having to do with \mathbf{FSet} this is a combinatorial question and it is not too difficult to analyze. The basic idea is that taking any ordering y_1,\dots,y_m allows us to look at f backwards, as the ordered list of its fibers,

f=(f^{-1}(y_1),\dots,f^{-1}(y_m)).

Each fiber is a (possibly empty) subset of X, distinct fibers are disjoint, and the union of all fibers is X. In combinatorial parlance, the function f becomes a weak m-part set composition of X. The rank r of f, which is by definition the cardinality of the image \mathrm{Im}(f) = \{f(x) \colon x \in X\}, is the number of nonempty fibers of f, and clearly r \leq \min(m,n). The type of f is an integer partition of \lambda \vdash n with r parts, or equivalently a Young diagram with n cells and r rows; it is simply the list of the cardinalities of the nonempty fibers of f sorted in non-increasing order.

To obtain a canonical form for f, choose an ordering y_1,\dots,y_m of Y such that \mathrm{Im}(f) = \{y_1,\dots,y_r\} and

|f^{-1}(y_1)| \geq \dots \geq |f^{-1}(y_r)|.

Thus, the type of f is the Young diagram with r rows of lengths

\lambda_i = |f^{-1}(y_i)|, \quad 1 \leq \leq r.

Then, let x_1,\dots,x_n be any ordering of X obtained by first listing the elements of f^{-1}(y_1) in any order, then listing the elements of f^{-1}(y_2), and so on. Then, the matrix of f with respect to these orderings is an m \times n matrix over \{0,1\} which can be described explicitly as follows: the first row of [f] begins with \lambda_1 entries equal to 1, and all remaining entries equal to 0; the second row of [f] begins with \lambda_1 entries equal to 0, followed by \lambda_2 entries equal to 1, and all remaining entries equal to 0; and so on until row r of [f], which has its final \lambda_r entries equal to 1 and all other entries equal 0. Finally, if there are any rows in [f] remaining (i.e. f is not surjective), then these remaining m-r rows are all zero rows.

The main point of last week’s material is that we can do much better than the above by quantizing. Instead of looking at f \in \mathrm{Hom}(X,Y), we invoke the functor

\mathcal{F} \colon \mathbf{FSet} \longrightarrow \mathbf{FHil}

and look at \mathcal{F}(f) \in \mathrm{Hom}(\mathcal{F}(X),\mathcal{F}(Y)). For brevity let us write V=\mathcal{F}(X) and W = \mathcal{F}(Y). Thus, X \subset V and Y \subset W are orthonormal bases, and A = \mathcal{F}(f) \in \mathrm{Hom}(V,W) is a linear transformation with the special feature that

Ax \in Y, \quad x \in X.

We have now opened up the option to look at alternative orthonormal bases in V=\mathcal{F}(f) and W=\mathcal{F}(Y) such that the corresponding matrix of A = \mathcal{F}(f) may be even simpler than matrix of A relative to X and Y, which is the binary matrix [f] of the function f\in \mathrm{Hom}(X,Y). The capstone problem last week shows that we can construct an ordered orthonormal basis e_1,\dots,e_n of X such that the matrix elements of A \in \mathrm{Hom}(V,W) relative to the ordered orthonormal bases e_1,\dots,e_n and y_1,\dots,y_m satisfies

\langle y_1,Ae_1 \rangle = \sigma_1,\dots,\langle y_r,Ae_r\rangle=\sigma_r,

where

\sigma_i=\sqrt{|f^{-1}(y_i)|}, \quad 1 \leq i \leq r,

and all other matrix elements of A are equal to 0. The difference here is that the binary matrix [f] of f always contains n nonzero entries, and the best we can do is arrange these 1‘s in a convenient pattern. However, in the quantum category we can obtain a matrix representation of A = \mathcal{F}(f) which only has r \leq \min(m,n) nonzero entries. This is potentially much sparser. Indeed, the number of nonzero entries of the binary matrix [f] is always

\frac{n}{mn}= \frac{1}{m},

whereas the number of nonzero entries in [A] is

\frac{r}{mn} \leq \frac{\min(m,n)}{mn}.

In the extreme case, for A=\mathcal{F}(f) the functorial image of rank one (i.e. constant) function f \colon X \to Y, the sparsity of [A] is

\frac{1}{mn}.

The above amounts to a combinatorial proof of a special case of an extremely useful canonical form for morphisms in the quantum computational category \mathbf{FHil}.

Theorem 14.1. (Singular Value Decomposition) For any finite-dimensional Hilbert spaces V,W and any linear transformation A \in \mathrm{Hom}(V,W) of rank r, there exist ordered orthonormal bases E \subset V and F \subset W such that

Ae_i = \sigma_i f_i, \quad 1 \leq i \leq r,

where \sigma_1 \geq \dots \geq \sigma_r >0. The remaining vectors e_{r+1},\dots,e_n are an orthonormal basis for the kernel of A.

Last week’s material is effectively a proof of Theorem 14.1 in the special case where A \in \mathrm{Hom}(V,W) happens to be the functorial image of a set function, meaning that there happen to be orthonormal bases X \subset V and Y \subset W such that Ax \in Y for every x \in X.

In order to prove Theorem 14.1 in full generality, we need to use the full structure of objects in \mathbf{FHil}, meaning algebra and geometry as opposed to combinatorics. The image of a set function f \in \mathrm{Hom}(X,Y) is related to the image of its quantization \mathcal{F}(f) \in \mathrm{Hom}(\mathcal{F}(X),\mathcal{F}(Y)) by

\mathrm{Im}(\mathcal{F}(f)) = \mathrm{Span} \mathrm{Im}(f),

and in this sense the image of a linear transformation is a combinatorial subspace of its target space. However, for every A \in \mathrm{Hom}(V,W) there are two associate subspaces of V which are not of a combinatorial nature. The first is the kernel of A,

\mathrm{Ker}(A) = \{v \in V \colon Av=0_W\},

which is algebraic in the sense that it is defined in terms of the additive identity in the target space W of A. The second is the optimizer space of A,

\mathrm{Opt}(A) = \{v \in V \colon \|Av\| = \|A\| \|v\|\},

which is geometric in the sense that it is defined using the metric structure of both the source and target spaces. Here \|A\| is the operator norm of A, so vectors v \in \mathrm{Opt}(A) are those which are maximally stretched by applying A, in the sense that they force equality in the inequality

\|Av\| \leq \|A\|\|v\|

which holds for all vectors in V.

Theorem 14.2. For every A \in \mathrm{Hom}(V,W), the set \mathrm{Opt}(A) is a nonzero subspace of V.

Proof: Fix an arbitrary linear transformation A \in \mathrm{Hom}(V,W), and for brevity write

\sigma_1 = \|A\| \quad\text{and}\quad V_1=\mathrm{Opt}(A).

The zero vector in V belongs to V_1 because

\|A0_V\|=0=\sigma_1\|0_V\|.

The nonempty set V_1 is closed under scalar multiplication: if v \in V_1 then

\|A(\lambda v)\|=|\lambda|\|Av\|=|\lambda|\sigma_1\|v\| = \sigma_1\|\lambda v\|

for any \lambda \in \mathbb{C}.

Now we show that V_1 is closed under vector addition. Let u,v \in V_1, and observe that u+v \in V_1 if and only if u-v \in V_1, since we have already proved V_1 closed under scalar multiplication. Suppose that u,v \in V_1 but u+v,u-v\not\in V_1. We then have

\sigma_1^2\|u+v\|^2+\sigma_1^2\|u-v\|^2=2\sigma_1^2\|u\|^2+2\sigma_1^2\|v\|^2=2\|Au\|^2+2\|Av\|^2,

where the first equality is the Parallelogram Law and the second is the hypothesis u,v \in V_1. Now

2\|Au\|^2+2\|Av\|^2=\|A(u+v)\|^2+\|A(u-v)^2\|<\sigma_1^2\|u+v\|^2+\sigma+1^2\|u-v\|^2,

where the equality is the Parallelogram Law and the inequality is the hypothesis u+v,u-v\not\in V_1. But then we have

\sigma_1^2\|u+v\|^2+\sigma_1^2\|u-v\|^2<\sigma_1^2\|u+v\|^2+\sigma_1^2\|u-v\|^2,

which is absurd.

We have now shown that V_1 is a subspace of V, and it remains to show that V_1\neq \{0_V\}. To do so, it is sufficient to show that V_1 contains a vector of norm one. The unit sphere

S_1 =\{u \in V \colon \|u\|=1\},

is a compact set (in the norm topology on V), and the function u \mapsto \|Au\| is a continuous function on S_1 whose supremum is \sigma_1. By the Extreme Value Theorem, there exists u \in S_1 such that \|Au\|=\sigma_1=\sigma_1\|u\|, whence u \in V_1. \square

The proof above uses a purely geometric argument to prove that V_1 is a subspace of V. In particular, this part of the argument does not require finite-dimensionality. However, to show V_1 nonzero we needed compactness of the unit sphere in V, which is a topological assertion.

Problem 14.1. Prove that the unit sphere S_1 in a finite-dimensional Hilbert space V really is a compact set.

Another feature of the proof of Theorem 14.1 is its use of the Parallelogram Law, which as we have discussed characterizes Banach spaces whose norm is induced by a scalar product. There are less clever ways to prove this result which do not rely on the Parallelogram Law, and this leads one to wonder whether Theorem 14.1 holds more generally, in the larger category \mathbf{FBan} of finite-dimensional Banach spaces.

Problem 14.2. Prove that Theorem 14.2 does not hold in \mathbf{FBan}.

Math 202A: Lecture 13

In this Lecture, building on the exposition in Lecture 12, we will take our first pass at the singular value decomposition of a linear transformation A \colon V \to W whose source and target are finite-dimensional Hilbert spaces.

In Lecture 12, we worked in the classical computational category \mathbf{FSet} whose objects are finite sets and whose morphisms are functions. Given any two finite sets X,Y and any function f\in \mathrm{Hom}(X,Y) we showed that the matrix of f can be brought into a certain canonical form by choosing appropriate orderings of X and Y. Namely, we can choose orderings so that the matrix [f] is block diagonal, with blocks being square matrices with first row consisting entirely of 1’s and all other entries equal to zero. The number of blocks is equal to the cardinality of the image \mathrm{Im}(f), i.e. the “rank” of the set function f. For each y \in \mathrm{Im}(f), the dimension of the corresponding block is the cardinality of the fiber f^{-}(y).

Now let us quantize this situation: let V=\mathcal{F}(X) and W=\mathcal{F}(Y) be the free Hilbert spaces over X and Y, in which X and Y live on as orthonormal bases, and let A =\mathcal{F}(f) \in \mathrm{Hom}(V,W) be the linear transformation defined by

Ax = f(x), \quad x \in X.

This situation contains exactly the same information as the set-theoretic one, where the original function f can be visualized as a bipartite graph on two disjoint sets of vertices X and Y. Now, A = \mathcal{F}(f) is (a version of) the adjacency matrix of this graph. More precisely, the matrix elements of A relative to the bases X \subset V and Y \subset W are

\langle y,Ax \rangle = [x \text{ adjacent to }y], \quad x \in X,\ y \in Y.

The difference now is that we can form superpositions of vertices in V=\mathcal{F}(X) and W=\mathcal{F}(Y) to construct new orthonormal bases which have additional advantages allow us to achieve a very canonical form for A=\mathcal{F}(f) which is not attainable for f, namely a diagonal form. The following problems will guide you through this process, which is a special case of the Singular Value Decomposition.

Problem 13.1. Consider the vectors in V = \mathcal{F}(X) defined by

e_y = \sum\limits_{x \in f^{-1}(y)} x, \quad y \in Y,

and for each of these vectors compute Ae_y.

Problem 13.2. Construct an orthonormal basis X^\prime of V such that the off-diagonal matrix elements of A relative to the bases X^\prime \subset V and Y \subset W are zero, and the diagonal matrix elements are the square roots of the cardinalities of the fibers of the function f \in \mathrm{Hom}(X,Y).

Problem 13.3. Show that the operator norm of A=\mathcal{F}(f)\in \mathrm{Hom}(V,W) is the square root of the cardinality of the largest fiber of f \in \mathrm{Hom}(X,Y).

Math 202A: Lecture 12

In this lecture we begin our treatment of the Singular Value Decomposition, which is arguably the most applicable and applied result in linear algebra. Those of you familiar with this result likely think of it in the following form: for an arbitrary m \times n complex matrix A there exists a factorization of the form A=UBV^*, where U is an m \times m unitary matrix, B is an m \times n matrix whose diagonal elements are nonnegative numbers and whose off diagonal elements are zero, and V is an n \times n unitary matrix. Let’s check we got the dimensions right here: we write down the word (mm)(mn)(nn) and observe that matrix multiplication simplifies it to (mn)(nn) and then to mn, which is indeed the shape of A.

Of course, Math 202A is Linear Algebra Done Weird and our treatment of the SVD will look nothing like the above, though in the end we will reproduce this matrix factorization. The “Weird” in Linear Algebra Done Weird refers in part to the fact that quantum mechanics is often viewed as weird by physicists, but mathematically the transition from classical to quantum is no weirder than the transition from combinatorics to linear algebra. We will use this perspective to shape our understanding of the SVD. This development will be divided over three lectures, each of which will (hopefully) be quite easy to understand. I would like you to come away from this discussion with the sense that the Singular Value Decomposition is not just some arbitrary (albeit useful) matrix factorization which can be done, but rather a geometrically natural result which can yet again be understood as a major benefit of quantization.

In this lecture we remain in the classical computational category \mathbf{FSet}, whose objects are finite sets and whose morphisms are functions. Let X be a set of cardinality n, let Y be a set of cardinality m, and let f \in \mathrm{Hom}(X,Y) be an arbitrary function. The binary encoding of this function consists of the mn numbers

f_{yx} = [f(x)=y], \quad x \in X,\ y \in Y,

where

[P]=\begin{cases} 1, \text{ if }P\text{ true}\\ 0, \text{ if }P\text{ false} \end{cases}

is the Iverson bracket beloved by computer scientists. If we choose an ordering x_1,\dots,x_n of X and y_1,\dots,y_m of the domain X and codomain Y, we get a corresponding arrangement of these bits as an m \times n matrix with entries

[f]_{ij} = f_{y_ix_j}, \quad 1 \leq i \leq m,\ 1 \leq j \leq n.

If m=n the matrix [f] is square, if m>n it is tall and skinny (aka flaco), if m<n it is short and fat (aka gordo).

Now let us consider the question of how to choose orderings of X and Y so that the corresponding matrix [f] assumes a canonical form, a vaguely defined term (sorry Lani) which roughly means “reasonably simple.”

Consider first the case where m=n, so that possibly X and Y are in one-to-one correspondence.

Problem 12.1. Show that one can choose orderings of X and Y such that [f] is the n \times n identity matrix I_n if and only if f \in \mathrm{Hom}(X,Y) is a bijection. Now letting [f] be the matrix of f relative to an arbitrary pair of orderings, show that there exists n \times n permutation matrices P,Q such that I_n = P[f]Q if and only if f is a bijection.

Now consider the case where m>n, so that [f] is tall and skinny and possibly X embeds into Y.

Problem 12.2. Show that one can choose orderings of X and Y such that [f] is the m \times n matrix made by appending an (m-n) \times n matrix of zeros to the bottom of I_n if and only if f is injective. Now letting [f] be the matrix of f with respect to an arbitrary pair of orderings, show that there exists an m \times m permutation matrix P and an n \times n permutation matrix Q such that P[f]Q is in this canonical form if and only if f is injective.

Finally, consider the case where m<n, so that [f] is short and fat and possibly X covers Y.

Problem 12.3. Show that f is surjective if and only if there exist orderings of X and Y such that the m \times m principal submatrix of [f] is I_m. Now letting [f] be the matrix of f with respect to an arbitrary pair of orderings, show that there exists an m \times m permutation matrix P and an n \times n permutation matrix Q such that P[f]Q is in this canonical form if and only if f is surjective.

The matrix form in Problem 12.3 should perhaps not be called canonical because the “leftover” part of [f] outside its principal m \times m submatrix has no particular structure. Here is another option.

Visualize the surjective function f as a bipartite graph \Gamma(X,Y) with edges going between two disjoint sets of vertices X and Y, such that \{x,y\} is an edge if and only if f(x)=y. The connected components of \Gamma are star graphs \Gamma_y, y \in Y, in which the hub vertex y has degree equal to the cardinality |f^{-1}(y)| of the fiber of f over y. The surjectivity condition means that each of these star graphs has at least one edge or, equivalently, two vertices. The type of f is the vector \lambda(F) whose components are the positive numbers

|f^{-1}(y)|, \quad y \in Y,

listed in weakly decreasing order from left to right. Thus, \lambda(f)=(\lambda_1(f),\dots,\lambda_m(f)) is a partition of n with m parts, or equivalently a Young diagram with exactly n cells and m rows.

As an example of the above, take a set X of cardinality 5 and let x_1,x_2,x_3,x_4,x_5 be an ordering of its elements. Let Y be a set of cardinality 2 and let y_1,y_2 be an ordering of its elements. Let f \in \mathrm{Hom}(X,Y) be the surjective function

f^{-1}(y_1)=\{x_1,x_2,x_3\}, \quad f^{-1}(y_2) = \{x_4,x_5\}.

The type of this function is \lambda(f)=(3,2). The matrix of this function with respect to these ordering is

[f]=\begin{bmatrix} 1 & 1 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 & 1 \end{bmatrix}.

As a second example, take a set X of cardinality 6 and let x_1,x_2,x_3,x_4,x_5,x_6 be an ordering of its elements. Let Y be a set of cardinality 2 and let y_1,y_2,y_3 be an ordering of its elements. Let f \in \mathrm{Hom}(X,Y) be the surjective function

f^{-1}(y_1)=\{x_1,x_2,x_3\}, \quad f^{-1}(y_2) = \{x_4,x_5\}, f^{-1}(y_3)=\{x_6\}.

The type of this function is \lambda(f)=(3,2,1). The matrix of this function with respect to these ordering is

[f]=\begin{bmatrix} 1 & 1 & 1 & 0 & 0 & 0\\ 0 & 0 & 0 & 1 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1\end{bmatrix}.

Problem 12.4. Prove that f \in \mathrm{Hom}(X,Y) is surjective if and only if there exist orderings of X and Y such that the corresponding matrix [f] is in reduced row-echelon form and has no zero rows. Now letting [f] be the matrix of f with respect to arbitrary orderings of X and Y, show that f is surjective if and only if there is an m \times m permutation matrix P and an n \times n permutation matrix Q such that P[f]Q assumes the above canonical form.

Problem 12.5. State and prove a generalization of Problem 12.4 for arbitrary f \in \mathrm{Hom}(X,Y), dropping all assumptions on the relative sizes of X and Y and the injectivity/surjectivity of f.

Math 202A: Lectures 10 and 11

In this pair of lectures we consider the familiar topic of subspaces and projections, continuing our program of studying a subject you already know from a (possibly) new point of view. Probably you have heard of the textbook Linear Algebra Done Right, which prompted a reactionary response in the form an antithetical textbook called Linear Algebra Done Wrong. As to which is better, the debate rages on. Math 202A is Linear Algebra Done Weird. I believe that title is unclaimed; let me know if you have a better one to suggest.

As in previous lectures, we will use category-theoretic thinking to guide us towards analogies between the classical computational category \mathbf{FSet} of finite sets and the quantum computational category \mathbf{FHil} of finite-dimensional Hilbert spaces.

In this pair of lectures, we look at subobjects in these categories and compare and contrast the classical (set-theoretic) and quantum (linear-algebraic) aspects of subobjects. In both cases (and indeed much more generally), we can describe subobjects algebraically as equivalence classes of idempotent endomorphisms. In the classical computational category \mathbf{FSet}, there is no canonical way to choose a representative of the equivalence class of idempotents which correspond to a given subobject. However, in the quantum computational category \mathbf{FHil} the geometry of Hilbert space singles out a natural representative in every equivalence class of idempotents characterized by an orthogonality feature.

We start on the classical side, \mathbf{FSet}, where subobjects are just subsets. We have a functor \mathcal{P} \colon \mathbf{FSet} \rightarrow \mathbf{FLat} from finite sets to finite lattices which sends X to its power set,

\mathcal{P}(X) = \{Y \subseteq X\}.

The partial order in \mathcal{P}(X) is containment, meaning that for Y_1,Y_2 \in \mathcal{P}(X) we define

Y_1 \leq Y_2 \iff Y_1 \subseteq Y_2.

What makes \mathcal{P}(X) a lattice (as opposed to just a poset) is the fact that we have also the meet and join operations,

Y_1 \vee Y_2 = Y_1 \cup Y_2 \quad\text{ and }\quad Y_1 \wedge Y_2 = Y_1 \cap Y_2,

which give the smallest set containing both Y_1,Y_2 and the largest set contained in both Y_1,Y_2, respectively.

The subset lattice \mathcal{P}(X) has a couple of additional properties. First, it is graded, with rank function given by subset cardinality. This means that \mathcal{P}(X) decomposes into a disjoint union of antichains

\mathcal{P}_0(X),\mathcal{P}_1(X),\dots,\mathcal{P}_n(X)

in which n=|X| and

\mathcal{P}_k(X) = \{Y \subseteq X \colon |Y|=k\}

has cardinality

|\mathcal{P}_k(X)|= {n \choose k}.

Second, the subset lattice \mathcal{P}(X) has a natural anti-isomorphism (i.e. order-reversing bijection) given by

Y \mapsto Y^c = X \backslash Y.

Taking complements maps \mathcal{P}_k(X) bijectively onto \mathcal{P}_{n-k}(X), which is why

{n \choose k} = {n \choose n-k}.

Furthermore, for any Y \in \mathcal{P}(X) we have

|X|=|Y|+|Y^c|.

Our definition of the power set functor \mathcal{P} is not complete until we say where it sends a given set function f \in \mathrm{Hom}(X_1,X_2). The corresponding \mathcal{P}(f) \colon \mathcal{P}(X_1) \to \mathcal{P}(X_2) is defined by

[\mathcal{P}(f)](Y) = f(Y) = \{f(x) \colon x \in y\}.

As an exercise, you should check that \mathcal{P}(f) really is a morphism in \mathbf{FLat}.

Now let us consider how we can detect subobjects of X algebraically, using functions on X.

Definition 10.1. A function f \in \mathrm{End}(X) is said to be a retraction onto Y \in \mathcal{P}(X) if f(x) \in Y for all x \in X and f(y)=y for all y \in Y.

Notice that Definition 10.1 only applies to nonempty subsets of X. Another remark is that retractions are usually considered as morphisms in the category \mathbf{Top} whose objects are topological spaces and whose morphisms are continuous functions. We can include \mathbf{Set} in \mathbf{Top} by giving every set the discrete topology.

Now let us discuss a question raised by Lani Southern: what does it mean to say that there is no “canonical” retraction of X onto Y. The precise meaning of this is that there is no way to assign to each pair (X,Y) consisting of a finite set X and a subset Y \subseteq X a specific retraction f_{XY} of X onto Y which is equivariant with respect to all automorphisms of X that preserve Y. In more detail, let \mathrm{Aut}(X,Y) denote the subgroup of \mathrm{Aut}(X) consisting of all automorphisms (permutations) \sigma of X satisfying \sigma(Y)=Y. Call a rule (X,Y) \mapsto f_{XY} which assigns to set/subset pairs a corresponding retraction canonical if

\sigma \circ f_{XY} = f_{XY} \circ \sigma \quad\text{for all } \sigma \in \mathrm{Aut}(X,Y).

Proposition 10.2. If |Y|\geq 2 and X \backslash Y is nonempty, no canonical retraction exists.

Proof: Suppose otherwise, and let f_{XY} be a canonical retraction of X onto Y. Pick a point x \in X \backslash Y. Then, for every \sigma \in \mathrm{Aut}(X,Y) we have

f_{XY}(x)=f_{XY}(\sigma x)= \sigma f_{XY}(x).

In particular, this holds for every permutation \sigma which has our chosen point x as a fixed point and permutes the points of Y arbitrarily, so the above forces f_{XY}(x) \in Y to be fixed by every permutation of Y, which in turn forces |Y|=1, a contradiction. \square

In order to get past the above, we would need to break the symmetry of X. For example, this could mean promoting the structureless finite set X to a metric space (X,\mathrm{d}). Then, for any nonempty Y \in \mathcal{P}(X), we can define a retraction of X onto Y by declaring f_{XY}(x) to be the point of Y closest to x. However, even here we need to impose a rule for breaking ties (two distinct minimizers). Once we do this, the retraction f_{XY} is canonical in the sense that it is equivariant with respect to all isometries of (X,\mathrm{d}) which preserve tie breakers.

Definition 10.2. A function f \in \mathrm{End}(X) is called an idempotent if f \circ f = f.

Idempotents allow us to discuss retractions without specifying what they retract onto.

Problem 10.1. Prove that an idempotent f \in \mathrm{End}(X) is a retraction onto its image \mathrm{Im}(f)=\{f(x) \colon x \in X\}. Conversely, show that if f \in \mathrm{End}(X) is a retraction of X onto a subset Y \in \mathcal{P}(X), then f^2=f.

Let \mathrm{Ide}(X) denote the set of idempotent functions in \mathrm{End}(X). Define an equivalence relation on \mathrm{Ide}(X) by

f \sim g \iff \mathrm{Im}(f) = \mathrm{Im}(g).

Then, associated to each Y \in \mathcal{P}(X) is the equivalence class of idempotent functions with image Y,

[f]_Y = \{f \in \mathrm{End}(X) \colon \mathrm{Im}(f)=Y,\ f^2=f\}.

Problem 10.2. Prove that Y \mapsto [f]_Y is a bijection \mathcal{P}(X)\backslash\{\emptyset\} \to \mathrm{Ide}(X)/\sim.

There is another way to encode subsets if we do not insists on self-maps.

Defintion 10.3. The indicator function of Y \in \mathcal{P}(X) is the function \mathbf{1}_Y \in \mathrm{Hom}(X,\{0,1\}) defined by

\mathbf{1}_Y(x) = \begin{cases} 1, \text{ if } x \in Y \\ 0, \text{ if } x \not\in Y \end{cases}.

The function

\mathcal{P}(X) \longrightarrow \mathrm{Hom}(X,\{0,1\})

which sends each subset Y to its indicator function \mathbf{1}_Y is a bijection (binary encoding of subsets).

Now let us transport the above to the quantum computational category by means of our favorite functor

\mathcal{F} \colon \mathbf{FSet} \longrightarrow \mathbf{FHil}.

Recall that the quantization functor \mathcal{F} sends a finite set X to the free Hilbert space \mathcal{F}(X), inside of which X lives on as an orthonormal basis. Conversely, if V is an arbitrary Hilbert space then we can choose an orthonormal basis of V and identity it with \mathcal{F}(X).

On the Hilbert space side, the subobject functor

\mathcal{L} \colon \mathbf{FHil} \longrightarrow \mathbf{Lat}

sends a Hilbert space V to

\mathcal{L}(V) =\{W \subseteq V \colon W \text{ a subspace}\},

with partial order again given by set containment. Thus, \mathcal{L}(V) only includes subsets of V which are themselves objects of \mathbf{FHil}. The meet and join operations in \mathcal{L}(V) are

W_1 \wedge W_2 = W_1 \cap W_2 \quad\text{and}\quad W_1 \vee W_2=\mathrm{Span}(W_1 \cup W_2).

Moreover, if A \in \mathrm{Hom}(V_1,V_2) is a linear transformation the corresponding order homomorphism \mathcal{L}(A) \in \mathrm{Hom}(\mathcal{L}(V_1),\mathcal{L}(V_2)) is defined by

[\mathcal{L}(A)](W) = A(W) = \{Aw \colon w \in W\} = \mathrm{Im}(A).

The subspace lattice \mathcal{L}(V) of the Hilbert space V is graded by dimension: it decomposes into a disjoint union of \dim V antichains

\mathcal{L}(V)_k = \{W \leq V \colon \dim W=k\}, \quad 0 \leq k \leq n,

where n=\dim V. Although each \mathcal{L}(V)_k is uncountably infinite set, it has much more structure than just being a very big set. In fact, these sets are important enough that they have a special name.

Definition 10.4. The set \mathcal{L}(V)_k is called the Grassmannian of k-planes in V.

Note to self: if we’re going to be using ostentatious names, perhaps \mathcal{P}_k(X) should be called the Pascalian of k-sets in X.

Now let V = \mathcal{F}(X) be the free Hilbert space over the finite set X. Then, every subset X \in \mathcal{P}(X) quantizes to \mathcal{F}(X) \in \mathcal{L}(V), giving 2^{|X|} special subspaces of V called coordinate subspaces relative to X. The coordinate subspaces of V = \mathcal{F}(X) form an induced sublattice of \mathcal{L}(V) isomorphic to \mathcal{P}(X). Complementary subsets of X quantize to complementary coordinate subspaces of V.

Problem 10.3. Prove that for any Y \in \mathcal{P}(X) we have \mathcal{F}(Y^c)=\mathcal{F}(Y)^\perp. Moreover, prove that we have the internal direct sum decomposition \mathcal{F}(X) = \mathcal{F}(Y) \oplus \mathcal{F}(Y^c).

Of course, not every subspace W \in \mathcal{L}(V) is a coordinate subspace of V=\mathcal{F}(X). For example, for two distinct points x_1,x_2 \in X the line in V spanned by the vector x_1+x_2 does not contain any point of X.

Problem 10.4. Prove that for any W \in \mathcal{L}(V) we have the internal direct sum decomposition V = W \oplus W^\perp. (Hint: you can deduce this directly from Problem 10.3 by constructing an appropriate unitary operator on V).

Now let us again consider the question of encoding subobjects of V algebraically, this time as linear operators on V rather than arbitrary functions.

Definition 10.5. Given W \in \mathcal{L}(V), a linear retraction of V onto W is an operator P_W \in \mathrm{End}(V) such that \mathrm{Im}(P_W)=W and the restriction of P_W to W is the identity operator I_W.

Problem 10.5. Prove that linear operator P \in \mathrm{End}(V) is a retraction onto its image if and only if it is idempotent, P^2=P.

As in the set-theoretic setting, let \mathrm{Ide}(V) denote the set of idempotent operator on V, and define an equivalence relation on \mathrm{Ide}(V) by declaring two idempotents equivalent if and only if they have the same image.

Problem 10.6. The map W \mapsto [P]_W which sends a subspace W \leq V to the equivalence class [P]_W of idempotent operators with image W is a bijection \mathcal{L}(V) \to \mathrm{Ide}/\sim.

Now we come to a key difference between the classical computational category \mathbf{FSet} and the quantum computational category \mathbf{FHil}. In the classical setting, given a point x \in X and a subset Y \in \mathcal{P}(X), either x \in Y or x \not\in Y. In the quantum setting, given a vector v \in V and a subspace W \in \mathcal{L}(V), we can quantify how much v is or is not in W.

Definition 10.6. For v \in V and W \in \mathcal{L}(V), define

\mathrm{d}(v,W) = \inf\{\|v-w\| \colon w \in W\},

and call this quantity the distance from the vector v to the subspace W.

Note that the distance from v \in V to W \in \mathcal{L}(V) is indeed well-defined, since the set of distances \|v-w\| with w \in W is bounded below by zero. The question of whether or not the minimum is attained is answered in the following.

Theorem 10.7. For any v \in V and any W \in \mathcal{L}(V), there exists a unique w_0 \in W such that \|v-w_0\|=\mathrm{d}(v,W).

Proof: Suppose first that W \in \mathcal{L}(V) is a coordinate subspace of V = \mathcal{F}(X). That is, we have W=\mathcal{F}(Y) for a nonempty Y \in \mathcal{P}(X), and hence W^\perp = \mathcal{F}(Z) where Z=Y^c. For any vector v \in V, consider its expansion in the orthonormal basis X=Y \sqcup Z,

v = \sum\limits_{x \in X} \langle x,v \rangle x = \sum\limits_{y \in Y} \langle y,v \rangle y + \sum\limits_{z \in Z} \langle z,v \rangle z,

and define

w_0 = \sum\limits_{y \in Y} \langle y,v \rangle y.

Then, w_0 \in W. We claim that \|v-w_0\| = \mathrm{d}(v,W) and that w_0 is the unique point of W which achieves this minimum. To see this, we first compute

\|v-w_0\|^2 = \sum\limits_{z \in Z} |\langle z,v\rangle|^2.

On the other hand, for any w \in W we have

\|v-w\|^2 = \sum\limits_{y \in Y} |\langle y,v\rangle-\langle y,w\rangle|^2 + \sum\limits_{z \in Z} |\langle z,v \rangle|^2,

which is strictly larger than \|v-w_0\|^2 unless we have

\langle \langle y,v \rangle = \langle y,w\rangle, \quad\text{for all }y \in Y,

in which case w=w_0.

To finish, observe that the argument above holds even when W is not a coordinate subspace of V, thanks to Problem 10.4. \square

In view of Theorem 10.7, the above definition is legitimate.

Definition 10.8. For any subspace W \in \mathcal{L}(V), we define the orthogonal projection P_W \in \mathrm{End}(V) to be the linear operator characterized by

\|v -P_Wv\|= \mathrm{d}(v,W), \quad v \in V.

It is clear from the definition that P_W is an idempotent with image W, hence it is a retraction of V onto W (Problem 10.5). As for the name “orthogonal projection,” this is justified by the following.

Problem 10.7. Prove that v-P_Wv \in W^\perp, and that \mathrm{Ker}(P_W)=W^\perp.

Hence, we see an important difference between the classical computational category \mathbf{FSet} and the quantum computational category \mathbf{FHil}: for a finite set X and a subset Y \in \mathcal{P}(X) there is no canonical representative of the class [f]_Y of idempotents with image Y, whereas in the quantized setting V = \mathcal{F}(X) we do have a canonical linear retraction onto any subspace W \in \mathcal{L}(V) characterized by orthogonality.

To finish, let us consider the case of a coordinate subspace W=\mathcal{F}(Y) of V=\mathcal{F}(X), i.e. Y \in \mathcal{P}(X). Let P_W \in \mathrm{End}(V) be the corresponding orthogonal projection operator. The matrix elements of P_W in the orthonormal basis X are described as follows: for two distinct points x_1,x_2 \in X we have \langle x_1,P_Wx_2 \rangle = 0, and for any x \in X we have \langle x,P_Wx \rangle = \mathbf{1}_Y(x). That is, the orthogonal projection P_W incorporates the binary encoding of the subset Y \subset X as its diagonal.

Let us note that the following question remains open: if we are given an idempotent operator P \in \mathrm{End}(V) we currently have no algebraic criterion for determining whether or not P is an orthogonal projection. To determine this, we would have to examine the geometry of P, meaning that we would have to compute its image W = \mathrm{Im}(P) and check that \mathrm{Ker}(P) = W^\perp. In fact, there is an algebraic characterization of which idempotents actually are orthogonal projections: it is precisely the selfadjoint ones. I mention this because you probably already know it. If you don’t that’s even better, because we will prove this characterization as a theorem after we establish the singular value decomposition, which we shall use to define the adjoint of a general linear transformation.

Math 202A: Lecture 9

Let V,W \in \mathrm{Ob}(\mathbf{FHil}). In this lecture we continue our proof of the fact that all Frobenius scalar products on \mathrm{Hom}(V,W) coincide:\ we fix orthonormal bases X \subset V and Y \subset W, define

\langle A,B \rangle_{XY} = \sum\limits_{x \in X} \sum\limits_{y \in Y} \overline{\langle y,Ax\rangle}\langle y,Bx \rangle, \quad A,B \in \mathrm{Hom}(V,W),

and aim to show that \langle A,B \rangle_{XY} is in fact independent of the chosen coordinate systems X and Y. We are pursuing a proof of this which is inductive in the dimension of W. In the base case, W= \mathbb{C}y, we showed in Lecture 8 that the above scalar product coincides with the Riesz scalar product on \mathrm{Hom}(V,\mathbb{C}y), which is canonically defined through vector/covector duality.

For the induction step, let V,W be fixed and also fix orthonormal bases X \subset V and Y \subset W. We know already that the corresponding Frobenius scalar product \langle \cdot,\cdot \rangle_{XY} has no dependence on Y. Fix an “anchor” vector y \in Y and let Y'=Y \backslash \{y\}. Then, a basic formula we can write down and keep in mind is

\langle A,B\rangle_{XY} = \sum\limits_{x \in X} \overline{\langle y,Ax\rangle} \langle y,Bx \rangle + \sum\limits_{x \in X} \sum\limits_{y^\prime \in Y^\prime}  \overline{\langle y^\prime,Ax\rangle} \langle y^\prime,Bx \rangle.

Further note that Y^\prime is an orthonormal basis for \mathbb{C}y^\perp, the orthogonal complement of \mathbb{C}y in W. Now, for any vector w \in W we have

w= \langle y,w\rangle y + \sum\limits_{y^\prime \in Y^\prime} \langle y^\prime,w\rangle y^\prime,

where the first term is a vector in \mathbb{C}y and the remaining terms sum to a vector in \mathbb{C}y^\perp. In symbols, we have the internal direct sum decomposition

W = \mathbb{C}y \oplus \mathbb{C}y^\perp.

More generally, if H is a Hilbert space and H_1,\dots,H_r are subspaces which are pairwise orthogonal and whose union spans H, one writes

H = \bigoplus_{i=r}^r H_i.

to denote this situation. If further we have an orthonormal basis Q \subset H which decomposes as a disjoint union

Q= \bigsqcup_{i=1}^r Q_i

with Q_i a (necessarily orthonormal) basis of H_i, then we say that Q is adapted to the decomposition H=H_1 \oplus \dots \oplus H_r. In particular, the basis Y= \{y\} \sqcup Y^\prime is adapted to the orthogonal decomposition W = \mathbb{C}y \oplus \mathbb{C}y^\perp.

We want to establish that the Frobenius scalar \langle \cdot,\cdot \rangle_{XY} on \mathrm{Hom}(V,W)=\mathrm{Hom}(V,\mathbb{C}y \oplus \mathbb{C}^\perp) has no dependence on X. What the induction hypothesis actually allows us to do is the make this statement for two different spaces of linear transformations, namely \mathrm{Hom}(V,\mathbb{C}y) and \mathrm{Hom}(V,\mathbb{C}y^\perp). Namely, by induction the Frobenius scalar product \langle \cdot,\cdot\rangle_{X\{y\}} on \mathrm{Hom}(V,\mathbb{C}y) has no dependence on X, and likewise for the Frobenius scalar product \langle \cdot,\cdot\rangle_{XY^\prime} on \mathrm{Hom}(V,\mathbb{C}y^\perp). We want to glue these two statements together to make the same conclusion about \langle \cdot,\cdot \rangle_{XY} on \mathrm{Hom}(V,\mathbb{C}y \oplus \mathbb{C}^\perp).

Definition 9.1. Let H and H^\prime be Hilbert spaces. Their external direct sum H \oplus H^\prime has underlying set

H \times H^\prime = \{(h,h^\prime): h \in H,\ h^\prime \in H^\prime\},

is equipped with scalar multiplication and vector addition defined by

\alpha(h,h^\prime) = (\alpha h,\alpha h^\prime) \quad\text{and}\quad (h_1,h_1') + (h_2,h_2^\prime)=(h_1+h_2,h_1'+h_2'),

and carries the scalar product

\langle (h_1,h_1^\prime),(h_1,h_2^\prime)\rangle = \langle h_1,h_2\rangle + \langle h_1^\prime,h_2^\prime\rangle.

There are a few things here to be checked, and you should check them (for example, make sure you believe that the putative scalar product in this definition really is one).

A few basic observations are in order. First, we have an internal direct sum decomposition of the external direct sum, namely

H \oplus H^\prime = (H,0_{H^\prime}) \oplus (0_H,H^\prime),

where

(H,0_{H^\prime})=\{(h,0_{H^\prime}) \colon h \in H\}

is a subspace of H \oplus H^\prime canonically isomorphic to H, and

(0_H,H^\prime)=\{(0_H,h^\prime) \colon h^\prime \in H^\prime\}

is a subspace of H \oplus H^\prime canonically isomorphic to H^\prime. Second, let Q \subset H and Q^\prime \subset H^\prime be orthonormal bases.

Problem 9.1. Show that (Q,0_{H^\prime}) and (0_H,Q^\prime) are disjoint sets whose union constitutes an orthonormal basis of H \oplus H^\prime.

Now that we know how to glue two Hilbert spaces together additively, we can consider the Hilbert space

\mathrm{Hom}(V,\mathbb{C}y) \oplus \mathrm{Hom}(V,\mathbb{C}y^\perp),

whose scalar product

\langle \cdot,\cdot\rangle_{X\{y\}} + \langle \cdot,\cdot\rangle_{XY^\prime}

has no dependence on X because its summands have none.

Problem 9.2. Use the above to conclude that the Frobenius scalar product \langle \cdot,\cdot\rangle_{XY} on \mathrm{Hom}(V,\mathbb{C}y \oplus \mathbb{C}y^\perp) has no dependence on X.

Math 202A: Lecture 8

While arguing our case that the category \mathbf{FHil} of finite-dimensional Hilbert spaces is better than the category \mathbf{FSet} of finite sets, we found ourselves obliged to address the question: is \mathbf{FHil} enriched over itself?

We did not answer this question in Lecture 7, but we did show that \mathbf{FHil} is enriched over the category \mathbf{FBan} of finite-dimensional Banach spaces. More precisely, we proved that every linear transformation A \in \mathrm{Hom}(V,W) is a Lipschitz function with respect to the Hilbert norms on V and W, and we defined the operator norm on \mathrm{Hom}(V,W) to be the Lipschitz constant \|A\| of A \in \mathrm{Hom}(V,W).

The fact that \mathrm{Hom}(V,W) is a Banach space under operator norm is important and we will return to this often. However, we posed a problem in Lecture 7 whose solution rules out the possibility that the operator norm on \mathrm{Hom}(V,W) is induced by a scalar product. Thus, we cannot hope to promote \mathrm{Hom}(V,W) from a Banach space to a Hilbert space by means of polarization.

On the other hand, the main tool we used in Lecture 7, namely the existence of an explicit basis \{E_{yx} \colon x \in X,\ y \in Y\} of \mathrm{Hom}(V,W) associated to any given pair of orthonormal bases X \subset V and Y \subset W, can be utilized to define a scalar product \langle \cdot,\cdot\rangle_{XY} on \mathrm{Hom}(V,W) simply by declaring \{E_{yx} \colon x \in X,\ y \in Y\} of \mathrm{Hom}(V,W) to be orthonormal. This is a general fact: any vector space basis can be used to define a scalar product in which it is orthonormal. Translating this into an explicit formula in the present case of interest, we get

\langle A,B \rangle_{XY} = \sum\limits_{x \in X}\sum\limits_{y\in Y} \overline{\langle y,Ax\rangle}\langle y,Bx\rangle.

The corresponding norm is

\|A\|_{XY} = \sqrt{\sum\limits_{x \in X} |\langle y,Ax \rangle|^2}.

So, we have constructed a scalar product on \mathrm{Hom}(V,W).

Definition 8.1. For any two orthonormal bases X \subset V and Y \subset W, the corresponding scalar product \langle \cdot,\cdot\rangle_{XY} is called the Frobenius scalar product on \mathrm{Hom}(V,W).

According to Definition 8.1, there are many Frobenius scalar products on \mathrm{Hom}(V,W) — one for every choice of Cartesian coordinate systems in the source and target spaces. Our objective is to show that all of these are in fact the same sesquilinear form on \mathrm{Hom}(V,W). This means that we may speak of the Frobenius scalar product on \mathrm{Hom}(V,W), and that this scalar product is intrinsically geometric.

It is in fact very easy to see that \langle \cdot,\cdot \rangle_{XY} has no dependence on Y.

Proposition 8.2. For any A,B \in \mathrm{Hom}(V,W) we have

\langle A,B \rangle = \sum\limits_{x \in X} \langle Ax,Bx\rangle.

Proof: We calculate

\sum\limits_{x \in X} \langle Ax,Bx\rangle = \sum\limits_{x \in X} \left\langle \sum\limits_{y \in Y} \langle y,Ax\rangle y,\sum\limits_{y \in Y} \langle y,Bx\rangle y\right\rangle=\sum\limits_{x \in X}\sum\limits_{y\in Y} \overline{\langle y,Ax\rangle}\langle y,Bx\rangle,

and this is \langle A,B\rangle_{XY}. \square

Showing that the Frobenius scalar product \langle \cdot,\cdot \rangle_{XY} on \mathrm{Hom}(V,W) has no dependence on the Cartesian coordinate system X \subset V is more challenging. To do so, we will use an argument which is inductive in the dimension of W. The base case, where \dim W=1, is essentially the finite-dimensional Riesz representation theorem. The induction step motivates the introduction of orthogonal decompositions and direct sums of Hilbert spaces.

For the rest of this Lecture, let V and W be a pair of Hilbert spaces with fixed orthonormal bases X \subset V and Y \subset W, and denote by \langle \cdot,\cdot \rangle_{XY} the corresponding Frobenius scalar product on \mathrm{Hom}(V,W).

For the rest of this lecture we focus on the case where W is one-dimensional. This means that our specified orthonormal basis of W is a singleton set Y=\{y\} containing a unit vector y \in W. Thus, W=\mathbb{C}y = \{\alpha y \colon y \in Y\} and \mathrm{Hom}(V,W) = \mathrm{Hom}(V,\mathbb{C}y). Because the target space W = \mathbb{C}y is one-dimensional, there are only |X|=\dim V elementary transformations in \mathrm{Hom}(V,\mathbb{C}y), namely

E_{yx}(\cdot) = y \langle x,\cdot \rangle, \quad x \in X.

Let us generalize the elementary transformations to a broader class of linear transformations in \mathrm{Hom}(V,\mathbb{C}y) defined as follows.

Definition 8.2. For every vector v \in V, the corresponding covector L_{yv} \in \mathrm{Hom}(V,\mathbb{C}y) is the linear transformation defined by

L_{yv}(\cdot) = y \langle v,\cdot \rangle.

It is useful to view the assignment v \mapsto L_{yv} as defining a function

\Phi_y \colon V \longrightarrow \mathrm{Hom}(V,\mathbb{C}y).

This function is called the Riesz mapping, and it injects V into \mathrm{Hom}(V,\mathbb{C}y). That the Riesz mapping is injective follows from the fact that \langle v_1-v_2,v_1-v_2\rangle=0 forces v_1=v_2. The Riesz mapping is not linear but antilinear,

L_{y(\alpha_1v_1+\alpha_2v_2)} = \overline{\alpha}_1L_{yv_1} + \overline{\alpha}_2L_{v_2}.

Like linearity, antilinearity implies that the image of V in \mathrm{Hom}(V,\mathbb{C}y) is a vector subspace – the space of covectors.

We claim that that the space of covectors is all of \mathrm{Hom}(V,\mathbb{C}y), i.e. that the Riesz mapping is surjective. Indeed, let A \in \mathrm{Hom}(V,\mathbb{C}y) be an arbitrary transformation. Then, for each x \in X, we have Ax = \alpha_x y for some \alpha_x \in \mathbb{C}. Thus, for any \tilde{v} \in V, we have

A \tilde{v} = \sum\limits_{x \in X} \langle x,\tilde{v}\rangle \alpha_x y.

In order to show that A \in \mathrm{Hom}(V,\mathbb{C}y) is in fact a covector, consider the vector v \in V defined by

v = \sum\limits_{x \in X} \overline{\alpha_x} x.

Then, we have

L_{yv}(\tilde{v}) = y \langle v,\tilde{v}\rangle = y \sum\limits_{x \in X} \alpha_x \langle x,\tilde{v}\rangle = A\tilde{v}.

This completes the proof of the finite-dimensional Riesz representation theorem.

Theorem 8.3. The Riesz mapping v \mapsto L_{yv} is an antilinear bijection V \to \mathrm{Hom}(V,\mathbb{C}y).

Note that the injectivity of the Riesz mapping was proved without any reference to the finite-dimensionality of V. However, our surjectivity argument did make use of the finite-dimensionality of V. Let us also remark that, typically, Definition 8.2 and Theorem 8.3. are stated as theorems about the relationship between V and its dual space V^*=\mathrm{Hom}(V,\mathbb{C}). Note that \mathbb{C}=\mathbb{C}y where y =e^{i\theta} may be any complex number of modulus one: the orthonormal bases of \mathbb{C} are exactly singleton sets containing a point on the unit circle. The standard definition of covectors is made assuming that we take y=1 as the basis of \mathbb{C}.

According to Theorem 8.3, the vector space \mathrm{Hom}(V,\mathbb{C}y) is precisely the space of covectors L_{yv}, v \in V. Thus, we may define a scalar product on \mathrm{Hom}(V,\mathbb{C}y) by

\langle L_{yv_1},L_{yv_2} \rangle := \langle v_1,v_2\rangle, \quad v_1,v_2 \in V.

Let us call this the Riesz scalar product on \mathrm{Hom}(V,\mathbb{C}y). It is defined solely in terms of the scalar product in V, and therefore is canonical and geometric, independent of a choice of basis in V. On the other hand, since L_{yx} = E_{yx}, the Riesz scalar product \langle \cdot,\cdot\rangle on \mathrm{Hom}(V,\mathbb{C}y) coincides with the Frobenius scalar product \langle \cdot,\cdot \rangle_{XY} on \mathrm{Hom}(V,\mathbb{C}y). This proves that the Frobenius scalar product is independent of coordinates in the case of a one-dimensional target space.

Math 202A: Lecture 7

I will start this lecture with a brief review, both because we have some new students joining us and because I am generally a proponent of the spiral method of learning.

In the beginning there was the category \mathbf{FSet} whose objects are finite sets X and whose morphisms are functions f \colon X \to Y. Initially I called this the combinatorial category, but it could just as well be called the computational category because everything about it can be described using just two digits, 0 and 1. Indeed, for any finite set X there exists a unique n \in \mathbb{N} such that an isomorphism r \colon \{1,\dots,n\}. Any such isomorphism is called an ordering of X, and typically one writes r(1)=x_1,\dots,r(n)=x_n. Once an ordering is fixed, we can encode x_i as the column vector

[x_j]_i= \begin{cases} 1, \text{ if }j=i \\ 0, \text{ if } j \neq i \end{cases}.

If X and Y are two finite sets with orderings x_1,\dots,x_n and y_1,\dots,y_n then a function f \in \mathrm{Hom}(X,Y) is encoded as the m \times n logical matrix

[f]_{ij} = \begin{cases} 1, \text{ if }f(x_j)=y_i \\ 0,\text{ if } f(x_j) \neq y_i \end{cases}.

Note that the columns of this one-hot encoding of f are nothing but the bit strings [f(x_j)]_i. For any two functions f,g such that g \circ f is defined, it can be computed as the matrix product [g][f]. So our conclusion early on was that everything about \mathbf{FSet} can be expressed in terms of matrix algebra, and only binary matrices are needed.

We then began to argue a case which we will continue to argue throughout the quarter. We claim that it is advantageous to move from the classical computational category \mathbf{FSet} to the quantum computational category \mathbf{FHil} whose objects are finite-dimensional Hilbert spaces and whose morphism are linear transformations. Here all computations can again be reduced to matrix algebra but the matrices used are no longer constrained to be built from two bits; rather they can involve arbitrary complex numbers. The passage from classical computation to quantum computation is one way to motivate the study of linear algebra. A more traditional way is to start with the desire to solve systems of linear equations, which I assume you have already been exposed to.

More formally, we introduced the quantization functor

\mathcal{F} \colon \mathbf{FSet} \longrightarrow \mathbf{FHil}

which sends each finite set X to the Hilbert space \mathcal{F}(X) containing X as an orthonormal basis. This is called the free Hilbert space on X, and it may be constructed concretely as the space of complex-valued functions on X equipped with the \ell^2-scalar product. The functor \mathcal{F} also sends each function f \in \mathrm{Hom}(X,Y) to the linear transformation \mathcal{F}(f) \in \mathrm{Hom}(\mathcal{F}(X),\mathcal{F}(Y)) defined by

\mathcal{F}(f)x = f(x), \quad x \in X.

The quantization functor is faithful but not full. Faithful means that X and Y are isomorphic objects in \mathbf{FSet} if and only if \mathcal{F}(X) and \mathcal{F}(Y) are isomorphic objection in \mathcal{FHil}. Not full means that there are many more morphisms in \mathrm{Hom}(\mathcal{F}(X),\mathcal{F}(Y)) then there are in \mathrm{Hom}(X,Y), and this is a good thing. For example, in order to show that X and Y are isomorphic sets we can try to show that \mathcal{F}(X) and \mathcal{F}(Y) are isomorphic Hilbert spaces, and there are many more ways in which this can happen. I even have a friend who claims that there are no bijections, only invertible linear transformations (he’s kidding but also kind of serious). The upshot is that we want to argue that \mathbf{FHil} is better than \mathbf{FSet} and we will use the category-theoretic perspective to guid us in understanding what “better” should mean.

Definition 7.1. A category \mathbf{Cat} is said to be enriched over another category \mathbf{Dat} if for every X,Y \in \mathrm{Ob}(\mathbf{Cat}) we have \mathrm{Hom}(X,Y) \in \mathrm{Ob}(\mathbf{Dat}).

For example, the category \mathbf{FSet} is enriched over itself. The question we want to answer: is \mathbf{FHil} enriched over itself?

Let us start with some obvious statements and work towards answering this question. First of all, \mathbf{FHil} is enriched over \mathbf{Set}, which is the same thing as saying that \mathbf{FHil} is a locally small category. But we can easily do better than this.

Proposition 7.2. The category \mathbf{FHil} is enriched over the category \mathbf{Vec} whose objects are complex vector spaces and whose morphisms are linear maps.

Proving this means showing that for any finite-dimensional Hilbert spaces V,W the corresponding set \mathrm{Hom}(V,W) can be made into a complex vector space, and equipping linear maps with addition and scalar multiplication defined point wise we get the desired vector space structure (this should be familiar, but you should still think about it).

Now let us see if we can do better than this and show that \mathbf{FHil} is enriched over \mathbf{FVec}, the full subcategory of \mathbf{Vec} whose objects are finite-dimensional complex vector spaces.

Definition 7.3. Given V,W \in \mathrm{Ob}(\mathbf{FHil}), let X \subset V and Y \subset W be orthonormal bases. We define the corresponding elementary operators \{E_{yx} \colon (x,y) \in X\times Y\} \subset \mathrm{Hom}(V,W) by

E_{yx}\tilde{x} = y \langle x,\tilde{x}\rangle, \quad \tilde{x} \in X.

To get a feel for the definition, let us compute the matrix elements of E_{yx} relative to the orthonormal bases X and Y: we have

\langle \tilde{y},E_{yx}\tilde{x}\rangle = \langle \tilde{y},y\langle x,\tilde{x}\rangle\rangle = \langle \tilde{y},y\rangle \langle x,\tilde{x}\rangle, \quad \tilde{x} \in X,\ \tilde{y} \in Y.

Thus, E_{yx} has exactly one matrix element equal to one, namely \langle y,E_{yx} x\rangle, and all other matrix elements are equal to zero.

Problem 7.1. Prove that \{E_{yx} \colon (x,y) \in X\times Y\} is a basis of \mathrm{Hom}(V,W), and show that the expansion of any A \in \mathrm{Hom}(V,W) is

A = \sum\limits_{x \in X} \sum\limits_{y \in Y} \langle y,Ax \rangle E_{yx}.

This expansion is one way to write down the information contained in the matrix of A without actually writing down the matrix. An immediate consequence of Problem 7.1 is that \mathbf{FHil} is enriched over \mathbf{FVec}. Another useful consequence of having an explicit basis for this space is the following.

Theorem 7.4. Every A \in \mathrm{Hom}(V,W) is a continuous function, with respect to the Hilbert space norms on V and W.

Proof: It suffices to prove that the elementary transformations are continuous, since then every transformation is a sum of continuous functions and hence continuous. In fact, the elementary transformations are not just continuous, they are contractive mappings. To see this, we compute

\|E_{yx}v\| = \left\| \sum\limits_{\tilde{x} \in X} \langle \tilde{x},v\rangle E_{yx} \tilde{x} \right\| = \left\| \sum\limits_{\tilde{x} \in X} \langle \tilde{x},v\rangle y \langle x, \tilde{x} \rangle\right\|=|\langle x,v\rangle |\|y\| \leq \|v\|,

where the final inequality is Cauchy-Schwarz together with the fact that x \in X and y \in Y are unit vectors. \square

We still have not discovered a scalar product on the finite-dimensional vector space \mathrm{Hom}(V,W), so it is not yet a Hilbert space. Instead of trying to find a scalar product immediately, let us first shoot for an intermediate goal: finding a norm.

Definition 7.5. A Banach space is a complex vector space equipped with a norm.

This definition has the same irregularity as our definition of Hilbert space, namely that we do not require completeness. This is because we are really only interested in finite-dimensional Banach spaces, which are always complete.

Problem 7.2. Let V,W \in \mathrm{Ob}(\mathbf{FHil}) and let X\subset V and Y \subset W be orthonormal bases. Define a function

\langle \cdot,\cdot\rangle_{XY} \colon \mathrm{Hom}(V,W) \longrightarrow [0,\infty)

by

\|A\|_{XY} = \sum\limits_{x \in X}\sum\limits_{y \in Y} |\langle y,Ax \rangle|.

Show that \|\cdot\|_{XY} is a norm on \mathrm{Hom}(V,W), but that it is not induced by a scalar product.

An immediate consequence of Problem 7.2 is that \mathbf{FHil} is enriched over the category \mathbf{FBan} whose objects are finite-dimensional Banach spaces and whose morphisms are linear transformations. However, the norm \|\cdot\|_{XY} is not a very good norm in that it depends on choosing orthonormal bases X \subset V and Y \subset W. In this sense it is not tethered to the intrinsic geometry of a linear transformation, but rather to the particular numerology of a specific matrix representation.

A better norm on \mathrm{Hom}(V,W) can be obtained by observing that linear transformations are not just continuous functions, they are Lipschitz functions.

Theorem 7.5. Every A \in \mathrm{Hom}(V,W) is Lipschitz.

Proof: Let X \subset V and Y \subset W be orthonormal bases. Using the solution to Problem 7.1 as well as the triangle inequality and Cauchy-Schwarz inequality for the Hilbert space norm in W, we have

\|Av\| = \left\| \sum\limits_{x \in X} \sum\limits_{y \in Y}\langle y,Av\rangle E_{yx}v \right\| = \left\| \sum\limits_{x \in X} \sum\limits_{y \in Y}\langle y,Av\rangle y \langle x,v \rangle \right\| \leq \|A\|_{XY}\|v\|,

which shows that A is Lipschitz. \square

Definition 7.6. The operator norm of A \in \mathrm{Hom}(V,W) is defined to be its Lipschitz constant,

\|A\|:=\inf\{C \geq 0 \colon \|Av\| \leq C\|v\| \text{ for all }v \in V\}.

Problem 7.3. Prove that the operator norm on \mathrm{Hom}(V,W) really is a norm, but that it is not induced by a scalar product.