Math 202A: Lecture 20

***Happy Thanksgiving – all problems here due 12/03, 23:59.***

In Lecture 19, we interpreted the SVD of a morphism A \in \mathrm{Hom}(V,W) in terms of the Hilbert space geometry of \mathrm{Hom}(V,W) equipped with the Frobenius scalar product. Let us start “downstairs,” with the orthogonal decompositions

V=\underbrace{V_1 \oplus \dots \oplus V_r}_{\mathrm{Ker}(A)^\perp} \oplus \mathrm{Ker}(A) \quad\text{and} \quad W=\underbrace{W_1 \oplus \dots \oplus W_r}_{\mathrm{Im}(A)} \oplus \mathrm{Im}(A)^\perp

The left singular lines V_1,\dots,V_r are orthogonal one-dimensional subspaces of V, and the right singular lines W_1,\dots,W_r are orthogonal one-dimensional subspaces in W. The restriction of A to V_i for 1 \leq i \leq r has the form \sigma_iU_i, with U_i \in \mathrm{Hom}(V_i,W_i) a line-to-line isometry and \sigma_1 \geq \dots \geq \sigma_r >0. Corresponding to this, we have an “upstairs” decomposition

A=\sigma_1E_1 + \dots + \sigma_rE_r,

happening in the Hilbert space \mathrm{Hom}(V,W), where E_i = U_iP_i and P_i \in \mathrm{End}(V) is the orthogonal projection of V onto the line V_i. In Lecture 19, we proved that \{E_1,\dots,E_r\} \subset \mathrm{Hom}(V,W) is an orthonormal set wrt the Frobenius scalar product, hence it can be extended to an orthonormal basis \{E_1,\dots,E_r,\dots,E_{mn}\} of the Hilbert space \mathrm{Hom}(V,W) such that

[A]_{\{E_i\}}=\begin{bmatrix} \sigma_1 \\ \vdots \\ \sigma_r \\ 0 \\ \vdots \\ 0 \end{bmatrix}.

Thus, for any 1 \leq q \leq r, the vector A_q \in \mathrm{Hom}(V,W) defined by

A_q = \sigma_1E_1+\dots+\sigma_qE_q

has coordinates

[A_q]_{\{E_i\}}=\begin{bmatrix} \sigma_1 \\ \vdots \\ \sigma_q \\ 0 \\ \vdots \\ 0 \end{bmatrix},

and is simply the orthogonal projection of the vector A \in \mathrm{Hom}(V,W) onto the coordinate space \mathrm{Span}\{E_1,\dots,E_q\}. Remembering that A,A_q \in \mathrm{Hom}(V,W) are morphism and not just inert vectors, we ask: what is the significance of A_q as a linear transformation from V into W?

Theorem (Eckart-Young-Mirsky): Among all rank q morphisms in \mathrm{Hom}(V,W), the transformation A_q is an optimal approximation to A. More precisely, if B \in \mathrm{Hom}(V,W) is any morphism of rank q, then \|A-B\| \geq \|A-A_q\| in Frobenius norm.

At a recent conference, my colleague Jorge Garza-Vargas emphasized to me that this result is crucial in applications of linear algebra to data compression, and that I would be doing you the student a grave injustice if I skipped it. So we will now prove the EYM theorem via a geometric argument which hopefully gives some understanding of why this result matters in data compression.

First of all, the upstairs SVD of A_q \in \mathrm{Hom}(V,W) is by construction

A_q = \sigma_1E_1 + \dots + \sigma_qE_q,

so it is clear that the rank of A_q is indeed q. Our task is to demonstrate that if B \in \mathrm{Hom}(V,W) is an arbitrary transformation of rank q, then

\|A-A_q\| \leq \|A-B\|.

We will decompose this into a two-part optimization problem by first optimizing over all approximants with a fixed image, and then optimizing over the image. Thus, let U \in \mathcal{L}_q(W) be a point in the Grassmannian of q-dimensional subspaces of W.

Problem 20.1. Show that the optimization problem

\text{minimize } \|A-B\| \text{ subject to } \mathrm{Im}(B)=U

is solved by B=P_UA, where P_U \in \mathrm{End}(W) is the orthogonal projection of W onto the subspace U.

The morphism P_UA \in \mathrm{Hom}(V,W) is called the compression of A into U, and by construction it has rank \dim U=q. It remains now to optimize the functional

S_A(U) = \|A-P_UA\|^2

over U \in \mathcal{L}_q(W). To do this we will express S_A(U) in terms of the singular data of A. Let L \subset V and R \subset W be ordered orthonormal bases adapted to the downstairs decompositions

V=V_1 \oplus \dots \oplus V_r \oplus \mathrm{Ker}(A)\quad\text{and}\quad W=W_1\oplus \dots \oplus W_r \oplus \mathrm{Im}(A)^\perp.

This means that L=\{v_1,\dots,v_r,v_{r+1},\dots,v_n\} and R=\{w_1,\dots,w_r,w_{r+1},\dots,w_m\} are orthonormal bases of V and W such that

Av_i = \sigma_i w_i

for 1 \leq i \leq r, which is summarized by saying that v_1,\dots,v_r are left singular vectors for A and w_1,\dots,w_r are right singular vectors for A. The remaining vectors v_{r+1},\dots,v_n in L form an orthonormal basis of the kernel of A in V, while the remaining vectors w_{r+1},\dots,w_m in R form an orthonormal basis for the complement of the image of A in W. Now, to any subspace U \in \mathcal{L}(W) we associate the compression numbers c_1(U),\dots,c_r(U) defined as the norms

c_i(U) = \|P_Uw_i\|

of the vectors obtained by projecting each right singular vector w_1,\dots,w_r of A onto U. Note that c_i(U) is in fact independent of which unit vector w_i we sample from the line W_i, since any other unit sample from W_i will be a scalar multiple of w_i by a complex number of modulus one. Note that 0 \leq c_i(U) \leq 1 with nonnegativity due to the fact that these numbers are norms, and the upper bound due to the fact that they are norms of projections of unit vectors. In fact, there is one more constraint on the compression numbers of a q-dimensional subspace.

Problem 20.2. Show that for any U \in \mathcal{L}(W) we have

\sum_{i=1}^r c_i(U)^2 \leq \dim U.

Now we are ready to quantify compression error.

Problem 20.3. Show that

S_A(U)=\|(I-P_U)A\|^2 = \sum\limits_{i=1}^r \sigma_i^2(1-c_i(U)^2).

Let us write the error formula above as

S_A(U) = \sum\limits_{i=1}^r \sigma_i^2 - \sum\limits_{i=1}^r \sigma_i^2c_i(U)^2=\|A\|_F^2 -\sum\limits_{i=1}^r \sigma_i^2c_i(U)^2.

Here we are seeing that the squared error S_A(U) in approximating A by the compression P_UA is at most the Frobenius norm of A, which is the error we incur if we compress A to zero.

Our task now is to maximize the part of the sum which reduces this worst-case error. This is a very simple optimization problem: maximize the quadratic form

S(c_1,\dots,c_r) = \sum\limits_{i=1}^r \sigma_i^2 c_i^2

subject to the constraints

0 \leq c_i \leq 1 \quad\text{and}\quad \sum\limits_{i=1}^r c_i^2 \leq q.

Remembering that the coefficients of S satisfy

\sigma_1 \geq \dots \leq \sigma_r>0,

our optimal strategy is to put as much weight as possible on the q largest coefficients by taking

(c_1,\dots,c_r) = (\underbrace{1,\dots,1}_1,\underbrace{0,\dots,0}_{r-q}).

Problem 20.4. Finish the proof of the EYM theorem by showing that the compression numbers of U=W_1 \oplus \dots \oplus W_r realize the above maximizer, and that the corresponding compression is P_U=\sigma_1E_1+\dots +\sigma_qE_q.

Leave a Reply