Math 202A: Lecture 20

***Happy Thanksgiving – all problems here due 12/03, 23:59.***

In Lecture 19, we interpreted the SVD of a morphism $A \in \mathrm{Hom}(V,W)$ in terms of the Hilbert space geometry of $\mathrm{Hom}(V,W)$ equipped with the Frobenius scalar product. Let us start “downstairs,” with the orthogonal decompositions

$V=\underbrace{V_1 \oplus \dots \oplus V_r}_{\mathrm{Ker}(A)^\perp} \oplus \mathrm{Ker}(A) \quad\text{and} \quad W=\underbrace{W_1 \oplus \dots \oplus W_r}_{\mathrm{Im}(A)} \oplus \mathrm{Im}(A)^\perp$

The left singular lines $V_1,\dots,V_r$ are orthogonal one-dimensional subspaces of $V,$ and the right singular lines $W_1,\dots,W_r$ are orthogonal one-dimensional subspaces in $W.$ The restriction of $A$ to $V_i$ for $1 \leq i \leq r$ has the form $\sigma_iU_i,$ with $U_i \in \mathrm{Hom}(V_i,W_i)$ a line-to-line isometry and $\sigma_1 \geq \dots \geq \sigma_r >0.$ Corresponding to this, we have an “upstairs” decomposition

$A=\sigma_1E_1 + \dots + \sigma_rE_r,$

happening in the Hilbert space $\mathrm{Hom}(V,W)$ , where $E_i = U_iP_i$ and $P_i \in \mathrm{End}(V)$ is the orthogonal projection of $V$ onto the line $V_i.$ In Lecture 19, we proved that $\{E_1,\dots,E_r\} \subset \mathrm{Hom}(V,W)$ is an orthonormal set wrt the Frobenius scalar product, hence it can be extended to an orthonormal basis $\{E_1,\dots,E_r,\dots,E_{mn}\}$ of the Hilbert space $\mathrm{Hom}(V,W)$ such that

$[A]_{\{E_i\}}=\begin{bmatrix} \sigma_1 \\ \vdots \\ \sigma_r \\ 0 \\ \vdots \\ 0 \end{bmatrix}.$

Thus, for any $1 \leq q \leq r,$ the vector $A_q \in \mathrm{Hom}(V,W)$ defined by

$A_q = \sigma_1E_1+\dots+\sigma_qE_q$

has coordinates

$[A_q]_{\{E_i\}}=\begin{bmatrix} \sigma_1 \\ \vdots \\ \sigma_q \\ 0 \\ \vdots \\ 0 \end{bmatrix},$

and is simply the orthogonal projection of the vector $A \in \mathrm{Hom}(V,W)$ onto the coordinate space $\mathrm{Span}\{E_1,\dots,E_q\}.$ Remembering that $A,A_q \in \mathrm{Hom}(V,W)$ are morphism and not just inert vectors, we ask: what is the significance of $A_q$ as a linear transformation from $V$ into $W$ ?

Theorem (Eckart-Young-Mirsky): Among all rank $q$ morphisms in $\mathrm{Hom}(V,W),$ the transformation $A_q$ is an optimal approximation to $A.$ More precisely, if $B \in \mathrm{Hom}(V,W)$ is any morphism of rank $q$ , then $\|A-B\| \geq \|A-A_q\|$ in Frobenius norm.

At a recent conference, my colleague Jorge Garza-Vargas emphasized to me that this result is crucial in applications of linear algebra to data compression, and that I would be doing you the student a grave injustice if I skipped it. So we will now prove the EYM theorem via a geometric argument which hopefully gives some understanding of why this result matters in data compression.

First of all, the upstairs SVD of $A_q \in \mathrm{Hom}(V,W)$ is by construction

$A_q = \sigma_1E_1 + \dots + \sigma_qE_q,$

so it is clear that the rank of $A_q$ is indeed $q.$ Our task is to demonstrate that if $B \in \mathrm{Hom}(V,W)$ is an arbitrary transformation of rank $q,$ then

$\|A-A_q\| \leq \|A-B\|.$

We will decompose this into a two-part optimization problem by first optimizing over all approximants with a fixed image, and then optimizing over the image. Thus, let $U \in \mathcal{L}_q(W)$ be a point in the Grassmannian of $q$ -dimensional subspaces of $W.$

Problem 20.1. Show that the optimization problem

$\text{minimize } \|A-B\| \text{ subject to } \mathrm{Im}(B)=U$

is solved by $B=P_UA,$ where $P_U \in \mathrm{End}(W)$ is the orthogonal projection of $W$ onto the subspace $U.$

The morphism $P_UA \in \mathrm{Hom}(V,W)$ is called the compression of $A$ into $U,$ and by construction it has rank $\dim U=q.$ It remains now to optimize the functional

$S_A(U) = \|A-P_UA\|^2$

over $U \in \mathcal{L}_q(W).$ To do this we will express $S_A(U)$ in terms of the singular data of $A$ . Let $L \subset V$ and $R \subset W$ be ordered orthonormal bases adapted to the downstairs decompositions

$V=V_1 \oplus \dots \oplus V_r \oplus \mathrm{Ker}(A)\quad\text{and}\quad W=W_1\oplus \dots \oplus W_r \oplus \mathrm{Im}(A)^\perp.$

This means that $L=\{v_1,\dots,v_r,v_{r+1},\dots,v_n\}$ and $R=\{w_1,\dots,w_r,w_{r+1},\dots,w_m\}$ are orthonormal bases of $V$ and $W$ such that

$Av_i = \sigma_i w_i$

for $1 \leq i \leq r,$ which is summarized by saying that $v_1,\dots,v_r$ are left singular vectors for $A$ and $w_1,\dots,w_r$ are right singular vectors for $A$ . The remaining vectors $v_{r+1},\dots,v_n$ in $L$ form an orthonormal basis of the kernel of $A$ in $V$ , while the remaining vectors $w_{r+1},\dots,w_m$ in $R$ form an orthonormal basis for the complement of the image of $A$ in $W.$ Now, to any subspace $U \in \mathcal{L}(W)$ we associate the compression numbers $c_1(U),\dots,c_r(U)$ defined as the norms

$c_i(U) = \|P_Uw_i\|$

of the vectors obtained by projecting each right singular vector $w_1,\dots,w_r$ of $A$ onto $U.$ Note that $c_i(U)$ is in fact independent of which unit vector $w_i$ we sample from the line $W_i,$ since any other unit sample from $W_i$ will be a scalar multiple of $w_i$ by a complex number of modulus one. Note that $0 \leq c_i(U) \leq 1$ with nonnegativity due to the fact that these numbers are norms, and the upper bound due to the fact that they are norms of projections of unit vectors. In fact, there is one more constraint on the compression numbers of a $q$ -dimensional subspace.

Problem 20.2. Show that for any $U \in \mathcal{L}(W)$ we have

$\sum_{i=1}^r c_i(U)^2 \leq \dim U.$

Now we are ready to quantify compression error.

Problem 20.3. Show that

$S_A(U)=\|(I-P_U)A\|^2 = \sum\limits_{i=1}^r \sigma_i^2(1-c_i(U)^2).$

Let us write the error formula above as

$S_A(U) = \sum\limits_{i=1}^r \sigma_i^2 - \sum\limits_{i=1}^r \sigma_i^2c_i(U)^2=\|A\|_F^2 -\sum\limits_{i=1}^r \sigma_i^2c_i(U)^2.$

Here we are seeing that the squared error $S_A(U)$ in approximating $A$ by the compression $P_UA$ is at most the Frobenius norm of $A,$ which is the error we incur if we compress $A$ to zero.

Our task now is to maximize the part of the sum which reduces this worst-case error. This is a very simple optimization problem: maximize the quadratic form

$S(c_1,\dots,c_r) = \sum\limits_{i=1}^r \sigma_i^2 c_i^2$

subject to the constraints

$0 \leq c_i \leq 1 \quad\text{and}\quad \sum\limits_{i=1}^r c_i^2 \leq q.$

Remembering that the coefficients of $S$ satisfy

$\sigma_1 \geq \dots \leq \sigma_r>0,$

our optimal strategy is to put as much weight as possible on the $q$ largest coefficients by taking

$(c_1,\dots,c_r) = (\underbrace{1,\dots,1}_1,\underbrace{0,\dots,0}_{r-q}).$

Problem 20.4. Finish the proof of the EYM theorem by showing that the compression numbers of $U=W_1 \oplus \dots \oplus W_r$ realize the above maximizer, and that the corresponding compression is $P_U=\sigma_1E_1+\dots +\sigma_qE_q.$

Math 202A: Lecture 20

Like this:

Published by Jonathan Novak

Leave a ReplyCancel reply

Share this:

Like this:

Published by Jonathan Novak

Leave a ReplyCancel reply

Discover more from Jonathan Novak