Math 31AH: Lecture 7

In Lecture 5, we considered the question of how a vector $\mathbf{v}$ in a Euclidean space $(\mathbf{V},\langle \cdot,\cdot \rangle)$ can be represented as a linear combination of the vectors in an orthonormal basis $E = \{\mathbf{e}_1,\dots,\mathbf{e}_n\}$ of $\mathbf{V}.$ We worked out the answer to this question: the coordinates of $\mathbf{v}$ are given by taking the scalar product with each vector in the the orthonormal basis:

$\mathbf{v} = \langle \mathbf{e}_1,\mathbf{v} \rangle \mathbf{e}_1 + \dots + \langle \mathbf{e}_n,\mathbf{v} \rangle \mathbf{e}_n.$

Equivalently, using our algebraic definition of the angle between two vectors in a Euclidean space, this can be written as

$\mathbf{v} = \|\mathbf{v}\| \cos \theta_1 \mathbf{e}_1 + \dots + \|\mathbf{v}\| \cos \theta_n \mathbf{e}_n,$

where $\theta_i$ is the angle between $\mathbf{e}_i$ and $\mathbf{v}.$ This lead us to think of the vector $\langle \mathbf{e}_i,\mathbf{v} \rangle \mathbf{e}_i = \|\mathbf{v}\| \cos \theta_i\mathbf{e}_i$ as the “projection” of $\mathbf{v}$ onto the one-dimensional subspace $\mathrm{span} \{\mathbf{e}_i\}$ of $\mathbf{V}.$ In what sense is the vector $\langle \mathbf{e}_i,\mathbf{v}\rangle \mathbf{e}_i$ the “projection” of the vector $\mathbf{v}$ onto the “line” $\mathbf{E}_i = \mathrm{span} \{\mathbf{e}_i\}$? Our geometric intuition concerning projections suggests that this construction should have two properties: first, the vector $\langle \mathbf{e}_i,\mathbf{v}\rangle \mathbf{e}_i$ should be the element of $\mathbf{E}_i$ which is closest to $\mathbf{v};$ and second, the vector $\mathbf{v}-\langle\mathbf{e}_i,\mathbf{v}\rangle \mathbf{e}_i$ should be orthogonal to $\mathbf{e}_i.$ (This would be a good time to draw yourself a diagram, or to consult the diagram in Lecture 5). We want to prove that these two features, which characterize the geometric notion of projection, actually hold in the setting of an arbitrary Euclidean space. Let us consider this in the following slightly more general setup, where the line $\mathbf{E}_i$ is replaced by an arbitrary finite-dimensional subspace. Here’s a motivating and suggestive picture.

We first develop some general features of subspaces of Euclidean spaces, which amount to the statement that they always come in complementary pairs. More precisely, let us consider the subset $\mathbf{W}^\perp$ of $\mathbf{V}$ consisting of all those vectors in $\mathbf{V}$ which are perpendicular to every vector in the subspace $\mathbf{W},$

$\mathbf{W}^\perp := \{\mathbf{v} \in \mathbf{V} \colon \langle \mathbf{v},\mathbf{w} \rangle = 0 \ \forall \mathbf{w} \in \mathbf{W}\}.$

Proposition 1: $\mathbf{W}^\perp$ is a subspace of $\mathbf{V}.$

Proof: Since the zero vector is orthogonal to everything, we have $\mathbf{0} \in \mathbf{W}^\perp.$ It remains to demonstrate that $\mathbf{W}^\perp$ is closed under taking linear combinations. For any $\mathbf{v}_1,\mathbf{v}_2 \in \mathbf{W}^\perp,$ any $\mathbf{w} \in \mathbf{W},$ and any $a_1,a_2 \in \mathbb{R},$ we have

$\langle a_1\mathbf{v}_1+a_2\mathbf{v}_2,\mathbf{w} \rangle = \langle a_1 \mathbf{v}_1,\mathbf{w} \rangle + \langle a_2 \mathbf{v}_2,\mathbf{w} \rangle= a_10 + a_20 =0.$

—Q.E.D.

Proposition 2: We have $\mathbf{W} \cap \mathbf{W}^\perp = \{\mathbf{0}\}.$

Proof: Since both $\mathbf{W}$ and $\mathbf{W}^\perp$ contain the zero vector (because they’re subspaces), their intersection also contains the zero vector. Now let $w \in \mathbf{W} \cap \mathbf{W}^\perp.$ Then, $\mathbf{w}$ is orthogonal to itself, i.e. $\langle \mathbf{w},\mathbf{w} \rangle = 0.$ By the scalar product axioms, the only vector with this property is $\mathbf{w}=\mathbf{0}.$

— Q.E.D.

Propositions 1 and 2 make no assumption on the dimension of the Euclidean space $\mathbf{V}$ — it could be finite-dimensional, or it could be infinite-dimensional. The same is true of the subspace $\mathbf{W}.$ At this point, we restrict to the case that $\mathbf{V}$ is an $n$-dimensional vector space, and keep this restriction in place for the rest of the lecture.

Let $\mathbf{W}$ be an $m$-dimensional subspace of the $n$-dimensional subspace $\mathbf{V}.$ If $m=n,$ then $\mathbf{W}=\mathbf{V},$ as proved on Assignment 1. Suppose $m and let $\{\mathbf{e}_1,\dots,\mathbf{e}_m\}$ be an orthonormal basis of $\mathbf{W}.$ Since $m there is a vector $\mathbf{v}_1 \in \mathbf{V}$ which is not in $\mathbf{W}.$ In particular, the vector

$\mathbf{f}_1 = \mathbf{v}_1 - \sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{v}_1\rangle \mathbf{e}_i$

is not the zero vector. This vector $\mathbf{f}_1$ is orthogonal to each of the vectors $\mathbf{e}_1,\dots,\mathbf{e}_m,$ and hence two things are true: first, $\mathbf{f}_1 \in \mathbf{W}^\perp;$ and second, $\{\mathbf{e}_1,\dots,\mathbf{e}_m,\mathbf{f}_1\}$ is an orthogonal set of nonzero vectors. Thus, if $m=n-1,$ the set $\{\mathbf{e}_1,\dots,\mathbf{e}_m,\mathbf{f}_1\}$ is an orthogonal basis of $\mathbf{V}.$ If $m then there is a vector $\mathbf{v}_2 \in \mathbf{V}$ which is not in the span of $\{\mathbf{e}_1,\dots,\mathbf{e}_m,\mathbf{f}_1\}$. We set

$\mathbf{f}_2 = \mathbf{v}_2-\langle \mathbf{f_1},\mathbf{v}_2\rangle\mathbf{f}_1 - \sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{v}_2\rangle \mathbf{e}_i$

to obtain a nonzero vector $\mathbf{f}_2$ orthogonal to all vectors in the set $\{\mathbf{e}_1,\dots,\mathbf{e}_m,\mathbf{f}_1\}$. In particular, $\mathbf{f}_2 \in \mathbf{W}^\perp.$ If $m=n-2,$ then $\{\mathbf{e}_1,\dots,\mathbf{e}_m,\mathbf{f}_1,\,mathbf{f}_2\}$ is an orthogonal basis of $\mathbf{V}.$ If $m we repeat the same process. After $n-m$ iterations of this process, we have generated an orthogonal basis

$\{\mathbf{e}_1,\dots,\mathbf{e}_m,\mathbf{f}_1,\dots,\mathbf{f}_{n-m}\}$

such that $\{\mathbf{e}_1,\dots,\mathbf{e}_m\}$ is an orthonormal basis of $\mathbf{W}$ and $\{\mathbf{f}_1,\dots,\mathbf{f}_{n-m}\}$ is an orthogonal basis of $\mathbf{W}^\perp,$ which can be normalized to get an orthonormal basis of $\mathbf{W}^\perp.$

We now come orthogonal projections in general. Let $\mathbf{W}$ be a subspace of $\mathbf{V},$ and let $\mathbf{W}^\perp$ be its orthogonal complement. Invoking the above construction, let $\{\mathbf{e}_1,\dots,\mathbf{e}_m,\mathbf{f}_1,\dots,\mathbf{f}_{n-m}\}$ be an orthonormal basis of $\mathbf{V}$ such that $\{\mathbf{e}_1,\dots,\mathbf{e}_m\}$ is an orthonormal basis of $\mathbf{W}$ and $\{\mathbf{f}_1,\dots,\mathbf{f}_{n-m}\}$ is an orthonormal basis of $\mathbf{W}^\perp.$ The function $P_\mathbf{W} \colon \mathbf{V} \to \mathbf{W}$ defined by

$P_\mathbf{W}\mathbf{v} = \sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{v} \rangle \mathbf{e}_i$

is called the orthogonal projector of $\mathbf{V}$ on $\mathbf{W}.$ For any vector $\mathbf{v} \in \mathbf{V},$ the vector $P_\mathbf{W}\mathbf{v} \in \mathbf{W}$ is called the orthogonal projection of $\mathbf{v}$ onto $\mathbf{W}.$ Observe that $\mathbf{P}_\mathbf{W}\mathbf{v}=\mathbf{0}$ if $\mathbf{v} \in \mathbf{W}^\perp.$

Proposition 1: The function $P_\mathbf{W}$ is a linear transformation.

Proof: First, let us check that $P_\mathbf{W}$ sends the zero vector of $\mathbf{V}$ to the zero vector of $\mathbf{W}.$ Note that, since $\mathbf{W}$ is a subspace of $\mathbf{V},$ they have the same zero vector, we denote it simply $\mathbf{0}$ instead of using two different symbols $\mathbf{0}_\mathbf{V}$ and $\mathbf{0}_\mathbf{W}$ for this same vector. We have

$P_\mathbf{W} \mathbf{0} = \sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{0}\rangle \mathbf{e}_i= \sum_{i=1}^m 0 \mathbf{e}_i=\mathbf{0}.$

Now we check that $P_\mathbf{W}$ respects linear combinations. Let $\mathbf{v}_1,\mathbf{v}_2 \in \mathbf{V}$ be two vectors, and let $a_1,a_2 \in \mathbb{R}$ be two scalars. We then have

$P_\mathbf{W}(a_1\mathbf{v}_1+a_2\mathbf{v}_2) \\ = \sum_{i=1}^m \langle \mathbf{e}_i,a_1\mathbf{v}_1+a_2\mathbf{v}_2 \rangle \mathbf{e}_i \\=a_1\sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{v}_1\rangle \mathbf{e}_i + a_2\sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{v}_2\rangle \mathbf{e}_i \\ = a_1P_\mathbf{W}\mathbf{v}_1 + a_2P_\mathbf{W}\mathbf{v}_2.$

— Q.E.D.

Proposition 2: The linear transformation $P_\mathbf{W}$ satisfies $P_\mathbf{W}P_\mathbf{W}=P_\mathbf{W}.$

Proof: The claim is that $P_\mathbf{W}(P_\mathbf{W}\mathbf{v})=P_\mathbf{W}\mathbf{v}$ for all $\mathbf{v} \in \mathbf{V}.$ Let us check this. First, observe that for any vector $\mathbf{e}_j$ in the orthogonal basis $E$ of $\mathbf{W},$ we have

$P_\mathbf{W}\mathbf{e}_j = \sum_{i=1}^n \langle \mathbf{e}_i,\mathbf{e}_j \rangle \mathbf{e}_i = \sum_{i=1}^n \delta_{ij} \mathbf{e}_i = \mathbf{e}_j.$

Note also that since $E$ is a basis of $\mathbf{W},$ the above calculation together with Proposition 1 tells us that $P_\mathbf{W}\mathbf{w}=\mathbf{w}$ for all $\mathbf{w} \in \mathbf{W},$ which is to be expected: the projection of a vector $\mathbf{w}$ already in $\mathbf{W}$ onto $\mathbf{W}$ should just be $\mathbf{w}.$ Now to finish the proof, we apply this calculation:

$P_\mathbf{W}(P_\mathbf{W}\mathbf{v}) = P_\mathbf{W}\sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{v}\rangle \mathbf{e}_i = \sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{v}\rangle P_\mathbf{W}\mathbf{e}_i = \sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{v}\rangle \mathbf{e}_i= \mathbf{v}.$

— Q.E.D.

Proposition 3: The linear transformation $P_\mathbf{W}$ has the property that $\langle \mathbf{v}_1, P_\mathbf{W}\mathbf{v}_2 \rangle = \langle P_\mathbf{W}\mathbf{v}_1, \mathbf{v}_2 \rangle$ for any $\mathbf{v}_1,\mathbf{v}_2 \in \mathbf{V}.$

Proof: For any two vectors $\mathbf{v}_1,\mathbf{v}_2 \in \mathbf{V},$ we have

$\langle \mathbf{v}_1,P_\mathbf{W}\mathbf{v}_2 \rangle \\ = \left\langle \sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{v}_1\rangle \mathbf{e}_i+\sum_{j=1}^{n-m} \langle \mathbf{f}_j,\mathbf{v}_1 \rangle \mathbf{f}_j,\sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{v}_2\rangle P_\mathbf{W}\mathbf{e}_i+\sum_{j=1}^{n-m} \langle \mathbf{f}_j,\mathbf{v}_2 \rangle P_\mathbf{W}\mathbf{f}_j\right\rangle \\ = \left\langle \sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{v}_1\rangle \mathbf{e}_i+\sum_{j=1}^{n-m} \langle \mathbf{f}_j,\mathbf{v}_1 \rangle \mathbf{f}_j,\sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{v}_2\rangle \mathbf{e}_i\right\rangle \\= \left\langle \sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{v}_1\rangle \mathbf{e}_i,\sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{v}_2\rangle \mathbf{e}_i\right\rangle \\ = \left\langle \sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{v}_1\rangle \mathbf{e}_i,\sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{v}_2\rangle \mathbf{e}_i + \sum_{j=1}^{n-m} \langle \mathbf{f}_j,\mathbf{v}_2\rangle \mathbf{f}_j\right\rangle \\ = \left\langle \sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{v}_1\rangle P_\mathbf{W}\mathbf{e}_i+\sum_{j=1}^{n-m} \langle \mathbf{f}_j,\mathbf{v}_1\rangle P_\mathbf{W}\mathbf{f}_j,\sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{v}_2\rangle \mathbf{e}_i + \sum_{j=1}^{n-m} \langle \mathbf{f}_j,\mathbf{v}_2\rangle \mathbf{f}_j\right\rangle \\ = \langle P_\mathbf{W}\mathbf{v}_1,\mathbf{v}_2 \rangle.$

— Q.E.D

Proposition 4: For any $\mathbf{v} \in \mathbf{V}$ and $\mathbf{w} \in \mathbf{W},$ we have $\langle \mathbf{v}-P_\mathbf{W}\mathbf{v},\mathbf{w} \rangle = 0.$

Proof: Before reading the proof, draw yourself a diagram to make sure you can visualize what this proposition is saying. The proof itself follows easily from Proposition 3: we have

$\langle \mathbf{v}-P_\mathbf{W}\mathbf{v},\mathbf{w} \rangle = \langle \mathbf{v}, \mathbf{w} \rangle - \langle P_\mathbf{W}\mathbf{v},\mathbf{w} \rangle = \langle \mathbf{v}, \mathbf{w} \rangle - \langle \mathbf{v},P_\mathbf{W}\mathbf{w} \rangle = \langle \mathbf{v}, \mathbf{w} \rangle - \langle \mathbf{v},\mathbf{w} \rangle =0.$

— Q.E.D.

Proposition 5: For any $\mathbf{v} \in \mathbf{V},$ we have

$\|\mathbf{v} - P_\mathbf{W}\mathbf{v}\| < \|\mathbf{v} - \mathbf{w}\| \quad \forall \mathbf{w} \in \mathbf{W}-\{P_\mathbf{W}\mathbf{v}\}.$

Proof: Let us write

$\|\mathbf{v}-\mathbf{w}\|^2 = \|\mathbf{v}-P_\mathbf{W}\mathbf{v} + P_\mathbf{W}\mathbf{v}-\mathbf{w}\|^2.$

Now observe that the vector $P_\mathbf{W}\mathbf{v} - \mathbf{w}$ lies in $\mathbf{W},$ since it is the difference of two vectors in this subspace. Consequently, $\mathbf{v}-P_\mathbf{W}\mathbf{v}$ and $P_\mathbf{W}\mathbf{v}-\mathbf{w}$ are orthogonal vectors, by Proposition 4. We may thus apply the Pythagorean theorem (Assignment 2) to obtain

$\|\mathbf{v}-\mathbf{w}\|^2 = \|\mathbf{v}-P_\mathbf{W}\mathbf{v}\|^2 + \varepsilon,$

where

$\varepsilon = \|P_\mathbf{W}\mathbf{v}-\mathbf{w}\|^2 >0.$

— Q.E.D.

Proposition 5 says that $P_\mathbf{W}\mathbf{v}$ is the vector in $\mathbf{W}$ which is closest to $\mathbf{v},$ which matches our geometric intuition concerning projections. Equivalently, we can say that $P_\mathbf{W}\mathbf{v}$ is the vector in $\mathbf{W}$ which best approximates $\mathbf{v},$ and this perspective makes orthogonal projections very important in applications of linear algebra to statistics, data science, physics, engineering, and more. However, Proposition 5 also has purely mathematical importance. Namely, we have constructed the linear transformation $P_\mathbf{W}$ using an arbitrarily chosen orthonormal basis $E=\{\mathbf{e}_1,\dots,\mathbf{e}_n\}$ in $\mathbf{W}.$ If we had used a different orthonormal basis $F=\{\mathbf{f}_1,\dots,\mathbf{f}_n\},$ the same formula gives us a possibly different linear transformation

$Q_\mathbf{W} \colon \mathbf{V} \to \mathbf{W}$

defined by

$Q_\mathbf{W}\mathbf{v} = \sum_{I=1}^m \langle \mathbf{f}_i,\mathbf{v}\rangle \mathbf{f}_i.$

Propositions 1-5 above all apply to $Q_\mathbf{W}$ as well, and in fact this forces $Q_\mathbf{W}=P_\mathbf{W},$ so that it really is correct to speak of the orthogonal projection of $\mathbf{V}$ onto $\mathbf{W}.$ To see why these two transformations must be the same, let us suppose they are not. This means that there is a vector $\mathbf{v} \in \mathbf{V}$ such that $P_\mathbf{W}\mathbf{v} \neq Q_\mathbf{W}\mathbf{v}.$ Thus by Proposition 5 we have

$\|\mathbf{v} - P_\mathbf{W}\mathbf{v}\| < \|\mathbf{v} - Q_\mathbf{W}\mathbf{v}\|,$

while also by Proposition 5 we have

$\|\mathbf{v} - Q\mathbf{W}\mathbf{v}\| < \|\mathbf{v} - P_\mathbf{W}\mathbf{v}\|,$

a contradiction. So, in the construction of the transformation $P_\mathbf{W},$ it does not matter which orthonormal basis of $\mathbf{W}$ we use.