Math 31AH: Lecture 7

In Lecture 5, we considered the question of how a vector \mathbf{v} in a Euclidean space (\mathbf{V},\langle \cdot,\cdot \rangle) can be represented as a linear combination of the vectors in an orthonormal basis E = \{\mathbf{e}_1,\dots,\mathbf{e}_n\} of \mathbf{V}. We worked out the answer to this question: the coordinates of \mathbf{v} are given by taking the scalar product with each vector in the the orthonormal basis:

\mathbf{v} = \langle \mathbf{e}_1,\mathbf{v} \rangle \mathbf{e}_1 + \dots + \langle \mathbf{e}_n,\mathbf{v} \rangle \mathbf{e}_n.

Equivalently, using our algebraic definition of the angle between two vectors in a Euclidean space, this can be written as

\mathbf{v} = \|\mathbf{v}\| \cos \theta_1 \mathbf{e}_1 + \dots + \|\mathbf{v}\| \cos \theta_n \mathbf{e}_n,

where \theta_i is the angle between \mathbf{e}_i and \mathbf{v}. This lead us to think of the vector \langle \mathbf{e}_i,\mathbf{v} \rangle \mathbf{e}_i = \|\mathbf{v}\| \cos \theta_i\mathbf{e}_i as the “projection” of \mathbf{v} onto the one-dimensional subspace \mathrm{span} \{\mathbf{e}_i\} of \mathbf{V}. In what sense is the vector \langle \mathbf{e}_i,\mathbf{v}\rangle \mathbf{e}_i the “projection” of the vector \mathbf{v} onto the “line” \mathbf{E}_i = \mathrm{span} \{\mathbf{e}_i\}? Our geometric intuition concerning projections suggests that this construction should have two properties: first, the vector \langle \mathbf{e}_i,\mathbf{v}\rangle \mathbf{e}_i should be the element of \mathbf{E}_i which is closest to \mathbf{v}; and second, the vector \mathbf{v}-\langle\mathbf{e}_i,\mathbf{v}\rangle \mathbf{e}_i should be orthogonal to \mathbf{e}_i. (This would be a good time to draw yourself a diagram, or to consult the diagram in Lecture 5). We want to prove that these two features, which characterize the geometric notion of projection, actually hold in the setting of an arbitrary Euclidean space. Let us consider this in the following slightly more general setup, where the line \mathbf{E}_i is replaced by an arbitrary finite-dimensional subspace. Here’s a motivating and suggestive picture.

We first develop some general features of subspaces of Euclidean spaces, which amount to the statement that they always come in complementary pairs. More precisely, let us consider the subset \mathbf{W}^\perp of \mathbf{V} consisting of all those vectors in \mathbf{V} which are perpendicular to every vector in the subspace \mathbf{W},

\mathbf{W}^\perp := \{\mathbf{v} \in \mathbf{V} \colon \langle \mathbf{v},\mathbf{w} \rangle = 0 \ \forall \mathbf{w} \in \mathbf{W}\}.

Proposition 1: \mathbf{W}^\perp is a subspace of \mathbf{V}.

Proof: Since the zero vector is orthogonal to everything, we have \mathbf{0} \in \mathbf{W}^\perp. It remains to demonstrate that \mathbf{W}^\perp is closed under taking linear combinations. For any \mathbf{v}_1,\mathbf{v}_2 \in \mathbf{W}^\perp, any \mathbf{w} \in \mathbf{W}, and any a_1,a_2 \in \mathbb{R}, we have

\langle a_1\mathbf{v}_1+a_2\mathbf{v}_2,\mathbf{w} \rangle = \langle a_1 \mathbf{v}_1,\mathbf{w} \rangle + \langle a_2 \mathbf{v}_2,\mathbf{w} \rangle= a_10 + a_20 =0.


Proposition 2: We have \mathbf{W} \cap \mathbf{W}^\perp = \{\mathbf{0}\}.

Proof: Since both \mathbf{W} and \mathbf{W}^\perp contain the zero vector (because they’re subspaces), their intersection also contains the zero vector. Now let w \in \mathbf{W} \cap \mathbf{W}^\perp. Then, \mathbf{w} is orthogonal to itself, i.e. \langle \mathbf{w},\mathbf{w} \rangle = 0. By the scalar product axioms, the only vector with this property is \mathbf{w}=\mathbf{0}.

— Q.E.D.

Propositions 1 and 2 make no assumption on the dimension of the Euclidean space \mathbf{V} — it could be finite-dimensional, or it could be infinite-dimensional. The same is true of the subspace \mathbf{W}. At this point, we restrict to the case that \mathbf{V} is an n-dimensional vector space, and keep this restriction in place for the rest of the lecture.

Let \mathbf{W} be an m-dimensional subspace of the n-dimensional subspace \mathbf{V}. If m=n, then \mathbf{W}=\mathbf{V}, as proved on Assignment 1. Suppose m<n and let \{\mathbf{e}_1,\dots,\mathbf{e}_m\} be an orthonormal basis of \mathbf{W}. Since m<n, there is a vector \mathbf{v}_1 \in \mathbf{V} which is not in \mathbf{W}. In particular, the vector

\mathbf{f}_1 = \mathbf{v}_1 - \sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{v}_1\rangle \mathbf{e}_i

is not the zero vector. This vector \mathbf{f}_1 is orthogonal to each of the vectors \mathbf{e}_1,\dots,\mathbf{e}_m, and hence two things are true: first, \mathbf{f}_1 \in \mathbf{W}^\perp; and second, \{\mathbf{e}_1,\dots,\mathbf{e}_m,\mathbf{f}_1\} is an orthogonal set of nonzero vectors. Thus, if m=n-1, the set \{\mathbf{e}_1,\dots,\mathbf{e}_m,\mathbf{f}_1\} is an orthogonal basis of \mathbf{V}. If m <n-1, then there is a vector \mathbf{v}_2 \in \mathbf{V} which is not in the span of \{\mathbf{e}_1,\dots,\mathbf{e}_m,\mathbf{f}_1\}. We set

\mathbf{f}_2 = \mathbf{v}_2-\langle \mathbf{f_1},\mathbf{v}_2\rangle\mathbf{f}_1 - \sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{v}_2\rangle \mathbf{e}_i

to obtain a nonzero vector \mathbf{f}_2 orthogonal to all vectors in the set \{\mathbf{e}_1,\dots,\mathbf{e}_m,\mathbf{f}_1\}. In particular, \mathbf{f}_2 \in \mathbf{W}^\perp. If m=n-2, then \{\mathbf{e}_1,\dots,\mathbf{e}_m,\mathbf{f}_1,\,mathbf{f}_2\} is an orthogonal basis of \mathbf{V}. If m<n-2, we repeat the same process. After n-m iterations of this process, we have generated an orthogonal basis


such that \{\mathbf{e}_1,\dots,\mathbf{e}_m\} is an orthonormal basis of \mathbf{W} and \{\mathbf{f}_1,\dots,\mathbf{f}_{n-m}\} is an orthogonal basis of \mathbf{W}^\perp, which can be normalized to get an orthonormal basis of \mathbf{W}^\perp.

We now come orthogonal projections in general. Let \mathbf{W} be a subspace of \mathbf{V}, and let \mathbf{W}^\perp be its orthogonal complement. Invoking the above construction, let \{\mathbf{e}_1,\dots,\mathbf{e}_m,\mathbf{f}_1,\dots,\mathbf{f}_{n-m}\} be an orthonormal basis of \mathbf{V} such that \{\mathbf{e}_1,\dots,\mathbf{e}_m\} is an orthonormal basis of \mathbf{W} and \{\mathbf{f}_1,\dots,\mathbf{f}_{n-m}\} is an orthonormal basis of \mathbf{W}^\perp. The function P_\mathbf{W} \colon \mathbf{V} \to \mathbf{W} defined by

P_\mathbf{W}\mathbf{v} = \sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{v} \rangle \mathbf{e}_i

is called the orthogonal projector of \mathbf{V} on \mathbf{W}. For any vector \mathbf{v} \in \mathbf{V}, the vector P_\mathbf{W}\mathbf{v} \in \mathbf{W} is called the orthogonal projection of \mathbf{v} onto \mathbf{W}. Observe that \mathbf{P}_\mathbf{W}\mathbf{v}=\mathbf{0} if \mathbf{v} \in \mathbf{W}^\perp.

Proposition 1: The function P_\mathbf{W} is a linear transformation.

Proof: First, let us check that P_\mathbf{W} sends the zero vector of \mathbf{V} to the zero vector of \mathbf{W}. Note that, since \mathbf{W} is a subspace of \mathbf{V}, they have the same zero vector, we denote it simply \mathbf{0} instead of using two different symbols \mathbf{0}_\mathbf{V} and \mathbf{0}_\mathbf{W} for this same vector. We have

P_\mathbf{W} \mathbf{0} = \sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{0}\rangle \mathbf{e}_i= \sum_{i=1}^m 0 \mathbf{e}_i=\mathbf{0}.

Now we check that P_\mathbf{W} respects linear combinations. Let \mathbf{v}_1,\mathbf{v}_2 \in \mathbf{V} be two vectors, and let a_1,a_2 \in \mathbb{R} be two scalars. We then have

P_\mathbf{W}(a_1\mathbf{v}_1+a_2\mathbf{v}_2) \\ = \sum_{i=1}^m \langle \mathbf{e}_i,a_1\mathbf{v}_1+a_2\mathbf{v}_2 \rangle \mathbf{e}_i \\=a_1\sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{v}_1\rangle \mathbf{e}_i +  a_2\sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{v}_2\rangle \mathbf{e}_i \\ = a_1P_\mathbf{W}\mathbf{v}_1 + a_2P_\mathbf{W}\mathbf{v}_2.

— Q.E.D.

Proposition 2: The linear transformation P_\mathbf{W} satisfies P_\mathbf{W}P_\mathbf{W}=P_\mathbf{W}.

Proof: The claim is that P_\mathbf{W}(P_\mathbf{W}\mathbf{v})=P_\mathbf{W}\mathbf{v} for all \mathbf{v} \in \mathbf{V}. Let us check this. First, observe that for any vector \mathbf{e}_j in the orthogonal basis E of \mathbf{W}, we have

P_\mathbf{W}\mathbf{e}_j = \sum_{i=1}^n \langle \mathbf{e}_i,\mathbf{e}_j \rangle \mathbf{e}_i = \sum_{i=1}^n \delta_{ij} \mathbf{e}_i = \mathbf{e}_j.

Note also that since E is a basis of \mathbf{W}, the above calculation together with Proposition 1 tells us that P_\mathbf{W}\mathbf{w}=\mathbf{w} for all \mathbf{w} \in \mathbf{W}, which is to be expected: the projection of a vector \mathbf{w} already in \mathbf{W} onto \mathbf{W} should just be \mathbf{w}. Now to finish the proof, we apply this calculation:

P_\mathbf{W}(P_\mathbf{W}\mathbf{v}) = P_\mathbf{W}\sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{v}\rangle \mathbf{e}_i = \sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{v}\rangle P_\mathbf{W}\mathbf{e}_i = \sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{v}\rangle \mathbf{e}_i= \mathbf{v}.

— Q.E.D.

Proposition 3: The linear transformation P_\mathbf{W} has the property that \langle \mathbf{v}_1, P_\mathbf{W}\mathbf{v}_2 \rangle = \langle P_\mathbf{W}\mathbf{v}_1, \mathbf{v}_2 \rangle for any \mathbf{v}_1,\mathbf{v}_2 \in \mathbf{V}.

Proof: For any two vectors \mathbf{v}_1,\mathbf{v}_2 \in \mathbf{V}, we have

\langle \mathbf{v}_1,P_\mathbf{W}\mathbf{v}_2 \rangle \\ = \left\langle \sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{v}_1\rangle \mathbf{e}_i+\sum_{j=1}^{n-m} \langle \mathbf{f}_j,\mathbf{v}_1 \rangle \mathbf{f}_j,\sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{v}_2\rangle P_\mathbf{W}\mathbf{e}_i+\sum_{j=1}^{n-m} \langle \mathbf{f}_j,\mathbf{v}_2 \rangle P_\mathbf{W}\mathbf{f}_j\right\rangle \\ = \left\langle \sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{v}_1\rangle \mathbf{e}_i+\sum_{j=1}^{n-m} \langle \mathbf{f}_j,\mathbf{v}_1 \rangle \mathbf{f}_j,\sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{v}_2\rangle \mathbf{e}_i\right\rangle \\= \left\langle \sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{v}_1\rangle \mathbf{e}_i,\sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{v}_2\rangle \mathbf{e}_i\right\rangle \\ = \left\langle \sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{v}_1\rangle \mathbf{e}_i,\sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{v}_2\rangle \mathbf{e}_i + \sum_{j=1}^{n-m} \langle \mathbf{f}_j,\mathbf{v}_2\rangle \mathbf{f}_j\right\rangle \\ = \left\langle \sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{v}_1\rangle P_\mathbf{W}\mathbf{e}_i+\sum_{j=1}^{n-m} \langle \mathbf{f}_j,\mathbf{v}_1\rangle P_\mathbf{W}\mathbf{f}_j,\sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{v}_2\rangle \mathbf{e}_i + \sum_{j=1}^{n-m} \langle \mathbf{f}_j,\mathbf{v}_2\rangle \mathbf{f}_j\right\rangle \\ = \langle P_\mathbf{W}\mathbf{v}_1,\mathbf{v}_2 \rangle.

— Q.E.D

Proposition 4: For any \mathbf{v} \in \mathbf{V} and \mathbf{w} \in \mathbf{W}, we have \langle \mathbf{v}-P_\mathbf{W}\mathbf{v},\mathbf{w} \rangle = 0.

Proof: Before reading the proof, draw yourself a diagram to make sure you can visualize what this proposition is saying. The proof itself follows easily from Proposition 3: we have

\langle \mathbf{v}-P_\mathbf{W}\mathbf{v},\mathbf{w} \rangle = \langle \mathbf{v}, \mathbf{w} \rangle - \langle P_\mathbf{W}\mathbf{v},\mathbf{w} \rangle = \langle \mathbf{v}, \mathbf{w} \rangle - \langle \mathbf{v},P_\mathbf{W}\mathbf{w} \rangle = \langle \mathbf{v}, \mathbf{w} \rangle - \langle \mathbf{v},\mathbf{w} \rangle =0.

— Q.E.D.

Proposition 5: For any \mathbf{v} \in \mathbf{V}, we have

\|\mathbf{v} - P_\mathbf{W}\mathbf{v}\| < \|\mathbf{v} - \mathbf{w}\| \quad \forall \mathbf{w} \in \mathbf{W}-\{P_\mathbf{W}\mathbf{v}\}.

Proof: Let us write

\|\mathbf{v}-\mathbf{w}\|^2 = \|\mathbf{v}-P_\mathbf{W}\mathbf{v} + P_\mathbf{W}\mathbf{v}-\mathbf{w}\|^2.

Now observe that the vector P_\mathbf{W}\mathbf{v} - \mathbf{w} lies in \mathbf{W}, since it is the difference of two vectors in this subspace. Consequently, \mathbf{v}-P_\mathbf{W}\mathbf{v} and P_\mathbf{W}\mathbf{v}-\mathbf{w} are orthogonal vectors, by Proposition 4. We may thus apply the Pythagorean theorem (Assignment 2) to obtain

\|\mathbf{v}-\mathbf{w}\|^2 = \|\mathbf{v}-P_\mathbf{W}\mathbf{v}\|^2 + \varepsilon,


\varepsilon = \|P_\mathbf{W}\mathbf{v}-\mathbf{w}\|^2 >0.

— Q.E.D.

Proposition 5 says that P_\mathbf{W}\mathbf{v} is the vector in \mathbf{W} which is closest to \mathbf{v}, which matches our geometric intuition concerning projections. Equivalently, we can say that $P_\mathbf{W}\mathbf{v}$ is the vector in \mathbf{W} which best approximates \mathbf{v}, and this perspective makes orthogonal projections very important in applications of linear algebra to statistics, data science, physics, engineering, and more. However, Proposition 5 also has purely mathematical importance. Namely, we have constructed the linear transformation P_\mathbf{W} using an arbitrarily chosen orthonormal basis E=\{\mathbf{e}_1,\dots,\mathbf{e}_n\} in \mathbf{W}. If we had used a different orthonormal basis F=\{\mathbf{f}_1,\dots,\mathbf{f}_n\}, the same formula gives us a possibly different linear transformation

Q_\mathbf{W} \colon \mathbf{V} \to \mathbf{W}

defined by

Q_\mathbf{W}\mathbf{v} = \sum_{I=1}^m \langle \mathbf{f}_i,\mathbf{v}\rangle \mathbf{f}_i.

Propositions 1-5 above all apply to Q_\mathbf{W} as well, and in fact this forces Q_\mathbf{W}=P_\mathbf{W}, so that it really is correct to speak of the orthogonal projection of \mathbf{V} onto \mathbf{W}. To see why these two transformations must be the same, let us suppose they are not. This means that there is a vector \mathbf{v} \in \mathbf{V} such that P_\mathbf{W}\mathbf{v} \neq Q_\mathbf{W}\mathbf{v}. Thus by Proposition 5 we have

\|\mathbf{v} - P_\mathbf{W}\mathbf{v}\| < \|\mathbf{v} - Q_\mathbf{W}\mathbf{v}\|,

while also by Proposition 5 we have

\|\mathbf{v} - Q\mathbf{W}\mathbf{v}\| < \|\mathbf{v} - P_\mathbf{W}\mathbf{v}\|,

a contradiction. So, in the construction of the transformation P_\mathbf{W}, it does not matter which orthonormal basis of \mathbf{W} we use.

Leave a Reply