Math 31AH: Lecture 8

In Lecture 4, we introduced the notion of a Euclidean space, which is a vector space \mathbf{V} together with a scalar product \langle \cdot,\cdot \rangle defined on \mathbf{V}. In a Euclidean space, we can use the scalar product to define a notions of vector length, distance between two vectors, and angle between two vectors. In short, while a vector space alone is a purely algebraic object, we can do Euclidean geometry in a vector space with a scalar product. This realization is extremely useful since it gives us a way to think geometrically about vectors which may not be at all like vectors in \mathbb{R}^n. For example, they could be functions, as in Assignment 3.

For better or worse, it turns out that Euclidean geometry, as useful as it is in this generalized setup, is not sufficient to describe the world around us. Mathematically, this means that we must sometimes think about non-Euclidean geometry. At the level of linear algebra, this comes down to opening ourselves up to thinking about general bilinear forms, which extend the scalar product concept in that they might fail to satisfy the symmetry and positivity axioms. An important example is the geometry of special relativity. In this physical theory, the vector space \mathbf{R}^4=\{(x_1,x_2,x_3,x_4) \colon x_i \in \mathbf{R}\} is taken to model spacetime, with the first three coordinates of a vector corresponding to its position in space, and the last coordinate being its position in time. It turns out that the geometry of spacetime is governed by a “fake” scalar product, called the Lorentz form, which is defined by

\langle (x_1,x_2,x_3,x_4),(y_1,y_2,y_3,y_4) \rangle = x_1y_1+x_2y_2+x_3y_3-x_4y_4.

So, physicists are telling us that in order to understand the geometry of spacetime we have to think about a strange version of the usual dot product on \mathbb{R}^4 which is made by taking the usual dot product of the spatial coordinates, and then subtracting the product of the time coordinates — typical, they always do this kind of thing. The Lorentz form is definitely not a scalar product, since the length of a vector can be negative:

\|(0,0,0,1)\| = -1.

Still, mathematically, there’s no reason we can’t consider such fake scalar products as a legitimate generalization of the scalar product concept.

Definition 1: A function \langle \cdot,\cdot \rangle \colon \mathbf{V} \times \mathbf{V} \to \mathbb{R} is said to be a bilinear form if, for all vectors \mathbf{v}_1,\mathbf{v}_2,\mathbf{w}_1,\mathbf{w}_2 \in \mathbf{V} and all scalars s_1,s_2,t_1,t_2 \in \mathbb{R}, we have

\langle s_1\mathbf{v}_1+s_2\mathbf{v}_2,t_1\mathbf{w}_1+t_2\mathbf{w}_2 \rangle \\ = s_1t_1\langle\mathbf{v}_1,\mathbf{w}_1\rangle + s_1t_2 \langle\mathbf{v}_1,\mathbf{w}_2\rangle + s_2t_1\langle\mathbf{v}_2,\mathbf{w}_1\rangle + s_2t_2\langle\mathbf{v}_2,\mathbf{w}_2)\rangle.

So, a bilinear form is just a “weak” scalar product on \mathbf{V} which might fail two out of three of the scalar product axioms.

In this lecture, we will see that the set of all bilinear forms that can be defined on an n-dimensional vector space \mathbf{V} can be viewed as the set of all tables of real numbers with n rows and n columns, or in other words n \times n matrices. In fact, it is not difficult to come to this realization — we just have to pick a basis in A=\{\mathbf{a}_1,\dots,\mathbf{a}_n\} in \mathbf{V} in order to describe a given bilinear form as a matrix. Things however get a bit tricky when we want to compare the two matrices which describe the same bilinear form relative to different bases.

Let’s start with something easier.

Definition 2: A function \langle \cdot \rangle \colon \mathbf{V} \to \mathbb{R} is said to be a linear form if, for all vectors \mathbf{v}_1,\mathbf{v}_2 and all scalars s_1,s_2 \in \mathbb{R}, we have

\langle s_1\mathbf{v}_1 + s_2\mathbf{v}_2 \rangle = s_1\langle \mathbf{v}_1\rangle + s_2\langle \mathbf{v}_2 \rangle.

Now suppose that \langle \cdot \rangle is a linear form on an n-dimensional vector space \mathbf{V}. Then, in order to be able to compute the number \langle \mathbf{v} \rangle for any vector \mathbf{v} \in \mathbf{V}, it is sufficient to know how to calculate the numbers

a_1=\langle \mathbf{a}_1 \rangle,\dots,a_n=\langle \mathbf{a}_n \rangle,

where A=\{\mathbf{a}_1,\dots,\mathbf{a}_n\} is a basis of \mathbf{V}. Indeed, in order to compute \langle \mathbf{v} \rangle from this information, we simply write \mathbf{v} as a linear combination of basis vectors,

\mathbf{v} = x_1\mathbf{a}_1+ \dots + x_n\mathbf{a}_n,

and then compute

\langle \mathbf{v} \rangle = \langle x_1\mathbf{a}_1+ \dots + x_n\mathbf{a}_n\rangle \\ = x_1\langle \mathbf{a}_1\rangle + \dots + x_n\langle \mathbf{a}_n\rangle \\ = x_1a_1 + \dots + x_na_n.

Note that this has a very simple description in terms of the usual dot product in \mathbf{R}^n, namely

\langle \mathbf{v} \rangle =(x_1,\dots,x_n) \cdot (a_1,\dots,a_n).

Equivalently, the number \langle \mathbf{v} \rangle is computed as the product of a 1 \times n matrix and an n\times 1 matrix:

\langle \mathbf{v} \rangle = \begin{bmatrix} a_1 & \dots & a_n \end{bmatrix} \begin{bmatrix} x_1 \\ \vdots \\ x_n \end{bmatrix}.

We can write this more succinctly as the matrix equation

\langle \mathbf{v} \rangle = \mathbf{A}\mathbf{x},

where \mathbf{a} and \mathbf{x} are the only things they could be based on context. The 1 \times n matrix \mathbf{A} in this equation is referred to as the matrix of the linear form \langle \cdot \rangle relative to the basis A, and its entries are just the values of the form on each of the basis vectors. Not too complicated.

Essentially the same idea works for bilinear forms: in order to know how to compute the number \langle \mathbf{v},\mathbf{w} \rangle for any two vectors \mathbf{v},\mathbf{w} \in \mathbf{V}, it is sufficient to know how to compute the n^2 numbers

\begin{matrix} a_{11} = \langle \mathbf{a}_1,\mathbf{a}_1 \rangle & \dots & a_{1n}=\langle \mathbf{a}_1,\mathbf{a}_n \rangle \\ \vdots & {} & \vdots \\ a_{n1}=\langle \mathbf{a}_n,\mathbf{a}_1 \rangle & \dots & a_{nn} = \langle \mathbf{a}_n,\mathbf{a}_n \rangle\end{matrix}

relative to a basis A=\{\mathbf{a}_1,\dots,\mathbf{a}_n\}. Indeed, if we have access to this table of numbers, then to compute \langle \mathbf{v},\mathbf{w}\rangle for given \mathbf{v},\mathbf{w} \in \mathbf{V}, we first write these vectors as linear combinations of basis vectors,

\mathbf{v} = x_1\mathbf{a}_1 + \dots + x_n\mathbf{a}_n \\ \mathbf{w}=y_1\mathbf{a}_1 + \dots + y_n\mathbf{a}_n,

and then calculate using bilinearity:

\langle \mathbf{v},\mathbf{w} \rangle = \left\langle \sum_{i=1}^n x_i\mathbf{a}_i,\sum_{j=1}^n y_j \mathbf{a}_j \right\rangle = \sum_{i,j=1}^n x_i \langle \mathbf{a}_i,\mathbf{a}_j \rangle y_j =\sum_{i,j=1}^n x_i a_{ij} y_j.

Once again, the result of this calculation can be expressed in terms of matrices, namely as the product three matrices: an 1 \times n matrix, an n \times n matrix, and an n \times 1 matrix. Here’s how this looks:

\langle \mathbf{v},\mathbf{w} \rangle =\begin{bmatrix} x_1 & \dots & x_n \end{bmatrix} \begin{bmatrix} a_{11} & \dots & a_{1n} \\ \vdots & {} & \vdots \\ a_{n1} & \dots & a_{nn} \end{bmatrix} \begin{bmatrix} y_1 \\ \vdots \\ y_n \end{bmatrix}.

This formula is often written

\langle \mathbf{v},\mathbf{w} \rangle = \mathbf{x}^T\mathbf{A}\mathbf{y},

where the symbols \mathbf{x},\mathbf{A},\mathbf{y} are the only things they could possibly be based on context. In particular, the n \times n matrix \mathbf{A}=[a_{ij}]_{i,j=1}^n is referred to as the matrix of the bilinear form \langle \cdot,\cdot \rangle relative to the basis A=\{\mathbf{a}_1,\dots,\mathbf{a}_n\}.

Now we come to the issue of dependence on the choice of basis. This is easily worked out for linear forms, but is a little more complex for bilinear forms.

Let A=\{\mathbf{a}_1,\dots,\mathbf{a}_n\} and B=\{\mathbf{b}_1,\dots,\mathbf{b}_n\} be two bases in the same vector space \mathbf{V}, and let \langle \cdot \rangle be a linear form on \mathbf{V}. Let

\mathbf{A}=\begin{bmatrix} a_1 \\ \vdots \\ a_n \end{bmatrix} \quad\text{ and }\quad \mathbf{B} = \begin{bmatrix} b_1 \\ \vdots \\ b_n \end{bmatrix}

be the matrices which represent the form \langle \cdot \rangle relative to the two bases A,B. We want to discern the relationship between these two matrices. We follow the same Marie Kondo approved out with the old, in with the new strategy as in Lecture 4: we write vectors of the “old” basis A as linear combinations of the vectors of the new basis B,

\mathbf{a}_1 = p_{11}\mathbf{b}_1 + \dots + p_{n1}\mathbf{b}_n \\ \vdots \\ \mathbf{a}_n = p_{1n}\mathbf{b}_1 + \dots + p_{nn}\mathbf{b}_n.

Now we evaluate the linear form \langle \cdot \rangle on both sides of each of these vector equations, to get the scalar equations

a_1 = p_{11}b_1 + \dots + p_{n1}b_n \\ \vdots \\ a_n = p_{1n}b_1 + \dots + p_{nn}b_n.

These scalar equations can be written as the single matrix equation

\begin{bmatrix} a_1 \\ \vdots \\ a_n \end{bmatrix} = \begin{bmatrix} p_{11} & \dots & p_{1n} \\ \vdots & {} & \vdots \\ p_{n1} & \dots & p_{nn} \end{bmatrix}\begin{bmatrix} b_1 \\ \vdots \\ b_n \end{bmatrix},

or more briefly as

\mathbf{A} = \mathbf{P}\mathbf{B},

where \mathbf{P}=[p_{ij}]_{i,j=1}^n. And that’s it — that’s change of basis for linear forms.

Although the end result is slightly more complicated, the strategy for working out the relationship between the matrices \mathbf{A} and \mathbf{B} representing the same bilinear form \langle \cdot,\cdot \rangle relative to two (possibly) different bases A=\{\mathbf{a}_1,\dots,\mathbf{a}_n\} and B=\{\mathbf{b}_1,\dots,\mathbf{b}_n\} is the same: out with the old, in with the new. Just as in the case of a linear form, the first step is to write

\mathbf{a}_1 = p_{11}\mathbf{b}_1 + \dots + p_{n1}\mathbf{b}_n \\ \vdots \\ \mathbf{a}_n = p_{1n}\mathbf{b}_1 + \dots + p_{nn}\mathbf{b}_n.

Now we consider the numbers a_{ij} = \langle \mathbf{a}_i,\mathbf{a}_j \rangle. We have

a_{ij} = \left\langle \sum_{k=1}^n p_{ki}\mathbf{b}_i,\sum_{l=1}^n p_{lj}\mathbf{b}_j \right\rangle = \sum_{k,l=1}^n p_{ki} \langle \mathbf{b}_i,\mathbf{b}_j \rangle p_{lj}=\sum_{k,l=1}^n p_{ki} b_{ij} p_{lj}.

Although it may take a little bit of experimentation (try it out for n=2,3), the above is fairly easily seen to be equivalent to the matrix equation

\mathbf{A}=\mathbf{P}^T\mathbf{B}\mathbf{P},

where \mathbf{P}^T is the transpose of the n \times n matrix \mathbf{P}.

That’s it for this lecture, and next time we will do more interesting things with bilinear forms, aka generalized scalar products. Although the above change of basis formulas are presented in any standard course in linear algebra, my personal opinion is that they aren’t too important. If you find them easy to remember, excellent; more important is the ability to re-derive them whenever you want, since this means you understand why they are what they are. My hope is that you will understand the meaning of linear and bilinear forms conceptually, which doesn’t require calculating their matrices relative to a particular basis.

To drive the above point home, let us close this lecture by remarking that there’s no need to stop at bilinear forms. Why not keep going to trilinear forms? Indeed, for any k \in \mathbb{N}, one may define a k-linear form on a given vector space \mathbf{V} to be any function real-valued function of k arguments on \mathbf{V},

\langle \underbrace{\cdot,\dots,\cdot}_{k \text{ arguments}} \rangle \colon \underbrace{\mathbf{V} \times \dots \times \mathbf{V}}_{k \text{ copies}} \to \mathbb{R},

which is a linear function of each argument. Conceptually, this isn’t any more complicated than a bilinear form. However, to represent such a function we need to use a k-dimensional array of numbers, which is often referred to as a $k$-dimensional tensor. In particular, a 1-dimensional tensor is a list, and a 2-dimensional tensor is a matrix. In general, change of basis formulas for k-dimensional tensors are quite messy and not very meaningful.

Lecture 9 video

Math 31AH: Lecture 7

In Lecture 5, we considered the question of how a vector \mathbf{v} in a Euclidean space (\mathbf{V},\langle \cdot,\cdot \rangle) can be represented as a linear combination of the vectors in an orthonormal basis E = \{\mathbf{e}_1,\dots,\mathbf{e}_n\} of \mathbf{V}. We worked out the answer to this question: the coordinates of \mathbf{v} are given by taking the scalar product with each vector in the the orthonormal basis:

\mathbf{v} = \langle \mathbf{e}_1,\mathbf{v} \rangle \mathbf{e}_1 + \dots + \langle \mathbf{e}_n,\mathbf{v} \rangle \mathbf{e}_n.

Equivalently, using our algebraic definition of the angle between two vectors in a Euclidean space, this can be written as

\mathbf{v} = \|\mathbf{v}\| \cos \theta_1 \mathbf{e}_1 + \dots + \|\mathbf{v}\| \cos \theta_n \mathbf{e}_n,

where \theta_i is the angle between \mathbf{e}_i and \mathbf{v}. This lead us to think of the vector \langle \mathbf{e}_i,\mathbf{v} \rangle \mathbf{e}_i = \|\mathbf{v}\| \cos \theta_i\mathbf{e}_i as the “projection” of \mathbf{v} onto the one-dimensional subspace \mathrm{span} \{\mathbf{e}_i\} of \mathbf{V}. In what sense is the vector \langle \mathbf{e}_i,\mathbf{v}\rangle \mathbf{e}_i the “projection” of the vector \mathbf{v} onto the “line” \mathbf{E}_i = \mathrm{span} \{\mathbf{e}_i\}? Our geometric intuition concerning projections suggests that this construction should have two properties: first, the vector \langle \mathbf{e}_i,\mathbf{v}\rangle \mathbf{e}_i should be the element of \mathbf{E}_i which is closest to \mathbf{v}; and second, the vector \mathbf{v}-\langle\mathbf{e}_i,\mathbf{v}\rangle \mathbf{e}_i should be orthogonal to \mathbf{e}_i. (This would be a good time to draw yourself a diagram, or to consult the diagram in Lecture 5). We want to prove that these two features, which characterize the geometric notion of projection, actually hold in the setting of an arbitrary Euclidean space. Let us consider this in the following slightly more general setup, where the line \mathbf{E}_i is replaced by an arbitrary finite-dimensional subspace. Here’s a motivating and suggestive picture.

We first develop some general features of subspaces of Euclidean spaces, which amount to the statement that they always come in complementary pairs. More precisely, let us consider the subset \mathbf{W}^\perp of \mathbf{V} consisting of all those vectors in \mathbf{V} which are perpendicular to every vector in the subspace \mathbf{W},

\mathbf{W}^\perp := \{\mathbf{v} \in \mathbf{V} \colon \langle \mathbf{v},\mathbf{w} \rangle = 0 \ \forall \mathbf{w} \in \mathbf{W}\}.

Proposition 1: \mathbf{W}^\perp is a subspace of \mathbf{V}.

Proof: Since the zero vector is orthogonal to everything, we have \mathbf{0} \in \mathbf{W}^\perp. It remains to demonstrate that \mathbf{W}^\perp is closed under taking linear combinations. For any \mathbf{v}_1,\mathbf{v}_2 \in \mathbf{W}^\perp, any \mathbf{w} \in \mathbf{W}, and any a_1,a_2 \in \mathbb{R}, we have

\langle a_1\mathbf{v}_1+a_2\mathbf{v}_2,\mathbf{w} \rangle = \langle a_1 \mathbf{v}_1,\mathbf{w} \rangle + \langle a_2 \mathbf{v}_2,\mathbf{w} \rangle= a_10 + a_20 =0.

—Q.E.D.

Proposition 2: We have \mathbf{W} \cap \mathbf{W}^\perp = \{\mathbf{0}\}.

Proof: Since both \mathbf{W} and \mathbf{W}^\perp contain the zero vector (because they’re subspaces), their intersection also contains the zero vector. Now let w \in \mathbf{W} \cap \mathbf{W}^\perp. Then, \mathbf{w} is orthogonal to itself, i.e. \langle \mathbf{w},\mathbf{w} \rangle = 0. By the scalar product axioms, the only vector with this property is \mathbf{w}=\mathbf{0}.

— Q.E.D.

Propositions 1 and 2 make no assumption on the dimension of the Euclidean space \mathbf{V} — it could be finite-dimensional, or it could be infinite-dimensional. The same is true of the subspace \mathbf{W}. At this point, we restrict to the case that \mathbf{V} is an n-dimensional vector space, and keep this restriction in place for the rest of the lecture.

Let \mathbf{W} be an m-dimensional subspace of the n-dimensional subspace \mathbf{V}. If m=n, then \mathbf{W}=\mathbf{V}, as proved on Assignment 1. Suppose m<n and let \{\mathbf{e}_1,\dots,\mathbf{e}_m\} be an orthonormal basis of \mathbf{W}. Since m<n, there is a vector \mathbf{v}_1 \in \mathbf{V} which is not in \mathbf{W}. In particular, the vector

\mathbf{f}_1 = \mathbf{v}_1 - \sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{v}_1\rangle \mathbf{e}_i

is not the zero vector. This vector \mathbf{f}_1 is orthogonal to each of the vectors \mathbf{e}_1,\dots,\mathbf{e}_m, and hence two things are true: first, \mathbf{f}_1 \in \mathbf{W}^\perp; and second, \{\mathbf{e}_1,\dots,\mathbf{e}_m,\mathbf{f}_1\} is an orthogonal set of nonzero vectors. Thus, if m=n-1, the set \{\mathbf{e}_1,\dots,\mathbf{e}_m,\mathbf{f}_1\} is an orthogonal basis of \mathbf{V}. If m <n-1, then there is a vector \mathbf{v}_2 \in \mathbf{V} which is not in the span of \{\mathbf{e}_1,\dots,\mathbf{e}_m,\mathbf{f}_1\}. We set

\mathbf{f}_2 = \mathbf{v}_2-\langle \mathbf{f_1},\mathbf{v}_2\rangle\mathbf{f}_1 - \sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{v}_2\rangle \mathbf{e}_i

to obtain a nonzero vector \mathbf{f}_2 orthogonal to all vectors in the set \{\mathbf{e}_1,\dots,\mathbf{e}_m,\mathbf{f}_1\}. In particular, \mathbf{f}_2 \in \mathbf{W}^\perp. If m=n-2, then \{\mathbf{e}_1,\dots,\mathbf{e}_m,\mathbf{f}_1,\,mathbf{f}_2\} is an orthogonal basis of \mathbf{V}. If m<n-2, we repeat the same process. After n-m iterations of this process, we have generated an orthogonal basis

\{\mathbf{e}_1,\dots,\mathbf{e}_m,\mathbf{f}_1,\dots,\mathbf{f}_{n-m}\}

such that \{\mathbf{e}_1,\dots,\mathbf{e}_m\} is an orthonormal basis of \mathbf{W} and \{\mathbf{f}_1,\dots,\mathbf{f}_{n-m}\} is an orthogonal basis of \mathbf{W}^\perp, which can be normalized to get an orthonormal basis of \mathbf{W}^\perp.

We now come orthogonal projections in general. Let \mathbf{W} be a subspace of \mathbf{V}, and let \mathbf{W}^\perp be its orthogonal complement. Invoking the above construction, let \{\mathbf{e}_1,\dots,\mathbf{e}_m,\mathbf{f}_1,\dots,\mathbf{f}_{n-m}\} be an orthonormal basis of \mathbf{V} such that \{\mathbf{e}_1,\dots,\mathbf{e}_m\} is an orthonormal basis of \mathbf{W} and \{\mathbf{f}_1,\dots,\mathbf{f}_{n-m}\} is an orthonormal basis of \mathbf{W}^\perp. The function P_\mathbf{W} \colon \mathbf{V} \to \mathbf{W} defined by

P_\mathbf{W}\mathbf{v} = \sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{v} \rangle \mathbf{e}_i

is called the orthogonal projector of \mathbf{V} on \mathbf{W}. For any vector \mathbf{v} \in \mathbf{V}, the vector P_\mathbf{W}\mathbf{v} \in \mathbf{W} is called the orthogonal projection of \mathbf{v} onto \mathbf{W}. Observe that \mathbf{P}_\mathbf{W}\mathbf{v}=\mathbf{0} if \mathbf{v} \in \mathbf{W}^\perp.

Proposition 1: The function P_\mathbf{W} is a linear transformation.

Proof: First, let us check that P_\mathbf{W} sends the zero vector of \mathbf{V} to the zero vector of \mathbf{W}. Note that, since \mathbf{W} is a subspace of \mathbf{V}, they have the same zero vector, we denote it simply \mathbf{0} instead of using two different symbols \mathbf{0}_\mathbf{V} and \mathbf{0}_\mathbf{W} for this same vector. We have

P_\mathbf{W} \mathbf{0} = \sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{0}\rangle \mathbf{e}_i= \sum_{i=1}^m 0 \mathbf{e}_i=\mathbf{0}.

Now we check that P_\mathbf{W} respects linear combinations. Let \mathbf{v}_1,\mathbf{v}_2 \in \mathbf{V} be two vectors, and let a_1,a_2 \in \mathbb{R} be two scalars. We then have

P_\mathbf{W}(a_1\mathbf{v}_1+a_2\mathbf{v}_2) \\ = \sum_{i=1}^m \langle \mathbf{e}_i,a_1\mathbf{v}_1+a_2\mathbf{v}_2 \rangle \mathbf{e}_i \\=a_1\sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{v}_1\rangle \mathbf{e}_i +  a_2\sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{v}_2\rangle \mathbf{e}_i \\ = a_1P_\mathbf{W}\mathbf{v}_1 + a_2P_\mathbf{W}\mathbf{v}_2.

— Q.E.D.

Proposition 2: The linear transformation P_\mathbf{W} satisfies P_\mathbf{W}P_\mathbf{W}=P_\mathbf{W}.

Proof: The claim is that P_\mathbf{W}(P_\mathbf{W}\mathbf{v})=P_\mathbf{W}\mathbf{v} for all \mathbf{v} \in \mathbf{V}. Let us check this. First, observe that for any vector \mathbf{e}_j in the orthogonal basis E of \mathbf{W}, we have

P_\mathbf{W}\mathbf{e}_j = \sum_{i=1}^n \langle \mathbf{e}_i,\mathbf{e}_j \rangle \mathbf{e}_i = \sum_{i=1}^n \delta_{ij} \mathbf{e}_i = \mathbf{e}_j.

Note also that since E is a basis of \mathbf{W}, the above calculation together with Proposition 1 tells us that P_\mathbf{W}\mathbf{w}=\mathbf{w} for all \mathbf{w} \in \mathbf{W}, which is to be expected: the projection of a vector \mathbf{w} already in \mathbf{W} onto \mathbf{W} should just be \mathbf{w}. Now to finish the proof, we apply this calculation:

P_\mathbf{W}(P_\mathbf{W}\mathbf{v}) = P_\mathbf{W}\sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{v}\rangle \mathbf{e}_i = \sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{v}\rangle P_\mathbf{W}\mathbf{e}_i = \sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{v}\rangle \mathbf{e}_i= \mathbf{v}.

— Q.E.D.

Proposition 3: The linear transformation P_\mathbf{W} has the property that \langle \mathbf{v}_1, P_\mathbf{W}\mathbf{v}_2 \rangle = \langle P_\mathbf{W}\mathbf{v}_1, \mathbf{v}_2 \rangle for any \mathbf{v}_1,\mathbf{v}_2 \in \mathbf{V}.

Proof: For any two vectors \mathbf{v}_1,\mathbf{v}_2 \in \mathbf{V}, we have

\langle \mathbf{v}_1,P_\mathbf{W}\mathbf{v}_2 \rangle \\ = \left\langle \sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{v}_1\rangle \mathbf{e}_i+\sum_{j=1}^{n-m} \langle \mathbf{f}_j,\mathbf{v}_1 \rangle \mathbf{f}_j,\sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{v}_2\rangle P_\mathbf{W}\mathbf{e}_i+\sum_{j=1}^{n-m} \langle \mathbf{f}_j,\mathbf{v}_2 \rangle P_\mathbf{W}\mathbf{f}_j\right\rangle \\ = \left\langle \sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{v}_1\rangle \mathbf{e}_i+\sum_{j=1}^{n-m} \langle \mathbf{f}_j,\mathbf{v}_1 \rangle \mathbf{f}_j,\sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{v}_2\rangle \mathbf{e}_i\right\rangle \\= \left\langle \sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{v}_1\rangle \mathbf{e}_i,\sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{v}_2\rangle \mathbf{e}_i\right\rangle \\ = \left\langle \sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{v}_1\rangle \mathbf{e}_i,\sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{v}_2\rangle \mathbf{e}_i + \sum_{j=1}^{n-m} \langle \mathbf{f}_j,\mathbf{v}_2\rangle \mathbf{f}_j\right\rangle \\ = \left\langle \sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{v}_1\rangle P_\mathbf{W}\mathbf{e}_i+\sum_{j=1}^{n-m} \langle \mathbf{f}_j,\mathbf{v}_1\rangle P_\mathbf{W}\mathbf{f}_j,\sum_{i=1}^m \langle \mathbf{e}_i,\mathbf{v}_2\rangle \mathbf{e}_i + \sum_{j=1}^{n-m} \langle \mathbf{f}_j,\mathbf{v}_2\rangle \mathbf{f}_j\right\rangle \\ = \langle P_\mathbf{W}\mathbf{v}_1,\mathbf{v}_2 \rangle.

— Q.E.D

Proposition 4: For any \mathbf{v} \in \mathbf{V} and \mathbf{w} \in \mathbf{W}, we have \langle \mathbf{v}-P_\mathbf{W}\mathbf{v},\mathbf{w} \rangle = 0.

Proof: Before reading the proof, draw yourself a diagram to make sure you can visualize what this proposition is saying. The proof itself follows easily from Proposition 3: we have

\langle \mathbf{v}-P_\mathbf{W}\mathbf{v},\mathbf{w} \rangle = \langle \mathbf{v}, \mathbf{w} \rangle - \langle P_\mathbf{W}\mathbf{v},\mathbf{w} \rangle = \langle \mathbf{v}, \mathbf{w} \rangle - \langle \mathbf{v},P_\mathbf{W}\mathbf{w} \rangle = \langle \mathbf{v}, \mathbf{w} \rangle - \langle \mathbf{v},\mathbf{w} \rangle =0.

— Q.E.D.

Proposition 5: For any \mathbf{v} \in \mathbf{V}, we have

\|\mathbf{v} - P_\mathbf{W}\mathbf{v}\| < \|\mathbf{v} - \mathbf{w}\| \quad \forall \mathbf{w} \in \mathbf{W}-\{P_\mathbf{W}\mathbf{v}\}.

Proof: Let us write

\|\mathbf{v}-\mathbf{w}\|^2 = \|\mathbf{v}-P_\mathbf{W}\mathbf{v} + P_\mathbf{W}\mathbf{v}-\mathbf{w}\|^2.

Now observe that the vector P_\mathbf{W}\mathbf{v} - \mathbf{w} lies in \mathbf{W}, since it is the difference of two vectors in this subspace. Consequently, \mathbf{v}-P_\mathbf{W}\mathbf{v} and P_\mathbf{W}\mathbf{v}-\mathbf{w} are orthogonal vectors, by Proposition 4. We may thus apply the Pythagorean theorem (Assignment 2) to obtain

\|\mathbf{v}-\mathbf{w}\|^2 = \|\mathbf{v}-P_\mathbf{W}\mathbf{v}\|^2 + \varepsilon,

where

\varepsilon = \|P_\mathbf{W}\mathbf{v}-\mathbf{w}\|^2 >0.

— Q.E.D.

Proposition 5 says that P_\mathbf{W}\mathbf{v} is the vector in \mathbf{W} which is closest to \mathbf{v}, which matches our geometric intuition concerning projections. Equivalently, we can say that $P_\mathbf{W}\mathbf{v}$ is the vector in \mathbf{W} which best approximates \mathbf{v}, and this perspective makes orthogonal projections very important in applications of linear algebra to statistics, data science, physics, engineering, and more. However, Proposition 5 also has purely mathematical importance. Namely, we have constructed the linear transformation P_\mathbf{W} using an arbitrarily chosen orthonormal basis E=\{\mathbf{e}_1,\dots,\mathbf{e}_n\} in \mathbf{W}. If we had used a different orthonormal basis F=\{\mathbf{f}_1,\dots,\mathbf{f}_n\}, the same formula gives us a possibly different linear transformation

Q_\mathbf{W} \colon \mathbf{V} \to \mathbf{W}

defined by

Q_\mathbf{W}\mathbf{v} = \sum_{I=1}^m \langle \mathbf{f}_i,\mathbf{v}\rangle \mathbf{f}_i.

Propositions 1-5 above all apply to Q_\mathbf{W} as well, and in fact this forces Q_\mathbf{W}=P_\mathbf{W}, so that it really is correct to speak of the orthogonal projection of \mathbf{V} onto \mathbf{W}. To see why these two transformations must be the same, let us suppose they are not. This means that there is a vector \mathbf{v} \in \mathbf{V} such that P_\mathbf{W}\mathbf{v} \neq Q_\mathbf{W}\mathbf{v}. Thus by Proposition 5 we have

\|\mathbf{v} - P_\mathbf{W}\mathbf{v}\| < \|\mathbf{v} - Q_\mathbf{W}\mathbf{v}\|,

while also by Proposition 5 we have

\|\mathbf{v} - Q\mathbf{W}\mathbf{v}\| < \|\mathbf{v} - P_\mathbf{W}\mathbf{v}\|,

a contradiction. So, in the construction of the transformation P_\mathbf{W}, it does not matter which orthonormal basis of \mathbf{W} we use.

Math 31AH: Lecture 5

In this lecture we continue the study of Euclidean spaces. Let \mathbf{V} be a vectors space, and let \langle \cdot,\cdot \rangle be a scalar product on \mathbf{V}, as defined in Lecture 4. The following definition generalizes the concept of perpendicularity to the setting of an arbitrary Euclidean space.

Definition 1: Vectors \mathbf{v},\mathbf{w} are said to be orthogonal if \langle \mathbf{v},\mathbf{w} \rangle =0. More generally, we say that S \subseteq \mathbf{V} is an orthogonal set if \langle \mathbf{v},\mathbf{w} \rangle =0 for all \mathbf{v},\mathbf{w} \in S.

Observe that the zero vector \mathbf{0} \in \mathbf{V} is orthogonal to every vector \mathbf{v} \in \mathbf{V}, by the third scalar product axiom. Let us check that orthogonality of nonzero abstract vectors does indeed generalize perpendicularity of geometric vectors.

Proposition 1: Two nonzero vectors \mathbf{v},\mathbf{w} are orthogonal if and only if the angle between them is \frac{\pi}{2}.

Proof: By definition, the angle between nonzero vectors \mathbf{v} and \mathbf{w} is the unique number \theta \in [0,2\pi) which solves the equation

\langle \mathbf{v},\mathbf{w} \rangle = \|\mathbf{v}\| \|\mathbf{w}\| \cos \theta.

If the angle between \mathbf{v} and \mathbf{w} is \frac{\pi}{2}, then

\langle \mathbf{v},\mathbf{w} \rangle = \|\mathbf{v}\| \|\mathbf{w}\| \cos \frac{\pi}{2} = \|\mathbf{v}\| \|\mathbf{w}\|0=0.

Conversely, if \langle \mathbf{v},\mathbf{w} \rangle =0, then

\|\mathbf{v}\| \|\mathbf{w}\| \cos \theta = 0.

Since \mathbf{v},\mathbf{w} are nonzero, we have \|\mathbf{v}\| > 0 and \|\mathbf{w}\| > 0, and we can divide through by \|\mathbf{v}\| \|\mathbf{w}\| > 0 to obtain

\cos \theta = 0.

The unique solution of this equation in the interval [0,2\pi) is \theta = \frac{\pi}{2}. — Q.E.D.

In Lecture 4, we proved that any two nonzero vectors \mathbf{v},\mathbf{w} separated by a nonzero angle are linearly independent. This is not true for three or more vectors: for example, if \mathbf{v}_1,\mathbf{v}_2,\mathbf{v}_3 \in \mathbb{R}^2 are the vectors (1,0),(1,1),(0,1), respectively, then

\theta(\mathbf{v}_1,\mathbf{v}_2) = \theta(\mathbf{v}_2,\mathbf{v}_3) = \frac{\pi}{4},\ \theta(\mathbf{v}_1,\mathbf{v}_3) = \frac{\pi}{2},

but \mathbf{v_2} = \mathbf{v}_1+\mathbf{v}_3. So, separation by a positive angle is generally not enough to guarantee the linear independence of a given set of vectors. However, orthogonality is.

Proposition 2: If S = \{\mathbf{v}_1,\dots,\mathbf{v}_n\} be an orthogonal set of nonzero vectors, then S is linearly independent.

Proof: Let x_1,\dots,x_n \in \mathbb{R} be scalars such that

x_1\mathbf{v}_1 + \dots + x_n \mathbf{v}_n = \mathbf{0}.

Let us take the scalar product with \mathbf{v}_1 on both sides of this equation, to get

\langle \mathbf{v}_1, x_1\mathbf{v}_1 + \dots + x_n \mathbf{v}_n \rangle = \langle \mathbf{v}_1,\mathbf{0} \rangle.

Using the scalar product axioms, we thus have

x_1 \langle \mathbf{v}_1,\mathbf{v}_1 \rangle + \dots + x_n \langle \mathbf{v}_1,\mathbf{v}_n \rangle = 0.

Now, since S \{\mathbf{v}_1,\dots,\mathbf{v}_n\} is an orthogonal set, all terms on the left hand side are zero except for the first term, which is x_1 \langle \mathbf{v}_1,\mathbf{v}_1 \rangle = x_1 \|\mathbf{v}_1\|^2. We thus have

x_1 \|\mathbf{v}_1\|^2 = 0.

Now, since \mathbf{v}_1 \neq \mathbf{0}, we have \|\mathbf{v}_1\| > 0, and thus we can divide through by \|\mathbf{v}_1\| in the above equation to get

x_1 = 0.

Repeating the above argument with \mathbf{v}_2 in place of \mathbf{v}_1 yields x_2=0. In general, using the same argument for each i=1,\dots,n, we get x_i=0 for all i=1,\dots,n. Thus S is a linearly independent set. — Q.E.D.

One consequence Proposition 1 is that, if \mathbf{V} is an n-dimensional vector space, and E=\{\mathbf{e}_1,\dots,\mathbf{e}_n\} is an orthogonal set of nonzero vectors in \mathbf{V}, then E is a basis of \mathbf{V}. In general, a basis of a vector space which is also an orthogonal set is called an orthogonal basis. In many ways, orthogonal bases are better than bases which are not orthogonal sets. One manifestation of this is the very useful fact that coordinates relative to an orthogonal basis are easily expressed as scalar products.

Proposition 2: Let E=\{\mathbf{e}_1,\dots,\mathbf{e}_n\} be an orthogonal basis in \mathbf{V}. For any \mathbf{v} \in \mathbf{V}, the unique representation of \mathbf{v} as a linear combination of vectors in E is

\mathbf{v} = \frac{\langle \mathbf{e}_1,\mathbf{v} \rangle}{\|\mathbf{e}_1\|^2}\mathbf{e}_1 + \frac{\langle \mathbf{e}_2,\mathbf{v} \rangle}{\|\mathbf{e}_2\|^2}\mathbf{e}_2 + \dots + \frac{\langle \mathbf{e}_n,\mathbf{v} \rangle}{\|\mathbf{e}_n\|^2}\mathbf{e}_n.

Equivalently, we have

\mathbf{v} = \frac{\|\mathbf{v}\| \cos \theta_1}{\|\mathbf{e}_1\|} \mathbf{e}_1 + \frac{\|\mathbf{v}\|\cos \theta_2}{\|\mathbf{e}_2\|} \mathbf{e}_2 + \dots + \frac{|\mathbf{v}\| \cos \theta_n}{\|\mathbf{e}_n\|} \mathbf{e}_n,

where, for each 1 \leq j \leq n, \theta_j is the angle between \mathbf{v} and \mathbf{e}_j.

Proof: Let \mathbf{v} \in \mathbf{V} be any vector, and let

\mathbf{v} = x_1\mathbf{e}_1 + \dots + x_n\mathbf{e}_n

be its unique representation as a linear combination of vectors from E. Taking the inner product with the basis vector \mathbf{e}_j on both sides of this decomposition, we get

\langle \mathbf{e}_j,\mathbf{v} \rangle = \left\langle \mathbf{e}_j, \sum_{i=1}^n x_i \mathbf{e}_i \right\rangle.

Using the scalar product axioms, we can expand the right hand side as

\left\langle \mathbf{e}_j, \sum_{i=1}^n x_i \mathbf{e}_i \right\rangle = \sum_{i=1}^n x_i \langle \mathbf{e}_j,\mathbf{e}_i \rangle =\sum_{i=1}^n x_i \delta_{ij},

where \delta_{ij} is the Kronecker delta, which equals 1 if I=j and equals 0 if i \neq j. We thus have

\langle \mathbf{e}_j,\mathbf{v} \rangle = x_j \langle \mathbf{e}_j,\mathbf{e}_j \rangle = x_j \|\mathbf{e}_j\|^2.

Now, since \{\mathbf{e}_1,\dots,\mathbf{e}_n\} is a linearly independent set, \mathbf{e}_j \neq \mathbf{0} and hence \|\mathbf{e}_j \| >0. Solving for the coordinate x_j, we thus have

x_j = \frac{\langle \mathbf{e}_j,\mathbf{v} \rangle}{\|\mathbf{e}_j\|^2}.

Since \langle \mathbf{e}_j,\mathbf{v}\rangle = \|\mathbf{e}_j\| \|\mathbf{v}\| \cos \theta_j, where \theta_j is the angle between \mathbf{v} and the basis vector \mathbf{e}_j, this may equivalently be written

x_j = \frac{\|\mathbf{e}_j\| \|\mathbf{v}\| \cos \theta_j}{\|\mathbf{e}_j\|^2} = \frac{\|\mathbf{v}\| \cos \theta_j}{\|\mathbf{e}_j\|},

which completes the proof. — Q.E.D.

The formulas in Proposition 2 become even simpler if E = \{\mathbf{e}_1,\dots,\mathbf{e}_n\} is an orthogonal basis in which every vector has length 1, i.e.

\|\mathbf{e}_j\| = 1, \quad j=1,\dots,n.

Such a basis is called an orthonormal basis. According to Proposition 2, if E is an orthonormal basis in \mathbf{V}, then for any v \in \mathbf{V} we have

\mathbf{v} = \langle \mathbf{e}_1,\mathbf{v}\rangle\mathbf{e}_1 + \langle \mathbf{e}_2,\mathbf{v} \rangle\mathbf{e}_2 + \dots + \langle \mathbf{e}_n,\mathbf{v} \rangle\mathbf{e}_n,

or equivalently

\mathbf{v} = \|\mathbf{v}\| \cos \theta_1\mathbf{e}_1 + \|\mathbf{v}\|\cos \theta_2\mathbf{e}_2 + \dots + \|\mathbf{v}\| \cos \theta_n\mathbf{e}_n.

The first of these formulas is important in that it gives an algebraically efficient way to calculate coordinates relative to an orthonormal basis: to calculate the coordinates of a vector \mathbf{v}, just compute its scalar product with each of the basis vectors. The second formula is important because it provides geometric intuition: it says that the coordinates of \mathbf{v} relative to an orthonormal basis are the lengths of the orthogonal projections of \mathbf{v} onto the lines (i.e one-dimensional subspaces) spanned by each of the basis vectors. Indeed, thinking of the case where \mathbf{v} and \mathbf{e}_j are geometric vectors, the quantity \|\mathbf{v}\| \cos \theta_j is the length of the orthogonal projection P_{\mathbf{e}_j}\mathbf{v} of the vector \mathbf{v} onto the line spanned by \mathbf{e}_j, as in the figure below.

Orthogonal projection of a vector onto a line.

An added benefit of orthonormal bases is that they reduce abstract scalar products to the familiar dot product of geometric vectors. More precisely, suppose that E=\{\mathbf{e}_1,\dots,\mathbf{e}_n\} is an orthonormal basis of \mathbf{V}. Let \mathbf{v},\mathbf{w} be vectors in \mathbf{V}, and let

\mathbf{v} = x_1\mathbf{e}_1 + \dots + x_n\mathbf{e}_n \\ \mathbf{w} = y_1\mathbf{e}_1 + \dots + y_n\mathbf{e}_n

be their representations relative to E. Then, we may evaluate the scalar product of \mathbf{v} and \mathbf{w} as

\langle \mathbf{v},\mathbf{w} \rangle = \left\langle \sum_{i=1}^n x_i\mathbf{e}_i,\sum_{j=1}^n y_j\mathbf{e}_j \right\rangle\\ = \sum_{i,j=1}^n x_iy_j \langle \mathbf{e}_i,\mathbf{e}_j \rangle\\ = \sum_{i,j=1}^n x_iy_j \delta_{ij} \\= \sum_{i=1}^n x_iy_i\\ = (x_1,\dots,x_n) \cdot (y_1,\dots,y_n).

In words, the scalar product \langle \mathbf{v},\mathbf{w} \rangle equals the dot product of the coordinate vectors of \mathbf{v} and \mathbf{w} relative to an orthonormal basis of \mathbf{V}.

This suggests the following definition.

Definition 2: Euclidean spaces (\mathbf{V}_1,\langle \cdot,\cdot \rangle_1) and (\mathbf{V}_2,\langle \cdot,\cdot \rangle_2) are said to be isomorphic if there exists an isomorphism T \colon \mathbf{V}_1 \to \mathbf{V}_2 which has the additional feature that

\langle T\mathbf{v},T\mathbf{w} \rangle_2 = \langle \mathbf{v},\mathbf{w} \rangle_1.

Our calculation above makes it seem likely that any two n-dimensional Euclidean spaces (\mathbf{V}_1,\langle \cdot,\cdot \rangle_1) and (\mathbf{V}_2,\langle \cdot,\cdot \rangle_2)are isomorphic, just as any two n-dimensional vector spaces \mathbf{V} and \mathbf{W} are. Indeed, we can prove this immediately if we can claim that both \mathbf{V}_1 and \mathbf{V}_2 contain orthonormal bases. In this case, let E_1 = \{\mathbf{e}_{11},\dots,\mathbf{e}_{1n}\} be an orthonormal basis in \mathbf{V}_1, let E_2 = \{\mathbf{e}_{21},\dots,\mathbf{e}_{2n}\} be an orthonormal basis in \mathbf{V}_2, and define T \colon \mathbf{V}_1 \to \mathbf{V}_2 to be the unique linear transformation that transforms \mathbf{e}_{1j} into \mathbf{e}_{2j} for each 1 \leq j \leq n. Then T is an isomorphism of vector spaces by the same argument as in Lecture 2, and it also satisfies \langle T\mathbf{v},T\mathbf{w}\rangle_2 = \langle \mathbf{v},\mathbf{w}\rangle_1 (make sure you understand why).

But, how can we be sure that every n-dimensional Euclidean space (\mathbf{V},\langle \cdot,\cdot \rangle) actually does contain an orthonormal basis? Certainly, we know that \mathbf{V} contains a basis B=\{\mathbf{b}_1,\dots,\mathbf{b}_n\}, but this basis might not be orthonormal. Luckily, there is a fairly simple algorithm which takes as input a finite linearly independent set of vectors, and outputs a linearly independent orthogonal set of the same size, which we can then “normalize” by dividing each vector in the output set by its norm. This algorithm is called the Gram-Schmidt algorithm, and you are encouraged to familiarize yourself with it — it’s not too complicated, and is based entirely on material covered in this lecture. In this course, we only need to know that the Gram-Schmidt algorithm exists, so that we can claim any finite-dimensional Euclidean space has an orthonormal basis. We won’t bother analyzing the internal workings of the Gram-Schmidt algorithm, and will treat it as a black box to facilitate geometric thinking in abstract Euclidean spaces. More on this in Lecture 6.

Lecture 5 video

Math 31AH: Lecture 4

Let \mathbf{V} be a vector space. In Lecture 2, we proved that \mathbf{V} is n-dimensional if and only if every basis in \mathbf{V} consists of n vectors. Suppose that B=\{\mathbf{b}_1,\dots,\mathbf{b}_n\} is a basis of \mathbf{V}. Then, every vector \mathbf{v} \in \mathbf{V} can be represented as a linear combination of vectors in B,

\mathbf{v}=x_1\mathbf{b}_1+\dots+x_n\mathbf{b}_n,

and this representation is unique. A natural question is then the following: if C=\{\mathbf{c}_1,\dots,\mathbf{c}_n\} is a second basis of \mathbf{V}, and

\mathbf{v}=y_1\mathbf{c}_1+\dots+y_n\mathbf{c}_n

is the representation of \mathbf{v} as a linear combination of vectors in C, what is the relationship between the numbers x_1,\dots,x_n and the numbers y_1,\dots,y_n? Since these two lists of numbers are the coordinates of the same vector \mathbf{v}, but with respect to (possibly) different bases, it is reasonable to expect that they should be related to one another in a structured way. We begin this lecture by working out this relationship precisely.

We follow a strategy which would be acceptable to Marie Kondo: out with the old, in with the new. Let us call B the “old” basis, and C the “new” basis. Let us do away with the old basis vectors by expressing them in terms of the new basis, writing

\mathbf{b}_1 = a_{11}\mathbf{c}_1 + a_{21}\mathbf{c}_2 \dots + a_{n1}\mathbf{c}_n \\ \mathbf{b}_2 = a_{12}\mathbf{c}_1 + a_{22}\mathbf{c}_2\dots + a_{n2}\mathbf{c}_n \\ \vdots \\ \mathbf{b}_n = a_{1n}\mathbf{c}_1 + a_{2n}\mathbf{c}_2\dots + a_{nn}\mathbf{c}_n,

where, for each 1 \leq j \leq n,

(a_{1j},a_{2j},\dots,a_{nj})

is the coordinate vector of the old basis vector \mathbf{b}_j relative to the new basis C.

We now return to the first equation above, which expresses our chosen vector \mathbf{v} in terms of the old basis. Replacing the vectors of the old basis with their representations relative to the new basis, we have

\mathbf{v} = x_1\mathbf{b}_1+\dots+x_n\mathbf{b}_n = x_1 \sum_{i=1}^n a_{i1} \mathbf{c}_i + \dots + x_n\sum_{i=1}^n a_{in} \mathbf{c}_i,

which we can compress even more if we use Sigma notation twice:

\mathbf{v} = \sum_{j=1}^n x_j \sum_{i=1}^n a_{ij} \mathbf{c}_i = \sum_{i=1}^n \left( \sum_{j=1}^n a_{ij}x_j \right)\mathbf{c}_i.

Now, since the representation

\mathbf{v}=y_1\mathbf{c}_1+\dots+y_n\mathbf{c}_n

of \mathbf{v} relative to C is unique, we find that

y_1 = \sum_{j=1}^n a_{1j}x_j \\ y_2 = \sum_{j=1}^n a_{2j}x_j \\ \vdots \\ y_n = \sum_{j=1}^n a_{nj}x_j.

This list of n formulas answers our original question: it expresses the “new” coordinates y_1,\dots,y_n in terms of the “old” coordinates x_1,\dots,x_n. A good way to remember these formulas is to rewrite them using the familiar dot product of geometric vectors in \mathbb{R}^n. In terms of the dot product, the above formulas become

y_1 = (a_{11},\dots,a_{1n}) \cdot (x_1,\dots,x_n) \\ y_2 = (a_{21},\dots,a_{2n}) \cdot (x_1,\dots,x_n) \\ \vdots \\ y_n= (a_{n1},\dots,a_{nn}) \cdot (x_1,\dots,x_n).

Usually, this collection of n formulas is packaged as a single matrix equation:

\begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix} = \begin{bmatrix} a_{11} & \dots & a_{1n} \\ a_{21} & \dots & a_{2n} \\ \vdots & {} & \vdots \\ a_{n1} & \dots & a_{nn}  \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}.

In fact, this process of changing from the old coordinates \mathbf{x}=(x_1,\dots,x_n) of a vector \mathbf{v} relative to the old basis B to the new coordinates \mathbf{y}= (y_1,\dots,y_n) of this same vector relative to the new basis C explains why the product of an n \times n matrix and an n \times 1 matrix is defined in the way that it is: the definition is made so that we can write

\mathbf{y}=A\mathbf{x},

with A the matrix whose (i,j)-entry is a_{ij}.

Let us summarize the result of the above calculation. We have a vector \mathbf{v} belonging to a finite-dimensional vector space \mathbf{V}, and we have two bases B and C of \mathbf{v}. Let [\mathbf{v}]_B be the n \times 1 matrix whose entries are the coordinates of \mathbf{v} relative to the old basis B, and let [\mathbf{v}]_C denote the n \times 1 matrix whose entries are the coordinates of this same vector \mathbf{v} relative to the new basis C. We want to write down an equation which relates the matrices [\mathbf{v}]_B and [\mathbf{v}]_C. The equation is

[\mathbf{v}]_C = A_{B \to C} [\mathbf{v}]_B,

where

A_{B \to C} = \begin{bmatrix} [\mathbf{b}_1]_C & [\mathbf{b}_2]_C & \dots & [\mathbf{b}_n]_C \end{bmatrix}

is the n \times n “transition matrix” whose jth column is the n \times 1 matrix [\mathbf{b}_j]_C consisting of the coordinates of the old basis vector \mathbf{b}_j relative to the new basis C.

Let’s look at a two-dimensional example. In \mathbb{R}^2, the standard basis is E=\{\mathbf{e}_1,\mathbf{e}_2\}, where \mathbf{e}_1=(1,0) and \mathbf{e}_2=(0,1). Suppose now that we wish to get creative and write the vectors of \mathbb{R}^2 in terms of the alternative basis F=\{\mathbf{f}_1,\mathbf{f}_2\}, where \mathbf{f}_1=\mathbf{e}_1=(1,0) but \mathbf{f}_2 = (1,1). This corresponds to using coordinate axes which, instead of being a pair of perpendicular lines, are a pair of lines at a 45^\circ angle to one another — pretty wild. What are the coordinates of a given vector \mathbf{v}=(x_1,x_2) in \mathbb{R}^2 when we use these tilted axes? Let us answer this question using the above recipe. We need to express the vectors \mathbf{e}_1,\mathbf{e}_2 of the old basis E in terms of the vectors \mathbf{f}_1,\mathbf{f}_2 of the new basis C. This is easy: by inspection, we have

\mathbf{e}_1 = \mathbf{f}_1 \\ \mathbf{e}_2= -\mathbf{f}_1+\mathbf{f}_2.

This means that our transition matrix is the 2 \times 2 matrix

A_{E \to F} = \begin{bmatrix} [\mathbf{e}_1]_F & [\mathbf{e}_2]_F \end{bmatrix} = \begin{bmatrix} 1 & -1 \\ 0 & 1 \end{bmatrix}.

We conclude that the coordinates of \mathbf{v}=(x_1,x_2) in the new basis F are given by

[\mathbf{v}]_F = A_{E \to F} [\mathbf{v}]_E = \begin{bmatrix} 1 & -1 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} =\begin{bmatrix} x_1-x_2 \\ x_2 \end{bmatrix}.

In the course of the above discussion, we have seen that the familiar dot product of geometric vectors is useful in the context of general vector spaces. This raises the question of whether the dot product itself can be generalized. The answer is yes, and the concept which generalizes the dot product by capturing its basic features is the following.

Definition 1: Let \mathbf{V} be a vector space. A scalar product on \mathbf{V} is a function

\langle \cdot, \cdot \rangle \colon \mathbf{V} \times \mathbf{V} \to \mathbb{R}

which satisfies:

  1. For any \mathbf{v}_1,\mathbf{v}_2,\mathbf{w}_1,\mathbf{w}_2 \in \mathbf{V} and x_1,x_2,y_1,y_2 \in \mathbb{R}, we have \langle x_1\mathbf{v}_1+x_2\mathbf{v}_2,y_1\mathbf{w}_1+y_2\mathbf{w}_2 \rangle = x_1y_1 \langle \mathbf{v}_1,\mathbf{w}_1 \rangle + x_1y_2\langle \mathbf{v}_1,\mathbf{w}_2 \rangle + x_2y_1\langle \mathbf{v}_2,\mathbf{w}_1 \rangle + x_2y_2 \langle \mathbf{v}_2,\mathbf{w}_2 \rangle.
  2. For any \mathbf{v},\mathbf{w} \in \mathbf{V}, we have \langle \mathbf{v},\mathbf{w} \rangle = \langle \mathbf{w},\mathbf{v} \rangle.
  3. For any \mathbf{v} \in \mathbf{V}, we have \langle \mathbf{v},\mathbf{v} \rangle \geq 0, with equality if and only if \mathbf{v}=\mathbf{0}.

Let us consider why the operation introduced in Defintion 1 is called a “scalar product.” First, it’s called a “product” because it takes two vectors \mathbf{v},\mathbf{w} and produces from them the new entity \langle \mathbf{v},\mathbf{w} \rangle. Second, this new entity is not a vector, but a scalar — hence, \langle \mathbf{v},\mathbf{w} \rangle is the “scalar product” of \mathbf{v} and \mathbf{w}. What about the axioms? These are obtained by extracting the basic features of the dot product of geometric vectors: it is “bilinear,” which means that one has the usual FOIL identity

(x_1\mathbf{v}_1+x_2\mathbf{v}_2) \cdot (y_1\mathbf{w}_1+y_2\mathbf{w}_2)= x_1y_1  \mathbf{v}_1 \cdot \mathbf{w}_1 + x_1y_2\mathbf{v}_1, \cdot \mathbf{w}_2+ x_2y_1\mathbf{v}_2 \cdot \mathbf{w}_1 + x_2y_2 \mathbf{v}_2 \cdot \mathbf{w}_2

for expanding brackets; it is “symmetric,” in the sense that

\mathbf{v} \cdot \mathbf{w} = \mathbf{w} \cdot \mathbf{v},

and it is “positive definite,” meaning that

\mathbf{v} \cdot \mathbf{v} \geq 0,

with equality if and only if \mathbf{v} is the zero vector. Definition 1 takes these properties and lifts them to the setting of an abstract vector space to form the scalar product concept, of which the dot product becomes a special case.

Definition 2: A pair (\mathbf{V},\langle \cdot, \cdot \rangle) consisting of a vector space together with a scalar product is called a Euclidean space.

Why is a vector space equipped with a scalar product called a Euclidean space? In the familiar vector space \mathbb{R}^n, the basic notions of Euclidean geometry — length and angle — can be expressed algebraically, in terms of the dot product. More precisely, the length of a vector \mathbf{v} = (x_1,\dots,x_n) \in \mathbb{R}^n is given by

|\mathbf{v}| = \sqrt{x_1^2 + \dots + x_n^2} = \sqrt{\mathbf{v} \cdot \mathbf{v}},

where \sqrt{x} denotes the nonnegative square root of a nonnegative real number, and the angle \theta between two vectors \mathbf{v}=(x_1,\dots,x_n) and \mathbf{w}=(y_1,\dots,y_n) is related to the dot product via

\mathbf{v} \cdot \mathbf{w} = |\mathbf{v}| |\mathbf{w}| \cos \theta.

We can mimic these algebraic formulas to define the concepts of length and angle in an abstract Euclidean space (\mathbf{V},\langle \cdot, \cdot \rangle) — we define the length \|\mathbf{v}\| of a vector \mathbf{v} \in \mathbf{V} by the formula

\|\mathbf{v}\| = \sqrt{\langle \mathbf{v},\mathbf{v}},

and we define the angle between two vectors \mathbf{v},\mathbf{w} \in \mathbf{V} to be the number \theta \in [0,2\pi) determined by the formula

\langle \mathbf{v},\mathbf{w} \rangle = \|\mathbf{v}\| \|\mathbf{w}\| \cos \theta.

Let us examine these definitions more carefully. First, the quantity \|\mathbf{v}\| which generalizes the length of a geometric vector is usually called the “norm” of \mathbf{v} in order to distinguish it from the original notion of length, which it generalizes. If the vector norm is a good generalization of geometric length, then it should have some of the main properties of the original concept; in particular, it should be nonnegative, and the only vector of length zero should be the zero vector. In order for these properties to hold in every possible Euclidean space, we must be able to deduce them solely from the axioms defining the scalar product.

Proposition 1: Let (\mathbf{V},\langle \cdot, \cdot \rangle) be a Euclidean space. For any vector \mathbf{v} \in \mathbf{V}, we have \|\mathbf{v}\| \geq 0, and equality holds if and only if \mathbf{v}=\mathbf{0}.

Proof: From the definition of vector norm and the first scalar product axiom, we have that

\|\mathbf{v}\| = \sqrt{\langle \mathbf{v},\mathbf{v} \rangle}

is the square root of a nonnegative number, and hence is itself nonnegative. Moreover, in order for \sqrt{x}=0 to hold for a nonnegative real number x, it must be the case that x=0, and from the second scalar product axiom we have \langle \mathbf{v},\mathbf{v} \rangle=0 if and only if \mathbf{v}=\mathbf{0}. — Q.E.D.

Now we consider the algebraic definition of the angle \theta between two vectors \mathbf{v},\mathbf{w} in a Euclidean space (\mathbf{V},\langle \cdot,\cdot \rangle). As you are aware, for any number \theta \in \mathbb{R} we have -1 \leq \cos \theta \leq 1. Thus, for our definition of angle to be valid, we need the following proposition — which is known as the Cauchy-Schwarz inequality — to follow from the scalar product axioms.

Proposition 2: Let (\mathbf{V},\langle \cdot, \cdot \rangle) be a Euclidean space. For any \mathbf{v},\mathbf{w} \in \mathbf{V}, we have

-1 \leq \frac{\langle \mathbf{v},\mathbf{w} \rangle}{\|\mathbf{v}\| \|\mathbf{w}\|} \leq 1.

Proof: We begin by noting that the claimed double inequality is equivalent to the single inequality

\frac{\langle \mathbf{v},\mathbf{w} \rangle^2}{\|\mathbf{v}\|^2 \|\mathbf{w}\|^2} \leq 1,

which is in turn equivalent to

\langle \mathbf{v},\mathbf{w} \rangle^2 \leq \|\mathbf{v}\|^2 \|\mathbf{w}\|^2.

We will prove that this third form of the claimed inequality is true.

Let \mathbf{v},\mathbf{w} be any two vectors in \mathbf{V}. If either of \mathbf{v} or \mathbf{w} is the zero vector, then by the third scalar product axiom (positive definiteness) both sides of the above inequality are zero, and we get the true expression 0 \leq 0. It remains to prove the inequality in the case that neither \mathbf{v} nor \mathbf{w} is the zero vector.

Consider the function f(x) of a variable x defined by

f(x) = \langle \mathbf{v}-x\mathbf{w},\mathbf{v}-x\mathbf{w} \rangle.

We can expand this using the first scalar product axiom (bilinearity), and we get

f(x) = \langle \mathbf{v},\mathbf{v} \rangle -x\langle \mathbf{v},\mathbf{w} \rangle - x \langle \mathbf{w},\mathbf{v} \rangle + x^2\langle \mathbf{w},\mathbf{w} \rangle.

Using the second scalar product axiom (symmetry), this simplifies to

f(x) = \langle \mathbf{v},\mathbf{v} \rangle -2x\langle \mathbf{v},\mathbf{w} \rangle + x^2\langle \mathbf{w},\mathbf{w} \rangle.

We see that the function f(x) is a polynomial of degree two, i.e. it has the form

f(x) = ax^2 + bx + c,

with

a=\langle \mathbf{w},\mathbf{w} \rangle,\ b = -2\langle \mathbf{v},\mathbf{w}\rangle,\ c = \langle \mathbf{v},\mathbf{v} \rangle.

Note that we can be sure a > 0, because \mathbf{w} \neq 0. Thus the graph of the function f(x) is an upward-opening parabola. Moreover, since

f(x) = \langle \mathbf{v}-x\mathbf{w},\mathbf{v}-x\mathbf{w} \rangle \geq 0,

this parabola either lies strictly above the horizontal axis, or is tangent to it. Equivalently, the quadratic equation

ax^2 + bx+ c=0

has either no real roots (parabola strictly above the horizontal axis), or two identical real roots (parabola tangent to the horizontal axis). We can differentiate between the two cases using the discriminant of this quadratic equation, i.e. the number

b^2-4ac

which is the square root part of the familiar quadratic formula

x= \frac{-b \pm \sqrt{b^2-4ac}}{2a}.

More precisely, if the discriminant is negative the corresponding quadratic equation has no real solutions, and if it is zero then the equation has a unique solutions. In the case b^2-4ac < 0, we get

4\langle \mathbf{v},\mathbf{w} \rangle^2 < 4\langle \mathbf{w},\mathbf{w} \rangle \langle \mathbf{v},\mathbf{v} \rangle,

which gives us the inequality

\langle \mathbf{v},\mathbf{w} \rangle^2 < \|\mathbf{v}\|^2 \|\mathbf{w}\|^2,

which verifies the inequality we’re trying to prove in this case. In the case, b^2-4ac, we get instead

\langle \mathbf{v},\mathbf{w} \rangle^2 = \|\mathbf{v}\|^2 \|\mathbf{w}\|^2.

So, in all cases the claimed inequality

\langle \mathbf{v},\mathbf{w} \rangle^2 \leq \|\mathbf{v}\|^2 \|\mathbf{w}\|^2

holds true. — Q.E.D.

The upshot of the above discussion is that the concepts of length and angle are now well-defined in the setting of a general Euclidean space (\mathbf{V},\langle \cdot,\cdot \rangle). So, even though the vectors in such a space need not be geometric vectors, we can use geometric intuition and analogies when thinking about them. A simple example is the following natural proposition, which generalizes the fact that a pair of nonzero geometric vectors are linearly dependent if and only if they point in the same direction or opposite directions.

Proposition 3: Let (\mathbf{V},\langle \cdot,\cdot \rangle) be a Euclidean space, and let \mathbf{v},\mathbf{w} be nonzero vectors in \mathbf{V}. The set \{\mathbf{v},\mathbf{w}\} is linearly dependent if and only if the angle between \mathbf{v} and \mathbf{w} is 0 or \pi.

Proof: You will prove on Assignment 2 that equality holds in the Cauchy-Schwarz inequality if and only if the vectors involved are linearly dependent. Thus, Proposition 3 is equivalent to the statement that any two nonzero vectors \mathbf{v},\mathbf{w} \in \mathbf{V} satisfy the equation

\langle \mathbf{v},\mathbf{w} \rangle^2 = \|\mathbf{v}\|^2 \|\mathbf{w}\|^2

if and only if the angle between them is 0 or \pi. Let us prove this statement.

By definition of the angle between two vectors in a Euclidean space, the above equation is equivalent to

\|\mathbf{v}\|^2\|\mathbf{w}\|^2 \cos^2 \theta = \|\mathbf{v}\|^2\|\mathbf{w}\|^2,

and dividing both sides by the nonzero number \|\mathbf{v}\|^2\|\mathbf{w}\|^2 this becomes

\cos^2 \theta = 1,

which holds for \theta \in [0,2\pi) if and only if \theta is 0 or \pi. — Q.E.D.

In Lecture 5, we will consider further ramifications of geometrical thinking in vector spaces.

Math 31AH: Lecture 2

Let \mathbf{V} be a vector space. Recall from Lecture 1 (together with Assignment 1, Problem 2) that to say \mathbf{V} is n-dimensional means that \mathbf{V} contains a linearly independent set of size n, but does not contain a linearly independent set of size n+1. Also recall from Lecture 1 that a basis of \mathbf{V} is a subset B of \mathbf{V} which is linearly independent and spans \mathbf{V}. The main goal of this lecture is to prove the following theorem.

Theorem 1: A vector space \mathbf{V} is n-dimensional if and only if it contains a basis B of size n.

Before going on to the proof of this theorem, let us pause to consider its ramifications. Importantly, Theorem 1 gives a method to calculate the dimension of a given vector space \mathbf{V}: all we have to do is find a basis B of \mathbf{V}, and then count the number of elements in that basis. As an example, let us compute the dimension of \mathbf{V}=\mathbb{R}^4. This vector space is typically used to model the world in which we live, which consists of four physical dimensions: three for space and one for time. Let us verify that our mathematical definition of vector space dimension matches our physical understanding of dimension.

Let \mathbf{x}=(x_1,x_2,x_3,x_4) be a vector in \mathbb{R}^4. Then, we have

\mathbf{x} = x_1\mathbf{e}_1+x_2\mathbf{e}_2+x_3\mathbf{e}_3+x_4\mathbf{e}_4,

where

\mathbf{e}_1 = (1,0,0,0) \\ \mathbf{e}_2=(0,1,0,0) \\ \mathbf{e}_3=(0,0,1,0) \\ \mathbf{e}_4=(0,0,0,1).

This shows that the set B=\{\mathbf{e}_1,\mathbf{e}_2,\mathbf{e}_3,\mathbf{e}_4\} spans \mathbb{R}^4. Let us check that B is linearly independent. This amounts to performing the above manipulation in reverse. If x_1,x_2,x_3,x_4 \in \mathbb{R} are numbers such that

x_1\mathbf{e}_1+x_2\mathbf{e}_2+x_3\mathbf{e}_3+x_4\mathbf{e}_4=\mathbf{0},

then we have

(x_1,x_2,x_3,x_4)=(0,0,0,0),

which means that x_1=x_2=x_3=x_4=0. Thus B is a linearly independent set which spans \mathbf{R}^4, i.e. it is a basis of \mathbb{R}^4. Since B has size 4, we conclude from Theorem 1 that \mathrm{dim} \mathbf{R}^4=4.

Now let us prove Theorem 1. First, observe that we have already proved that if \mathbf{V} is n-dimensional then it contains a basis B of size n — this is Theorem 1 from Lecture 1. It remains to prove the converse, which we now state as a standalone result for emphasis and ease of reference.

Theorem 2: If \mathbf{V} contains a basis B of size n, then \mathbf{V} is n-dimensional.

The proof of Theorem 2 is quite subtle. In order to make it easier to understand, it is helpful to first prove the following lemma, in which the main difficulty is concentrated.

Lemma 1: If A= \{\mathbf{a}_1,\dots\mathbf{a}_m\} is a linearly independent set in a vector space \mathbf{V} and B = \{\mathbf{b}_1,\dots,\mathbf{b}_n\} is a basis of \mathbf{V}, then m \leq n.

Proof: Suppose this were false, i.e. that there exists in the vector space \mathbf{V} a linearly independent set A= \{\mathbf{a}_1,\dots\mathbf{a}_m\} and a basis B = \{\mathbf{b}_1,\dots,\mathbf{b}_n\} such that m>n. We will see that this leads to a contradiction. The strategy is the following: we wish to demonstrate that the assumption m>n implies that we can replace m-n of the vectors in A with the vectors in B in such a way that the resulting set

S=\{\mathbf{a}_{i_1},\dots,\mathbf{a}_{i_{m-n}},\mathbf{b}_1,\dots,\mathbf{b}_n\}

is linearly independent. If we can show this, we will have obtained the desired contradiction: S cannot possibly be independent, because B=\{\mathbf{b}_1,\dots,\mathbf{b}_n\} is a basis. In particular, each of the remaining vectors \mathbf{a}_{i_1},\dots,\mathbf{a}_{i_{m-n}} is a linear combination of the vectors in B.

We will pursue the following strategy to show that the existence of the set S as above follows from the hypothesis m>n. For each k \in \in \{0,1,2,\dots,n\}, consider the following proposition: there exists a linearly independent set S_k in \mathbf{V} of the form

S_k = A_{m-k} \cup B_k,

where A_{m-k} is a subset of A of size m-k and B_k = \{\mathbf{b}_j \colon j \leq k\}. Let us call this proposition \mathcal{P}_k. Now, if we can prove that \mathcal{P}_0 is true, and that

\mathcal{P}_k \text{ true } \implies \mathcal{P}_{k+1} \text{ true } \quad \forall k \in \{0,1,\dots,n-1\},

then we can conclude that \mathcal{P}_k is true for all k \in \{0,1,\dots,k\}. Indeed, we would then have

\mathcal{P}_0 \text{ true } \implies \mathcal{P}_1 \text{ true } \implies \dots \implies \mathcal{P}_{n-1} \text{ true } \implies \mathcal{P}_n \text{ true}.

This is one version of a proof technique known as mathematical induction. But the statement \mathcal{P}_n true is exactly what we want, since it gives us a linearly independent set S_n consisting of some number of vectors from A together with all vectors in B, which results in the contradiction explained above.

Let us now implement the above strategy. The first step is to prove that \mathcal{P}_0 is true. To see that it is, consider the set S_k = A_m \cup B_0 where A_m = A and B_0=\{\}. This set is of the required form, and since A is linearly independent so is S_k.

It remains to prove that if k \in \{1,\dots,n-1\} and \mathcal{P}_k is true, then \mathcal{P}_{k+1} is true. Given that \mathcal{P}_k is true, there exists a linearly independent set S_k in \mathbf{V} of the form S_k = A_{m-k} \cup B_k with

A_{m-k} = \{\mathbf{a}_{i_1},\dots,\mathbf{a}_{i_{m-k}} \}

an (m-k)-element subset of A and

B_k = \{\mathbf{b}_1,\dots,\mathbf{b}_k\}.

Consider the set

S_k \cup \{\mathbf{b}_{k+1}\} = \{\mathbf{a}_{i_1},\dots,\mathbf{a}_{i_{m-k}},\mathbf{b}_1,\dots,\mathbf{b}_k,\mathbf{b}_{k+1}\}.

If S_k \cup \{\mathbf{b}_{k+1}\} is linearly independent, then so is any subset of it, so in particular the subset

S_{k+1} =\{\mathbf{a}_1,\dots,\mathbf{a}_{i_{m-k-1}},\mathbf{b}_1,\dots,\mathbf{b}_k,\mathbf{b}_{k+1}\}

is linearly independent and \mathcal{P}_{k+1} is true. Now suppose that S_k \cup \{\mathbf{b}_{k+1}\} is linearly dependent. Then, because S_k is linearly independent, \mathbf{b}_{k+1} must be a linear combination of the vectors in S_k, i.e

\mathbf{b}_{k+1} = \sum_{j=1}^{m-k} t_j \mathbf{a}_{i_j} + \sum_{j=1}^k s_j \mathbf{b}_j

for some a_{i_1},\dots,a_{i_{m-j}},t_1,\dots,t_k \in \mathbb{R}. Moreover, there exists a number c \in \{1,\dots,m-k\} such that a_{i_c} \neq 0, else \mathbf{b}_{k+1} would be a linear combination of \mathbf{b}_1,\dots,\mathbf{b}_k, which is impossible because B_{k+1} is a linearly independent set. We now claim that the set

S_{k+1} = A_{m-k}\backslash \{\mathbf{a}_{i_c}\} \cup B_{k+1}

is linearly independent. Indeed, if S_{k+1} were linearly dependent, then we would have

\sum_{j=1}^{c-1} p_j \mathbf{a}_{i_j} +\sum_{j=c+1}^{m-k} p_j \mathbf{a}_{i_j} + \sum_{j=1}^{k+1} q_j \mathbf{b}_j = \mathbf{0},

where not all the numbers p_1,\dots,p_{c-1},p_{c+1},\dots,p_{m-k},q_1,\dots,q_{k+1} are zero. This means that the number q_{k+1} \neq 0, since otherwise the above would say that a subset of S_k is linearly dependent, which is false because S_k is linearly independent. Now, if we substitute the representation of \mathbf{b}_{k+1} as a linear combination of elements S_k given above, this becomes a vanishing linear combination of the vectors in S_k in which the coefficient of \mathbf{a}_{i_c} is t_cq_{k+1} \neq 0, which contradicts the linear independence of S_k. So, S_{k+1} must be linearly independent. — Q.E.D.

Let us note the following corollary of Lemma 1.

Corollary 1: If A=\{\mathbf{a}_1,\dots,\mathbf{a}_m\} and B=\{\mathbf{b}_1,\dots,\mathbf{b}_n\} are two bases of a vector space \mathbf{V}, then m=n.

Proof: Since A is linearly independent and B is a basis, we have m \leq n by Lemma 1. On the other hand, since B is linearly independent and A is a basis, we also have n \leq m by Lemma 1. Thus m=n. — Q.E.D.

We now have everything we need to prove Theorem 2.

Proof of Theorem 2: Let \mathbf{V} be a finite-dimensional vector space which contains a basis B of size n. We will prove that \mathbf{V} is n-dimensional using the definition of vector space dimension (Definition 4 in Lecture 1) and Lemma 1. First, since B is a linearly independent set of size n, we can be sure that \mathbf{V} contains a linearly independent set of size n. It remains to show that \mathbf{V} does not contain a linearly independent set of size n+1. This follows from Lemma 1: since B is a basis of size n, every linearly independent set A in \mathbf{V} must have size less than or equal to n. — Q.E.D.

Another very important consequence of Theorem 2 is the following: it reveals that any two vector spaces of the same dimension can more or less be considered the same. Note that this is fundamentally new territory for us; so far, we have only considered one vector space at a time, but now we are going to compare two vector spaces.

To make the above precise, we need to consider functions between vector spaces. In fact, it is sufficient to limit ourselves to the consideration of functions which are compatible with the operations of vector addition and scalar multiplication. This leads to the definition of a linear transformation from a vector space \mathbf{V} to another vector space \mathbf{W}. If vector spaces are the foundation of linear algebra, then linear transformations are the structures we want to build on these foundations.

Defintion 1: A function T \colon \mathbf{V} \to \mathbf{W} is said to be a linear transformation if it has the following properties:

  1. T(\mathbf{0}_\mathbf{V}) =\mathbf{0}_\mathbf{W}, where \mathbf{0}_\mathbf{V} is the zero vector in \mathbf{V} and \mathbf{0}_\mathbf{W} is the zero vector in \mathbf{W}.
  2. For any vectors \mathbf{v}_1,\mathbf{v}_2 \in \mathbf{V} and any scalars t_1,t_2 \in \mathbb{R}, we have T(t_1\mathbf{v}_1 + t_2\mathbf{v}_2)=t_1T(\mathbf{v}_1)+t_2T(\mathbf{v}_2).

So, a linear transformation is a special kind of function from one vector space to another. The name “linear transformation” comes from the fact that these special functions are generalizations of lines in \mathbb{R}^2 which pass through the origin (0,0). More precisely, every such line is the graph of a linear transformation T \colon \mathbb{R}^2 \to \mathbb{R}^2. Indeed, a line through the origin in \mathbb{R}^2 of slope m is the graph of the linear transformation T_m \colon \mathbf{R} \to \mathbb{R} defined by

T_m(x) = mx.

The one exception to this is the vertical line in \mathbb{R}^2 passing through (0,0), which has slope m=\infty. This line is not the graph of any function from \mathbb{R} to \mathbb{R}, which should be clear if you remember the vertical line test.

To reiterate, a linear transformation from a vector space \mathbf{V} to \mathbf{W} is a special kind of function T \colon \mathbf{V} \to \mathbf{W}. For linear transformations, it is common to write T\mathbf{v} instead of T(\mathbf{v}), and this shorthand is often used to implicitly indicate that T is a linear transformation. In the second half of the course, we will discuss linear transformations in great detail. At present, however, we are only concerned with a special type of linear transformation called an “isomorphism.”

Defintion 2: A linear transformation T \colon \mathbf{V} \to \mathbf{W} is said to be an isomorphism if there exists a linear transformation U \colon \mathbf{W} \to \mathbf{V} such that

U(T\mathbf{v})= \mathbf{v} \quad \forall \mathbf{v} \in \mathbf{V}

and

T(U\mathbf{w})=\mathbf{w} \quad \forall \mathbf{w} \in \mathbf{W}.

The word “isomorphism” comes from the Greek “iso,” which means “same,” and “morph” which means “shape.” It is not always the case that there exists an isomorphism T from \mathbf{V} to \mathbf{W}; when an isomorphism does exist, one says that \mathbf{V} and \mathbf{W} are isomorphic. Isomorphic vector spaces \mathbf{V} and \mathbf{W} have the “same shape” in the sense that there is both a linear transformation T \colon \mathbf{V} \to \mathbf{W} which transforms every vector \mathbf{v} \in \mathbf{V} into a vector \mathbf{w} \in \mathbf{W}, and an inverse transformation U \colon \mathbf{W} \to \mathbf{V} which “undoes” T by transforming \mathbf{w} back into \mathbf{v}. To understand this, it may be helpful to think of isomorphic vector spaces \mathbf{V} and \mathbf{W} as two different languages. Any two human languages, no matter how different they may seem, are “isomorphic” in the sense that they describe exactly the same thing, namely the totality of human experience. The isomorphism T \colon \mathbf{V} \to \mathbf{W} translates every word \mathbf{v} in the language \mathbf{V} to the corresponding word T\mathbf{v}=\mathbf{w} in \mathbf{W} which means the same thing. The inverse isomorphism U \colon \mathbf{W} \to \mathbf{V} translates back from language \mathbf{W} to language \mathbf{V}. On the other hand, if \mathbf{V} is a human language and \mathbf{W} is the language of an alien civilization, then \mathbf{V} and \mathbf{W} are not isomorphic, since the experience of membership in human society is fundamentally different from the experience of membership in a non-human society, and this difference is not merely a matter of language.

Theorem 2: Any two n-dimensional vector spaces \mathbf{V} and \mathbf{W} are isomorphic.

Proof: Since \mathbf{v} is n-dimensional, it contains a basis B=\{\mathbf{b}_1,\dots,\mathbf{b}_n\} by Theorem 1. Likewise, since \mathbf{W} is n-dimensional, it contains a basis C=\{\mathbf{c}_1,\dots,\mathbf{c}_n\}. Now, since B is a basis in \mathbf{V}, for every \mathbf{v} \in \mathbf{V} there exist unique scalars t_1,\dots,t_n \in \mathbb{R} such that

\mathbf{v}=t_1\mathbf{b}_1 + \dots + t_n\mathbf{b}_n.

Likewise, since since C is a basis in \mathbf{W}, for every \mathbf{w} \in \mathbf{V} there exist unique scalars u_1,\dots,u_n \in \mathbb{R} such that

\mathbf{w}=u_1\mathbf{c}_1 + \dots + u_n\mathbf{c}_n.

We may thus define functions

T \colon \mathbf{V} \to \mathbf{W} \quad\text{ and }\quad U \colon \mathbf{W} \to \mathbf{V}

by

T(t_1\mathbf{b}_1+\dots+t_n\mathbf{b}_n) = t_1\mathbf{c}_1+\dots+t_n\mathbf{c}_n

and

U(u_1\mathbf{c}_1 + \dots + u_n\mathbf{c}_n) = u_1\mathbf{b}_1 + \dots + u_n\mathbf{b}_n.

We claim that T is a linear transformation from \mathbf{V} to \mathbf{W}. To verify this, we must demonstrate that T has the two properties stipulated by Defintion 1. First, we have

T(\mathbf{0}_\mathbf{V})=T(0\mathbf{b}_1+\dots+0\mathbf{b}_n)=0\mathbf{c}_1+\dots+0\mathbf{c}_n=\mathbf{0}_\mathbf{W},

which verifies the first property. Next, let \mathbf{v}_1,\mathbf{v}_2 \in \mathbf{V} be any two vectors, and let a_1,a_2 \in \mathbb{R} be any two scalars. Let

\mathbf{v}_1 = t_{11}\mathbf{b}_1+\dots+t_{1n}\mathbf{b}_n \\\mathbf{v}_1 = t_{21}\mathbf{b}_1+\dots+t_{2n}\mathbf{b}_n

be the unique representations of \mathbf{v}_1 and \mathbf{v}_2 as linear combinations of the vectors in the basis B. We then have

T(a_1\mathbf{v}_1+a_2\mathbf{v}_2) = T((a_1t_{11}+a_2t_{21})\mathbf{b}_1+\dots+(a_1t_{1n}+a_2t_{2n})\mathbf{b}_n)\\ =(a_1t_{11}+a_2t_{21})\mathbf{c}_1+\dots+(a_1t_{1n}+a_2t_{2n})\mathbf{c}_n \\ = a_1\left(t_{11}\mathbf{c}_1+\dots+t_{1n}\mathbf{c}_n \right) + a_2\left( t_{21}\mathbf{c}_1+\dots+t_{2n}\mathbf{c}_n\right) \\ =a_1T\mathbf{v}_1 + a_2T\mathbf{v}_2,

which verifies the second property. In the same way, one checks that U is a linear transformation from \mathbf{W} to \mathbf{V}.

To prove that the linear transformation T \colon \mathbf{V} \to \mathbf{W} is an isomorphism, it remains only to prove that the linear transformation U \colon \mathbf{W} \to \mathbf{V} undoes T. To see this, let

\mathbf{v}=t_1\mathbf{b}_1+\dots+t_n\mathbf{b}_n

be an arbitrary vector in \mathbf{V} expressed as a linear combination of the vectors in the basis B. We then have

U(T\mathbf{v}) = U(t_1\mathbf{c}_1+\dots+t_n\mathbf{c}_n) = t_1\mathbf{b}_1+\dots+t_n\mathbf{b}_n = \mathbf{v}.

This completes the proof that \mathbf{V} and \mathbf{W} are isomorphic vector spaces. — Q.E.D.

To continue with our linguistic analogy, Theorem 2 says that any two n-dimensional vector spaces are just different languages expressing the same set of concepts. From this perspective, it is desirable to choose a “standard language” into which everything should be translated. In linguistics, such a language is called a lingua franca, a term which reflects the fact that this standard language was once the historical predecessor of modern French (these days the lingua franca is English, but this too may eventually change).

The lingua franca for n-dimensional vector spaces is \mathbb{R}^n, and it is unlikely that this will ever change. Let \mathbf{V} be an n-dimensional vector space, and let B=\{\mathbf{b}_1,\dots,\mathbf{b}_n\} be a basis in \mathbf{V}. Consider the basis E=\{\mathbf{e}_1,\dots,\mathbf{e}_n\} of \mathbb{R}^n in which

\mathbf{e}_1 = (1,0,0,\dots,0) \\ \mathbf{e}_2=(0,1,0,\dots,0) \\ \vdots \\ \mathbf{e}_n = (0,0,\dots,0,1).

To be perfectly clear, for each 1 \leq i \leq n the vector \mathbf{e}_i has the number 1 in position i, and a zero in every other position. The basis E is called the standard basis of \mathbb{R}^n, and you should immediately stop reading and check for yourself that it really is a basis. Assuming you have done so, we proceed to define an isomorphism

T_B \colon \mathbf{V} \to \mathbb{R}^n

as follows. Given a vector \mathbf{v} \in \mathbf{V}, let

\mathbf{v}=t_1\mathbf{b}_1 + \dots + t_n\mathbf{b}_n

be the unique representation of \mathbf{v} as a linear combination of vectors in B. Now set

T_B(\mathbf{v}) := t_1\mathbf{e}_1 + \dots + t_n\mathbf{e}_n,

where the symbol “:=” means “equal by definition.” The fact that this really is an isomorphism follows from the proof of Theorem 2 above. The isomorphism T_B is called the coordinate isomorphism relative to the basis B, and the geometric vector T_B(\mathbf{v}) \in \mathbb{R}^n is called the coordinate vector of \mathbf{v} relative to the basis B. Because of the special form of the standard basis of \mathbb{R}^n, we may write the coordinate vector of \mathbf{v} more concisely as

T_B(\mathbf{v}) = (t_1,\dots,t_n).

When working with a given n-dimensional vector space, it is often convenient to choose a basis B in \mathbf{V} and use the corresponding coordinate isomorphism T_B to work in the standard n-dimensional vector space \mathbb{R}^n rather than working in the original vector space \mathbf{V}. For example, someone could come to you with a strange 4-dimensional vector space \mathbf{V} and claim that this vector space is the best model for spacetime. If you find that you disagree, you can choose a convenient basis B=\{\mathbf{b}_1,\mathbf{b}_2,\mathbf{b}_3,\mathbf{b}_4\} and use the corresponding coordinate isomorphism T_B \colon \mathbf{V} \to \mathbb{R}^4 to transform their model into the standard model of spacetime.

Math 31AH: Lecture 1

Assuming familiarity with the geometric concept of a vector, we introduce the notion of a vector space: a set containing objects which can be added to one another and scaled by numbers in a manner which is formally consistent with the addition and scaling of geometric vectors. The vector space concept is the bedrock of linear algebra.

Definition 1: A vector space is a triple (\mathbf{V},a,s) consisting of a set \mathbf{V} together with two functions

a \colon \mathbf{V} \times \mathbf{V} \to \mathbf{V}

and

s \colon \mathbb{R} \times \mathbf{V} \to \mathbf{V},

where \times denotes the Cartesian product of sets and \mathbb{R} denotes the set of real numbers. The functions a and s are required to have the following properties:

  1. For all \mathbf{v}_1,\mathbf{v}_2 \in \mathbf{V}, we have a(\mathbf{v}_1,\mathbf{v}_2) = a(\mathbf{v}_2,\mathbf{v}_1).
  2. For all \mathbf{v}_1,\mathbf{v}_2,\mathbf{v}_3 \in \mathbf{V}, we have a(\mathbf{v}_1,a(\mathbf{v}_2,\mathbf{v}_3)) = a(a(\mathbf{v}_1,\mathbf{v}_2),\mathbf{v}_3).
  3. There exists \mathbf{0} \in \mathbf{V} such that a(\mathbf{0},\mathbf{v}) = \mathbf{v} for all \mathbf{v} \in \mathbf{V}.
  4. For every \mathbf{v} \in \mathbf{V} there exists \mathbf{w} \in \mathbf{V} such that a(\mathbf{v},\mathbf{w})=\mathbf{0}.
  5. For every \mathbf{v} \in \mathbf{V}, we have s(1,\mathbf{v})=\mathbf{v}.
  6. For every t_1,t_2 \in \mathbb{R} and every \mathbf{v} \in \mathbf{V}, we have s(t_1,s(t_2,\mathbf{v})) = s(t_1t_2,\mathbf{v}).
  7. For every t_1,t_2 \in \mathbb{R} and every \mathbf{v} \in \mathbf{V}, we have s(t_1+t_2,\mathbf{v}) = a(s(t_1,\mathbf{v}),s(t_2,\mathbf{v})).
  8. For every t \in \mathbb{R} and every \mathbf{v}_1,\mathbf{v}_2 \in \mathbf{V}, we have s(t,a(\mathbf{v}_1,\mathbf{v}_2))=a(s(t,\mathbf{v}_1,s(t,\mathbf{v}_2).

This is the definition of a vector space. It is logically precise and totally unambiguous, or in other words mathematically rigorous. However, it is difficult to have any intuitive feeling for this construction. In order to make the definition of a vector space more relatable, we will rewrite it using words and symbols that are more familiar and evocative.

First, let us call the elements of the set \mathbf{V} “vectors.” When we use this word, we are reminded of geometric vectors, which are familiar objects. However, this is only an analogy — we make no assumption on the nature of the elements of \mathbf{V}.. Indeed, we shall soon see that many rather different mathematical objects — including numbers, polynomials, functions, and of course geometric vectors — can all be viewed as vectors.

Second, let us call the function a in Definition 1 “vector addition,” and write \mathbf{v}_1+\mathbf{v}_2 in place of a(\mathbf{v}_1,\mathbf{v}_2). If we use this alternative notation, then the axioms governing the function a in Definition 1 become

  1. For all \mathbf{v}_1,\mathbf{v}_2 \in \mathbf{V}, we have \mathbf{v}_1+\mathbf{v}_2 = \mathbf{v}_2+\mathbf{v}_1.
  2. For all \mathbf{v}_1,\mathbf{v}_2,\mathbf{v}_3\in\mathbf{V}, we have \mathbf{v}_1+(\mathbf{v}_2+\mathbf{v}_3) = (\mathbf{v}_1+\mathbf{v}_2)+\mathbf{v}_3.
  3. There exists \mathbf{0} \in \mathbf{V} such that \mathbf{0}+\mathbf{v} = \mathbf{v} for all \mathbf{v} \in \mathbf{V}.
  4. For every \mathbf{v} \in \mathbf{V} there exists \mathbf{w} \in \mathbf{V} such that \mathbf{v}+\mathbf{w}=\mathbf{0}.

This makes things intuitively clear: the axioms above say that the operation of vector addition behaves the way addition does in other contexts we are familiar with, such as the addition of numbers or the addition of geometric vectors. Although conceptually helpful, this comes at a cost, and that cost is ambiguity: we are now using the symbol + in two different ways, since it can mean either addition of numbers in \mathbb{R} or vectors in \mathbf{V}, and these operations are not the same. However, writing \mathbf{v}_1+\mathbf{v}_2 is so much more natural than writing a(\mathbf{v}_1,\mathbf{v}_2) that we decide to do this going forward despite the ambiguity it introduces. This is called abuse of notation.

Third and last, we are going to do the same thing with the function s — give it a name, and write it in a more intuitive but more ambiguous way. The usual name given to s is “scalar multiplication,” which indicates that s is an abstraction of the familiar operation of scaling a geometric vector by a number. The usual notation for scalar multiplication is simply juxtaposition: we write t\mathbf{v} in place of s(t,\mathbf{v}). Adding on the axioms prescribing the behavior of the function s, we now have

  1. For all \mathbf{v}_1,\mathbf{v}_2 \in \mathbf{V}, we have \mathbf{v}_1+\mathbf{v}_2 = \mathbf{v}_2+\mathbf{v}_1.
  2. For all \mathbf{v}_1,\mathbf{v}_2,\mathbf{v}_3\in\mathbf{V}, we have \mathbf{v}_1+(\mathbf{v}_2+\mathbf{v}_3) =(\mathbf{v}_1+\mathbf{v}_2)+\mathbf{v}_3.
  3. There exists \mathbf{0} \in \mathbf{V} such that \mathbf{0}+\mathbf{v} = \mathbf{v} for all \mathbf{v} \in \mathbf{V}.
  4. For every \mathbf{v} \in \mathbf{V} there exists \mathbf{w} \in \mathbf{V} such that \mathbf{v}+\mathbf{w}=\mathbf{0}.
  5. For every \mathbf{v} \in \mathbf{V}, we have 1\mathbf{v}=\mathbf{v}.
  6. For every t_1,t_2 \in \mathbb{R} and every \mathbf{v} \in \mathbf{V}, we have t_1(t_2\mathbf{v})=(t_1t_2)\mathbf{v}.

Again, this makes things much more intuitive: for example, Axiom 5 now says that scaling any vector by the number 1 does nothing, and Axiom 6 says that scaling a vector \mathbf{v} by a number t_2 and then scaling t_2\mathbf{v} by t_1 produces the same result as scaling \mathbf{v} by the number t_1t_2. Written in this way, the axioms governing scalar multiplication become clear and natural, and are compatible with our experience of scaling geometric vectors. However, we are again abusing notation, since two different operations, multiplication of real numbers and scaling of vectors, are being denoted in exactly the same way. Finally, if we incorporate the axioms dictating the way in which vector addition and scalar multiplication are required to interact with one another, we arrive at the following reformulation of Definition 1.

Definition 2: A vector space is a set \mathbf{V} whose elements can be added together and scaled by real numbers in such a way that the following axioms hold:

  1. For all \mathbf{v}_1,\mathbf{v}_2 \in \mathbf{V}, we have \mathbf{v}_1+\mathbf{v}_2 = \mathbf{v}_2+\mathbf{v}_1.
  2. For all \mathbf{v}_1,\mathbf{v}_2,\mathbf{v}_3\in\mathbf{V}, we have \mathbf{v}_1+(\mathbf{v}_2+\mathbf{v}_3) = (\mathbf{v}_1+\mathbf{v}_2)+\mathbf{v}_3.
  3. There exists \mathbf{0} \in \mathbf{V} such that \mathbf{0}+\mathbf{v} = \mathbf{v} for all \mathbf{v} \in \mathbf{V}.
  4. For every \mathbf{v} \in \mathbf{V} there exists \mathbf{w} \in \mathbf{V} such that \mathbf{v}+\mathbf{w}=\mathbf{0}.
  5. For every \mathbf{v} \in \mathbf{V}, we have 1\mathbf{v}=\mathbf{v}.
  6. For every t_1,t_2 \in \mathbb{R} and every \mathbf{v} \in \mathbf{V}, we have t_1(t_2\mathbf{v})=(t_1t_2)\mathbf{v}.
  7. For every t_1,t_2 \in \mathbb{R} and every \mathbf{v} \in \mathbf{V}, we have (t_1+t_2)\mathbf{v} = t_1\mathbf{v}+t_2\mathbf{v}.
  8. For every t \in \mathbb{R} and every \mathbf{v}_1,\mathbf{v}_2 \in \mathbf{V}, we have t(\mathbf{v}_1+\mathbf{v}_2)=t\mathbf{v}_1+t\mathbf{v}_2.

We also make the following definition, which formalizes the notion that a vector space \mathbf{V} may contain a smaller vector space \mathbf{W}.

Definition 3: A subset \mathbf{W} of a vector space \mathbf{V} is said to be a subspace of \mathbf{V} if it is itself a vector space when equipped with the operations of vector addition and scalar multiplication inherited from \mathbf{V}.

From now on, we are going to use Definition 2 as our definition of a vector space, since it is much more convenient and understandable to write things in this way. However, it is important to comprehend that Definition 2 is not completely precise, and to be aware that the pristine and unassailable definition of a vector space given by Definition 1 is what is actually happening under the hood.

Excercise 1: Write down as many concrete examples of vector spaces as you can. You should be able to exhibit quite a few specific vector spaces which are significantly different from one another.

This brings us to an important question: why are we doing this? We understand familiar vector spaces like \mathbb{R}^2 and \mathbb{R}^3 pretty well, so why not just analyze these as standalone objects instead of viewing them as particular instances of the general notion of a vector space? There are many answers to this question, some quite philosophical. Here is a practical answer: if we are able to prove theorems about an abstract vector space, then these theorems will be universal: they will apply to all specific instances of vector spaces which we encounter in the wild.

We now begin to develop this program: we seek to identify properties that every object which satisfies the axioms laid out in Definition 2 must have. What should these properties be? In addressing this question, it is helpful to rely on the intuition gained from experience working with geometric vectors. For example, vectors in \mathbb{R}^2 are just pairs of real numbers, and we have concrete and specific formulas for vector addition and scalar multiplication in \mathbb{R}^2: if \mathbf{x}=(x_1,x_2) and \mathbf{y}=(y_1,y_2), then

\mathbf{x}+\mathbf{y}=(x_1,x_2)+(y_1,y_2) = (x_1+x_2,y_1+y_2),

and

t\mathbf{x} = t(x_1,x_2) = (tx_1,tx_2).

Thus for example we can see that, in \mathbb{R}^2, Axiom 3 in Definition 2 is fulfilled by the vector

\mathbf{0}=(0,0),

since

\mathbf{x}+\mathbf{0}=(x_1,x_2) + (0,0) = (x_1+0,x_2+0)=(x_1,x_2) = \mathbf{x}.

In fact, we can say more: \mathbf{0}=(0,0) is the only vector in \mathbb{R}^2 which has the property required by Axiom 3, because 0 is the only number such that x+0=x for every number x. We can similarly argue that, in \mathbb{R}^3, the only vector which fulfills Axiom 3 is \mathbf{0}=(0,0,0). So, we might suspect that in any vector space \mathbf{V}, the vector \mathbf{0} whose existence is required by Axiom 3 is actually unique. This claim is formulated as follows.

Proposition 1: There is a unique vector \mathbf{0} \in \mathbf{V} such that \mathbf{v}+\mathbf{0}=\mathbf{v} for all \mathbf{v} \in \mathbf{V}.

In order to prove that the claim made by Proposition 1 is true, we must deduce it using nothing more than the vector space axioms given in Definition 2. This is Problem 1 on the first homework assignment. The propositions below give more properties which hold true for every vector space \mathbf{V}. In every case, proving such a proposition means deducing its truth using no information apart from the axioms in Definition 2, and propositions which have already been proved using these axioms.

Proposition 2: For every \mathbf{v} \in \mathbf{V}, there exists a unique \mathbf{w} \in \mathbf{W} such that \mathbf{v}+\mathbf{w}=\mathbf{0}.

Proof: Let \mathbf{v} \in \mathbf{V} be any vector. By Axiom 4 in Definition 2, we know that there exists a vector \mathbf{w} \in \mathbf{V} such that \mathbf{v}+\mathbf{w}=\mathbf{0}. It remains to prove that this vector is unique. Suppose that \mathbf{w}' is another vector such that \mathbf{v}+\mathbf{w}'=\mathbf{0}. We then have

\mathbf{v}+\mathbf{w} = \mathbf{v}+\mathbf{w}'.

Adding the vector \mathbf{w} to both sides of this equation, we get

\mathbf{w}+\mathbf{v}+\mathbf{w} =\mathbf{w}+ \mathbf{v}+\mathbf{w}'.

Since \mathbf{w}+\mathbf{v} = \mathbf{v}+\mathbf{w} by Axiom 1, and since \mathbf{v}+\mathbf{w}=\mathbf{0} by hypothesis, the above equation implies

\mathbf{0}+\mathbf{w} =\mathbf{0}+\mathbf{w}'.

By Axiom 1 this is equivalent to

\mathbf{0}+\mathbf{w} =\mathbf{0}+\mathbf{w}',

and by Axiom 3 this implies

\mathbf{w} =\mathbf{w}',

as required. — Q.E.D.

Now that we have proved that the vector \mathbf{w} which cancels out \mathbf{v} is unique, it is appropriate to denote it by -\mathbf{v}. Thus Axiom 4 becomes

\mathbf{v} + (-\mathbf{v}) = \mathbf{0},

which we agree to write more simply as

\mathbf{v} - \mathbf{v}=\mathbf{0}.

More generally, for any two vectors \mathbf{v},\mathbf{w} \in \mathbf{V}, we write \mathbf{v}-\mathbf{w} as shorthand for \mathbf{v}+(-\mathbf{w}).

Proposition 2: For any t \in \mathbb{R}, we have t\mathbf{0}=\mathbf{0}. That is, scaling the zero vector by any number produces the zero vector.

Proof: By Axiom 3, we have

\mathbf{0} = \mathbf{0}+\mathbf{0}.

Let t \in \mathbb{R} be arbitrary. Multiplying both sides of the above equation by t, we get

t\mathbf{0} = t(\mathbf{0}+\mathbf{0}).

Using Axiom 8 on the left hand side of this equation, we get

t\mathbf{0} = t\mathbf{0} +t\mathbf{0}.

Now, subtracting t\mathbf{0} from both sides of the above equation, we get

t\mathbf{0}-t\mathbf{0} = t\mathbf{0} +t\mathbf{0} - t\mathbf{0},

which simplifies to

\mathbf{0} = t\mathbf{0},

as required.

Proposition 3: For any \mathbf{v} \in \mathbf{V}, we have 0\mathbf{v} = \mathbf{0}. That is, scaling any vector by the number zero produces the zero vector.

Proof: Let \mathbf{v} \in \mathbf{V} be any vector. We have

(1+0)\mathbf{v}=1\mathbf{v} + 0\mathbf{v} = \mathbf{v}+0\mathbf{v},

where we used Axiom 7 to obtain the first equality and Axiom 5 to obtain the second equality. On the other hand, the left hand side of the above equation is

(1+0)\mathbf{v} = 1\mathbf{v} = \mathbf{v},

where the first equality is the fact that adding the number 1 and the number 0 produces the number 1, and the second inequality is Axiom 5 again. So, we have that

\mathbf{v} = \mathbf{v}+\mathbf{0}\mathbf{v}.

Since the vector \mathbf{v} was chosen arbitrarily, we have shown that the vector 0\mathbf{v} has the property that

\mathbf{v} + 0\mathbf{v} = \mathbf{v}

for any v \in \mathbf{V}. We thus have

0\mathbf{v} = \mathbf{0},

by Proposition 1. — Q.E.D.

Proposition 4: If t \neq 0 and \mathbf{v} \neq \mathbf{0}, then t\mathbf{v} \neq \mathbf{0}. That is, scaling a nonzero vector by a nonzero number produces a nonzero vector.

Proof: Suppose there exists a nonzero number t and a nonzero vector \mathbf{v} such that

t\mathbf{v}=\mathbf{0}.

Since t \neq 0, the real number t^{-1} is well-defined. Multiplying both sides of the above equation by t^{-1}, we obtain

(t^{-1}t)\mathbf{v} = t^{-1}\mathbf{0}.

This gives

1\mathbf{v}=t^{-1}\mathbf{0}.

Using Axiom 5 on the left hand side and Proposition 2 on the right hand side, this becomes

\mathbf{v}=\mathbf{0}.

However, this is false, since \mathbf{v} \neq \mathbf{0}. Since the statement that t\mathbf{v}=\mathbf{0} leads to the false statement \mathbf{v}=\mathbf{0}, it must itself be false, and we conclude that t\mathbf{v} \neq \mathbf{0}. — Q.E.D.

Proposition 5: If \mathbf{v} \neq \mathbf{0} and t_1\mathbf{v}=t_2\mathbf{v}, then t_1=t_2. That is, if two scalar multiples of the same nonzero vector are the same, then the scaling factors are the same.

Proof: Subtracting t_2\mathbf{v} from both sides of the equation t_1\mathbf{v}=t_2\mathbf{v} yields

(t_1-t_2)\mathbf{v}=\mathbf{0},

where we used Axiom 7 on the left hand side. If t_1 \neq t_2, this contradicts Proposition 4, so it must be the case that t_1=t_2. —Q.E.D.

Proposition 6: Every vector space contains either one vector, or infinitely many vectors.

Proof: Let \mathbf{V} be a vector space. Then, by Axiom 4, \mathbf{V} contains at least one vector, namely \mathbf{0}. It is possible that this is the only vector in \mathbf{V}, i.e. we have \mathbf{V}=\{\mathbf{0}\}. However, if \mathbf{V} contains another vector \mathbf{v} \neq \mathbf{0}, then it also contains the vector t\mathbf{v} for all t \in \mathbb{R}\backslash \{0\}. By Proposition 4, each of these vectors is different from \mathbf{0}, and by Proposition 5 they are all different from one another. —Q.E.D.

Exercise 2: Try to prove more propositions about vector spaces suggested by your familiarity with \mathbb{R}^2 and \mathbb{R}^3. If you discover something interesting, consider posting about your findings on Piazza.

We now embark on an ambitious project: using nothing more than Definition 2 and the Propositions we have already deduced from it, we want to define a meaningful notion of dimension for vector spaces. The first step on this road is the following definition.

Definition 4: Let \mathbf{V} be a vector space, and let S =\{\mathbf{v}_1,\dots,\mathbf{v}_k\} be a finite subset of \mathbf{V}. We say that S is linearly dependent if there exist numbers t_1,\dots,t_k \in \mathbb{R}, not all equal to zero, such that

\sum_{I=1}^k t_i\mathbf{v}_i = \mathbf{0}.

If no such numbers exist, then S is said to be linearly independent.

It will be convenient to extend Definition 3 to the case where S \subseteq \mathbf{V} is a set of size zero. There is only one such set, namely the empty set S=\{\}. By fiat, we declare the empty set to be a linearly independent set.

The fundamental feature of a linearly dependent set S in a vector space \mathbf{V} is that at least one vector in S is a linear combination of the other vectors in S, meaning that it can be represented as a sum of scalar multiples of these other vectors. For example, suppose that S=\{\mathbf{v}_1,\mathbf{v}_2,\mathbf{v}_3\} is a linearly dependent set. Then, by Definition 3, there exist numbers t_1,t_2,t_3 \in \mathbb{R}, not all equal to zero, such that

t_1\mathbf{v}_1 + t_2 \mathbf{v}_2 + t_3 \mathbf{v}_3 = \mathbf{0}.

We thus have

t_1\mathbf{v}_1 =  -t_2 \mathbf{v}_2 - t_3 \mathbf{v}_3.

If t_1 \neq 0, we can divide both sides of this equation by t_1, obtaining

\mathbf{v}_1 = -\frac{t_2}{t_1} \mathbf{v}_2 - \frac{t_3}{t_1} \mathbf{v}_3.

This expresses the vector \mathbf{v}_1 as a linear combination of \mathbf{v}_2 and \mathbf{v}_3. However, if t_1 = 0, we cannot divide through by t_1 as we did above. Instead, we use the fact that one of t_2,t_3 is nonzero. If t_2 \neq 0, then we have

\mathbf{v}_2 = -\frac{t_1}{t_2} \mathbf{v}_1 - \frac{t_3}{t_2} \mathbf{v}_3,

while if t_3 \neq 0, we have

\mathbf{v}_3 = -\frac{t_1}{t_3} \mathbf{v}_1 - \frac{t_2}{t_3} \mathbf{v}_2.

So, no matter what, the fact that \{\mathbf{v}_1,\mathbf{v}_2,\mathbf{v}_3\} is a linearly dependent set implies that at least one vector in this set is a linear combination of the other two. Conversely, if it were the case that \{\mathbf{v}_1,\mathbf{v}_2,\mathbf{v}_3\} was a linearly independent set, then no vector in \{\mathbf{v}_1,\mathbf{v}_2,\mathbf{v}_3\} would be a linear combination of the other two.

Definition 5: Let \mathbf{V} be a vector space and let n be a nonnegative integer. We say that \mathbf{V} is n-dimensional if it contains a linearly independent set of size k for each integer 0 \leq k \leq n, and does not contain a linearly independent set of size k for any integer k>n. If no such number n exists, then \mathbf{V} is said to be infinite-dimensional.

Proposition 7: Suppose that \mathbf{V} is a vector space which is both m-dimensional and n-dimensional. Then m=n.

Proof: Let us suppose without loss of generality that m \leq n. Then either m<n or m=n. Suppose it were the case that m<n. Then, since \mathbf{V} is m-dimensional, any subset S of \mathbf{V} of size n is linearly dependent. But this is false, since the fact that \mathbf{V} is n-dimensional means that \mathbf{V} contains a linearly independent set of size n. Consequently, it must be the case that m=n. —Q.E.D.

In view of Proposition 7, the concept of vector space dimension introduced by Definition 4 is well-defined; if \mathbf{V} is n-dimensional for some nonnegative integer n, then n is the unique number with this property. We may therefore refer to n as the dimension of \mathbf{V}, and write \dim \mathbf{V}=n. If \mathbf{V} is a vector space which is not n-dimensional for any nonnegative integer n, then it is infinite-dimensional, as per Definition 4. In this case it is customary to write \dim \mathsf{V} =\infty.

One way to make the concept of vector space dimension more relatable is to think of it as the critical value at which a phase transition between possible linear independence and certain linear dependence occurs. That is, if one samples a set S of vectors of size less than or equal to \dim \mathbf{V} from \mathbf{V}, it is possible that S is linearly independent; however, if one samples a set of more than \dim \mathrm{V} from \mathbf{V}, then S is necessarily linearly dependent. To say that \dim \mathbf{V} = \infty is to say that this transition never occurs.

Let us use Definition 4 to calculate the dimension of \mathbf{V}=\{\mathbf{0}\}, a vector space containing only one vector. Observe that the only two subsets of \mathbf{V} are S_0=\{\} and S_1=\{\mathbf{0}\}, and these sets have sizes 0 and 1, respectively. Now, S_0 is linearly independent by definition (see the paragraph immediately following Definition 3), so \mathbf{V} contains a linearly independent set of size zero. Moreover, S_1 is linearly dependent since t\mathbf{0}=\mathbf{0} for any choice of t \in \mathbb{R} by Proposition 2, and thus any subset of \mathbf{V} of size bigger than zero is linearly dependent. We have thus shown that \dim \{\mathbf{0}\}=0.

As another example, let us calculate the dimension of \mathbb{R}^1=\mathbb{R}, the number line. A linearly independent subset of \mathbb{R} of size zero is given by the empty set S_0=\{\}. A linearly independent set of size one is given by \mathbf{S}_1=\{1\}, or any other set containing a single non-zero real number. Consider now an arbitrary subset S_2=\{x,y\} of size two. Since x \neq y, at least one of x,y is not equal to zero. Suppose without loss in generality that y \neq 0. If x=0, then we have t_1x+t_2y=0 with t_1=1,t_2=0, so that S_2 is linearly dependent in this case. If x \neq 0, then we have t_1x+t_2y = 0 with t_1=1, t_2=\frac{x}{y}. So, we have shown that any set of two real numbers is linearly dependent, and by one of the problems on Assignment 1 this implies that any set of more than two real numbers is linearly dependent. We thus conclude that \dim \mathbb{R}=1.

At this point, vector spaces may seem like impossibly complicated objects which are impossible to analyze in general. However, it turns out that for many purposes understanding a given vector space \mathbf{V} can be reduced to understanding a well-chose finite subset of \mathbf{V}. The first step in this direction is the following theorem.

Theorem 1: Let \mathbf{V} be an n-dimensional vector space, and let B = \{\mathbf{b}_1,\dots,\mathbf{b}_n\} be a linearly independent set of n vectors in \mathbf{V}. Then, every vector \mathbf{v} in \mathbf{V} can be uniquely represented as a linear combination of the vectors in B.

Proof: Let \mathbf{v} \in \mathbf{V} be any vector. Consider the set C=\{\mathbf{v},\mathbf{b}_1,\dots,\mathbf{b}_n\}. Since the dimension of \mathbf{V} is n, the set C must be linearly dependent. Thus there exist numbers t_0,t_1,\dots,t_n \in \mathbb{R}, not all of which are equal to zero, such that

t_0\mathbf{v} + t_1\mathbf{b}_1 + \dots + t_n\mathbf{b}_n=\mathbf{0}.

We claim that t_0\neq 0. Indeed, if it were the case that t_0=0, then the above would read

t_1\mathbf{b}_1 + \dots + t_n\mathbf{b}_n=\mathbf{0},

where t_1,\dots,t_n are not all equal to zero. But this is impossible, since B=\{\mathbf{b}_1,\dots,\mathbf{b}_n\} is a linearly independent set, and thus it cannot be the case that t_0=0. Now, since t_0 \neq 0, we can write

\mathbf{v} = -\frac{t_1}{t_0}\mathbf{b}_1 - \dots - \frac{t_n}{t_0} \mathbf{b}_n,

which shows that \mathbf{v} is a linear combination of the vectors \mathbf{b}_1,\dots,\mathbf{b}_n. Since \mathbf{v} was arbitrary, we have shown that every vector in \mathbf{V} can be represented as a linear combination of vectors from B.

Now let us prove uniqueness. Let \mathbf{v} \in \mathbf{V} be a vector, and suppose that

\mathbf{v}=p_1\mathbf{b}_1 + \dots +p_n\mathbf{b}_n \\ \mathbf{v}=q_1\mathbf{b}_1 + \dots +q_n\mathbf{b}_n

are two representations of \mathbf{v} as a linear combination of the vectors in B. Subtracting the second of these equations from the first, we obtain the equation

\mathbf{0}=(p_1-q_1)\mathbf{b}_1 + \dots + (p_n-q_n)\mathbf{b}_n.

Since B is a linearly independent set, we have that p_i-q_i=0 for all 1 \leq i \leq n, which means that p_i=q_i for all 1 \leq i \leq n. We thus conclude that any two representations of any vector \mathbf{v} \in \mathbf{V} as a linear combination of the vectors \mathbf{b}_1,\dots,\mathbf{b}_n in fact coincide. —Q.E.D.

A subset B of a vector space \mathbf{V} which has the property that every \mathbf{v} \in \mathbf{V} can be written as a linear combination of vectors in B is said to span the vector space \mathbf{V}. If moreover B is a linearly independent set, then B is called a basis of \mathbf{V}, and in this case the above argument shows that every vector in \mathbf{V} can be written as a unique linear combination of the vectors in B. In Theorem 1, we have proven that, in an n-dimensional vector space \mathbf{V}, any linearly independent set of size n is a basis. We will continue to study the relationship between the dimension of a vector space \mathbf{V} and its bases in Lecture 2.

Lecture 1 video

Math 31AH: Lecture 0

Welcome to Math 31AH, the first quarter of a three-quarter honors integrated linear algebra/multivariable calculus sequence for well-prepared students. Math 31AH focuses on linear algebra, meaning the study of vectors and linear transformations.

Before saying anything else, I want to draw your attention to the following remark which accompanies the registrar’s listing for this course:

The honors calculus courses 31AH-BH-CH are unusually demanding and are intended for students with strong aptitude and deep interest in Mathematics.

– UCSD Registrar’s Office

If you’re enrolled in this class, I want you to be aware of this: Math 31AH is an unusually demanding course. If the prospect of an intellectual challenge appeals to you, you’ve come to the right place; otherwise, this is not the course for you, and you should instead consider Math 18 and the Math 20 course sequence. It is not the case that the Math 31 course sequence is “better” than the Math 18/20 course sequence — but it is different. The Math 31 sequence is more theoretical and rigorous, meaning that there is a strong emphasis on precise definitions and clear logical reasoning, as opposed to simply learning how to use formulas. This does not mean that you are not expected to master formulas in Math 31AH, rather it means that the process of translating theoretical knowledge into the ability to do concrete calculations is largely left to the student — there is a large “make your own examples” component.

Now let us discuss the basic parameters of the course. First of all, we are presently in a highly unusual situation, and will not be able to meet for standard in-person lectures. The course will instead be organized as follows.

This blog will serve as the textbook for the course: the entire course content will be made available here, in the form of lecture posts. These posts will be of a quality unmatched by any textbook, and will only be available to students enrolled in Math 31AH. There will be two posts per week, each of which will contain content corresponding to what would be covered in a rather ambitious in-person lecture. In addition to written text and links to additional material, each of the two weekly lecture posts will be accompanied by a video coda consisting of a discussion of the post together with illuminating examples, worked problems, and sometimes additional content. These videos will be posted to the media gallery section of the Math 31AH Canvas page. Originally I had planned to embed each such video in the corresponding lecture post, but this proved problematic (an example of an embedded video is included at the bottom of this post). Each video coda will be presented in a way which assumes familiarity with the lecture post containing it. The two weekly lecture posts will be made available prior to our designated Monday and Wednesday 15:00-15:50 lecture slots.

That leaves the Friday lecture slot, which will consist of two parts. The first part will occupy the 15:00-15:50 lecture slot, and will be conducted in the style of a flipped classroom in order to cement your understanding of material you have already absorbed. This may take the form of an interactive problem solving session, a recap of course material presented in the lecture posts, or a more free-form discussion of the course material intended to deepen and broaden your understanding. The flipped classroom session will segue into an additional 70 minutes corresponding to “office hours,” which will be driven primarily by student questions (many students find it worthwhile to attend office hours even if they don’t plan to ask questions). The combined flipped classroom and office hours endeavor makes for a two hour live Zoom session every Friday, 15:00-17:00. For those of you not already aware, Zoom is a freely available videoconferencing platform. These Friday Zoom sessions will be recorded and made available on Canvas.

In addition to the course content delivered by Professor Novak in the manner described above, the Math 31AH teaching assistant, Finley McGlade, will be running live discussion sections via Zoom every Thursday, from 11:00-11:50 for students in section A01, and from 12:00-12:50 for students in section A02. Mr McGlade will post further details concerning the discussion sections on Piazza. There will be no discussion sections on October 1.

Piazza is an online platform facilitating online discussion of all aspects of the course, from logistics to course content. If you are enrolled in Math 31AH, you should have already received an email containing Piazza signup instructions. Links to the lecture posts will be made available under the “Resources” section, where you will also find a syllabus link leading back to this post. Both Professor Novak and Mr McGlade will be active on Piazza. We also expect that Piazza will serve as a form for students to discuss the course content with each other, and endeavor to answer each others questions. Please use Piazza as the default mechanism for asking questions about the course, and refrain from using email unless absolutely necessary.

Now that we have discussed how the course content will be disseminated to students, let us discuss the content to be created by students and submitted for evaluation. Students in Math 31AH will generate two types of content: solutions to weekly problem sets, and solutions to exams.

We will aim for a total of nine weekly problem sets in this course. Problem sets will be posted to Piazza on Sundays before 24:00, and the corresponding solutions will be due the following Sunday before 24:00. Your solutions will be both submitted by you and returned to you using Gradescope. This process will be managed by Mr McGlade, and any questions or concerns related to Gradescope should be directed to him. The problem sets are a very important part of the course, and accordingly make up 50% of the total grade. While you may submit handwritten solutions, it is recommended that you typeset your solutions using LaTeX, the professional standard for the preparation of scientific documents. Learning how to prepare scientific documents of professional quality is a valuable skill that will serve you well in your university career, and beyond. In order to help with this, the problem sets will be typeset in LaTeX, and the source files will be posted along with their PDF output. Problem set solutions which have been typeset using LaTeX will receive an automatic 5% bonus.

There will be no problem set due on November 13, because this is the date of the midterm exam. The midterm exam will count for 20% of your course grade. The details of how the midterm exam will be written and submitted are not yet available, but will be soon, and I will keep you updated.

The final exam for the course is set for December 18, with a scheduled time of 15:00-17:59. This date and time slot is set by the university registrar. The final exam will count for 30% of your course grade. The details of how the final exam will be written and submitted are not yet available, but will be soon, and I will keep you updated.

Our first live Zoom session will take place on October 2, 15:00-17:00. I expect you will have many questions, so this meeting will be purely logistical (no math). The schedule below indicates the expected mathematical content of each subsequent lecture. This schedule is subject to change, and may be updated during the quarter.

10/02Lecture 0Course logistics.
10/05Lecture 1Vector spaces, basis and dimension.
10/07Lecture 2Linear transformations, isomorphism, coordinates.
10/09Lecture 3Flipped classroom, office hour.
10/12Lecture 4Change of basis, Euclidean spaces, Cauchy-Schwarz inequality.
10/14Lecture 5Orthogonal bases; Gram-Schmidt algorithm.
10/16Lecture 6Flipped classroom, office hour.
10/19Lecture 7Orthogonal projection, approximation.
10/21Lecture 8Linear forms, bilinear forms.
10/23Lecture 9Flipped classroom, office hour.
10/26Lecture 10Change of basis, quadratic forms, polarization.
10/28Lecture 11Reduction of a quadratic form to a sum of squares.
10/30Lecture 12Flipped classroom, office hour.
11/02Lecture 13Law of inertia for quadratic forms.
11/04Lecture 14Linear transformations and matrices, rank and nullity.
11/06Lecture 15Flipped classroom, office hour.
11/09Lecture 16Adjoint, symmetric and orthogonal transformations.
11/13Lecture 17MIDTERM EXAM.
11/16Lecture 18Invariant subspaces, eigenvalues and eigenvectors.
11/18Lecture 19Diagonalization of symmetric transformations.
11/20Lecture 20Flipped classroom, office hour.
11/23Lecture 20Tensor product.
11/25Lecture 21Symmetric and antisymmetric tensors.
11/30Lecture 22Wedge product and oriented volume.
12/02Lecture 23Wedge product and determinants.
12/04Lecture 24Flipped classroom, office hour.
12/07Lecture 25Characteristic polynomial.
12/09Lecture 26Complex linear algebra.
12/11Lecture 27Flipped classroom, office hour.
Tentative lecture schedule, subject to change.
Lecture 0 Zoom session, 10/02.