Math 31AH: Lecture 2

Let \mathbf{V} be a vector space. Recall from Lecture 1 (together with Assignment 1, Problem 2) that to say \mathbf{V} is n-dimensional means that \mathbf{V} contains a linearly independent set of size n, but does not contain a linearly independent set of size n+1. Also recall from Lecture 1 that a basis of \mathbf{V} is a subset B of \mathbf{V} which is linearly independent and spans \mathbf{V}. The main goal of this lecture is to prove the following theorem.

Theorem 1: A vector space \mathbf{V} is n-dimensional if and only if it contains a basis B of size n.

Before going on to the proof of this theorem, let us pause to consider its ramifications. Importantly, Theorem 1 gives a method to calculate the dimension of a given vector space \mathbf{V}: all we have to do is find a basis B of \mathbf{V}, and then count the number of elements in that basis. As an example, let us compute the dimension of \mathbf{V}=\mathbb{R}^4. This vector space is typically used to model the world in which we live, which consists of four physical dimensions: three for space and one for time. Let us verify that our mathematical definition of vector space dimension matches our physical understanding of dimension.

Let \mathbf{x}=(x_1,x_2,x_3,x_4) be a vector in \mathbb{R}^4. Then, we have

\mathbf{x} = x_1\mathbf{e}_1+x_2\mathbf{e}_2+x_3\mathbf{e}_3+x_4\mathbf{e}_4,


\mathbf{e}_1 = (1,0,0,0) \\ \mathbf{e}_2=(0,1,0,0) \\ \mathbf{e}_3=(0,0,1,0) \\ \mathbf{e}_4=(0,0,0,1).

This shows that the set B=\{\mathbf{e}_1,\mathbf{e}_2,\mathbf{e}_3,\mathbf{e}_4\} spans \mathbb{R}^4. Let us check that B is linearly independent. This amounts to performing the above manipulation in reverse. If x_1,x_2,x_3,x_4 \in \mathbb{R} are numbers such that


then we have


which means that x_1=x_2=x_3=x_4=0. Thus B is a linearly independent set which spans \mathbf{R}^4, i.e. it is a basis of \mathbb{R}^4. Since B has size 4, we conclude from Theorem 1 that \mathrm{dim} \mathbf{R}^4=4.

Now let us prove Theorem 1. First, observe that we have already proved that if \mathbf{V} is n-dimensional then it contains a basis B of size n — this is Theorem 1 from Lecture 1. It remains to prove the converse, which we now state as a standalone result for emphasis and ease of reference.

Theorem 2: If \mathbf{V} contains a basis B of size n, then \mathbf{V} is n-dimensional.

The proof of Theorem 2 is quite subtle. In order to make it easier to understand, it is helpful to first prove the following lemma, in which the main difficulty is concentrated.

Lemma 1: If A= \{\mathbf{a}_1,\dots\mathbf{a}_m\} is a linearly independent set in a vector space \mathbf{V} and B = \{\mathbf{b}_1,\dots,\mathbf{b}_n\} is a basis of \mathbf{V}, then m \leq n.

Proof: Suppose this were false, i.e. that there exists in the vector space \mathbf{V} a linearly independent set A= \{\mathbf{a}_1,\dots\mathbf{a}_m\} and a basis B = \{\mathbf{b}_1,\dots,\mathbf{b}_n\} such that m>n. We will see that this leads to a contradiction. The strategy is the following: we wish to demonstrate that the assumption m>n implies that we can replace m-n of the vectors in A with the vectors in B in such a way that the resulting set


is linearly independent. If we can show this, we will have obtained the desired contradiction: S cannot possibly be independent, because B=\{\mathbf{b}_1,\dots,\mathbf{b}_n\} is a basis. In particular, each of the remaining vectors \mathbf{a}_{i_1},\dots,\mathbf{a}_{i_{m-n}} is a linear combination of the vectors in B.

We will pursue the following strategy to show that the existence of the set S as above follows from the hypothesis m>n. For each k \in \in \{0,1,2,\dots,n\}, consider the following proposition: there exists a linearly independent set S_k in \mathbf{V} of the form

S_k = A_{m-k} \cup B_k,

where A_{m-k} is a subset of A of size m-k and B_k = \{\mathbf{b}_j \colon j \leq k\}. Let us call this proposition \mathcal{P}_k. Now, if we can prove that \mathcal{P}_0 is true, and that

\mathcal{P}_k \text{ true } \implies \mathcal{P}_{k+1} \text{ true } \quad \forall k \in \{0,1,\dots,n-1\},

then we can conclude that \mathcal{P}_k is true for all k \in \{0,1,\dots,k\}. Indeed, we would then have

\mathcal{P}_0 \text{ true } \implies \mathcal{P}_1 \text{ true } \implies \dots \implies \mathcal{P}_{n-1} \text{ true } \implies \mathcal{P}_n \text{ true}.

This is one version of a proof technique known as mathematical induction. But the statement \mathcal{P}_n true is exactly what we want, since it gives us a linearly independent set S_n consisting of some number of vectors from A together with all vectors in B, which results in the contradiction explained above.

Let us now implement the above strategy. The first step is to prove that \mathcal{P}_0 is true. To see that it is, consider the set S_k = A_m \cup B_0 where A_m = A and B_0=\{\}. This set is of the required form, and since A is linearly independent so is S_k.

It remains to prove that if k \in \{1,\dots,n-1\} and \mathcal{P}_k is true, then \mathcal{P}_{k+1} is true. Given that \mathcal{P}_k is true, there exists a linearly independent set S_k in \mathbf{V} of the form S_k = A_{m-k} \cup B_k with

A_{m-k} = \{\mathbf{a}_{i_1},\dots,\mathbf{a}_{i_{m-k}} \}

an (m-k)-element subset of A and

B_k = \{\mathbf{b}_1,\dots,\mathbf{b}_k\}.

Consider the set

S_k \cup \{\mathbf{b}_{k+1}\} = \{\mathbf{a}_{i_1},\dots,\mathbf{a}_{i_{m-k}},\mathbf{b}_1,\dots,\mathbf{b}_k,\mathbf{b}_{k+1}\}.

If S_k \cup \{\mathbf{b}_{k+1}\} is linearly independent, then so is any subset of it, so in particular the subset

S_{k+1} =\{\mathbf{a}_1,\dots,\mathbf{a}_{i_{m-k-1}},\mathbf{b}_1,\dots,\mathbf{b}_k,\mathbf{b}_{k+1}\}

is linearly independent and \mathcal{P}_{k+1} is true. Now suppose that S_k \cup \{\mathbf{b}_{k+1}\} is linearly dependent. Then, because S_k is linearly independent, \mathbf{b}_{k+1} must be a linear combination of the vectors in S_k, i.e

\mathbf{b}_{k+1} = \sum_{j=1}^{m-k} t_j \mathbf{a}_{i_j} + \sum_{j=1}^k s_j \mathbf{b}_j

for some a_{i_1},\dots,a_{i_{m-j}},t_1,\dots,t_k \in \mathbb{R}. Moreover, there exists a number c \in \{1,\dots,m-k\} such that a_{i_c} \neq 0, else \mathbf{b}_{k+1} would be a linear combination of \mathbf{b}_1,\dots,\mathbf{b}_k, which is impossible because B_{k+1} is a linearly independent set. We now claim that the set

S_{k+1} = A_{m-k}\backslash \{\mathbf{a}_{i_c}\} \cup B_{k+1}

is linearly independent. Indeed, if S_{k+1} were linearly dependent, then we would have

\sum_{j=1}^{c-1} p_j \mathbf{a}_{i_j} +\sum_{j=c+1}^{m-k} p_j \mathbf{a}_{i_j} + \sum_{j=1}^{k+1} q_j \mathbf{b}_j = \mathbf{0},

where not all the numbers p_1,\dots,p_{c-1},p_{c+1},\dots,p_{m-k},q_1,\dots,q_{k+1} are zero. This means that the number q_{k+1} \neq 0, since otherwise the above would say that a subset of S_k is linearly dependent, which is false because S_k is linearly independent. Now, if we substitute the representation of \mathbf{b}_{k+1} as a linear combination of elements S_k given above, this becomes a vanishing linear combination of the vectors in S_k in which the coefficient of \mathbf{a}_{i_c} is t_cq_{k+1} \neq 0, which contradicts the linear independence of S_k. So, S_{k+1} must be linearly independent. — Q.E.D.

Let us note the following corollary of Lemma 1.

Corollary 1: If A=\{\mathbf{a}_1,\dots,\mathbf{a}_m\} and B=\{\mathbf{b}_1,\dots,\mathbf{b}_n\} are two bases of a vector space \mathbf{V}, then m=n.

Proof: Since A is linearly independent and B is a basis, we have m \leq n by Lemma 1. On the other hand, since B is linearly independent and A is a basis, we also have n \leq m by Lemma 1. Thus m=n. — Q.E.D.

We now have everything we need to prove Theorem 2.

Proof of Theorem 2: Let \mathbf{V} be a finite-dimensional vector space which contains a basis B of size n. We will prove that \mathbf{V} is n-dimensional using the definition of vector space dimension (Definition 4 in Lecture 1) and Lemma 1. First, since B is a linearly independent set of size n, we can be sure that \mathbf{V} contains a linearly independent set of size n. It remains to show that \mathbf{V} does not contain a linearly independent set of size n+1. This follows from Lemma 1: since B is a basis of size n, every linearly independent set A in \mathbf{V} must have size less than or equal to n. — Q.E.D.

Another very important consequence of Theorem 2 is the following: it reveals that any two vector spaces of the same dimension can more or less be considered the same. Note that this is fundamentally new territory for us; so far, we have only considered one vector space at a time, but now we are going to compare two vector spaces.

To make the above precise, we need to consider functions between vector spaces. In fact, it is sufficient to limit ourselves to the consideration of functions which are compatible with the operations of vector addition and scalar multiplication. This leads to the definition of a linear transformation from a vector space \mathbf{V} to another vector space \mathbf{W}. If vector spaces are the foundation of linear algebra, then linear transformations are the structures we want to build on these foundations.

Defintion 1: A function T \colon \mathbf{V} \to \mathbf{W} is said to be a linear transformation if it has the following properties:

  1. T(\mathbf{0}_\mathbf{V}) =\mathbf{0}_\mathbf{W}, where \mathbf{0}_\mathbf{V} is the zero vector in \mathbf{V} and \mathbf{0}_\mathbf{W} is the zero vector in \mathbf{W}.
  2. For any vectors \mathbf{v}_1,\mathbf{v}_2 \in \mathbf{V} and any scalars t_1,t_2 \in \mathbb{R}, we have T(t_1\mathbf{v}_1 + t_2\mathbf{v}_2)=t_1T(\mathbf{v}_1)+t_2T(\mathbf{v}_2).

So, a linear transformation is a special kind of function from one vector space to another. The name “linear transformation” comes from the fact that these special functions are generalizations of lines in \mathbb{R}^2 which pass through the origin (0,0). More precisely, every such line is the graph of a linear transformation T \colon \mathbb{R}^2 \to \mathbb{R}^2. Indeed, a line through the origin in \mathbb{R}^2 of slope m is the graph of the linear transformation T_m \colon \mathbf{R} \to \mathbb{R} defined by

T_m(x) = mx.

The one exception to this is the vertical line in \mathbb{R}^2 passing through (0,0), which has slope m=\infty. This line is not the graph of any function from \mathbb{R} to \mathbb{R}, which should be clear if you remember the vertical line test.

To reiterate, a linear transformation from a vector space \mathbf{V} to \mathbf{W} is a special kind of function T \colon \mathbf{V} \to \mathbf{W}. For linear transformations, it is common to write T\mathbf{v} instead of T(\mathbf{v}), and this shorthand is often used to implicitly indicate that T is a linear transformation. In the second half of the course, we will discuss linear transformations in great detail. At present, however, we are only concerned with a special type of linear transformation called an “isomorphism.”

Defintion 2: A linear transformation T \colon \mathbf{V} \to \mathbf{W} is said to be an isomorphism if there exists a linear transformation U \colon \mathbf{W} \to \mathbf{V} such that

U(T\mathbf{v})= \mathbf{v} \quad \forall \mathbf{v} \in \mathbf{V}


T(U\mathbf{w})=\mathbf{w} \quad \forall \mathbf{w} \in \mathbf{W}.

The word “isomorphism” comes from the Greek “iso,” which means “same,” and “morph” which means “shape.” It is not always the case that there exists an isomorphism T from \mathbf{V} to \mathbf{W}; when an isomorphism does exist, one says that \mathbf{V} and \mathbf{W} are isomorphic. Isomorphic vector spaces \mathbf{V} and \mathbf{W} have the “same shape” in the sense that there is both a linear transformation T \colon \mathbf{V} \to \mathbf{W} which transforms every vector \mathbf{v} \in \mathbf{V} into a vector \mathbf{w} \in \mathbf{W}, and an inverse transformation U \colon \mathbf{W} \to \mathbf{V} which “undoes” T by transforming \mathbf{w} back into \mathbf{v}. To understand this, it may be helpful to think of isomorphic vector spaces \mathbf{V} and \mathbf{W} as two different languages. Any two human languages, no matter how different they may seem, are “isomorphic” in the sense that they describe exactly the same thing, namely the totality of human experience. The isomorphism T \colon \mathbf{V} \to \mathbf{W} translates every word \mathbf{v} in the language \mathbf{V} to the corresponding word T\mathbf{v}=\mathbf{w} in \mathbf{W} which means the same thing. The inverse isomorphism U \colon \mathbf{W} \to \mathbf{V} translates back from language \mathbf{W} to language \mathbf{V}. On the other hand, if \mathbf{V} is a human language and \mathbf{W} is the language of an alien civilization, then \mathbf{V} and \mathbf{W} are not isomorphic, since the experience of membership in human society is fundamentally different from the experience of membership in a non-human society, and this difference is not merely a matter of language.

Theorem 2: Any two n-dimensional vector spaces \mathbf{V} and \mathbf{W} are isomorphic.

Proof: Since \mathbf{v} is n-dimensional, it contains a basis B=\{\mathbf{b}_1,\dots,\mathbf{b}_n\} by Theorem 1. Likewise, since \mathbf{W} is n-dimensional, it contains a basis C=\{\mathbf{c}_1,\dots,\mathbf{c}_n\}. Now, since B is a basis in \mathbf{V}, for every \mathbf{v} \in \mathbf{V} there exist unique scalars t_1,\dots,t_n \in \mathbb{R} such that

\mathbf{v}=t_1\mathbf{b}_1 + \dots + t_n\mathbf{b}_n.

Likewise, since since C is a basis in \mathbf{W}, for every \mathbf{w} \in \mathbf{V} there exist unique scalars u_1,\dots,u_n \in \mathbb{R} such that

\mathbf{w}=u_1\mathbf{c}_1 + \dots + u_n\mathbf{c}_n.

We may thus define functions

T \colon \mathbf{V} \to \mathbf{W} \quad\text{ and }\quad U \colon \mathbf{W} \to \mathbf{V}


T(t_1\mathbf{b}_1+\dots+t_n\mathbf{b}_n) = t_1\mathbf{c}_1+\dots+t_n\mathbf{c}_n


U(u_1\mathbf{c}_1 + \dots + u_n\mathbf{c}_n) = u_1\mathbf{b}_1 + \dots + u_n\mathbf{b}_n.

We claim that T is a linear transformation from \mathbf{V} to \mathbf{W}. To verify this, we must demonstrate that T has the two properties stipulated by Defintion 1. First, we have


which verifies the first property. Next, let \mathbf{v}_1,\mathbf{v}_2 \in \mathbf{V} be any two vectors, and let a_1,a_2 \in \mathbb{R} be any two scalars. Let

\mathbf{v}_1 = t_{11}\mathbf{b}_1+\dots+t_{1n}\mathbf{b}_n \\\mathbf{v}_1 = t_{21}\mathbf{b}_1+\dots+t_{2n}\mathbf{b}_n

be the unique representations of \mathbf{v}_1 and \mathbf{v}_2 as linear combinations of the vectors in the basis B. We then have

T(a_1\mathbf{v}_1+a_2\mathbf{v}_2) = T((a_1t_{11}+a_2t_{21})\mathbf{b}_1+\dots+(a_1t_{1n}+a_2t_{2n})\mathbf{b}_n)\\ =(a_1t_{11}+a_2t_{21})\mathbf{c}_1+\dots+(a_1t_{1n}+a_2t_{2n})\mathbf{c}_n \\ = a_1\left(t_{11}\mathbf{c}_1+\dots+t_{1n}\mathbf{c}_n \right) + a_2\left( t_{21}\mathbf{c}_1+\dots+t_{2n}\mathbf{c}_n\right) \\ =a_1T\mathbf{v}_1 + a_2T\mathbf{v}_2,

which verifies the second property. In the same way, one checks that U is a linear transformation from \mathbf{W} to \mathbf{V}.

To prove that the linear transformation T \colon \mathbf{V} \to \mathbf{W} is an isomorphism, it remains only to prove that the linear transformation U \colon \mathbf{W} \to \mathbf{V} undoes T. To see this, let


be an arbitrary vector in \mathbf{V} expressed as a linear combination of the vectors in the basis B. We then have

U(T\mathbf{v}) = U(t_1\mathbf{c}_1+\dots+t_n\mathbf{c}_n) = t_1\mathbf{b}_1+\dots+t_n\mathbf{b}_n = \mathbf{v}.

This completes the proof that \mathbf{V} and \mathbf{W} are isomorphic vector spaces. — Q.E.D.

To continue with our linguistic analogy, Theorem 2 says that any two n-dimensional vector spaces are just different languages expressing the same set of concepts. From this perspective, it is desirable to choose a “standard language” into which everything should be translated. In linguistics, such a language is called a lingua franca, a term which reflects the fact that this standard language was once the historical predecessor of modern French (these days the lingua franca is English, but this too may eventually change).

The lingua franca for n-dimensional vector spaces is \mathbb{R}^n, and it is unlikely that this will ever change. Let \mathbf{V} be an n-dimensional vector space, and let B=\{\mathbf{b}_1,\dots,\mathbf{b}_n\} be a basis in \mathbf{V}. Consider the basis E=\{\mathbf{e}_1,\dots,\mathbf{e}_n\} of \mathbb{R}^n in which

\mathbf{e}_1 = (1,0,0,\dots,0) \\ \mathbf{e}_2=(0,1,0,\dots,0) \\ \vdots \\ \mathbf{e}_n = (0,0,\dots,0,1).

To be perfectly clear, for each 1 \leq i \leq n the vector \mathbf{e}_i has the number 1 in position i, and a zero in every other position. The basis E is called the standard basis of \mathbb{R}^n, and you should immediately stop reading and check for yourself that it really is a basis. Assuming you have done so, we proceed to define an isomorphism

T_B \colon \mathbf{V} \to \mathbb{R}^n

as follows. Given a vector \mathbf{v} \in \mathbf{V}, let

\mathbf{v}=t_1\mathbf{b}_1 + \dots + t_n\mathbf{b}_n

be the unique representation of \mathbf{v} as a linear combination of vectors in B. Now set

T_B(\mathbf{v}) := t_1\mathbf{e}_1 + \dots + t_n\mathbf{e}_n,

where the symbol “:=” means “equal by definition.” The fact that this really is an isomorphism follows from the proof of Theorem 2 above. The isomorphism T_B is called the coordinate isomorphism relative to the basis B, and the geometric vector T_B(\mathbf{v}) \in \mathbb{R}^n is called the coordinate vector of \mathbf{v} relative to the basis B. Because of the special form of the standard basis of \mathbb{R}^n, we may write the coordinate vector of \mathbf{v} more concisely as

T_B(\mathbf{v}) = (t_1,\dots,t_n).

When working with a given n-dimensional vector space, it is often convenient to choose a basis B in \mathbf{V} and use the corresponding coordinate isomorphism T_B to work in the standard n-dimensional vector space \mathbb{R}^n rather than working in the original vector space \mathbf{V}. For example, someone could come to you with a strange 4-dimensional vector space \mathbf{V} and claim that this vector space is the best model for spacetime. If you find that you disagree, you can choose a convenient basis B=\{\mathbf{b}_1,\mathbf{b}_2,\mathbf{b}_3,\mathbf{b}_4\} and use the corresponding coordinate isomorphism T_B \colon \mathbf{V} \to \mathbb{R}^4 to transform their model into the standard model of spacetime.


Leave a Reply