Math 31AH: Lecture 4

Let \mathbf{V} be a vector space. In Lecture 2, we proved that \mathbf{V} is n-dimensional if and only if every basis in \mathbf{V} consists of n vectors. Suppose that B=\{\mathbf{b}_1,\dots,\mathbf{b}_n\} is a basis of \mathbf{V}. Then, every vector \mathbf{v} \in \mathbf{V} can be represented as a linear combination of vectors in B,


and this representation is unique. A natural question is then the following: if C=\{\mathbf{c}_1,\dots,\mathbf{c}_n\} is a second basis of \mathbf{V}, and


is the representation of \mathbf{v} as a linear combination of vectors in C, what is the relationship between the numbers x_1,\dots,x_n and the numbers y_1,\dots,y_n? Since these two lists of numbers are the coordinates of the same vector \mathbf{v}, but with respect to (possibly) different bases, it is reasonable to expect that they should be related to one another in a structured way. We begin this lecture by working out this relationship precisely.

We follow a strategy which would be acceptable to Marie Kondo: out with the old, in with the new. Let us call B the “old” basis, and C the “new” basis. Let us do away with the old basis vectors by expressing them in terms of the new basis, writing

\mathbf{b}_1 = a_{11}\mathbf{c}_1 + a_{21}\mathbf{c}_2 \dots + a_{n1}\mathbf{c}_n \\ \mathbf{b}_2 = a_{12}\mathbf{c}_1 + a_{22}\mathbf{c}_2\dots + a_{n2}\mathbf{c}_n \\ \vdots \\ \mathbf{b}_n = a_{1n}\mathbf{c}_1 + a_{2n}\mathbf{c}_2\dots + a_{nn}\mathbf{c}_n,

where, for each 1 \leq j \leq n,


is the coordinate vector of the old basis vector \mathbf{b}_j relative to the new basis C.

We now return to the first equation above, which expresses our chosen vector \mathbf{v} in terms of the old basis. Replacing the vectors of the old basis with their representations relative to the new basis, we have

\mathbf{v} = x_1\mathbf{b}_1+\dots+x_n\mathbf{b}_n = x_1 \sum_{i=1}^n a_{i1} \mathbf{c}_i + \dots + x_n\sum_{i=1}^n a_{in} \mathbf{c}_i,

which we can compress even more if we use Sigma notation twice:

\mathbf{v} = \sum_{j=1}^n x_j \sum_{i=1}^n a_{ij} \mathbf{c}_i = \sum_{i=1}^n \left( \sum_{j=1}^n a_{ij}x_j \right)\mathbf{c}_i.

Now, since the representation


of \mathbf{v} relative to C is unique, we find that

y_1 = \sum_{j=1}^n a_{1j}x_j \\ y_2 = \sum_{j=1}^n a_{2j}x_j \\ \vdots \\ y_n = \sum_{j=1}^n a_{nj}x_j.

This list of n formulas answers our original question: it expresses the “new” coordinates y_1,\dots,y_n in terms of the “old” coordinates x_1,\dots,x_n. A good way to remember these formulas is to rewrite them using the familiar dot product of geometric vectors in \mathbb{R}^n. In terms of the dot product, the above formulas become

y_1 = (a_{11},\dots,a_{1n}) \cdot (x_1,\dots,x_n) \\ y_2 = (a_{21},\dots,a_{2n}) \cdot (x_1,\dots,x_n) \\ \vdots \\ y_n= (a_{n1},\dots,a_{nn}) \cdot (x_1,\dots,x_n).

Usually, this collection of n formulas is packaged as a single matrix equation:

\begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix} = \begin{bmatrix} a_{11} & \dots & a_{1n} \\ a_{21} & \dots & a_{2n} \\ \vdots & {} & \vdots \\ a_{n1} & \dots & a_{nn}  \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}.

In fact, this process of changing from the old coordinates \mathbf{x}=(x_1,\dots,x_n) of a vector \mathbf{v} relative to the old basis B to the new coordinates \mathbf{y}= (y_1,\dots,y_n) of this same vector relative to the new basis C explains why the product of an n \times n matrix and an n \times 1 matrix is defined in the way that it is: the definition is made so that we can write


with A the matrix whose (i,j)-entry is a_{ij}.

Let us summarize the result of the above calculation. We have a vector \mathbf{v} belonging to a finite-dimensional vector space \mathbf{V}, and we have two bases B and C of \mathbf{v}. Let [\mathbf{v}]_B be the n \times 1 matrix whose entries are the coordinates of \mathbf{v} relative to the old basis B, and let [\mathbf{v}]_C denote the n \times 1 matrix whose entries are the coordinates of this same vector \mathbf{v} relative to the new basis C. We want to write down an equation which relates the matrices [\mathbf{v}]_B and [\mathbf{v}]_C. The equation is

[\mathbf{v}]_C = A_{B \to C} [\mathbf{v}]_B,


A_{B \to C} = \begin{bmatrix} [\mathbf{b}_1]_C & [\mathbf{b}_2]_C & \dots & [\mathbf{b}_n]_C \end{bmatrix}

is the n \times n “transition matrix” whose jth column is the n \times 1 matrix [\mathbf{b}_j]_C consisting of the coordinates of the old basis vector \mathbf{b}_j relative to the new basis C.

Let’s look at a two-dimensional example. In \mathbb{R}^2, the standard basis is E=\{\mathbf{e}_1,\mathbf{e}_2\}, where \mathbf{e}_1=(1,0) and \mathbf{e}_2=(0,1). Suppose now that we wish to get creative and write the vectors of \mathbb{R}^2 in terms of the alternative basis F=\{\mathbf{f}_1,\mathbf{f}_2\}, where \mathbf{f}_1=\mathbf{e}_1=(1,0) but \mathbf{f}_2 = (1,1). This corresponds to using coordinate axes which, instead of being a pair of perpendicular lines, are a pair of lines at a 45^\circ angle to one another — pretty wild. What are the coordinates of a given vector \mathbf{v}=(x_1,x_2) in \mathbb{R}^2 when we use these tilted axes? Let us answer this question using the above recipe. We need to express the vectors \mathbf{e}_1,\mathbf{e}_2 of the old basis E in terms of the vectors \mathbf{f}_1,\mathbf{f}_2 of the new basis C. This is easy: by inspection, we have

\mathbf{e}_1 = \mathbf{f}_1 \\ \mathbf{e}_2= -\mathbf{f}_1+\mathbf{f}_2.

This means that our transition matrix is the 2 \times 2 matrix

A_{E \to F} = \begin{bmatrix} [\mathbf{e}_1]_F & [\mathbf{e}_2]_F \end{bmatrix} = \begin{bmatrix} 1 & -1 \\ 0 & 1 \end{bmatrix}.

We conclude that the coordinates of \mathbf{v}=(x_1,x_2) in the new basis F are given by

[\mathbf{v}]_F = A_{E \to F} [\mathbf{v}]_E = \begin{bmatrix} 1 & -1 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} =\begin{bmatrix} x_1-x_2 \\ x_2 \end{bmatrix}.

In the course of the above discussion, we have seen that the familiar dot product of geometric vectors is useful in the context of general vector spaces. This raises the question of whether the dot product itself can be generalized. The answer is yes, and the concept which generalizes the dot product by capturing its basic features is the following.

Definition 1: Let \mathbf{V} be a vector space. A scalar product on \mathbf{V} is a function

\langle \cdot, \cdot \rangle \colon \mathbf{V} \times \mathbf{V} \to \mathbb{R}

which satisfies:

  1. For any \mathbf{v}_1,\mathbf{v}_2,\mathbf{w}_1,\mathbf{w}_2 \in \mathbf{V} and x_1,x_2,y_1,y_2 \in \mathbb{R}, we have \langle x_1\mathbf{v}_1+x_2\mathbf{v}_2,y_1\mathbf{w}_1+y_2\mathbf{w}_2 \rangle = x_1y_1 \langle \mathbf{v}_1,\mathbf{w}_1 \rangle + x_1y_2\langle \mathbf{v}_1,\mathbf{w}_2 \rangle + x_2y_1\langle \mathbf{v}_2,\mathbf{w}_1 \rangle + x_2y_2 \langle \mathbf{v}_2,\mathbf{w}_2 \rangle.
  2. For any \mathbf{v},\mathbf{w} \in \mathbf{V}, we have \langle \mathbf{v},\mathbf{w} \rangle = \langle \mathbf{w},\mathbf{v} \rangle.
  3. For any \mathbf{v} \in \mathbf{V}, we have \langle \mathbf{v},\mathbf{v} \rangle \geq 0, with equality if and only if \mathbf{v}=\mathbf{0}.

Let us consider why the operation introduced in Defintion 1 is called a “scalar product.” First, it’s called a “product” because it takes two vectors \mathbf{v},\mathbf{w} and produces from them the new entity \langle \mathbf{v},\mathbf{w} \rangle. Second, this new entity is not a vector, but a scalar — hence, \langle \mathbf{v},\mathbf{w} \rangle is the “scalar product” of \mathbf{v} and \mathbf{w}. What about the axioms? These are obtained by extracting the basic features of the dot product of geometric vectors: it is “bilinear,” which means that one has the usual FOIL identity

(x_1\mathbf{v}_1+x_2\mathbf{v}_2) \cdot (y_1\mathbf{w}_1+y_2\mathbf{w}_2)= x_1y_1  \mathbf{v}_1 \cdot \mathbf{w}_1 + x_1y_2\mathbf{v}_1, \cdot \mathbf{w}_2+ x_2y_1\mathbf{v}_2 \cdot \mathbf{w}_1 + x_2y_2 \mathbf{v}_2 \cdot \mathbf{w}_2

for expanding brackets; it is “symmetric,” in the sense that

\mathbf{v} \cdot \mathbf{w} = \mathbf{w} \cdot \mathbf{v},

and it is “positive definite,” meaning that

\mathbf{v} \cdot \mathbf{v} \geq 0,

with equality if and only if \mathbf{v} is the zero vector. Definition 1 takes these properties and lifts them to the setting of an abstract vector space to form the scalar product concept, of which the dot product becomes a special case.

Definition 2: A pair (\mathbf{V},\langle \cdot, \cdot \rangle) consisting of a vector space together with a scalar product is called a Euclidean space.

Why is a vector space equipped with a scalar product called a Euclidean space? In the familiar vector space \mathbb{R}^n, the basic notions of Euclidean geometry — length and angle — can be expressed algebraically, in terms of the dot product. More precisely, the length of a vector \mathbf{v} = (x_1,\dots,x_n) \in \mathbb{R}^n is given by

|\mathbf{v}| = \sqrt{x_1^2 + \dots + x_n^2} = \sqrt{\mathbf{v} \cdot \mathbf{v}},

where \sqrt{x} denotes the nonnegative square root of a nonnegative real number, and the angle \theta between two vectors \mathbf{v}=(x_1,\dots,x_n) and \mathbf{w}=(y_1,\dots,y_n) is related to the dot product via

\mathbf{v} \cdot \mathbf{w} = |\mathbf{v}| |\mathbf{w}| \cos \theta.

We can mimic these algebraic formulas to define the concepts of length and angle in an abstract Euclidean space (\mathbf{V},\langle \cdot, \cdot \rangle) — we define the length \|\mathbf{v}\| of a vector \mathbf{v} \in \mathbf{V} by the formula

\|\mathbf{v}\| = \sqrt{\langle \mathbf{v},\mathbf{v}},

and we define the angle between two vectors \mathbf{v},\mathbf{w} \in \mathbf{V} to be the number \theta \in [0,2\pi) determined by the formula

\langle \mathbf{v},\mathbf{w} \rangle = \|\mathbf{v}\| \|\mathbf{w}\| \cos \theta.

Let us examine these definitions more carefully. First, the quantity \|\mathbf{v}\| which generalizes the length of a geometric vector is usually called the “norm” of \mathbf{v} in order to distinguish it from the original notion of length, which it generalizes. If the vector norm is a good generalization of geometric length, then it should have some of the main properties of the original concept; in particular, it should be nonnegative, and the only vector of length zero should be the zero vector. In order for these properties to hold in every possible Euclidean space, we must be able to deduce them solely from the axioms defining the scalar product.

Proposition 1: Let (\mathbf{V},\langle \cdot, \cdot \rangle) be a Euclidean space. For any vector \mathbf{v} \in \mathbf{V}, we have \|\mathbf{v}\| \geq 0, and equality holds if and only if \mathbf{v}=\mathbf{0}.

Proof: From the definition of vector norm and the first scalar product axiom, we have that

\|\mathbf{v}\| = \sqrt{\langle \mathbf{v},\mathbf{v} \rangle}

is the square root of a nonnegative number, and hence is itself nonnegative. Moreover, in order for \sqrt{x}=0 to hold for a nonnegative real number x, it must be the case that x=0, and from the second scalar product axiom we have \langle \mathbf{v},\mathbf{v} \rangle=0 if and only if \mathbf{v}=\mathbf{0}. — Q.E.D.

Now we consider the algebraic definition of the angle \theta between two vectors \mathbf{v},\mathbf{w} in a Euclidean space (\mathbf{V},\langle \cdot,\cdot \rangle). As you are aware, for any number \theta \in \mathbb{R} we have -1 \leq \cos \theta \leq 1. Thus, for our definition of angle to be valid, we need the following proposition — which is known as the Cauchy-Schwarz inequality — to follow from the scalar product axioms.

Proposition 2: Let (\mathbf{V},\langle \cdot, \cdot \rangle) be a Euclidean space. For any \mathbf{v},\mathbf{w} \in \mathbf{V}, we have

-1 \leq \frac{\langle \mathbf{v},\mathbf{w} \rangle}{\|\mathbf{v}\| \|\mathbf{w}\|} \leq 1.

Proof: We begin by noting that the claimed double inequality is equivalent to the single inequality

\frac{\langle \mathbf{v},\mathbf{w} \rangle^2}{\|\mathbf{v}\|^2 \|\mathbf{w}\|^2} \leq 1,

which is in turn equivalent to

\langle \mathbf{v},\mathbf{w} \rangle^2 \leq \|\mathbf{v}\|^2 \|\mathbf{w}\|^2.

We will prove that this third form of the claimed inequality is true.

Let \mathbf{v},\mathbf{w} be any two vectors in \mathbf{V}. If either of \mathbf{v} or \mathbf{w} is the zero vector, then by the third scalar product axiom (positive definiteness) both sides of the above inequality are zero, and we get the true expression 0 \leq 0. It remains to prove the inequality in the case that neither \mathbf{v} nor \mathbf{w} is the zero vector.

Consider the function f(x) of a variable x defined by

f(x) = \langle \mathbf{v}-x\mathbf{w},\mathbf{v}-x\mathbf{w} \rangle.

We can expand this using the first scalar product axiom (bilinearity), and we get

f(x) = \langle \mathbf{v},\mathbf{v} \rangle -x\langle \mathbf{v},\mathbf{w} \rangle - x \langle \mathbf{w},\mathbf{v} \rangle + x^2\langle \mathbf{w},\mathbf{w} \rangle.

Using the second scalar product axiom (symmetry), this simplifies to

f(x) = \langle \mathbf{v},\mathbf{v} \rangle -2x\langle \mathbf{v},\mathbf{w} \rangle + x^2\langle \mathbf{w},\mathbf{w} \rangle.

We see that the function f(x) is a polynomial of degree two, i.e. it has the form

f(x) = ax^2 + bx + c,


a=\langle \mathbf{w},\mathbf{w} \rangle,\ b = -2\langle \mathbf{v},\mathbf{w}\rangle,\ c = \langle \mathbf{v},\mathbf{v} \rangle.

Note that we can be sure a > 0, because \mathbf{w} \neq 0. Thus the graph of the function f(x) is an upward-opening parabola. Moreover, since

f(x) = \langle \mathbf{v}-x\mathbf{w},\mathbf{v}-x\mathbf{w} \rangle \geq 0,

this parabola either lies strictly above the horizontal axis, or is tangent to it. Equivalently, the quadratic equation

ax^2 + bx+ c=0

has either no real roots (parabola strictly above the horizontal axis), or two identical real roots (parabola tangent to the horizontal axis). We can differentiate between the two cases using the discriminant of this quadratic equation, i.e. the number


which is the square root part of the familiar quadratic formula

x= \frac{-b \pm \sqrt{b^2-4ac}}{2a}.

More precisely, if the discriminant is negative the corresponding quadratic equation has no real solutions, and if it is zero then the equation has a unique solutions. In the case b^2-4ac < 0, we get

4\langle \mathbf{v},\mathbf{w} \rangle^2 < 4\langle \mathbf{w},\mathbf{w} \rangle \langle \mathbf{v},\mathbf{v} \rangle,

which gives us the inequality

\langle \mathbf{v},\mathbf{w} \rangle^2 < \|\mathbf{v}\|^2 \|\mathbf{w}\|^2,

which verifies the inequality we’re trying to prove in this case. In the case, b^2-4ac, we get instead

\langle \mathbf{v},\mathbf{w} \rangle^2 = \|\mathbf{v}\|^2 \|\mathbf{w}\|^2.

So, in all cases the claimed inequality

\langle \mathbf{v},\mathbf{w} \rangle^2 \leq \|\mathbf{v}\|^2 \|\mathbf{w}\|^2

holds true. — Q.E.D.

The upshot of the above discussion is that the concepts of length and angle are now well-defined in the setting of a general Euclidean space (\mathbf{V},\langle \cdot,\cdot \rangle). So, even though the vectors in such a space need not be geometric vectors, we can use geometric intuition and analogies when thinking about them. A simple example is the following natural proposition, which generalizes the fact that a pair of nonzero geometric vectors are linearly dependent if and only if they point in the same direction or opposite directions.

Proposition 3: Let (\mathbf{V},\langle \cdot,\cdot \rangle) be a Euclidean space, and let \mathbf{v},\mathbf{w} be nonzero vectors in \mathbf{V}. The set \{\mathbf{v},\mathbf{w}\} is linearly dependent if and only if the angle between \mathbf{v} and \mathbf{w} is 0 or \pi.

Proof: You will prove on Assignment 2 that equality holds in the Cauchy-Schwarz inequality if and only if the vectors involved are linearly dependent. Thus, Proposition 3 is equivalent to the statement that any two nonzero vectors \mathbf{v},\mathbf{w} \in \mathbf{V} satisfy the equation

\langle \mathbf{v},\mathbf{w} \rangle^2 = \|\mathbf{v}\|^2 \|\mathbf{w}\|^2

if and only if the angle between them is 0 or \pi. Let us prove this statement.

By definition of the angle between two vectors in a Euclidean space, the above equation is equivalent to

\|\mathbf{v}\|^2\|\mathbf{w}\|^2 \cos^2 \theta = \|\mathbf{v}\|^2\|\mathbf{w}\|^2,

and dividing both sides by the nonzero number \|\mathbf{v}\|^2\|\mathbf{w}\|^2 this becomes

\cos^2 \theta = 1,

which holds for \theta \in [0,2\pi) if and only if \theta is 0 or \pi. — Q.E.D.

In Lecture 5, we will consider further ramifications of geometrical thinking in vector spaces.


Leave a Reply