Let be a vector space. In Lecture 2, we proved that is -dimensional if and only if every basis in consists of vectors. Suppose that is a basis of Then, every vector can be represented as a linear combination of vectors in
and this representation is unique. A natural question is then the following: if is a second basis of and
is the representation of as a linear combination of vectors in what is the relationship between the numbers and the numbers ? Since these two lists of numbers are the coordinates of the same vector but with respect to (possibly) different bases, it is reasonable to expect that they should be related to one another in a structured way. We begin this lecture by working out this relationship precisely.
We follow a strategy which would be acceptable to Marie Kondo: out with the old, in with the new. Let us call the “old” basis, and the “new” basis. Let us do away with the old basis vectors by expressing them in terms of the new basis, writing
where, for each ,
is the coordinate vector of the old basis vector relative to the new basis
We now return to the first equation above, which expresses our chosen vector in terms of the old basis. Replacing the vectors of the old basis with their representations relative to the new basis, we have
which we can compress even more if we use Sigma notation twice:
Now, since the representation
of relative to is unique, we find that
This list of formulas answers our original question: it expresses the “new” coordinates in terms of the “old” coordinates A good way to remember these formulas is to rewrite them using the familiar dot product of geometric vectors in In terms of the dot product, the above formulas become
Usually, this collection of formulas is packaged as a single matrix equation:
In fact, this process of changing from the old coordinates of a vector relative to the old basis to the new coordinates of this same vector relative to the new basis explains why the product of an matrix and an matrix is defined in the way that it is: the definition is made so that we can write
with the matrix whose -entry is
Let us summarize the result of the above calculation. We have a vector belonging to a finite-dimensional vector space and we have two bases and of Let be the matrix whose entries are the coordinates of relative to the old basis and let denote the matrix whose entries are the coordinates of this same vector relative to the new basis We want to write down an equation which relates the matrices and The equation is
is the “transition matrix” whose th column is the matrix consisting of the coordinates of the old basis vector relative to the new basis
Let’s look at a two-dimensional example. In the standard basis is where and Suppose now that we wish to get creative and write the vectors of in terms of the alternative basis where but This corresponds to using coordinate axes which, instead of being a pair of perpendicular lines, are a pair of lines at a angle to one another — pretty wild. What are the coordinates of a given vector in when we use these tilted axes? Let us answer this question using the above recipe. We need to express the vectors of the old basis in terms of the vectors of the new basis This is easy: by inspection, we have
This means that our transition matrix is the matrix
We conclude that the coordinates of in the new basis are given by
In the course of the above discussion, we have seen that the familiar dot product of geometric vectors is useful in the context of general vector spaces. This raises the question of whether the dot product itself can be generalized. The answer is yes, and the concept which generalizes the dot product by capturing its basic features is the following.
Definition 1: Let be a vector space. A scalar product on is a function
- For any and we have
- For any we have
- For any we have with equality if and only if
Let us consider why the operation introduced in Defintion 1 is called a “scalar product.” First, it’s called a “product” because it takes two vectors and produces from them the new entity Second, this new entity is not a vector, but a scalar — hence, is the “scalar product” of and What about the axioms? These are obtained by extracting the basic features of the dot product of geometric vectors: it is “bilinear,” which means that one has the usual FOIL identity
for expanding brackets; it is “symmetric,” in the sense that
and it is “positive definite,” meaning that
with equality if and only if is the zero vector. Definition 1 takes these properties and lifts them to the setting of an abstract vector space to form the scalar product concept, of which the dot product becomes a special case.
Definition 2: A pair consisting of a vector space together with a scalar product is called a Euclidean space.
Why is a vector space equipped with a scalar product called a Euclidean space? In the familiar vector space the basic notions of Euclidean geometry — length and angle — can be expressed algebraically, in terms of the dot product. More precisely, the length of a vector is given by
where denotes the nonnegative square root of a nonnegative real number, and the angle between two vectors and is related to the dot product via
We can mimic these algebraic formulas to define the concepts of length and angle in an abstract Euclidean space — we define the length of a vector by the formula
and we define the angle between two vectors to be the number determined by the formula
Let us examine these definitions more carefully. First, the quantity which generalizes the length of a geometric vector is usually called the “norm” of in order to distinguish it from the original notion of length, which it generalizes. If the vector norm is a good generalization of geometric length, then it should have some of the main properties of the original concept; in particular, it should be nonnegative, and the only vector of length zero should be the zero vector. In order for these properties to hold in every possible Euclidean space, we must be able to deduce them solely from the axioms defining the scalar product.
Proposition 1: Let be a Euclidean space. For any vector , we have and equality holds if and only if
Proof: From the definition of vector norm and the first scalar product axiom, we have that
is the square root of a nonnegative number, and hence is itself nonnegative. Moreover, in order for to hold for a nonnegative real number it must be the case that , and from the second scalar product axiom we have if and only if — Q.E.D.
Now we consider the algebraic definition of the angle between two vectors in a Euclidean space As you are aware, for any number we have Thus, for our definition of angle to be valid, we need the following proposition — which is known as the Cauchy-Schwarz inequality — to follow from the scalar product axioms.
Proposition 2: Let be a Euclidean space. For any we have
Proof: We begin by noting that the claimed double inequality is equivalent to the single inequality
which is in turn equivalent to
We will prove that this third form of the claimed inequality is true.
Let be any two vectors in If either of or is the zero vector, then by the third scalar product axiom (positive definiteness) both sides of the above inequality are zero, and we get the true expression It remains to prove the inequality in the case that neither nor is the zero vector.
Consider the function of a variable defined by
We can expand this using the first scalar product axiom (bilinearity), and we get
Using the second scalar product axiom (symmetry), this simplifies to
We see that the function is a polynomial of degree two, i.e. it has the form
Note that we can be sure because Thus the graph of the function is an upward-opening parabola. Moreover, since
this parabola either lies strictly above the horizontal axis, or is tangent to it. Equivalently, the quadratic equation
has either no real roots (parabola strictly above the horizontal axis), or two identical real roots (parabola tangent to the horizontal axis). We can differentiate between the two cases using the discriminant of this quadratic equation, i.e. the number
which is the square root part of the familiar quadratic formula
More precisely, if the discriminant is negative the corresponding quadratic equation has no real solutions, and if it is zero then the equation has a unique solutions. In the case we get
which gives us the inequality
which verifies the inequality we’re trying to prove in this case. In the case, we get instead
So, in all cases the claimed inequality
holds true. — Q.E.D.
The upshot of the above discussion is that the concepts of length and angle are now well-defined in the setting of a general Euclidean space So, even though the vectors in such a space need not be geometric vectors, we can use geometric intuition and analogies when thinking about them. A simple example is the following natural proposition, which generalizes the fact that a pair of nonzero geometric vectors are linearly dependent if and only if they point in the same direction or opposite directions.
Proposition 3: Let be a Euclidean space, and let be nonzero vectors in The set is linearly dependent if and only if the angle between and is or
Proof: You will prove on Assignment 2 that equality holds in the Cauchy-Schwarz inequality if and only if the vectors involved are linearly dependent. Thus, Proposition 3 is equivalent to the statement that any two nonzero vectors satisfy the equation
if and only if the angle between them is or Let us prove this statement.
By definition of the angle between two vectors in a Euclidean space, the above equation is equivalent to
and dividing both sides by the nonzero number this becomes
which holds for if and only if is or — Q.E.D.
In Lecture 5, we will consider further ramifications of geometrical thinking in vector spaces.