Math 31AH: Lecture 8

In Lecture 4, we introduced the notion of a Euclidean space, which is a vector space \mathbf{V} together with a scalar product \langle \cdot,\cdot \rangle defined on \mathbf{V}. In a Euclidean space, we can use the scalar product to define a notions of vector length, distance between two vectors, and angle between two vectors. In short, while a vector space alone is a purely algebraic object, we can do Euclidean geometry in a vector space with a scalar product. This realization is extremely useful since it gives us a way to think geometrically about vectors which may not be at all like vectors in \mathbb{R}^n. For example, they could be functions, as in Assignment 3.

For better or worse, it turns out that Euclidean geometry, as useful as it is in this generalized setup, is not sufficient to describe the world around us. Mathematically, this means that we must sometimes think about non-Euclidean geometry. At the level of linear algebra, this comes down to opening ourselves up to thinking about general bilinear forms, which extend the scalar product concept in that they might fail to satisfy the symmetry and positivity axioms. An important example is the geometry of special relativity. In this physical theory, the vector space \mathbf{R}^4=\{(x_1,x_2,x_3,x_4) \colon x_i \in \mathbf{R}\} is taken to model spacetime, with the first three coordinates of a vector corresponding to its position in space, and the last coordinate being its position in time. It turns out that the geometry of spacetime is governed by a “fake” scalar product, called the Lorentz form, which is defined by

\langle (x_1,x_2,x_3,x_4),(y_1,y_2,y_3,y_4) \rangle = x_1y_1+x_2y_2+x_3y_3-x_4y_4.

So, physicists are telling us that in order to understand the geometry of spacetime we have to think about a strange version of the usual dot product on \mathbb{R}^4 which is made by taking the usual dot product of the spatial coordinates, and then subtracting the product of the time coordinates — typical, they always do this kind of thing. The Lorentz form is definitely not a scalar product, since the length of a vector can be negative:

\|(0,0,0,1)\| = -1.

Still, mathematically, there’s no reason we can’t consider such fake scalar products as a legitimate generalization of the scalar product concept.

Definition 1: A function \langle \cdot,\cdot \rangle \colon \mathbf{V} \times \mathbf{V} \to \mathbb{R} is said to be a bilinear form if, for all vectors \mathbf{v}_1,\mathbf{v}_2,\mathbf{w}_1,\mathbf{w}_2 \in \mathbf{V} and all scalars s_1,s_2,t_1,t_2 \in \mathbb{R}, we have

\langle s_1\mathbf{v}_1+s_2\mathbf{v}_2,t_1\mathbf{w}_1+t_2\mathbf{w}_2 \rangle \\ = s_1t_1\langle\mathbf{v}_1,\mathbf{w}_1\rangle + s_1t_2 \langle\mathbf{v}_1,\mathbf{w}_2\rangle + s_2t_1\langle\mathbf{v}_2,\mathbf{w}_1\rangle + s_2t_2\langle\mathbf{v}_2,\mathbf{w}_2)\rangle.

So, a bilinear form is just a “weak” scalar product on \mathbf{V} which might fail two out of three of the scalar product axioms.

In this lecture, we will see that the set of all bilinear forms that can be defined on an n-dimensional vector space \mathbf{V} can be viewed as the set of all tables of real numbers with n rows and n columns, or in other words n \times n matrices. In fact, it is not difficult to come to this realization — we just have to pick a basis in A=\{\mathbf{a}_1,\dots,\mathbf{a}_n\} in \mathbf{V} in order to describe a given bilinear form as a matrix. Things however get a bit tricky when we want to compare the two matrices which describe the same bilinear form relative to different bases.

Let’s start with something easier.

Definition 2: A function \langle \cdot \rangle \colon \mathbf{V} \to \mathbb{R} is said to be a linear form if, for all vectors \mathbf{v}_1,\mathbf{v}_2 and all scalars s_1,s_2 \in \mathbb{R}, we have

\langle s_1\mathbf{v}_1 + s_2\mathbf{v}_2 \rangle = s_1\langle \mathbf{v}_1\rangle + s_2\langle \mathbf{v}_2 \rangle.

Now suppose that \langle \cdot \rangle is a linear form on an n-dimensional vector space \mathbf{V}. Then, in order to be able to compute the number \langle \mathbf{v} \rangle for any vector \mathbf{v} \in \mathbf{V}, it is sufficient to know how to calculate the numbers

a_1=\langle \mathbf{a}_1 \rangle,\dots,a_n=\langle \mathbf{a}_n \rangle,

where A=\{\mathbf{a}_1,\dots,\mathbf{a}_n\} is a basis of \mathbf{V}. Indeed, in order to compute \langle \mathbf{v} \rangle from this information, we simply write \mathbf{v} as a linear combination of basis vectors,

\mathbf{v} = x_1\mathbf{a}_1+ \dots + x_n\mathbf{a}_n,

and then compute

\langle \mathbf{v} \rangle = \langle x_1\mathbf{a}_1+ \dots + x_n\mathbf{a}_n\rangle \\ = x_1\langle \mathbf{a}_1\rangle + \dots + x_n\langle \mathbf{a}_n\rangle \\ = x_1a_1 + \dots + x_na_n.

Note that this has a very simple description in terms of the usual dot product in \mathbf{R}^n, namely

\langle \mathbf{v} \rangle =(x_1,\dots,x_n) \cdot (a_1,\dots,a_n).

Equivalently, the number \langle \mathbf{v} \rangle is computed as the product of a 1 \times n matrix and an n\times 1 matrix:

\langle \mathbf{v} \rangle = \begin{bmatrix} a_1 & \dots & a_n \end{bmatrix} \begin{bmatrix} x_1 \\ \vdots \\ x_n \end{bmatrix}.

We can write this more succinctly as the matrix equation

\langle \mathbf{v} \rangle = \mathbf{A}\mathbf{x},

where \mathbf{a} and \mathbf{x} are the only things they could be based on context. The 1 \times n matrix \mathbf{A} in this equation is referred to as the matrix of the linear form \langle \cdot \rangle relative to the basis A, and its entries are just the values of the form on each of the basis vectors. Not too complicated.

Essentially the same idea works for bilinear forms: in order to know how to compute the number \langle \mathbf{v},\mathbf{w} \rangle for any two vectors \mathbf{v},\mathbf{w} \in \mathbf{V}, it is sufficient to know how to compute the n^2 numbers

\begin{matrix} a_{11} = \langle \mathbf{a}_1,\mathbf{a}_1 \rangle & \dots & a_{1n}=\langle \mathbf{a}_1,\mathbf{a}_n \rangle \\ \vdots & {} & \vdots \\ a_{n1}=\langle \mathbf{a}_n,\mathbf{a}_1 \rangle & \dots & a_{nn} = \langle \mathbf{a}_n,\mathbf{a}_n \rangle\end{matrix}

relative to a basis A=\{\mathbf{a}_1,\dots,\mathbf{a}_n\}. Indeed, if we have access to this table of numbers, then to compute \langle \mathbf{v},\mathbf{w}\rangle for given \mathbf{v},\mathbf{w} \in \mathbf{V}, we first write these vectors as linear combinations of basis vectors,

\mathbf{v} = x_1\mathbf{a}_1 + \dots + x_n\mathbf{a}_n \\ \mathbf{w}=y_1\mathbf{a}_1 + \dots + y_n\mathbf{a}_n,

and then calculate using bilinearity:

\langle \mathbf{v},\mathbf{w} \rangle = \left\langle \sum_{i=1}^n x_i\mathbf{a}_i,\sum_{j=1}^n y_j \mathbf{a}_j \right\rangle = \sum_{i,j=1}^n x_i \langle \mathbf{a}_i,\mathbf{a}_j \rangle y_j =\sum_{i,j=1}^n x_i a_{ij} y_j.

Once again, the result of this calculation can be expressed in terms of matrices, namely as the product three matrices: an 1 \times n matrix, an n \times n matrix, and an n \times 1 matrix. Here’s how this looks:

\langle \mathbf{v},\mathbf{w} \rangle =\begin{bmatrix} x_1 & \dots & x_n \end{bmatrix} \begin{bmatrix} a_{11} & \dots & a_{1n} \\ \vdots & {} & \vdots \\ a_{n1} & \dots & a_{nn} \end{bmatrix} \begin{bmatrix} y_1 \\ \vdots \\ y_n \end{bmatrix}.

This formula is often written

\langle \mathbf{v},\mathbf{w} \rangle = \mathbf{x}^T\mathbf{A}\mathbf{y},

where the symbols \mathbf{x},\mathbf{A},\mathbf{y} are the only things they could possibly be based on context. In particular, the n \times n matrix \mathbf{A}=[a_{ij}]_{i,j=1}^n is referred to as the matrix of the bilinear form \langle \cdot,\cdot \rangle relative to the basis A=\{\mathbf{a}_1,\dots,\mathbf{a}_n\}.

Now we come to the issue of dependence on the choice of basis. This is easily worked out for linear forms, but is a little more complex for bilinear forms.

Let A=\{\mathbf{a}_1,\dots,\mathbf{a}_n\} and B=\{\mathbf{b}_1,\dots,\mathbf{b}_n\} be two bases in the same vector space \mathbf{V}, and let \langle \cdot \rangle be a linear form on \mathbf{V}. Let

\mathbf{A}=\begin{bmatrix} a_1 \\ \vdots \\ a_n \end{bmatrix} \quad\text{ and }\quad \mathbf{B} = \begin{bmatrix} b_1 \\ \vdots \\ b_n \end{bmatrix}

be the matrices which represent the form \langle \cdot \rangle relative to the two bases A,B. We want to discern the relationship between these two matrices. We follow the same Marie Kondo approved out with the old, in with the new strategy as in Lecture 4: we write vectors of the “old” basis A as linear combinations of the vectors of the new basis B,

\mathbf{a}_1 = p_{11}\mathbf{b}_1 + \dots + p_{n1}\mathbf{b}_n \\ \vdots \\ \mathbf{a}_n = p_{1n}\mathbf{b}_1 + \dots + p_{nn}\mathbf{b}_n.

Now we evaluate the linear form \langle \cdot \rangle on both sides of each of these vector equations, to get the scalar equations

a_1 = p_{11}b_1 + \dots + p_{n1}b_n \\ \vdots \\ a_n = p_{1n}b_1 + \dots + p_{nn}b_n.

These scalar equations can be written as the single matrix equation

\begin{bmatrix} a_1 \\ \vdots \\ a_n \end{bmatrix} = \begin{bmatrix} p_{11} & \dots & p_{1n} \\ \vdots & {} & \vdots \\ p_{n1} & \dots & p_{nn} \end{bmatrix}\begin{bmatrix} b_1 \\ \vdots \\ b_n \end{bmatrix},

or more briefly as

\mathbf{A} = \mathbf{P}\mathbf{B},

where \mathbf{P}=[p_{ij}]_{i,j=1}^n. And that’s it — that’s change of basis for linear forms.

Although the end result is slightly more complicated, the strategy for working out the relationship between the matrices \mathbf{A} and \mathbf{B} representing the same bilinear form \langle \cdot,\cdot \rangle relative to two (possibly) different bases A=\{\mathbf{a}_1,\dots,\mathbf{a}_n\} and B=\{\mathbf{b}_1,\dots,\mathbf{b}_n\} is the same: out with the old, in with the new. Just as in the case of a linear form, the first step is to write

\mathbf{a}_1 = p_{11}\mathbf{b}_1 + \dots + p_{n1}\mathbf{b}_n \\ \vdots \\ \mathbf{a}_n = p_{1n}\mathbf{b}_1 + \dots + p_{nn}\mathbf{b}_n.

Now we consider the numbers a_{ij} = \langle \mathbf{a}_i,\mathbf{a}_j \rangle. We have

a_{ij} = \left\langle \sum_{k=1}^n p_{ki}\mathbf{b}_i,\sum_{l=1}^n p_{lj}\mathbf{b}_j \right\rangle = \sum_{k,l=1}^n p_{ki} \langle \mathbf{b}_i,\mathbf{b}_j \rangle p_{lj}=\sum_{k,l=1}^n p_{ki} b_{ij} p_{lj}.

Although it may take a little bit of experimentation (try it out for n=2,3), the above is fairly easily seen to be equivalent to the matrix equation


where \mathbf{P}^T is the transpose of the n \times n matrix \mathbf{P}.

That’s it for this lecture, and next time we will do more interesting things with bilinear forms, aka generalized scalar products. Although the above change of basis formulas are presented in any standard course in linear algebra, my personal opinion is that they aren’t too important. If you find them easy to remember, excellent; more important is the ability to re-derive them whenever you want, since this means you understand why they are what they are. My hope is that you will understand the meaning of linear and bilinear forms conceptually, which doesn’t require calculating their matrices relative to a particular basis.

To drive the above point home, let us close this lecture by remarking that there’s no need to stop at bilinear forms. Why not keep going to trilinear forms? Indeed, for any k \in \mathbb{N}, one may define a k-linear form on a given vector space \mathbf{V} to be any function real-valued function of k arguments on \mathbf{V},

\langle \underbrace{\cdot,\dots,\cdot}_{k \text{ arguments}} \rangle \colon \underbrace{\mathbf{V} \times \dots \times \mathbf{V}}_{k \text{ copies}} \to \mathbb{R},

which is a linear function of each argument. Conceptually, this isn’t any more complicated than a bilinear form. However, to represent such a function we need to use a k-dimensional array of numbers, which is often referred to as a $k$-dimensional tensor. In particular, a 1-dimensional tensor is a list, and a 2-dimensional tensor is a matrix. In general, change of basis formulas for k-dimensional tensors are quite messy and not very meaningful.

Lecture 9 video

1 Comment

Leave a Reply