Math 31AH: Lecture 21

Two basic issues in linear algebra which we have not yet resolved are:

  1. Can we multiply vectors?
  2. Can we certify linear independence?

The answers to these questions turn out to be closely related to one another. In this lecture, we discuss the first item.

Let \mathbf{V} be a vector space. We have seen one sort of multiplication of vectors, namely the scalar product \langle \mathbf{v},\mathbf{w} \rangle. One one hand, the scalar product is a proper multiplication rule in the sense that it satisfies the FOIL identity, which is referred to as bilinearity in polite company. On the other hand, the scalar product does not correspond to our usual notion of multiplication in the sense that the product of two vectors is a number, not a vector. This is strange in that one instinctively feels that the “product” of two objects should be another object of the same type. It is natural to ask whether, we can define a bilinear “vector product” which has the feature that the product of two vectors in \mathbf{V} is a vector in \mathbf{V}. In other words, we are asking whether it is possible to give some universal recipe for multiplication of vectors which would turn every vector space into an algebra.

So far, we have only seen certain specific vector spaces \mathbf{V} where a bilinear multiplication of vectors naturally presents itself. Here is a list of these spaces.

  1. \mathbf{V} = \mathbb{R}. In this case, vectors \mathbf{v} \in \mathbf{V} are real numbers, and the vector product \mathbf{v}\mathbf{w} is the product of real numbers.
  2. \mathbf{V}=\mathbb{R}^2. Technically, we have not seen this example yet, but here it is. Let \mathbf{v}=(x_1,x_2) and \mathbf{w}=(y_1,y_2) be vectors in \mathbf{V}. We then define their product to be \mathbf{v}\mathbf{w}=(x_1y_1-x_2y_2,x_1y_2+x_2y_1). Next week, we will see that this example of vector multiplication gives the complex number system.
  3. \mathbf{V}=\mathbb{R}^\infty. In this example, the vector space \mathbf{V} consists of infinite sequences \mathbf{v}=(x_0,x_1,x_2,\dots) which are identically zero after finitely many terms. This means that \mathbf{V} is isomorphic to the vector space of polynomials in a single variable. Let \mathbf{v}=(x_0,x_1,x_2,\dots) and \mathbf{w}=(y_0,y_1,y_2,\dots) be vectors in \mathbf{V}. We define their product to be \mathbf{v}\mathbf{w} = (x_0y_0,x_0y_1+x_1y_0,x_2y_0+x_1y_1+x_2y_0,\dots), which is just the recipe for multiplying polynomials and collecting together terms of the same degree.
  4. \mathbf{V}=\mathbb{R}^{n \times n}. In this example, the vector space \mathbf{V} consists of matrices with n rows and n columns. This means that \mathbf{V} is isomorphic to the vector space of linear operators on an n-dimensional vector space. A vector product in \mathbf{V} is then defined by matrix multiplication.

The above examples are quite different from one another, and they do not appear to be given by any universal recipe for defining a product of vectors. It turns out that in order to answer the question of how to define a universal vector product, it is better not to answer it at all. This is the idea behind the tensor product, which we now introduce.

To every pair of vectors \mathbf{v},\mathbf{w} \in \mathbf{V}, we associate a new vector denoted \mathbf{v} \otimes \mathbf{w}, which is called the tensor product of \mathbf{v} and \mathbf{w}. However, the vector \mathbf{v} \otimes \mathbf{w} does not reside in \mathbf{V}; rather, it is a vector in a new vector space called the tensor square of \mathbf{V} and denoted \mathbf{V} \otimes \mathbf{V}. What is happening here is that we view the symbol \otimes as a rule for multiplying two vectors, but we do not specify what this rule is — instead, we view \mathbf{v} \otimes \mathbf{w} as an “unevaluated” product of two vectors. We then store this unevaluated product in a new vector space \mathbf{V} \otimes \mathbf{V}, which contains all unevaluated products of vectors from \mathbf{V}. More precisely, the vectors in \mathbf{V} \otimes \mathbf{V} are all unevaluated expressions of the form

\tau = \mathbf{v}_1 \otimes \mathbf{w}_1 + \dots + \mathbf{v}_k \otimes \mathbf{w}_k,

where k \in \mathbb{N} is a natural number and \mathbf{v}_1,\mathbf{w}_1,\dots,\mathbf{v}_k,\mathbf{w}_k \in \mathbf{V} are vectors. These unevaluated expressions are called tensors, and often denoted by Greek letters. So tensor products are ambiguous, in the sense that we do not specify what the result of the multiplication \mathbf{v} \otimes \mathbf{w} actually is. The only thing we specify about this rule is that it is bilinear:

(a_1\mathbf{v}_1 + a_2\mathbf{v}_2) \otimes (b_1\mathbf{w}_1 + b_2\mathbf{w}_2) \\ = a_1b_1\mathbf{v}_1 \otimes \mathbf{w}_1 + a_1b_2 \mathbf{v}_1 \otimes \mathbf{w}_2 + a_2b_1 \mathbf{v}_2\otimes \mathbf{w}_1  + a_2b_2\mathbf{v}_2\otimes \mathbf{w}_2,

where the equality means that the LHS and the RHS are different expressions for the same vector in the vector space \mathbf{V} \otimes \mathbf{V}.

A tensor in \mathbf{V} \otimes \mathbf{V} which can be represented as the product of two vectors from \mathbf{V} is called a simple tensor. Note that a tensor may be simple without obviously being so, in the event that it can be “factored” as in high school algebra. For example, we have

\mathbf{v}_1 \otimes \mathbf{w}_1 + \mathbf{v}_2 \otimes \mathbf{w}_1 = (\mathbf{v}_1+\mathbf{v}_2) \otimes \mathbf{w}_1.

We haven’t yet said how to scale tensors by numbers. The rule for scalar multiplication of tensors is determined by bilinearity: it is defined by

a \mathbf{v} \otimes \mathbf{w} = (a\mathbf{v}) \otimes \mathbf{w} = \mathbf{v} \otimes (a\mathbf{w}),

and

a \sum_{i=1}^k \mathbf{v}_i \otimes \mathbf{w}_i = \sum_{i=1}^k a\mathbf{v}_i \otimes \mathbf{w}_i.

We can summarize all of the above by saying that two tensors \tau,\sigma \in \mathbf{V} \otimes \mathbf{V} are equal if and only if it is possible to rewrite \tau as \sigma using bilinearity.

Tensor products take a while to get used to. It’s important to remember that the only specified property of the tensor product is bilinearity; apart from this, it’s entirely ambiguous. So, anything we can say about tensor products must ultimately be a consequence of bilinearity. Here is an example.

Proposition 1: For any \mathbf{v} \in \mathbf{V}, we have

\mathbf{v} \otimes \mathbf{0}_\mathbf{V} = \mathbf{0}_\mathbf{V} \otimes \mathbf{v} = \mathbf{0}_\mathbf{V} \otimes \mathbf{0}_\mathbf{V}.

Proof: We are going to use the fact that scaling any vector \mathbf{v} \in \mathbf{V} by the number 0 \in \mathbb{R} produces the zero vector \mathbf{0}_\mathbf{V} \in \mathbf{V}. This was proved in Lecture 1, when we discussed the definition of a vector space. We have

\mathbf{v} \otimes \mathbf{0}_\mathbf{V} = \mathbf{v} \otimes (0\mathbf{0}_\mathbf{V}) = (0\mathbf{v}) \otimes \mathbf{0}_\mathbf{V} = \mathbf{0}_\mathbf{V} \otimes \mathbf{0}_\mathbf{V}.

Notice that bilinearity was used here to move the scalar zero from the second factor in the tensor product to the first factor in the tensor product. The proof that \mathbf{0}_\mathbf{V} \otimes \mathbf{v} = \mathbf{0}_\mathbf{V} \otimes \mathbf{0}_\mathbf{V} is essentially the same (try it!).

— Q.E.D.

Using Proposition 1, we can explicitly identify the “zero tensor,” i.e. the zero vector \mathbf{0}_{\mathbf{V} \otimes \mathbf{V}} in the vector space \mathbf{V} \otimes \mathbf{V}.

Proposition 2: We have \mathbf{0}_{\mathbf{V} \otimes \mathbf{V}}=\mathbf{0}_{\mathbf{V}} \otimes \mathbf{0}_{\mathbf{V}}.

Proof: Let

\tau = \sum_{i=1}^k \mathbf{v}_i \otimes \mathbf{w}_i

be any tensor. We want to prove that \tau+\mathbf{0}_\mathbf{V} \otimes \mathbf{0}_\mathbf{V} = \tau.

In the case k=1, we have \tau = \mathbf{v}_1 \otimes \mathbf{w}_1. Using bilinearity, we have

\mathbf{v}_1 \otimes \mathbf{w}_1 + \mathbf{0}_{\mathbf{V}} \otimes \mathbf{0}_{\mathbf{V}} = \mathbf{v}_1 \otimes \mathbf{w}_1 + \mathbf{0}_{\mathbf{V}} \otimes \mathbf{w}_1 = (\mathbf{v}_1+\mathbf{0}_\mathbf{V}) \otimes \mathbf{w}_1 = \mathbf{v}_1 \otimes \mathbf{w}_1,

where we used Proposition 1 and bilinearity.

The case k>1 now follows from the case k=1,

\tau + \mathbf{0}_{\mathbf{V}} \otimes \mathbf{0}_{\mathbf{V}} = \sum_{i=1}^k \mathbf{v}_i \otimes \mathbf{w}_i + \mathbf{0}_{\mathbf{V}} \otimes \mathbf{0}_{\mathbf{V}} = \sum_{i=1}^{k-1} \mathbf{v}_i \otimes \mathbf{w}_i + \left(a_k\mathbf{v}_k \otimes \mathbf{w}_k + \mathbf{0}_{\mathbf{V}} \otimes \mathbf{0}_{\mathbf{V}}\right) = \sum_{i=1}^{k-1} \mathbf{v}_i \otimes \mathbf{w}_i + a_k\mathbf{v}_k \otimes \mathbf{w}_k = \tau.

— Q.E.D.

Suppose now that \mathbf{V} is a Euclidean space, i.e. it comes with a scalar product \langle \cdot,\cdot \rangle. Then, there is an associated scalar product on the vector space \mathbf{V} \otimes \mathbf{V}, which by abuse of notation we also write as \langle \cdot,\cdot \rangle. This natural scalar product on \mathbf{V} \otimes \mathbf{V} is uniquely determined by the requirement that

\langle \mathbf{v}_1 \otimes \mathbf{w}_1,\mathbf{v}_2 \otimes \mathbf{w}_2 \rangle = \langle \mathbf{v}_1,\mathbf{v}_2\rangle \langle \mathbf{w}_1,\mathbf{w}_2\rangle, \quad \forall \mathbf{v}_1,\mathbf{v}_2,\mathbf{w}_1,\mathbf{w}_2 \in \mathbf{V}.

Exercise 1: Verify that the scalar product on \mathbf{V} \otimes \mathbf{V} just defined really does satisfy the scalar product axioms.

Proposition 3: If S is an orthogonal set of vectors in \mathbf{V}, then

S \otimes S = \{\mathbf{v} \otimes \mathbf{w} \colon \mathbf{v},\mathbf{w} \in S\}

is an orthogonal set of tensors in \mathbf{V} \otimes \mathbf{V}.

Proof: We must show that if \mathbf{v}_1 \otimes \mathbf{w}_1,\mathbf{v}_2 \otimes \mathbf{w}_2 \in S \otimes S are different tensors, then their scalar product is zero. We have

\langle \mathbf{v}_1 \otimes \mathbf{w}_1,\mathbf{v}_2 \otimes \mathbf{w}_2 \rangle = \langle \mathbf{v}_1,\mathbf{v}_2\rangle \langle \mathbf{w}_1,\mathbf{w}_2\rangle.

The assumption that these tensors are different is equivalent to saying that one of the following conditions holds:

\mathbf{v}_1 \neq \mathbf{v}_2 \text{ or } \mathbf{w}_1 \neq \mathbf{w}_2.

Since S is an orthogonal set, the first possibility implies \langle \mathbf{v}_1,\mathbf{v}_2 \rangle =0, and the second implies \langle \mathbf{w}_1,\mathbf{w}_2 \rangle = 0. In either case, the product \langle \mathbf{v}_1,\mathbf{v}_2\rangle \langle \mathbf{w}_1,\mathbf{w}_2\rangle is equal to zero.

— Q.E.D.

Theorem 1: If E=\{\mathbf{e}_1,\dots,\mathbf{e}_n\} is an orthonormal basis in \mathbf{V}, then E \otimes E = \{\mathbf{e}_i \otimes \mathbf{e}_j \colon 1 \leq i,j \leq n\} is an orthonormal basis in \mathbf{V} \otimes \mathbf{V}.

Proof: Let us first show that E \otimes E spans \mathbf{V} \otimes \mathbf{V}. We have

\tau = \sum\limits_{k=1}^l a_k \mathbf{v}_k \otimes \mathbf{w}_k = \sum\limits_{k=1}^l a_k \left(\sum\limits_{i=1}^n \langle \mathbf{e}_i,\mathbf{v}_k\rangle\mathbf{e}_i \right) \otimes \left(\sum\limits_{j=1}^n \langle \mathbf{e}_j,\mathbf{w}_k\rangle\mathbf{e}_j \right) \\= \sum\limits_{i,j=1}^n \left(\sum\limits_{k=1}^n a_k \langle \mathbf{e}_i,\mathbf{v}_k\rangle\langle \mathbf{e}_j,\mathbf{w}_k\rangle\right)\mathbf{e}_i \otimes \mathbf{e}_j,

which shows that an arbitrary tensor is a linear combination of the tensors \mathbf{e}_i \otimes \mathbf{e}_j.

Since E is an orthogonal set in \mathbf{V}, by Proposition 3 we have that E \otimes E is an orthogonal set in \mathbf{V} \otimes \mathbf{V}, and therefore it is linearly independent.

It remains only to show that all tensors in E \otimes E have unit length. This is established by direct computation:

\|\mathbf{e}_i \otimes \mathbf{e}_i \| = \langle \mathbf{e}_i\otimes \mathbf{e}_i,\mathbf{e}_i \otimes \mathbf{e}_i \rangle = \langle \mathbf{e}_i,\mathbf{e}_i \rangle\langle \mathbf{e}_i,\mathbf{e}_i \rangle= 1.

— Q.E.D.

Corollary 1: If \dim \mathbf{V} = n, then \dim \mathbf{V} \otimes \mathbf{V} = n^2.

It is important to note that the tensor product is noncommutative: it is typically not the case that \mathbf{v} \otimes \mathbf{w} = \mathbf{w} \otimes \mathbf{v}. However, we can decompose a simple tensor into two pieces, as

\mathbf{v} \otimes \mathbf{w} = \frac{\mathbf{v} \otimes \mathbf{w} + \mathbf{w} \otimes \mathbf{v}}{2} + \frac{\mathbf{v} \otimes \mathbf{w} - \mathbf{w} \otimes \mathbf{v}}{2}.

The first of these fractions is called the “symmetric part” of \mathbf{v} \otimes \mathbf{w}, and is denoted

\mathbf{v} \vee \mathbf{w} := \frac{\mathbf{v} \otimes \mathbf{w} + \mathbf{w} \otimes \mathbf{v}}{2}.

The reason for this notation is that we can think of \vee as a symmetric version of the tensor product: a bilinear multiplication of vectors that, by construction, is commutative:

\mathbf{v} \vee \mathbf{w} = \mathbf{w} \vee \mathbf{v}.

Note that if \mathbf{v}=\mathbf{w}, the symmetric tensor product produces the same tensor as the tensor product itself:

\mathbf{v} \vee \mathbf{v} = \mathbf{v} \otimes \mathbf{v}.

The second fraction above is called the “antisymmetric part” of \mathbf{v} \otimes \mathbf{w}, and denoted

\mathbf{v} \wedge \mathbf{w} := \frac{\mathbf{v} \otimes \mathbf{w} - \mathbf{w} \otimes \mathbf{v}}{2}.

This is an antisymmetric version of the tensor product in that, by construction, satisfies

\mathbf{v} \wedge \mathbf{w} = -\mathbf{w} \wedge \mathbf{v}.

Note that the antisymmetric tensor product of any vector with itself produces the zero tensor:

\mathbf{v} \wedge \mathbf{v} = \mathbf{0}_{\mathbf{V} \otimes \mathbf{V}}.

Although it may seem like the symmetric tensor product is more natural (commutative products are nice), it turns out that the antisymmetric tensor product — or wedge product as it’s often called — is more important. Here is a first indication of this. Suppose that \mathbf{V} is a 2-dimensional Euclidean space with orthonormal basis \{\mathbf{e}_1,\mathbf{e}_2\}. Let

\mathbf{v}_1 = a_{11}\mathbf{e}_1 + a_{12}\mathbf{e}_2 \quad\text{ and }\quad \mathbf{v}_2 = a_{21}\mathbf{e}_1 + a_{22}\mathbf{e}_2

be two vectors in \mathbf{V}. Let’s compute their wedge product: using FOIL, we find

\mathbf{v}_1 \wedge \mathbf{v}_2 \\ = (a_{11}\mathbf{e}_1 + a_{12}\mathbf{e}_2) \wedge (a_{21}\mathbf{e}_1 + a_{22}\mathbf{e}_2) \\ = (a_{11}\mathbf{e}_1) \wedge (a_{21}\mathbf{e}_1) + (a_{11}\mathbf{e}_1) \wedge (a_{22}\mathbf{e}_2) + (a_{12}\mathbf{e}_2)\wedge (a_{21}\mathbf{e}_1) + (a_{12}\mathbf{e}_2) \wedge (a_{22}\mathbf{e}_2) \\ = a_{11}a_{21} \mathbf{e}_1 \wedge \mathbf{e}_1 + a_{11}a_{22}\mathbf{e}_1 \wedge \mathbf{e}_2 + a_{12}a_{21} \mathbf{e}_2 \wedge \mathbf{e}_1 + a_{12}a_{22}\mathbf{e}_2 \wedge \mathbf{e}_2 \\ = a_{11}a_{22}\mathbf{e}_1 \wedge \mathbf{e}_2 + a_{12}a_{21} \mathbf{e}_2 \wedge \mathbf{e}_1  \\ = a_{11}a_{22}\mathbf{e}_1 \wedge \mathbf{e}_2 - a_{12}a_{21} \mathbf{e}_1 \wedge \mathbf{e}_2 \\ = (a_{11}a_{22}\mathbf{e}_1 - a_{12}a_{21}) \mathbf{e}_1 \wedge \mathbf{e}_2.

Probably, you recognize the lone scalar (a_{11}a_{22}\mathbf{e}_1 - a_{12}a_{21}) remaining at the end of this computation as a determinant:

\begin{vmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{vmatrix} = a_{11}a_{22}-a_{21}a_{12}.

Even if you don’t, no need to worry: you are not expected to know what a determinant is at this point. Indeed, in Lecture 22 we are going to use the wedge product to define determinants.

Lecture 21 coda

1 Comment

Leave a Reply