Math 31AH: Lecture 22

Let us now use the symmetric and antisymmetric tensor products to define two subspaces of the tensor square \mathbf{V} \otimes \mathbf{V} which store “unevaluated” symmetric and antisymmetric tensor products of vectors from \mathbf{V}. The symmetric square of \mathbf{V} is the subspace \mathbf{V} \vee \mathbf{V} of \mathbf{V} \otimes \mathbf{V} spanned by all symmetric tensor products

\mathbf{v}_1 \vee \mathbf{v}_2, \quad \mathbf{v}_1,\mathbf{v}_2 \in \mathbf{V}.

Elements of \mathbf{V} \vee \mathbf{V} are called symmetric tensors. Similarly, the antisymmetric square of \mathbf{V} is the subspace \mathbf{V} \wedge \mathbf{V} of \mathbf{V} \otimes \mathbf{V} spanned by all antisymmetric tensor products,

\mathbf{v}_1 \wedge \mathbf{v}_2, \quad \mathbf{v}_1,\mathbf{v}_2 \in \mathbf{V}.

Elements of \mathbf{V} \vee \mathbf{V} are called antisymmetric tensors.

All of what we have said above can be generalized in a natural way to products of more than two vectors. More precisely, for any natural number \mathbf{d} \in \mathbb{N}, we can define the dth tensor power of the vector space \mathbf{V} to be the new vector space \mathbf{V}^{\otimes d} spanned by all “unevaluated” products

\mathbf{v}_1 \otimes \dots \otimes \mathbf{v}_d

of d vectors \mathbf{v}_1,\dots,\mathbf{v}_d. The only feature of such multiple unevaluated products is that they are “multilinear,” which really just means that they behave like ordinary products (sans commutativity). For example, in the case d=3, this just means that we have the following three identities in the vector space \mathbf{V}^{\otimes 3}: for any scalars a_1,a_2 \in \mathbb{R}

(a_1\mathbf{u}_1 + a_2\mathbf{u}_2) \otimes \mathbf{v} \otimes \mathbf{w} = a_1\mathbf{u}_1 \otimes \mathbf{v} \otimes \mathbf{w} + a_2\mathbf{u}_2 \otimes \mathbf{v} \otimes \mathbf{w}

for all \mathbf{u}_1,\mathbf{u}_2,\mathbf{v},\mathbf{w} \in \mathbf{V}, and

\mathbf{u} \otimes (a_1\mathbf{v}_1 + a_2\mathbf{v}_2) \otimes \mathbf{w} = a_1 \mathbf{u} \otimes \mathbf{v}_1 \otimes \mathbf{w} + a_2\mathbf{u} \otimes \mathbf{v}_2 \mathbf{w}

for all \mathbf{u},\mathbf{v}_1,\mathbf{v}_2,\mathbf{w} \in \mathbf{V}, and

\mathbf{u} \otimes \mathbf{v} \otimes (a_1\mathbf{w}_1 + a_2\mathbf{w}_2) = a_1 \mathbf{u} \otimes \mathbf{v} \otimes \mathbf{w}_1 + a_2 \mathbf{u} \otimes \mathbf{v} \otimes \mathbf{w}_2

for all \mathbf{u},\mathbf{v},\mathbf{w}_1,\mathbf{w}_2 \in \mathbf{V}. If \mathbf{V} comes with a scalar product \langle \cdot,\cdot \rangle, we can use this to define a scalar product on \mathbf{V}^{\otimes d} in a very simple way by declaring

\langle \mathbf{v}_1 \otimes \dots \otimes \mathbf{v}_d,\mathbf{w}_1 \otimes \dots \otimes \mathbf{w}_d \rangle = \langle \mathbf{v}_1,\mathbf{w}_1 \rangle \dots \langle \mathbf{v}_d. \mathbf{w}_d\rangle.

Even better, we can use the scalar product so defined to construct an orthonormal basis of \mathbf{V}^{\otimes d} from a given orthonormal basis \mathbf{E}=\{\mathbf{e}_1,\dots,\mathbf{e}_n\}: such a basis is simply given by all tensor products with d factors such that each factor is a vector in \mathbf{V}. More precisely, these are the tensors

\mathbf{e}_{i(1)} \otimes \mathbf{e}_{i(2)} \otimes \dots \otimes \mathbf{e}_{i(d)}, \quad i \in \mathrm{Fun}(d,N),

where \mathrm{Fun}(d,N) is a fun notation for the set of all functions

i \colon \{1,\dots,d\} \to \{1,\dots,N\}.

In particular, since the cardinality of \mathrm{Fun}(d,N) is N^d (make N choices d times), the dimension of the vector space \mathbf{V}^{\otimes d} is N^d.

Example 1: If \mathbf{V} is a 2-dimensional vector space with orthonormal basis \{\mathbf{e}_1,\mathbf{e}_2\}, then an orthonormal basis of \mathbf{V}^{\otimes 3} is given by the tensors

\mathbf{e}_1 \otimes \mathbf{e}_1 \otimes \mathbf{e}_1, \\ \mathbf{e}_1 \otimes \mathbf{e}_1 \otimes \mathbf{e}_2, \mathbf{e}_1 \otimes \mathbf{e}_2 \otimes \mathbf{e}_1,\mathbf{e}_2 \otimes \mathbf{e}_1 \otimes \mathbf{e}_1, \\ \mathbf{e}_1 \otimes \mathbf{e}_2 \otimes \mathbf{e}_2, \mathbf{e}_2 \otimes \mathbf{e}_1 \otimes \mathbf{e}_2, \mathbf{e}_2 \otimes \mathbf{e}_2 \otimes \mathbf{e}_1, \\ \mathbf{e}_2 \otimes \mathbf{e}_2 \otimes \mathbf{e}_2.

We now define the d-fold symmetric and antisymmetric tensor products. These products rely on the concept of permutations.

Reading Assignment: Familiarize yourself with permutations. What is important for our purposes is that you understand how to multiply permutations, and that you understand what the sign of a permutation is. Feel free to ask questions as needed.

Definition 1: For any d \in \mathbb{N}, and any vectors \mathbf{v}_1,\dots,\mathbf{v}_d \in \mathbf{V}, we define the symmetric tensor product of these vectors by

\mathbf{v}_1 \vee \dots \vee \mathbf{v}_d = \frac{1}{d!} \sum\limits_{\pi \in \mathrm{S}(d)} \mathbf{v}_{\pi(1)} \otimes \dots \otimes \mathbf{v}_{\pi(d)},

and denote by \mathbf{V}^{\vee d} the subspace of \mathbf{V}^{\otimes d} spanned by all symmetric tensor products of d vectors from \mathbf{V}. Likewise, we define the antisymmetric tensor product of \mathbf{v}_1,\dots,\mathbf{v}_d by

\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d = \frac{1}{d!} \sum\limits_{\pi \in \mathrm{S}(d)} \mathrm{sgn}(\pi)\mathbf{v}_{\pi(1)} \otimes \dots \otimes \mathbf{v}_{\pi(d)},

and denote by \mathbf{V}^{\wedge d} the subspace of \mathbf{V}^{\otimes d} spanned by all antisymmetric tensor products of d vectors from \mathbf{V}.

Note that, in the case d=2, this definition coincides with the definitions

\mathbf{v}_1 \vee \mathbf{v}_2 = \frac{1}{2}\left( \mathbf{v}_1\otimes \mathbf{v}_2 + \mathbf{v}_2 \otimes \mathbf{v}_1\right)

and

\mathbf{v}_1 \wedge \mathbf{v}_2 = \frac{1}{2}\left(\mathbf{v}_1\otimes \mathbf{v}_2 - \mathbf{v}_2 \otimes \mathbf{v}_1\right)

from Lecture 21.

Since the symmetric and antisymmetric tensor products are defined in terms of the tensor product, they inherit multilinearity. For example, in the case d=3, this means that we have the following three identities in the vector space \mathbf{V}^{\vee 3}: for any scalars a_1,a_2 \in \mathbb{R}

(a_1\mathbf{u}_1 + a_2\mathbf{u}_2) \vee \mathbf{v} \vee \mathbf{w} = a_1\mathbf{u}_1 \vee \mathbf{v} \vee \mathbf{w} + a_2\mathbf{u}_2 \vee \mathbf{v} \vee \mathbf{w}

for all \mathbf{u}_1,\mathbf{u}_2,\mathbf{v},\mathbf{w} \in \mathbf{V}, and

\mathbf{u} \vee (a_1\mathbf{v}_1 + a_2\mathbf{v}_2) \vee \mathbf{w} = a_1 \mathbf{u} \vee \mathbf{v}_1 \vee \mathbf{w} + a_2\mathbf{u} \vee \mathbf{v}_2 \mathbf{w}

for all \mathbf{u},\mathbf{v}_1,\mathbf{v}_2,\mathbf{w} \in \mathbf{V}, and

\mathbf{u} \vee \mathbf{v} \vee (a_1\mathbf{w}_1 + a_2\mathbf{w}_2) = a_1 \mathbf{u} \vee \mathbf{v} \vee \mathbf{w}_1 + a_2 \mathbf{u} \vee \mathbf{v} \vee \mathbf{w}_2

for all \mathbf{u},\mathbf{v},\mathbf{w}_1,\mathbf{w}_2 \in \mathbf{V}. The analogous statements hold in \mathbf{V}^{\wedge 3}.

The symmetric tensor product is constructed in such a way that

\mathbf{v}_{\pi(1)} \vee \dots \vee \mathbf{v}_{\pi(d)} = \mathbf{v}_1 \vee \dots \vee \mathbf{v}_d

for any permutation \pi \in \mathrm{S}(d), whereas the antisymmetric tensor product is constructed in such a way that

\mathbf{v}_{\pi(1)} \wedge \dots \wedge \mathbf{v}_{\pi(d)} = \mathrm{sgn}(\pi)\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d

for any permutation \pi \in \mathrm{S}(d). In particular, if any two of the vectors \mathbf{v}_1,\dots,\mathbf{v}_d are equal, then

\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d = \mathbf{0}.

Indeed, suppose that \mathbf{v}_1=\mathbf{v}_2. On one hand, by the above antisymmetry we have

\mathbf{v}_2 \wedge \mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d = - \mathbf{v}_1 \wedge \mathbf{v}_2 \wedge \dots \wedge \mathbf{v}_d,

but on the other hand we also have

\mathbf{v}_2 \wedge \mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d = \mathbf{v}_1 \wedge \mathbf{v}_2 \wedge \dots \wedge \mathbf{v}_d

because \mathbf{v}_1=\mathbf{v}_2. This means that

\mathbf{v}_1 \wedge \mathbf{v}_2 \wedge \dots \wedge \mathbf{v}_d = - \mathbf{v}_1 \wedge \mathbf{v}_2 \wedge \dots \wedge \mathbf{v}_d

if \mathbf{v}_1=\mathbf{v}_2, which forces

\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d = \mathbf{0}.

The vector space \mathbf{V}^{\vee d} is called the dth symmetric power of \mathbf{V}, and its elements are called symmetric tensors of degree d. The vector space \mathbf{V}^{\wedge d} is called the dth antisymmetric power of \mathbf{V}, and its elements are called antisymmetric tensors of degree d. These vector spaces have a physical interpretation. In quantum mechanics, an n-dimensional vector space \mathrm{dim} \mathbf{V} is viewed as the state space of a particle that can be in any one of n quantum states. The space \mathbf{V}^{\vee d} is then the state space of d bosons, each of which may occupy one of n quantum states, while \mathbf{V}^{\wedge d} is the state space of d fermions, each of which may be in any of n quantum states. The vanishing of wedge products with two equal factors corresponds physically to the characteristic feature of fermions, i.e. the Pauli exclusion principle. You don’t have to know any of this — I included this perspective in order to provide some indication that the construction of these vector spaces is not just abstract nonsense.

Theorem 1: For any \mathrm{d} \in \mathbb{N} and any \mathbf{v}_1,\dots,\mathbf{v}_d \in \mathbf{V}, we have

\langle \mathbf{v}_1 \vee \dots \vee \mathbf{v}_d,\mathbf{w}_1 \vee \dots \vee \mathbf{w}_d \rangle = \frac{1}{d!} \sum\limits_{\pi \in \mathrm{S}(d)} \langle \mathbf{v}_1,\mathbf{w}_{\pi(1)}\rangle \dots \langle \mathbf{v}_d,\mathbf{w}_{\pi(d)}\rangle,

and

\langle \mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d,\mathbf{w}_1 \wedge \dots \wedge \mathbf{w}_d \rangle = \frac{1}{d!} \sum\limits_{\pi \in \mathrm{S}(d)} \mathrm{sgn}(\pi)\langle \mathbf{v}_1,\mathbf{w}_{\pi(1)}\rangle \dots \langle \mathbf{v}_d,\mathbf{w}_{\pi(d)}\rangle.

Since we won’t use this theorem much, we will skip the proof. However, the proof is not too difficult, and is an exercise in permutations: simply plug in the definitions of the symmetric and antisymmetric tensor products in terms of the original tensor products, expand the scalar product, and simplify.

Perhaps counterintuitively, the antisymmetric tensor product is more important than the symmetric tensor product in linear algebra. The next theorem explains why.

Theorem 2: For any d \in \mathbb{N} and any \mathbf{v}_1,\dots,\mathbf{v}_d, the set \{\mathbf{v}_1,\dots,\mathbf{v}_d\} is linearly dependent if and only if

\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d = \mathbf{0}.

Proof: Suppose first that \{\mathbf{v}_1,\dots,\mathbf{v}_d\} is a linearly dependent set of vectors in \mathbf{V}. If d=1, this means that \mathbf{v}_1=\mathbf{0}, whence

\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d = \mathbf{v}_1 = \mathbf{0}.

If d \geq 2, then without loss in generality, the vector \mathbf{v}_1 is a linear combination of the vectors \mathbf{v}_2,\dots,\mathbf{v}_d,

\mathbf{v}_1 = a_2\mathbf{v}_2 + \dots + a_d\mathbf{v}_d.

We then have that

\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d = \left(\sum\limits_{i=2}^d a_i\mathbf{v}_i \right) \wedge \mathbf{v}_2 \wedge \dots \wedge \mathbf{v}_d = \sum\limits_{i=2}^d a_i\mathbf{v}_i \wedge \mathbf{v}_2 \wedge \dots \wedge \mathbf{v}_d,

by multinearity of the wedge product. Now observe that the ith term in the sum is a scalar multiple of the wedge product

\mathbf{v}_i \wedge \dots \wedge \mathbf{v}_i \wedge \dots \wedge \mathbf{v}_d,

which contains the vector \mathbf{v}_i twice, and hence each term in the sum is the zero tensor.

Conversely, suppose \mathbf{v}_1,\dots,\mathbf{v}_d \in \mathbf{V} are vectors such that

\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d =\mathbf{0}.

We must prove that \{\mathbf{v}_1,\dots,\mathbf{v}_d\} is a linearly dependent set in \mathbf{V}. We will prove the (equivalent) contrapositive statement: if \{\mathbf{v}_1,\dots,\mathbf{v}_d\} is a linearly independent set in \mathbf{V}, then

\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d \neq \mathbf{0}.

We prove this by induction on \mathbf{d}. In the case d=1, we have that \{\mathbf{v}_1\} is linearly independent, so

\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d = \mathbf{v}_d \neq \mathbf{0}.

For the inductive step, we proceed as follows. Since \{\mathbf{v}_1,\dots,\mathbf{v}_d\} is a linearly independent set, it is a basis of the subspace

\mathbf{W} = \mathrm{Span}\{\mathbf{v}_1,\dots,\mathbf{v}_d\}.

Let \langle \cdot,\cdot \rangle denote the scalar product on \mathbf{W} defined by declaring this basis to be orthonormal. We now define a linear transformation

L \colon \mathbf{W}^{\wedge d} \to \mathbf{W}^{\wedge d-1}

by

L\mathbf{w}_1 \wedge \dots \wedge \mathbf{w}_d = \langle \mathbf{v}_1,\mathbf{w}_1\rangle \mathbf{w}_2 \wedge \dots \wedge \mathbf{w}_d.

We then have that

L\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d = \langle \mathbf{v}_1,\mathbf{v}_1\rangle \mathbf{v}_2 \wedge \dots \wedge \mathbf{v}_d = \mathbf{v}_2 \wedge \dots \wedge \mathbf{v}_d.

Now, since \{\mathbf{v}_1,\dots,\mathbf{v}_d\} is a linearly independent set, so is the subset \{\mathbf{v}_2,\dots,\mathbf{v}_d\}. Thus, by the induction hypothesis,

\mathbf{v}_2 \wedge \dots \mathbf{v}_d \neq 0.

It then follows that

\mathbf{v}_1 \wedge \dots \mathbf{v}_d \neq \mathbf{0},

since otherwise the linear transformation L would map the zero vector in \mathbf{W}^{\wedge d} to a nonzero vector in \mathbf{W}^{\wedge d-1}, which is impossible.

— Q.E.D.

Corollary 1: We have \|\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d\| \geq 0 with equality if and only if \{\mathbf{v}_1,\dots,\mathbf{v}_d\} is linearly dependent.

Since

\|\|\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d\|^2 = \sum\limits_{\pi \in \mathrm{S}(d)} \mathrm{sgn}(\pi) \langle \mathbf{v}_1,\mathbf{v}_{\pi(1)}\rangle \dots \langle \mathbf{v}_d,\mathbf{v}_{\pi(d)}\rangle

you can think of this as a massive generalization of the Cauchy-Schwarz inequality, which is the case d=2.

Lecture 22 coda

Math 31AH: Lecture 21

Two basic issues in linear algebra which we have not yet resolved are:

  1. Can we multiply vectors?
  2. Can we certify linear independence?

The answers to these questions turn out to be closely related to one another. In this lecture, we discuss the first item.

Let \mathbf{V} be a vector space. We have seen one sort of multiplication of vectors, namely the scalar product \langle \mathbf{v},\mathbf{w} \rangle. One one hand, the scalar product is a proper multiplication rule in the sense that it satisfies the FOIL identity, which is referred to as bilinearity in polite company. On the other hand, the scalar product does not correspond to our usual notion of multiplication in the sense that the product of two vectors is a number, not a vector. This is strange in that one instinctively feels that the “product” of two objects should be another object of the same type. It is natural to ask whether, we can define a bilinear “vector product” which has the feature that the product of two vectors in \mathbf{V} is a vector in \mathbf{V}. In other words, we are asking whether it is possible to give some universal recipe for multiplication of vectors which would turn every vector space into an algebra.

So far, we have only seen certain specific vector spaces \mathbf{V} where a bilinear multiplication of vectors naturally presents itself. Here is a list of these spaces.

  1. \mathbf{V} = \mathbb{R}. In this case, vectors \mathbf{v} \in \mathbf{V} are real numbers, and the vector product \mathbf{v}\mathbf{w} is the product of real numbers.
  2. \mathbf{V}=\mathbb{R}^2. Technically, we have not seen this example yet, but here it is. Let \mathbf{v}=(x_1,x_2) and \mathbf{w}=(y_1,y_2) be vectors in \mathbf{V}. We then define their product to be \mathbf{v}\mathbf{w}=(x_1y_1-x_2y_2,x_1y_2+x_2y_1). Next week, we will see that this example of vector multiplication gives the complex number system.
  3. \mathbf{V}=\mathbb{R}^\infty. In this example, the vector space \mathbf{V} consists of infinite sequences \mathbf{v}=(x_0,x_1,x_2,\dots) which are identically zero after finitely many terms. This means that \mathbf{V} is isomorphic to the vector space of polynomials in a single variable. Let \mathbf{v}=(x_0,x_1,x_2,\dots) and \mathbf{w}=(y_0,y_1,y_2,\dots) be vectors in \mathbf{V}. We define their product to be \mathbf{v}\mathbf{w} = (x_0y_0,x_0y_1+x_1y_0,x_2y_0+x_1y_1+x_2y_0,\dots), which is just the recipe for multiplying polynomials and collecting together terms of the same degree.
  4. \mathbf{V}=\mathbb{R}^{n \times n}. In this example, the vector space \mathbf{V} consists of matrices with n rows and n columns. This means that \mathbf{V} is isomorphic to the vector space of linear operators on an n-dimensional vector space. A vector product in \mathbf{V} is then defined by matrix multiplication.

The above examples are quite different from one another, and they do not appear to be given by any universal recipe for defining a product of vectors. It turns out that in order to answer the question of how to define a universal vector product, it is better not to answer it at all. This is the idea behind the tensor product, which we now introduce.

To every pair of vectors \mathbf{v},\mathbf{w} \in \mathbf{V}, we associate a new vector denoted \mathbf{v} \otimes \mathbf{w}, which is called the tensor product of \mathbf{v} and \mathbf{w}. However, the vector \mathbf{v} \otimes \mathbf{w} does not reside in \mathbf{V}; rather, it is a vector in a new vector space called the tensor square of \mathbf{V} and denoted \mathbf{V} \otimes \mathbf{V}. What is happening here is that we view the symbol \otimes as a rule for multiplying two vectors, but we do not specify what this rule is — instead, we view \mathbf{v} \otimes \mathbf{w} as an “unevaluated” product of two vectors. We then store this unevaluated product in a new vector space \mathbf{V} \otimes \mathbf{V}, which contains all unevaluated products of vectors from \mathbf{V}. More precisely, the vectors in \mathbf{V} \otimes \mathbf{V} are all unevaluated expressions of the form

\tau = \mathbf{v}_1 \otimes \mathbf{w}_1 + \dots + \mathbf{v}_k \otimes \mathbf{w}_k,

where k \in \mathbb{N} is a natural number and \mathbf{v}_1,\mathbf{w}_1,\dots,\mathbf{v}_k,\mathbf{w}_k \in \mathbf{V} are vectors. These unevaluated expressions are called tensors, and often denoted by Greek letters. So tensor products are ambiguous, in the sense that we do not specify what the result of the multiplication \mathbf{v} \otimes \mathbf{w} actually is. The only thing we specify about this rule is that it is bilinear:

(a_1\mathbf{v}_1 + a_2\mathbf{v}_2) \otimes (b_1\mathbf{w}_1 + b_2\mathbf{w}_2) \\ = a_1b_1\mathbf{v}_1 \otimes \mathbf{w}_1 + a_1b_2 \mathbf{v}_1 \otimes \mathbf{w}_2 + a_2b_1 \mathbf{v}_2\otimes \mathbf{w}_1  + a_2b_2\mathbf{v}_2\otimes \mathbf{w}_2,

where the equality means that the LHS and the RHS are different expressions for the same vector in the vector space \mathbf{V} \otimes \mathbf{V}.

A tensor in \mathbf{V} \otimes \mathbf{V} which can be represented as the product of two vectors from \mathbf{V} is called a simple tensor. Note that a tensor may be simple without obviously being so, in the event that it can be “factored” as in high school algebra. For example, we have

\mathbf{v}_1 \otimes \mathbf{w}_1 + \mathbf{v}_2 \otimes \mathbf{w}_1 = (\mathbf{v}_1+\mathbf{v}_2) \otimes \mathbf{w}_1.

We haven’t yet said how to scale tensors by numbers. The rule for scalar multiplication of tensors is determined by bilinearity: it is defined by

a \mathbf{v} \otimes \mathbf{w} = (a\mathbf{v}) \otimes \mathbf{w} = \mathbf{v} \otimes (a\mathbf{w}),

and

a \sum_{i=1}^k \mathbf{v}_i \otimes \mathbf{w}_i = \sum_{i=1}^k a\mathbf{v}_i \otimes \mathbf{w}_i.

We can summarize all of the above by saying that two tensors \tau,\sigma \in \mathbf{V} \otimes \mathbf{V} are equal if and only if it is possible to rewrite \tau as \sigma using bilinearity.

Tensor products take a while to get used to. It’s important to remember that the only specified property of the tensor product is bilinearity; apart from this, it’s entirely ambiguous. So, anything we can say about tensor products must ultimately be a consequence of bilinearity. Here is an example.

Proposition 1: For any \mathbf{v} \in \mathbf{V}, we have

\mathbf{v} \otimes \mathbf{0}_\mathbf{V} = \mathbf{0}_\mathbf{V} \otimes \mathbf{v} = \mathbf{0}_\mathbf{V} \otimes \mathbf{0}_\mathbf{V}.

Proof: We are going to use the fact that scaling any vector \mathbf{v} \in \mathbf{V} by the number 0 \in \mathbb{R} produces the zero vector \mathbf{0}_\mathbf{V} \in \mathbf{V}. This was proved in Lecture 1, when we discussed the definition of a vector space. We have

\mathbf{v} \otimes \mathbf{0}_\mathbf{V} = \mathbf{v} \otimes (0\mathbf{0}_\mathbf{V}) = (0\mathbf{v}) \otimes \mathbf{0}_\mathbf{V} = \mathbf{0}_\mathbf{V} \otimes \mathbf{0}_\mathbf{V}.

Notice that bilinearity was used here to move the scalar zero from the second factor in the tensor product to the first factor in the tensor product. The proof that \mathbf{0}_\mathbf{V} \otimes \mathbf{v} = \mathbf{0}_\mathbf{V} \otimes \mathbf{0}_\mathbf{V} is essentially the same (try it!).

— Q.E.D.

Using Proposition 1, we can explicitly identify the “zero tensor,” i.e. the zero vector \mathbf{0}_{\mathbf{V} \otimes \mathbf{V}} in the vector space \mathbf{V} \otimes \mathbf{V}.

Proposition 2: We have \mathbf{0}_{\mathbf{V} \otimes \mathbf{V}}=\mathbf{0}_{\mathbf{V}} \otimes \mathbf{0}_{\mathbf{V}}.

Proof: Let

\tau = \sum_{i=1}^k \mathbf{v}_i \otimes \mathbf{w}_i

be any tensor. We want to prove that \tau+\mathbf{0}_\mathbf{V} \otimes \mathbf{0}_\mathbf{V} = \tau.

In the case k=1, we have \tau = \mathbf{v}_1 \otimes \mathbf{w}_1. Using bilinearity, we have

\mathbf{v}_1 \otimes \mathbf{w}_1 + \mathbf{0}_{\mathbf{V}} \otimes \mathbf{0}_{\mathbf{V}} = \mathbf{v}_1 \otimes \mathbf{w}_1 + \mathbf{0}_{\mathbf{V}} \otimes \mathbf{w}_1 = (\mathbf{v}_1+\mathbf{0}_\mathbf{V}) \otimes \mathbf{w}_1 = \mathbf{v}_1 \otimes \mathbf{w}_1,

where we used Proposition 1 and bilinearity.

The case k>1 now follows from the case k=1,

\tau + \mathbf{0}_{\mathbf{V}} \otimes \mathbf{0}_{\mathbf{V}} = \sum_{i=1}^k \mathbf{v}_i \otimes \mathbf{w}_i + \mathbf{0}_{\mathbf{V}} \otimes \mathbf{0}_{\mathbf{V}} = \sum_{i=1}^{k-1} \mathbf{v}_i \otimes \mathbf{w}_i + \left(a_k\mathbf{v}_k \otimes \mathbf{w}_k + \mathbf{0}_{\mathbf{V}} \otimes \mathbf{0}_{\mathbf{V}}\right) = \sum_{i=1}^{k-1} \mathbf{v}_i \otimes \mathbf{w}_i + a_k\mathbf{v}_k \otimes \mathbf{w}_k = \tau.

— Q.E.D.

Suppose now that \mathbf{V} is a Euclidean space, i.e. it comes with a scalar product \langle \cdot,\cdot \rangle. Then, there is an associated scalar product on the vector space \mathbf{V} \otimes \mathbf{V}, which by abuse of notation we also write as \langle \cdot,\cdot \rangle. This natural scalar product on \mathbf{V} \otimes \mathbf{V} is uniquely determined by the requirement that

\langle \mathbf{v}_1 \otimes \mathbf{w}_1,\mathbf{v}_2 \otimes \mathbf{w}_2 \rangle = \langle \mathbf{v}_1,\mathbf{v}_2\rangle \langle \mathbf{w}_1,\mathbf{w}_2\rangle, \quad \forall \mathbf{v}_1,\mathbf{v}_2,\mathbf{w}_1,\mathbf{w}_2 \in \mathbf{V}.

Exercise 1: Verify that the scalar product on \mathbf{V} \otimes \mathbf{V} just defined really does satisfy the scalar product axioms.

Proposition 3: If S is an orthogonal set of vectors in \mathbf{V}, then

S \otimes S = \{\mathbf{v} \otimes \mathbf{w} \colon \mathbf{v},\mathbf{w} \in S\}

is an orthogonal set of tensors in \mathbf{V} \otimes \mathbf{V}.

Proof: We must show that if \mathbf{v}_1 \otimes \mathbf{w}_1,\mathbf{v}_2 \otimes \mathbf{w}_2 \in S \otimes S are different tensors, then their scalar product is zero. We have

\langle \mathbf{v}_1 \otimes \mathbf{w}_1,\mathbf{v}_2 \otimes \mathbf{w}_2 \rangle = \langle \mathbf{v}_1,\mathbf{v}_2\rangle \langle \mathbf{w}_1,\mathbf{w}_2\rangle.

The assumption that these tensors are different is equivalent to saying that one of the following conditions holds:

\mathbf{v}_1 \neq \mathbf{v}_2 \text{ or } \mathbf{w}_1 \neq \mathbf{w}_2.

Since S is an orthogonal set, the first possibility implies \langle \mathbf{v}_1,\mathbf{v}_2 \rangle =0, and the second implies \langle \mathbf{w}_1,\mathbf{w}_2 \rangle = 0. In either case, the product \langle \mathbf{v}_1,\mathbf{v}_2\rangle \langle \mathbf{w}_1,\mathbf{w}_2\rangle is equal to zero.

— Q.E.D.

Theorem 1: If E=\{\mathbf{e}_1,\dots,\mathbf{e}_n\} is an orthonormal basis in \mathbf{V}, then E \otimes E = \{\mathbf{e}_i \otimes \mathbf{e}_j \colon 1 \leq i,j \leq n\} is an orthonormal basis in \mathbf{V} \otimes \mathbf{V}.

Proof: Let us first show that E \otimes E spans \mathbf{V} \otimes \mathbf{V}. We have

\tau = \sum\limits_{k=1}^l a_k \mathbf{v}_k \otimes \mathbf{w}_k = \sum\limits_{k=1}^l a_k \left(\sum\limits_{i=1}^n \langle \mathbf{e}_i,\mathbf{v}_k\rangle\mathbf{e}_i \right) \otimes \left(\sum\limits_{j=1}^n \langle \mathbf{e}_j,\mathbf{w}_k\rangle\mathbf{e}_j \right) \\= \sum\limits_{i,j=1}^n \left(\sum\limits_{k=1}^n a_k \langle \mathbf{e}_i,\mathbf{v}_k\rangle\langle \mathbf{e}_j,\mathbf{w}_k\rangle\right)\mathbf{e}_i \otimes \mathbf{e}_j,

which shows that an arbitrary tensor is a linear combination of the tensors \mathbf{e}_i \otimes \mathbf{e}_j.

Since E is an orthogonal set in \mathbf{V}, by Proposition 3 we have that E \otimes E is an orthogonal set in \mathbf{V} \otimes \mathbf{V}, and therefore it is linearly independent.

It remains only to show that all tensors in E \otimes E have unit length. This is established by direct computation:

\|\mathbf{e}_i \otimes \mathbf{e}_i \| = \langle \mathbf{e}_i\otimes \mathbf{e}_i,\mathbf{e}_i \otimes \mathbf{e}_i \rangle = \langle \mathbf{e}_i,\mathbf{e}_i \rangle\langle \mathbf{e}_i,\mathbf{e}_i \rangle= 1.

— Q.E.D.

Corollary 1: If \dim \mathbf{V} = n, then \dim \mathbf{V} \otimes \mathbf{V} = n^2.

It is important to note that the tensor product is noncommutative: it is typically not the case that \mathbf{v} \otimes \mathbf{w} = \mathbf{w} \otimes \mathbf{v}. However, we can decompose a simple tensor into two pieces, as

\mathbf{v} \otimes \mathbf{w} = \frac{\mathbf{v} \otimes \mathbf{w} + \mathbf{w} \otimes \mathbf{v}}{2} + \frac{\mathbf{v} \otimes \mathbf{w} - \mathbf{w} \otimes \mathbf{v}}{2}.

The first of these fractions is called the “symmetric part” of \mathbf{v} \otimes \mathbf{w}, and is denoted

\mathbf{v} \vee \mathbf{w} := \frac{\mathbf{v} \otimes \mathbf{w} + \mathbf{w} \otimes \mathbf{v}}{2}.

The reason for this notation is that we can think of \vee as a symmetric version of the tensor product: a bilinear multiplication of vectors that, by construction, is commutative:

\mathbf{v} \vee \mathbf{w} = \mathbf{w} \vee \mathbf{v}.

Note that if \mathbf{v}=\mathbf{w}, the symmetric tensor product produces the same tensor as the tensor product itself:

\mathbf{v} \vee \mathbf{v} = \mathbf{v} \otimes \mathbf{v}.

The second fraction above is called the “antisymmetric part” of \mathbf{v} \otimes \mathbf{w}, and denoted

\mathbf{v} \wedge \mathbf{w} := \frac{\mathbf{v} \otimes \mathbf{w} - \mathbf{w} \otimes \mathbf{v}}{2}.

This is an antisymmetric version of the tensor product in that, by construction, satisfies

\mathbf{v} \wedge \mathbf{w} = -\mathbf{w} \wedge \mathbf{v}.

Note that the antisymmetric tensor product of any vector with itself produces the zero tensor:

\mathbf{v} \wedge \mathbf{v} = \mathbf{0}_{\mathbf{V} \otimes \mathbf{V}}.

Although it may seem like the symmetric tensor product is more natural (commutative products are nice), it turns out that the antisymmetric tensor product — or wedge product as it’s often called — is more important. Here is a first indication of this. Suppose that \mathbf{V} is a 2-dimensional Euclidean space with orthonormal basis \{\mathbf{e}_1,\mathbf{e}_2\}. Let

\mathbf{v}_1 = a_{11}\mathbf{e}_1 + a_{12}\mathbf{e}_2 \quad\text{ and }\quad \mathbf{v}_2 = a_{21}\mathbf{e}_1 + a_{22}\mathbf{e}_2

be two vectors in \mathbf{V}. Let’s compute their wedge product: using FOIL, we find

\mathbf{v}_1 \wedge \mathbf{v}_2 \\ = (a_{11}\mathbf{e}_1 + a_{12}\mathbf{e}_2) \wedge (a_{21}\mathbf{e}_1 + a_{22}\mathbf{e}_2) \\ = (a_{11}\mathbf{e}_1) \wedge (a_{21}\mathbf{e}_1) + (a_{11}\mathbf{e}_1) \wedge (a_{22}\mathbf{e}_2) + (a_{12}\mathbf{e}_2)\wedge (a_{21}\mathbf{e}_1) + (a_{12}\mathbf{e}_2) \wedge (a_{22}\mathbf{e}_2) \\ = a_{11}a_{21} \mathbf{e}_1 \wedge \mathbf{e}_1 + a_{11}a_{22}\mathbf{e}_1 \wedge \mathbf{e}_2 + a_{12}a_{21} \mathbf{e}_2 \wedge \mathbf{e}_1 + a_{12}a_{22}\mathbf{e}_2 \wedge \mathbf{e}_2 \\ = a_{11}a_{22}\mathbf{e}_1 \wedge \mathbf{e}_2 + a_{12}a_{21} \mathbf{e}_2 \wedge \mathbf{e}_1  \\ = a_{11}a_{22}\mathbf{e}_1 \wedge \mathbf{e}_2 - a_{12}a_{21} \mathbf{e}_1 \wedge \mathbf{e}_2 \\ = (a_{11}a_{22}\mathbf{e}_1 - a_{12}a_{21}) \mathbf{e}_1 \wedge \mathbf{e}_2.

Probably, you recognize the lone scalar (a_{11}a_{22}\mathbf{e}_1 - a_{12}a_{21}) remaining at the end of this computation as a determinant:

\begin{vmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{vmatrix} = a_{11}a_{22}-a_{21}a_{12}.

Even if you don’t, no need to worry: you are not expected to know what a determinant is at this point. Indeed, in Lecture 22 we are going to use the wedge product to define determinants.

Lecture 21 coda

Math 31AH: Lecture 19

Let \mathbf{V} be an n-dimensional vector space equipped with a scalar product \langle \cdot,\cdot \rangle. Recall from Lecture 16 that an operator A \in \mathrm{End}\mathbf{V} is said to be selfadjoint (or symmetric) if

\langle \mathbf{v},A\mathbf{w} \rangle = \langle A\mathbf{v},\mathbf{w}\rangle \quad \forall\ \mathbf{v},\mathbf{w} \in \mathbf{V}.

Also recall from Lecture 18 that A \in \mathrm{End}\mathbf{V} is said to be semisimple if there exists a basis of \mathbf{V} consisting of eigenvectors of A. The goal of this lecture is to prove the following cornerstone result in linear algebra.

Theorem 1 (Spectral Theorem for selfadjoint operators): If A \in \mathrm{End}\mathbf{V} is selfadjoint, then it is semisimple.

The proof of this important theorem occupies the remainder of this lecture. It is a constructive argument that builds an eigenbasis for A one vector at a time. A nice feature of the construction is that the eigenbasis it outputs is an orthonormal basis of \mathbf{V}.

Let us begin with an important observation on a special subspecies of selfadjoint operators.

Definition 1: A selfadjoint operator B \in \mathrm{End}\mathbf{V} is said to be nonnegative if the associated quadratic form is nonnegative, i.e. if the function defined by

Q_B(\mathbf{v}) := \langle \mathbf{v},B \mathbf{v} \rangle, \quad v \in \mathbf{V},

satisfies Q_B(\mathbf{v}) \geq 0 for all \mathbf{v} \in \mathbf{V}.

Any nonnegative selfadjoint operator B has the property that membership in its kernel is certified by vanishing of Q_B.

Lemma 1: If B \in \mathrm{End}\mathbf{V} is a nonnegative selfadjoint operator, then \mathbf{v} \in \mathrm{Ker} B if and only if Q_B(\mathbf{v})=0.

Proof: One direction of this equivalence is obvious: if \mathbf{v} \in \mathrm{Ker}B, then

Q_B(\mathbf{v}) = \langle \mathbf{v},B\mathbf{v}\rangle = \langle \mathbf{v},\mathbf{0} \rangle = 0.

The proof of the converse statement is similar to the proof of the Cauchy-Schwarz inequality. More precisely, suppose that Q_B(\mathbf{v})=0, and let t \in \mathbb{R} be any number and let \mathbf{w} \in \mathbf{V} be an arbitrary vector. We have

Q_B(\mathbf{v}+t\mathbf{w}) = \langle \mathbf{v}+t\mathbf{w},B\mathbf{v}+tB\mathbf{w}\rangle \\= \langle \mathbf{v},B\mathbf{v} \rangle + \langle \mathbf{v},tB\mathbf{w} \rangle + \langle t\mathbf{w},B\mathbf{v} \rangle + \langle t\mathbf{w},tB\mathbf{w} \rangle.

Using the definition of Q_B together with the fact that B is selfadjoint, this simplifies to

Q_B(\mathbf{v}+t\mathbf{w}) = Q_B(\mathbf{v}) + 2t\langle B\mathbf{v},\mathbf{w} \rangle + t^2Q_B(\mathbf{w}),

and since Q_B(\mathbf{v})=0 this further simplifies to

Q_B(\mathbf{v}+t\mathbf{w}) = 2t\langle B\mathbf{v},\mathbf{w} \rangle + t^2Q_B(\mathbf{w}).

Now, as a function of t \in \mathbb{R} the righthand side of this equation is a parabola, and since Q_B(\mathbf{w}) \geq 0 this parabola is upward=opening. Moreover, since the lefthand side satisfies Q_B(\mathbf{v}+t\mathbf{w}) \geq 0, the lowest point of this parabola cannot lie below the line t=0, and this forces

\langle B\mathbf{v},\mathbf{w} \rangle = 0.

But the vector \mathbf{w} was chosen arbitrarily, so the above equation holds for any \mathbf{w} \in \mathbf{V}, in particular \mathbf{w}=B\mathbf{v}. We thus have

\langle B\mathbf{v},B\mathbf{v}\rangle = \|B\mathbf{v}\|^2=0,

which means that B\mathbf{v}=\mathbf{0}, i.e. \mathbf{v} \in \mathrm{Ker}B.

— Q.E.D.

Now, let A \in \mathrm{End}\mathbf{V} be any selfadjoint operator. We are going to use the Lemma just established to prove that A admits an eigenvector \mathbf{e}; the argument even gives a description of the corresponding eigenvalue \lambda.

Consider the unit sphere in the Euclidean space \mathbf{V}, i.e. the set

S(\mathbf{V}) = \{ \mathbf{v} \in \mathbf{V} \colon \|\mathbf{v}\|=1\}

of all vectors of length 1. The quadratic form Q_A(\mathbf{v}) = \langle \mathbf{v},A\mathbf{v}\rangle is a continuous function, and hence by the Extreme Value Theorem the minimum value of Q_A on the sphere,

\lambda = \min\limits_{\mathbf{v} \in S(\mathbf{V})} Q_A(\mathbf{v}),

does indeed exist, and is moreover achieved at a vector \mathbf{e} \in S(\mathbf{V}) at which the minimum is achieved, i.e.

Q_A(\mathbf{e})=\lambda.

Theorem 2: The minimum \lambda of Q_A on the unit sphere is an eigenvalue of A, and the minimizer \mathbf{e} lies in the eigenspace \mathbf{V}_\lambda.

Proof: By definition of \lambda as the minimum value of Q_A, we have that

\langle \mathbf{v},A\mathbf{v} \rangle \geq \lambda \quad \forall \mathbf{v} \in S(\mathbf{V}).

Since \mathbf{v},\mathbf{v} \rangle =1 for any \mathbf{v} \in S_1(\mathbf{V}), the above inequality can be rewritten as

\langle \mathbf{v},A\mathbf{v} \rangle \geq \lambda\langle \mathbf{v},\mathbf{v} \rangle \quad \forall \mathbf{v} \in S_1(\mathbf{V}).

But actually, this implies that

\langle \mathbf{v},A\mathbf{v} \rangle \geq \lambda\langle \mathbf{v},\mathbf{v} \rangle \quad \forall \mathbf{v} \in \mathbf{V},

since every vector in \mathbf{V} is a nonnegative scalar multiple of a vector of unit length (make sure you understand this). We thus have that

\langle \mathbf{v},(A-\lambda I)\mathbf{v} \rangle \geq 0 \quad \forall v \in \mathbf{V}.

This says that the selfadjoint operator B:= A-\lambda I is nonnegative. Moreover, we have that

Q_B(\mathbf{e}) = \langle \mathbf{e},(A-\lambda I)\mathbf{e} \rangle = Q_A(\mathbf{e})-\lambda \langle \mathbf{e},\mathbf{e}\rangle = \lambda - \lambda = 0.

Thus, by Lemma 1, we have that \mathbf{e} \in \mathrm{Ker}(A-\lambda)I, meaning that

(A-\lambda I)\mathbf{e} = \mathbf{0}.

or equivalently

A\mathbf{e} = \lambda \mathbf{e}.

— Q.E.D.

Theorem 2 has established that an arbitrary selfadjoint operator A has an eigenvector. However, this seems to be a long way from Theorem 1, which makes the much stronger assertion that A has n linearly independent eigenvectors. In fact, the distance from Theorem 2 to Theorem 1 is not so long as it may seem. To see why, we need to introduce one more very important concept.

Defintion 2: Let T\in \mathrm{End}\mathbf{V} be a linear operator, and let \mathbf{W} be a subspace of \mathbf{V}. We say that \mathbf{W} is invariant under T if

T\mathbf{w} \in \mathbf{W} \quad \forall\ \mathbf{w} \in \mathbf{W}.

The meaning of this definition is that if \mathbf{W} is invariant under T, then T may be considered as a linear operator on the smaller space \mathbf{W}, i.e. as an element of the algebra \mathrm{End}\mathbf{W}.

Let us adorn the eigenvalue/eigenvector pair produced by Theorem 2 with a subscript, writing this pair as (\mathbf{e}_1,\lambda_1). Consider the orthogonal complement of the line spanned by \mathbf{e}_1, i.e. the subspace of \mathbf{V} given by

\mathbf{V}_2 = \{ \mathbf{v} \in \mathbf{V} \colon \langle \mathbf{v},\mathbf{e}_1 \rangle = 0\}.

Proposition 1: The subspace \mathbf{V}_2 is invariant under A.

Proof: We have to prove that if \mathbf{v} is orthogonal to the eigenvector \mathbf{e}_1 of A, then so is A\mathbf{v}. This follows easily from the fact that A is selfadjoint:

\langle A\mathbf{v},\mathbf{e}_1 \rangle = \langle \mathbf{v},A\mathbf{e}_1 \rangle = \langle \mathbf{v},\lambda_1\mathbf{e}_1 \rangle = \lambda_1 \langle \mathbf{v},\mathbf{e}_1 \rangle=0.

— Q.E.D.

The effect of Proposition 1 is that we may consider A as a selfadjoint operator defined on the (n-1)-dimensional subspace \mathbf{V}_2. But this means that we can simply apply Theorem 2 again, with \mathbf{V}_2 replacing \mathbf{V}. We will then get a new eigenvector/eigenvalue pair (\mathbf{e}_2,\lambda_1), where

\lambda_2 = \min\limits_{\mathbf{v} \in S(\mathbf{V}_2} Q_A(\mathbf{v})

is the minimum value of Q_A on the unit sphere in the Euclidean space \mathbf{V}_2, and e_2 \in S_(\mathbf{V}_2) is a vector at which the minimum is achieved,

Q_A(\mathbf{e}_2) = \lambda_2.

By construction, \mathbf{e}_2 is a unit vector orthogonal to \mathbf{e}_1, so that in particular \{\mathbf{e}_1,\mathbf{e}_2\} is a linearly independent set in \mathbf{V}. Moreover, we have that \lambda_1 \leq \lambda_2, since S(\mathbf{V}_2) is a subset of S(\mathbf{V}_1).

Lecture 19 coda

Math 31AH: Lecture 18

Let \mathbf{V} be a vector space, and let us consider the algebra \mathrm{End}\mathbf{V} as a kind of ecosystem consisting of various life forms of varying complexity. We now move on to the portion of the course which is concerned with the taxonomy of linear operators — their classification and division into various particular classes.

The simplest organisms in the ecosystem \mathrm{End}\mathbf{V} are operators which act by scaling every vector \mathbf{v} \in \mathbf{V} by a fixed number \lambda \in \mathbb{R}; these are the single-celled organisms of the operator ecosystem.

Definition 1: An operator A \in \mathrm{End}\mathbf{V} is said to be simple if there exists a scalar \lambda \in \mathbb{R} such that

A\mathbf{v}=\lambda \mathbf{v} \quad \forall\ \mathbf{v} \in \mathbf{V}.

— Q.E.D.

Simple operators really are very simple, in the sense that they are no more complicated than numbers. Indeed, Definition 1 is equivalent to saying that A=\lambda I, where I \in \mathrm{End}\mathbf{V} is the identity operator, which plays the role of the number 1 in the algebra \mathrm{End}\mathbf{V}, meaning that it is the multiplicative identity in this algebra. Simple operators are extremely easy to manipulate algebraically: if A=\lambda I, then we have

A^k = \underbrace{(\lambda I)(\lambda I) \dots (\lambda I)}_{k \text{ factors }} =\lambda^kI,

for any nonnegative integer k, and more generally if p(x) is any polynomial in a single variable then we have

p(A) = p(\lambda)I.

Exercise 1: Prove the above formula.

The formula A^k=\lambda^kI even works in the case that k is a negative integer, provided that \lambda \neq 0; equivalently, the simple operator A=\lambda I is invertible if and only if \lambda \neq 0, its inverse being A^{-1} = \lambda^{-1}I. If A =\lambda I and B = \mu I are simple operators, then they commute,

AB = (\lambda I)(\mu I)=(\lambda\mu)I = (\mu I)(\lambda I) = BA,

just like ordinary numbers, and more generally

p(A,B) = p(A,B)I

for any polynomial p(x,y) in two variables.

Exercise 2: Prove the above formula.

Another way to appreciate how truly simple simple operators are is to look at their matrices. In order to do this, we have to restrict to the case that the vector space \mathbf{V} is finite-dimensional. If \mathbf{V} is n-dimensional, and E=\{\mathbf{e}_1,\dots,\mathbf{e}_n\} is any basis of \mathbf{V}, then the matrix of A=\lambda I relative to E is simply

[A]_E = \begin{bmatrix} \lambda & {} & {} \\ {} & \ddots & {} \\ {} & {} & \lambda \end{bmatrix},

where the off-diagonal matrix elements are all equal to zero. For this reason, simple operators are often called diagonal operators.

Most operators in \mathrm{End}\mathbf{V} are not simple operators — they are complicated multicellular organisms. So, to understand them we have to dissect them and look at their organs one at a time. Mathematically, this means that, given an operator A \in \mathrm{End}\mathbf{V}, we look for special vectors in \mathbf{V} on which A acts as if it was simple.

Definition 2: A nonzero vector \mathbf{e} \in \mathbf{V} is said to be an eigenvector of an operator A \in \mathbf{End} \mathbf{V} if

A\mathbf{e} = \lambda \mathbf{e}

for some \lambda \in \mathbf{R}. The scalar \lambda is said to be an eigenvalue of A.

The best case scenario is that we can find a basis of \mathbf{V} entirely made up of eigenvectors of A.

Defintion 3: An operator A \in \mathrm{End} \mathbf{V} is said to be semisimple if there exists a basis E of \mathbf{V} consisting of eigenvectors of A. Such a basis is called an eigenbasis for A.

As the name suggests, semisimple operators are pretty simple, but not quite as simple as simple operators. In particular, every simple operator is semisimple, because if A is simple then every nonzero vector in \mathbf{V} is an eigenvector of A, and hence any basis in \mathbf{V} is an eigenbasis for A. The converse, however, is not true.

Let \mathbf{V} be an n-dimensional vector space, and let A \in \mathrm{End} \mathbf{V} be a semisimple operator. By definition, this means that there exists a basis E=\{\mathbf{e}_1,\dots,\mathbf{e}_n\} in \mathbf{V} consisting of eigenvectors of A. This in turn means that there exist numbers \lambda_1,\dots,\lambda_n \in \mathbb{R} such that

A\mathbf{e}_i = \lambda_i \mathbf{e}_i \quad \forall\ 1 \leq i \leq n.

If \lambda_1=\dots=\lambda_n, then A is simple, but if these numbers are not all the same then it is not. However, even if all these numbers are different, the matrix of A relative to E will still be a diagonal matrix, i.e. it will have the form

[A]_E = \begin{bmatrix} \lambda_1 & {} & {} \\ {} & \ddots & {} \\ {} & {} & \lambda_n \end{bmatrix}.

For this reason, semisimple operators are often called diagonalizable operators. Note the shift in terminology from “diagonal,” for simple, to “diagonalizable,” for semisimple. The former term suggest an immutable characteristic, independent of basis, whereas the latter indicates that some action must be taken, in that a special basis must be found to reveal diagonal form. More precisely, the matrix of a semisimple operator A is not diagonal with respect to an arbitrary basis; the definition only says that the matrix of A is diagonal relative to some basis.

Most linear operators are not semisimple — indeed, there are plenty of operators that have no eigenvectors at all. Consider the operator

R_\theta \colon \mathbb{R}^2 \to \mathbb{R}^2

which rotates a vector \mathbf{v} \in \mathbb{R}^2 counterclockwise through the angle \theta \in [0,2\pi). The matrix of this operator relative to the standard basis

\mathbf{e}_1 = (1,0),\ \mathbf{e}_2 = (0,1)

of \mathbb{R}^2 is

\begin{bmatrix} \cos \theta & - \sin \theta \\ \sin \theta & \cos \theta \end{bmatrix}.

If \theta = 0, then R_\theta = I, so that R_\theta is a simple operator: \mathbf{e}_1,\mathbf{e}_2 are eigenvectors, with eigenvalues \lambda_1=\lambda_2=1. If \theta = \pi, then R_\theta=-I and again R_\theta is simple, with the same eigenvectors and eigenvalues \lambda_1=\lambda_2=-1. However, taking any other value of \theta, for example \theta = \frac{\pi}{2}, rotation through a right angle, it is geometrically clear that R_\theta \mathbf{v} is never a scalar multiple of \mathbf{v}, so that R_\theta has no eigenvectors at all. In particular, it is not semisimple.

Let us now formulate necessary and sufficient conditions for an operator to be semisimple. In this endeavor it is psychologically helpful to reorganize the eigenvector/eigenvalue definition by thinking of eigenvalues as the primary objects, and eigenvectors as secondary objects associated to them.

Defintion 4: The spectrum of an operator A \in \mathrm{End}\mathbf{V} is the set \sigma(A) \subseteq \mathbb{R} defined by

\sigma(A) = \{ \lambda \in \mathbb{R} \colon \lambda \text{ is an eigenvalues of } A\}.

For each \lambda \in \sigma(A), the set \mathbf{V}_\lambda \subseteq \mathbf{V} defined by

\mathbf{V}_\lambda = \{\mathbf{v} \in \mathbf{V} \colon A\mathbf{v} = \lambda \mathbf{v}\}

is called the \lambdaeigenspace of A. The dimension of \mathbf{V}_\lambda is called the geometric multiplicity of \lambda.

In these terms, saying that A \in \mathrm{End}\mathbf{V} is a simple operator means that the spectrum of A consists of a single number,

\sigma(A) = \{\lambda\},

and that the corresponding eigenspace exhausts \mathbf{V},

\mathbf{V}_\lambda = \mathbf{V}.

At the other extreme, the rotation operator R_{\pi/2} considered above has empty spectrum,

\sigma(R_{\pi/2}) = \{\},

and thus does not have any eigenspaces.

Proposition 1: For any A \in \mathrm{End}\mathbf{V}, for each \lambda \in \sigma(A) the eigenspace \mathbf{V}_\lambda is a subspace of \mathbf{V}.

Proof: First, observe that \mathbf{0} \in \mathbf{V}_\lambda, because

A\mathbf{0} = \mathbf{0} = \lambda \mathbf{0}.

Second, \mathbf{V}_\lambda is closed under scalar multiplication: if \mathbf{v} \in \mathbf{V}_\lambda, then

A(t\mathbf{v}) = tA\mathbf{v} = t\lambda\mathbf(v)=\lambda(t\mathbf{v}).

Third, \mathbf{V}_\lambda is closed under vector addition: if \mathbf{v},\mathbf{w} \in \mathbf{V}_\lambda, then

A(\mathbf{v}+\mathbf{w}) = A\mathbf{v}+A\mathbf{w}=\lambda\mathbf{v}+\lambda\mathbf{w}=\lambda(\mathbf{v}+\mathbf{w}).

— Q.E.D.

So, the eigenspaces of an operator A \in \mathrm{End}\mathbf{V} constitute a collection of subspaces of \mathbf{V}_\lambda of \mathbf{V} indexed by the numbers \lambda \in \sigma(A). A key feature of these subspaces is that they are independent of one another.

Theorem 1: Suppose that \lambda_1,\dots,\lambda_k are distinct eigenvalues of an operator A \in \mathrm{End}\mathbf{V}. Let \mathbf{e}_1,\dots,\mathbf{e}_k be nonzero vectors such that \mathbf{e}_i \in \mathbf{V}_{\lambda_i} for each 1 \leq  i\leq k. Then \{\mathbf{e}_1,\dots,\mathbf{e}_k\} is a linearly independent set.

Proof: We prove this by induction on k. The base case is k=1, and in this case the assertion is simply that the set \{\mathbf{e}_{\lambda_1}\} consisting of a single eigenvector of A is linearly independent. This is true, since eigenvectors are nonzero by definition.

For the induction step, suppose that \{\mathbf{e}_1,\dots,\mathbf{e}_k\} is a linearly dependent set. Then, there exist numbers t_1,\dots,t_k \in \mathbb{R}, not all equal to zero, such that

\sum_{i=1}^k t_i\mathbf{e}_i = \mathbf{0}.

Let us suppose that t_1 \neq 0. Applying the operator A to both sides of the above vector equation, we get

\sum_{i=1}^k t_i\lambda_i\mathbf{e}_i = \mathbf{0}.

On the other hand, we can multiply the original vector equation by any scalar and it remains true; in particular, we have

\sum_{i=1}^k t_i\lambda_k\mathbf{e}_i = \mathbf{0}.

Now, subtracting this third equation from the second equation, we obtain

\sum_{i=1}^{k-1} t_i(\lambda_i-\lambda_k)\mathbf{e}_i = \mathbf{0}.

By the induction hypothesis, \{\mathbf{e}_1,\dots,\mathbf{e}_{k-1}\} is a linearly independent set, and hence all the coefficients in this vector equation are zero. In particular, we have

t_1(\lambda_1-\lambda_k) \neq 0.

But this is impossible, since t_1 \neq 0 and \lambda_1 \neq \lambda_k. Hence, the set \{\mathbf{e}_1,\dots,\mathbf{e}_k\} cannot be linearly dependent — it must be linearly independent.

— Q.E.D.

Restricting to the case that \mathbf{V} is finite-dimensional, \dim \mathbf{V}=n, Theorem 1 has the following crucial consequences.

Corollary 1: A \in \mathrm{End}\mathbf{V} is semisimple if and only if

\sum\limits_{\lambda \in \sigma(A)} \dim \mathbf{V}_\lambda = \dim \mathbf{V}.

Proof: Suppose first that A is semisimple. By definition, this means that the span of the eigenspaces of A is all of \mathbf{V},

\mathrm{Span} \bigcup\limits_{\lambda \in \sigma(A)} \mathbf{V}_\lambda = \mathbf{V}.

Thus

\dim \mathrm{Span} \bigcup\limits_{\lambda \in \sigma(A)} \mathbf{V}_\lambda = \dim \mathbf{V}.

By Theorem 1, we have

\dim \mathrm{Span} \bigcup\limits_{\lambda \in \sigma(A)} \mathbf{V}_\lambda =\sum\limits_{\lambda \in \sigma(A)} \dim \mathbf{V}_\lambda,

and hence

\sum\limits_{\lambda \in \sigma(A)} \dim \mathbf{V}_\lambda = \dim \mathbf{V}.

Conversely, suppose that the sum of the dimensions of the eigenspaces of A is equal to the dimension of \mathbf{V}. For each \lambda \in \sigma(A), let E_\lambda be a basis of the eigenspace \mathbf{V}_\lambda. Then, by Theorem 1, the set

E = \bigcup\limits_{\lambda \in \sigma(A)} E_\lambda

is a linearly independent set, and hence a basis of the subspace \mathrm{Span}(E) of \mathbf{V}. Thus

\dim \mathrm{Span}(E) = \sum\limits_{\lambda \in \sigma(A)} \dim \mathbf{V}_\lambda.

Since by hypothesis we have

\sum\limits_{\lambda \in \sigma(A)} \dim \mathbf{V}_\lambda = \dim \mathbf{V},

this implies that

\dim \mathrm{Span}(E) =\dim \mathbf{V},

which in turn implies that

\mathrm{Span}(E) =\mathbf{V}.

Thus E is a basis of \mathbf{V} consisting of eigenvectors of A, whence A is semisimple.

— Q.E.D.

Corollay 3: If |\sigma(A)| = \dim \mathbf{V}, then A is semisimple.

Proof: To say that |\sigma(A)|=\dim \mathbf{V} is equivalent to saying that the spectrum of A consists of n=\dim \mathbf{V} distinct numbers,

\sigma(A) = \{\lambda_1,\dots,\lambda_n\}.

Sampling a collection of nonzero vectors from each corresponding eigenspace,

e_i \in \mathbf{V}_{\lambda_i}, \quad 1 \leq i \leq n,

we get a set E= \{\mathbf{e}_1,\dots,\mathbf{e}_n\} of eigenvectors of A. By Theorem 1, E is a linearly independent set, hence it is a basis of \mathbf{V}.

— Q.E.D.

Lecture 18 video

Math 31AH: Lecture 16

In Lecture 14, we considered a special type of linear operators known as orthogonal transformation, which were defined as follows. Let \mathbf{V} be an n-dimensional Euclidean space. An operator U \in \mathrm{End}\mathbf{V} is orthogonal if the image \{U\mathbf{e}_1,\dots,U\mathbf{e}_N\} of any orthonormal basis \mathbf{e}_1,\dots,\mathbf{e}_N\} of \mathbf{V} is again an orthonormal basis of \mathbf{V}. We found that orthogonal operators can alternatively be characterized as those linear operators which preserve the scalar product, meaning that

\langle U\mathbf{v},U\mathbf{w} \rangle = \langle \mathbf{v},\mathbf{w} \rangle \quad\forall \mathbf{v},\mathbf{w} \in \mathbf{W}.

Yet another way to characterize orthogonal operators is to say that they are invertible, and

\langle \mathbf{v},U\mathbf{w} \rangle = \langle U^{-1}\mathbf{v},\mathbf{w}\rangle \quad\forall \mathbf{v},\mathbf{w} \in \mathbf{W}.

This last characterization makes contact with a more general operation on operators.

Theorem 1: For any operator A \in \mathrm{End}\mathbf{V}, there is a unique operator B \in \mathrm{End}\mathbf{V} such that

\langle \mathbf{v},A\mathbf{w} \rangle = \langle B\mathbf{v},\mathbf{w}\rangle \quad\forall \mathbf{v},\mathbf{w} \in \mathbf{W}.

Proof: We first prove that an operator with the desired property exists. Let E=\{\mathbf{e}_1,\dots,\mathbf{e}_n\} be an orthonormal basis in \mathbf{V}. Let B \in \mathrm{End}\mathbf{V} be the operator defined by

B\mathbf{e}_j = \sum_{i=1}^n \langle A\mathbf{e}_i,\mathbf{e}_j \rangle \mathbf{e}_i, \quad 1 \leq j \leq n.

That is, we have defined the operator B in such a way that its matrix elements relative to the basis E satisfy

\langle \mathbf{e}_i,B\mathbf{e}_j \rangle = \langle A\mathbf{e}_i,\mathbf{e}_j \rangle = \langle \mathbf{e}_j,A\mathbf{e}_i\rangle,

which is equivalent to saying that the (i,j)-element of the matrix [B]_E is equal to the (j,i)-element of the matrix [A]_E, a relationship which is usually expressed as saying that [B]_E is the transpose of [A]_E. Now, for any vectors \mathbf{v},\mathbf{w} \in \mathbf{V}, we have

\langle \mathbf{v},A\mathbf{w} \rangle \\ = \left \langle \sum_{i=1}^n \langle \mathbf{e}_i,\mathbf{v} \rangle \mathbf{e}_i,A\sum_{j=1}^n \langle \mathbf{e}_j,\mathbf{w} \rangle \mathbf{e}_j \right\rangle \\ = \left \langle \sum_{i=1}^n \langle \mathbf{e}_i,\mathbf{v} \rangle \mathbf{e}_i,\sum_{j=1}^n \langle \mathbf{e}_j,\mathbf{w} \rangle A\mathbf{e}_j \right\rangle \\ = \sum_{i,j=1}^n \langle \mathbf{e}_i,\mathbf{v} \rangle \langle \mathbf{e}_j,\mathbf{w}\rangle \langle \mathbf{e}_i,A\mathbf{e}_j \rangle \\ = \sum_{i,j=1}^n \langle \mathbf{e}_i,\mathbf{v} \rangle \langle \mathbf{e}_j,\mathbf{w}\rangle \langle B\mathbf{e}_i,\mathbf{e}_j \rangle \\ = \left \langle \sum_{i=1}^n \langle \mathbf{e}_i,\mathbf{v} \rangle B\mathbf{e}_i,\sum_{j=1}^n \langle \mathbf{e}_j,\mathbf{w} \rangle \mathbf{e}_j \right\rangle \\ = \left \langle B\sum_{i=1}^n \langle \mathbf{e}_i,\mathbf{v} \rangle \mathbf{e}_i,\sum_{j=1}^n \langle \mathbf{e}_j,\mathbf{w} \rangle \mathbf{e}_j \right\rangle \\ = \langle B\mathbf{v},\mathbf{w}\rangle.

Now we prove uniqueness. Suppose that B,C \in \mathrm{End}\mathbf{V} are two operators such that

\langle \mathbf{v},A\mathbf{w} \rangle = \langle B\mathbf{v},\mathbf{w} \rangle = \langle C\mathbf{v},\mathbf{w} \rangle \quad \forall \mathbf{v},\mathbf{w} \in \mathbf{V}.

Then, we have that

\left\langle (B-C)\mathbf{v},\mathbf{w} \right\rangle = 0 \quad\forall\mathbf{v},\mathbf{w} \in \mathbf{V}.

In particular, we have

\left\langle (B-C)\mathbf{e}_j,\mathbf{e}_i \right\rangle = 0 \quad 1 \leq i,j \leq n,

or in other words that [B-C]_E is the zero matrix. Since the map which takes an operator to its matrix relative to E is an isomorphism, this means that B-C is the zero operator, or equivalently that B=C.

— Q.E.D.

Definition 1: For each A \in \mathrm{End}\mathbf{V}, we denote by A^* the unique operator such that \langle A^*\mathbf{v},\mathbf{w} \rangle = \langle \mathbf{v},A\mathbf{w} \rangle for all \mathbf{v},\mathbf{w} \in \mathbf{V}. We call A^* the adjoint of A.

According to Definition 1, yet another way to express that U \in \mathrm{End}\mathbf{V} is an orthogonal operator is to say that the adjoint of U is the inverse of U, i.e. U^*=U^{-1}. Operators on Euclidean space which are related to their adjoint in some predictable way play a very important role in linear algebra.

Definition 2: An operator X \in \mathrm{End}\mathbf{V} is said to be selfadjoint if X^*=X.

Owing to the fact that a selfadjoint operator X satisfies

\langle\mathbf{v},X\mathbf{w} \rangle = \langle X\mathbf{v},\mathbf{w} \rangle \quad \forall \mathbf{v},\mathbf{w} \in \mathbf{V},

selfadjoint operators are also often called symmetric operators. The matrix of a selfajdoint operator in any basis is equal to its own transpose. We are going to study selfadjoint operators extensively in the coming lectures.

Defintion 3: An operator A \in \mathrm{End}\mathbf{V} is said to be normal if it commutes with its adjoint, i.e. A^*A=AA^*.

Proposition 1: Orthogonal operators are normal operators, and selfadjoint operators are normal operators.

Proof: This is very straightforward, but worth going through at least once. If U is an orthogonal operator, then U^*U= U^{-1}U=I, and also UU^* = UU^{-1}=I. If X is a selfadjoint operator, then X^*X=XX=X^2, and also XX^*=XX=X^2.

— Q.E.D.

Lecture 16 coda

Math 31AH: Lecture 14

In Lecture 13, we discussed matrix representations of linear transformations between finite-dimensional vector spaces. In this lecture, we consider linear transformations between finite-dimensional Euclidean spaces, and discuss the relationship between the scalar product and the matrix representation of linear transformations. Note that any vector space \mathbf{V} can be promoted to a Euclidean space (\mathbf{V},\langle \cdot,\cdot \rangle) by choosing a basis E in \mathbf{V} and defining \langle \cdot,\cdot \rangle to be the unique scalar product on \mathbf{V} such that E is orthonormal.

Let \mathbf{V} and \mathbf{W} be Euclidean spaces; by abuse of notation, we will denote the scalar product in each of these spaces by the same symbol \langle \cdot,\cdot \rangle. Let E=\{\mathbf{e}_1,\dots,\mathbf{e}_n\} be an orthonormal basis in \mathbf{V}, and let F=\{\mathbf{f}_1,\dots,\mathbf{f}_m\} be an orthonormal basis in \mathbf{W}. Let A \in \mathrm{Hom}(\mathbf{V},\mathbf{W}) be a linear transformation.

Definition 1: The matrix elements of A relative to the bases E and F are the scalar products

\langle \mathbf{f}_i, A\mathbf{e}_j \rangle, \quad 1 \leq i \leq m,\ 1 \leq j \leq n.

The reason the number \langle \mathbf{f}_i,A\mathbf{e}_j \rangle is called a “matrix element” of A is that this number is exactly the (i,j)-element of the matrix [A]_{E,F} of defined in Lecture 13. Indeed, if

A\mathbf{e}_j = \sum_{k=1}^m a_{kj} \mathbf{f}_k,

then

\langle \mathbf{f}_i,A\mathbf{e}_j \rangle = \left\langle \mathbf{f}_i,\sum_{k=1}^m a_{kj} \mathbf{f}_k\right\rangle = \sum_{k=1}^m a_{kj} \langle \mathbf{f}_i,\mathbf{f}_k \rangle = a_{ij},

where the last equality follows from the orthonormality of F. However, one can note that it is not actually necessary to assume that \mathbf{V} and \mathbf{W} are finite-dimensional in order for the matrix elements of A to be well-defined. However, we will always make this assumption, and thus in more visual form, we have that

[A]_{E,F} = \begin{bmatrix} {} & \vdots & {} \\ \dots & \langle \mathbf{f}_i,A\mathbf{e}_j \rangle & \dots \\ {} & \vdots & {} \end{bmatrix}_{1 \leq i \leq m, 1 \leq j \leq n}.

The connection between matrices and scalar products is often very useful for performing computations which would be much more annoying without the use of scalar products. A good example is change of basis for linear operators. The setup here is that \mathbf{V}=\mathbf{W}, so that m=n and E,F are two (possibly) different orthonormal bases of the same Euclidean space. Given an operator A \in \mathrm{End}\mathbf{V}, we would like to understand the relationship between the two n \times n matrices

[A]_E \quad\text{ and }\quad [A]_F

which represent the operator A relative to the bases E and F, respectively. In order to do this, let us consider the linear operator U \in \mathrm{End}\mathbf{V} uniquely defined by the n equations

U\mathbf{e}_i = \mathbf{f}_i, \quad 1 \leq i \leq n.

Why do these n equations uniquely determine U? Because, for any \mathbf{v} \in \mathbf{V}, we have

U\mathbf{v} = U\sum_{i=1}^n \langle \mathbf{e}_i,\mathbf{v}\rangle \mathbf{e}_i = \sum_{i=1}^n \langle \mathbf{e}_i,\mathbf{v}\rangle U\mathbf{e}_i = \sum_{i=1}^n \langle \mathbf{e}_i,\mathbf{v}\rangle \mathbf{f}_i.

Let us observe that the operator U we have defined is an automorphism of \mathbf{V}, i.e. it has an inverse. Indeed, it is clear that the linear operator U^{-1} uniquely determined by the n equations

U^{-1}\mathbf{f}_i=\mathbf{e}_i, \quad 1 \leq i \leq n

is the inverse of U. Operators which transform orthonormal bases into orthonormal bases have a special name.

Definition 2: An operator U \in \mathrm{End}\mathbf{V} is said to be an orthogonal operator if it preserves orthonormal bases: for any orthonormal basis \{\mathbf{e}_1,\dots,\mathbf{e}_n\} in \mathbf{V}, the set \{U\mathbf{e}_1,\dots,U\mathbf{e}_n\} is again an orthonormal basis in \mathbf{V}.

Note that every orthogonal operator is invertible, since we can always define U^{-1} just as we did above. In particular, the operators U,U^{-1} we defined above by U\mathbf{e}_i=\mathbf{f}_i, U^{-1}\mathbf{f}_i=\mathbf{e}_i are orthogonal operators.

Proposition 1: An operator U \in \mathrm{End}\mathbf{V} is orthogonal if and only if

\langle U\mathbf{v},U\mathbf{w} \rangle = \langle \mathbf{v},\mathbf{w} \rangle, \quad \forall \mathbf{v},\mathbf{w} \in \mathbf{V}.

Proof: Observe that, by linearity of U and bilinearity of \langle \cdot,\cdot \rangle, it is sufficient to prove the claim in the case that \mathbf{v}=\mathbf{e}_i and \mathbf{w}=\mathbf{e}_j for some 1 \leq i,j \leq n, where \{\mathbf{e}_1,\dots,\mathbf{e}_n\} is an orthonormal basis of \mathbf{V}.

Suppose that U is an orthogonal operator. Let \mathbf{f}_i=U\mathbf{e}_i, 1 \leq i \leq n. Then \{\mathbf{f}_1,\dots,\mathbf{f}_n\} is an orthonormal basis of \mathbf{V}, and consequently we have

\langle U\mathbf{e}_i,U\mathbf{e}_j \rangle = \langle \mathbf{f}_i,\mathbf{f}_j \rangle = \delta_{ij} = \langle \mathbf{e}_i,\mathbf{e}_j \rangle.

Conversely, suppose that

\langle U\mathbf{e}_i,U\mathbf{e}_j \rangle = \langle \mathbf{e}_i,\mathbf{e}_j \rangle.

We then have that \langle \mathbf{f}_i,\mathbf{f}_j \rangle = \delta_{ij}, so that \{\mathbf{f}_1,\dots,\mathbf{f}_n\} is an orthonormal basis of \mathbf{V}, and thus U is an orthogonal operator.

— Q.E.D.

Proposition 2: An operator U \in \mathrm{End}\mathbf{V} is orthogonal if and only if it is invertible and

\langle \mathbf{v},U\mathbf{w} \rangle = \langle U^{-1}\mathbf{v},\mathbf{w} \rangle, \quad \forall \mathbf{v},\mathbf{w} \in \mathbf{V}.

Proof: Suppose first that U is orthogonal. Then, U is invertible and U^{-1} is also orthonal, and hence for any \mathbf{v},\mathbf{w} \in \mathbf{W}, we have

\langle \mathbf{v},U\mathbf{w} \rangle = \langle U^{-1}\mathbf{v},U^{-1}U\mathbf{w} \rangle = \langle U^{-1}\mathbf{v},\mathbf{w} \rangle.

Conversely, suppose that U is invertible and

\langle \mathbf{v},U\mathbf{w} \rangle = \langle U^{-1}\mathbf{v},\mathbf{w} \rangle, \quad \forall \mathbf{v},\mathbf{w} \in \mathbf{V}.

Then, for any \mathbf{v},\mathbf{w} \in \mathbf{V}, we have

\langle U\mathbf{v},U\mathbf{w} \rangle = \langle U^{-1}U\mathbf{v},\mathbf{w} \rangle = \langle \mathbf{v},\mathbf{w} \rangle,

whence U is orthogonal by Proposition 1.

— Q.E.D.

Now let us return to the problem that we were working on prior to our digression into the generalities of orthogonal operators, namely that of computing the relationship between the matrices [A]_E,[A]_F. We have

\langle \mathbf{f}_i,A\mathbf{f}_j \rangle = \langle U\mathbf{e}_i,AU\mathbf{e}_j \rangle = \langle \mathbf{e}_i,U^{-1}AU\mathbf{e}_j \rangle, \quad \forall 1 \leq i, j \leq n,

where we used Proposition 2 to obtain the second equality. Thus, we have the matrix equation

[A]_F = [UAU^{-1}]_E = [U]_E [A]_E [U^{-1}]_E = [U]_E [A]_E [U]_E^{-1}

where on the right hand side we are using the fact that

[\cdot]_E \colon \mathrm{End}\mathrm{V} \to \mathbb{R}^{n\times n}

is an algebra isomorphism, as in Lecture 13, which means that the matrix representing a product of operators is the product of the matrices representing each operator individually. This relationship between is usually phrased as the statement that the matrix [A]_F representing the operator A in the “new” basis F is obtained from the matrix [A]_E representing A in the “old” basis F by “conjugating” it by the of the matrix [A]_E by the matrix [U]_E, where U is the orthogonal operator that transforms the old basis into the new basis.

Lecture 14 coda

Math 31AH: Lecture 13

Last time, we began our discussion of linear transformations, and in particular observed that the set \mathrm{Hom}(\mathbf{V},\mathbf{W}) of linear transformations from a vector space \mathbf{V} to a vector space \mathbf{W} is itself a vector space in a natural way, because there are natural ways to add and scale linear transformations which are compliant with the vector space axioms. In the case that \mathbf{V}=\mathbf{W}, linear transformations are usually referred to as “linear operators,” and the vector space \mathrm{Hom}(\mathbf{V},\mathbf{V}) of linear operators on \mathbf{V} is typically denoted \mathrm{End}\mathbf{V}. This notation stems from the fact that a fancy name for a linear operator is endomorphism. This term derives from the Greek endon, meaning within, and is broadly used in mathematics to emphasize that one is considering a function whose range is contained within its domain. Linear operators are special in that, in addition to being able to scale and add them to one another, we can also multiply them in a natural way. Indeed, given two linear operators A,B \in \mathrm{End}\mathbf{V}, we may define their product to be their composition, i.e. AB:=A \circ B. Spelled out, this means that AB \in \mathrm{End}\mathbf{V} is the linear operator defined by

AB(\mathbf{v}):=A(B(\mathbf{v})) \quad \forall \mathbf{v} \in \mathbf{V}.

So \mathbf{End}\mathbf{V} is a special type of vector space whose vectors (which are operators) can be scaled, added, and multiplied. Such vector spaces warrant their own name.

Definition 1: An algebra is a vector space \mathbf{V} together with a multiplication rule

\mathbf{V} \times \mathbf{V} \to \mathbf{V}

which is bilinear and associative.

Previously, the only algebra we had encountered was \mathbb{R}, and now we find that there are in fact many more algebras, namely all vector spaces \mathrm{End}\mathbf{V} for \mathbf{V} an arbitrary vector space. So, linear operators are in some sense a generalization of numbers.

However, there are some notable differences between numerical multiplication and the multiplication of operators. One of the main differences is that multiplication of linear operators is noncommutative: it is not necessarily the case that AB=BA.

Exercise 1: Find an example of linear operators A,B such that AB\neq BA.

Another key difference between the arithmetic of numbers and the arithmetic of operators is that division is only sometimes possible: it is not the case that all non-zero operators have a multiplicative inverse, which is defined as follows.

Definition 2: An operator A \in \mathrm{End}\mathbf{V} is said to be invertible if there exists an operator B \in \mathrm{End}\mathbf{V} such that AB=BA=I, where I \in \mathbf{End}\mathbf{V} is the identity operator defined by I\mathbf{v}=\mathbf{v} for all \mathbf{v} \in \mathbf{V}.

You should take a moment to compare this definition of invertible linear operator with the definition of a vector space isomorphism from Lecture 2. You will then that A being invertible is equivalent to A \colon \mathbf{V} \to \mathbf{V} being an isomorphism of \mathbf{V} with itself. An isomorphism from a vector space to itself is called an automorphism where the prefix “auto” is from the Greek work for “self.” The set of all invertible linear operators in \mathrm{End}\mathbf{V} is therefore often denoted \mathrm{Aut}\mathbf{V}.

Proposition 1: If A \in \mathrm{Aut}\mathbf{V}, then there is precisely one operator B \in \mathrm{End}\mathbf{V} such that AB=BA=I.

Proof: Suppose that B,C \in \mathrm{End}\mathbf{V} are such that

AB=BA=I=AC=CA.

Then we have

AB = AC \implies BAB = BAC \implies IB = IC \implies B=C.

— Q.E.D.

Thus, if A is an invertible operator, then it has a unique inverse, so it is reasonable to call this “the inverse” of A, and denote it A^{-1}. You should check for yourself that A^{-1} is invertible, and that its inverse is A, i.e. that (A^{-1})^{-1}=A.

Exercise 2: Find an example of a nonzero linear operator which is not invertible.

Proposition 2: The set \mathrm{Aut}\mathbf{V} of invertible operators is closed under multiplication: if A,B \in \mathrm{Aut}\mathbf{V}, then AB \in \mathrm{Aut}\mathbf{V}.

Proof: We have

(AB)(B^{-1}A^{-1})=A(BB^{-1})A^{-1} = AIA^{-1} = AA^{-1}=I,

which shows that AB is invertible, and that (AB)^{-1}=A^{-1}B^{-1}..

— Q.E.D.

Proposition 2 shows that the set \mathrm{Aut}\mathbf{V} is an example of a type of algebraic structure called a group, which roughly means a set together with a notion of multiplication in which every element has an inverse. We won’t give the precise definition of a group, since the above is the only example of a group we will see in this course. The subject of group theory is its own branch of algebra, and it has many connections to linear algebra.

All of the above may seem quite abstract, and perhaps it is. However, in the case of finite-dimensional vector spaces, linear transformations can be described very concretely as tables of numbers, i.e. as matrices. Consider the vector space \mathrm{Hom}(\mathbf{V},\mathbf{W}) of linear transformations from an n-dimensional vector space \mathbf{V} to an m-dimensional vector space \mathbf{W}. Let A \in \mathrm{Hom}(\mathbf{V},\mathbf{W}) be a linear transformation, let E=\{\mathbf{e}_1,\dots,\mathbf{e}_n\} be a basis of \mathbf{V}, and let F=\{\mathbf{f}_1,\dots,\mathbf{f}_m\} be a basis of \mathbf{W}. The transformation A is then uniquely determined by the finitely many vectors

A\mathbf{e}_1,\dots,A\mathbf{e}_n.

Indeed, any vector \mathbf{v} \in \mathbf{V} may be uniquely represented as a linear combination of vectors in E,

\mathbf{v}=x_1\mathbf{e}_1 + \dots + x_n \mathbf{e}_n,

and we then have

A\mathbf{v}= A(x_1\mathbf{e}_1 + \dots + x_n \mathbf{e}_n)= x_1A\mathbf{e}_1 + \dots + x_n A\mathbf{e}_n.

Now, we may represent each of the vectors A\mathbf{e}_j as a linear combination of the vectors in F,

A\mathbf{e}_j = \sum_{i=1}^m a_{ij} \mathbf{f}_i, \quad 1 \leq j \leq n,

and we then have

A\mathbf{v} = \sum_{j=1}^n x_jA\mathbf{e}_j = \sum_{j=1}^n x_j\sum_{i=1}^m a_{ij}\mathbf{f}_i=\sum_{i=1}^m \left( \sum_{j=1}^n a_{ij}x_j \right)\mathbf{f}_i.

Thus, if

A\mathbf{v}= \sum_{i=1}^m y_i \mathbf{f}_i,

is the unique representation of the vector A\mathbf{v} \in \mathbf{W} relative to the basis F of \mathbf{W}. So, our computation shows that we have the matrix equation

\begin{bmatrix} y_1 \\ \vdots \\ y_m \end{bmatrix} = \begin{bmatrix} {} & \vdots & {} \\ \dots & a_{ij} & \dots \\ {} & \vdots & {} \end{bmatrix} \begin{bmatrix} x_1 \\ \vdots \\ x_n \end{bmatrix}.

Schematically, this matrix equation can be expressed as follows: for any \mathbf{v} \in \mathbf{V}, we have that

[A\mathbf{v}]_F = [A]_{E,F} [\mathbf{v}]_E,

where, [\mathbf{v}]_E denotes the n \times 1 matrix whose entries are the coordinates of the vector \mathbf{v} \in \mathbf{V} relative to the basis E of \mathbf{V}, [\mathbf{w}]_F denotes the m \times 1 matrix whose entries are the coordinates of the vector \mathbf{w} \in \mathbf{W} relative to the basis F of \mathbf{W}, and [A]_{E,F} is the m \times n matrix

[A]_{E,F} = \begin{bmatrix} [A\mathbf{e}_1]_F & \dots & [A\mathbf{e}_n]_F \end{bmatrix}

whose jth column is the m \times 1 matrix [A\mathbf{e}_j]_F. What this means at the conceptual level is the following: choosing a basis E in \mathbf{V} results in a vector space isomorphism \mathbf{V} \to \mathbb{R}^n defined by

\mathbf{v} \mapsto [\mathbf{v}]_E,

choosing a basis F in \mathbf{W} results in a vector space isomorphism \mathbf{W} \to \mathbb{R}^m defined by

\mathbf{w} \mapsto [\mathbf{w}]_F,

and these two choices together result in a vector space isomorphism \mathrm{Hom}(\mathbf{V},\mathbf{W}) \to \mathbb{R}^{m \times n} defined by

A \mapsto [A]_{E,F}.

Let us consider how the above works in the special case that \mathbf{V}=\mathbf{W} and E=F. We are then dealing with linear operators A \in \mathrm{End}\mathbf{V}, and the matrix representing such an operator is the square n \times n matrix

[A]_E = \begin{bmatrix} [A\mathbf{e}_1]_E & \dots & [A\mathbf{e}_n]_E \end{bmatrix}.

For every \mathbf{v} \in \mathbf{V}, we have the matrix equation

[A\mathbf{v}]_E = [A]_E [\mathbf{v}]_E.

In this case, there is an extra consideration. Suppose we have two linear operators A,B \in \mathrm{End}\mathbf{V}. Then, we also have their product AB \in \mathrm{End}\mathbf{V}, and a natural issue is to determine the relationship between the matrices [A]_E,[B]_E, and [AB]_E. Let us now work this out.

Start with a vector \mathbf{v} \in \mathbf{V}, and let

\mathbf{v}=x_1\mathbf{e}_1+ \dots + x_n\mathbf{e}_n

be its representation relative to the basis E of \mathbf{V}. Let

B\mathbf{e}_j = \sum_{I=1}^n b_{ij}\mathbf{e}_i, \quad 1 \leq i \leq n

be the representations of the vectors B\mathbf{e}_1,\dots,B\mathbf{e}_n relative to the basis E.

We then have

AB(\mathbf{v}) \\ = \sum_{j=1}^n x_j AB\mathbf{e}_j \\ = \sum_{j=1}^n x_j A\sum_{k=1}^n b_{kj}\mathbf{e}_k \\ = \sum_{j=1}^n x_j \sum_{k=1}^n b_{kj}A\mathbf{e}_k \\ = \sum_{j=1}^n x_j \sum_{k=1}^n b_{kj}\sum_{i=1}^n a_{ik} \mathbf{e}_i \\ = \sum_{i=1}^n  \left( \sum_{j=1}^n \left(\sum_{k=1}^n a_{ik}b_{kj} \right)x_j\right)\mathbf{e}_i.

This shows that the matrix of the product transformation AB relative to the basis E is given by the product of the matrices representing A and B in this basis, i.e.

[AB]_E = [A]_E [B]_E.

So, in the case of linear operators, the isomorphism \mathrm{End}\mathbf{V} \to \mathbb{R}^{n \times n} given by

A \mapsto [A]_E

is not just a vector space isomorphism, but a vector space isomorphism compatible with multiplication — an algebra isomorphism.