Math 31AH: Lecture 27

Let \mathbf{V} be a two-dimensional Euclidean space with orthonormal basis E=\{\mathbf{e}_1,\mathbf{e}_2\}. Let A \in \mathrm{End}\mathbf{V} be the operator defined by

A\mathbf{e}_1 = \mathbf{e}_1+\mathbf{e}_2,\ A\mathbf{e}_1 = -\mathbf{e}_1+\mathbf{e}_2,

so that

[A]_E = \begin{bmatrix} 1 & -1 \\ 1 & 1 \end{bmatrix}.

Geometrically, the operator A acts on vectors in \mathbf{V} by rotating them counterclockwise through an angle of 45^\circ, and then scaling it by \sqrt{2}. It is geometrically clear that A has no eigenvalues: rotating any vector by 45^\circ results in a new vector which is not a scalar multiple of the original vector. Right? Maybe not, now that we know about complex numbers.

By definition, a nonzero vector \mathbf{v} \in \mathbf{V} is an eigenvector of A if and only if we have R\mathbf{v}= \lambda \mathbf{v} for some scalar \lambda. This is the same thing as saying that the vector \mathbf{v} belongs to the kernel of the operator A-\lambda I, where I is the identity operator. This is in turn equivalent to saying that the kernel of A contains the nonzero vector \mathbf{v}, which means that A is not invertible, which in turn means that \det(A-\lambda I)=0. This chain of reasoning is true in general, i.e. we have the following general proposition.

Proposition 1: The eigenvalues of an operator A \in \mathrm{End}\mathbf{V} are exactly the scalars \lambda such that \det(A-\lambda I)=0.

So, according to Proposition 1, to find the eigenvalues of an operator A, we can try to solve the characteristic equation

\det(A-\lambda)I = 0.

The “unknown” in this equation is \lambda.

Let us write down the characteristic equation of the rotation operator A defined above. We have

(A-\lambda I)^{\wedge 2} \mathbf{e}_1 \wedge \mathbf{e}_2 \\ = (A-\lambda I) \mathbf{e}_1 \wedge (A-\lambda I)\mathbf{e}_2 \\ = (A\mathbf{e}_1-\lambda\mathbf{e}_1) \wedge (A\mathbf{e}_2 - \lambda \mathbf{e}_2) \\  =(\mathbf{e}_1+\mathbf{e}_2-\lambda\mathbf{e}_1)\wedge (-\mathbf{e_1}+\mathbf{e}_2-\lambda\mathbf{e}_2) \\ = ((1-\lambda)\mathbf{e}_1 + \mathbf{e}_2) \wedge (-\mathbf{e}_1+(1-\lambda)\mathbf{e}_2) \\ = (1-\lambda)^2\mathbf{e}_1 \wedge \mathbf{e}_2 -\mathbf{e}_2 \wedge \mathbf{e}_1 \\ = \left( (\lambda-1)^2+1 \right)\mathbf{e}_1 \wedge \mathbf{e}_2,

so that \det(A-\lambda I) = (\lambda-1)^2+1 and the characteristic equation of A is

(\lambda-1)^2+1=0.

There is no number \lambda \in \mathbb{R} which solves the characteristic equation, since for any such number the LHS is the sum of a nonnegative number and a positive number, which is a nonzero quantity. However, if we widen our scope of the number concept, this equation does have solutions, corresponding to the fact that

i^2+1=0 \text{ and } (-i)^2+1=0,

where i \in \mathbb{C} is the imaginary unit, as introduced in Lecture 26. That is, while the characteristic equation has no real solutions, it has the two distinct complex solutions

\lambda_1 = 1+i \text{ and } \lambda_2 = 1-i.

In fact, this is always the case: the main advantage of complex vector spaces is that operators on such spaces always have eigenvalues.

Theorem 1: If A \in \mathrm{End}V is a linear operator on an n-dimensional complex vector space, then A has n (not necessarily distinct) complex eigenvalues \lambda_1,\dots,\lambda_n.

Proof: The eigenvalues of A are the solutions of the characteristic equation, \det(A-\lambda I)=0. Now, since

(A-\lambda I)\mathbf{e}_1 \wedge \dots \wedge (A-\lambda I)\mathbf{e}_n = \det(A-\lambda)I \mathbf{e}_1 \wedge \dots \wedge (A-\lambda I)\mathbf{e}_n,

where \{\mathbf{e}_1,\dots,\mathbf{e}_n\} is any basis of \mathbf{V}, the determinant \det(A-\lambda I) is a polynomial function of \lambda, and the highest degree term of which is (-1)^n \lambda^n. But the fundamental theorem of algebra says that every polynomial of degree n has n (not necessarily distinct) roots in \mathbb{C}.

— Q.E.D.

The above is saying that if we consider the rotation operator A of our example as on operator on a complex vector space, then it does have eigenvalues, even though it did not when considered as an operator on a real vector space. Now comes the question of what the eigenvectors corresponding to these eigenvalues are. In order for the solutions of the characteristic equation to actually correspond to eigenvalues of the operator A, there must be nonzero vectors \mathbf{f}_1,\mathbf{f}_2 \in \mathbf{V} such that

A\mathbf{f}_1 = (1+i)\mathbf{v}_1 \text{ and } A\mathbf{v}=(1-i)\mathbf{f}_2.

Let us see if we can actually calculate \mathbf{f}_1 and \mathbf{f}_2. We have that

[A-\lambda_1I]_E = \begin{bmatrix} 1-(1+i) & -1 \\ 1 & 1-(1+i) \end{bmatrix} = \begin{bmatrix} -i & -1 \\ 1 & -i \end{bmatrix}.

Thus, \mathbf{f}_1=x\mathbf{e}_1+y\mathbf{e}_2 satisfies A\mathbf{f}_1=\lambda_1\mathbf{f}_1 if and only if x,y \in \mathbb{C} are complex numbers, not both zero, such that

\begin{bmatrix} 1-(1+i) & -1 \\ 1 & 1-(1+i) \end{bmatrix} = \begin{bmatrix} -i & -1 \\ 1 & -i \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \end{bmatrix},

or equivalently

ix+y = 0\\ x-iy=0.

By inspection, x=i and y=1 are solutions to the above equations, whence

\mathbf{f}_1 = i\mathbf{e}_1 + \mathbf{e}_2

is an eigenvector of A. Similarly, \mathbf{f}_2=x\mathbf{e}_1+y\mathbf{e}_2 satisfies A\mathbf{f}_2=\lambda_2\mathbf{f}_2 if and only if x,y \in \mathbb{C} are complex numbers, not both zero, such that

\begin{bmatrix} 1-(1-i) & -1 \\ 1 & 1-(1-i) \end{bmatrix} = \begin{bmatrix} i & -1 \\ 1 & i \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \end{bmatrix},

or equivalently

ix-y=0 \\ x+iy=0.

By inspection, x=i and y=-1 are solutions of these equations, whence

\mathbf{f}_2=i\mathbf{e}_1-\mathbf{e}_2

is an eigenvector of A. Now, what is the whole point of calculating eigenvalues and eigenvectors? Well, if F=\{\mathbf{f}_1,\mathbf{f}_2\} is a basis of \mathbf{V}, then we will have found that the matrix of the operator A in this basis is diagonal,

[A]_F = \begin{bmatrix} 1+i &0 \\ 0 & 1-i \end{bmatrix},

which is a more convenient matrix representation of A then that given by [A]_E, since we can use it to easily do computations with A. So, we wish to show that F = \{\mathbf{f}_1,\mathbf{f}_2\} is a linearly independent set in \mathbf{V}. This follows immediately if \mathbf{f}_1,\mathbf{f}_2 are orthogonal, so let’s see if we get lucky:

\langle \mathbf{f}_1,\mathbf{f}_2 \rangle = \langle i\mathbf{e}_1+\mathbf{e}_2,i\mathbf{e}_1 - \mathbf{e}_2 \rangle = i^2\langle \mathbf{e}_1,\mathbf{e}_1 \rangle - \langle \mathbf{e}_2,\mathbf{e}_2\rangle = -1 -1 = -2 \neq 0.

We didn’t get lucky, and worse than that this scalar product calculation suggests that something unpleasant happens when we start computing scalar products with complex numbers. Indeed, if we modify the above calculation by computing the scalar product of \mathbf{f}_1 with itself, we find that

\langle \mathbf{f}_1,\mathbf{f}_1 \rangle = \langle i\mathbf{e}_1+\mathbf{e}_2,i\mathbf{e}_1 + \mathbf{e}_2 \rangle = i^2\langle \mathbf{e}_1,\mathbf{e}_1 \rangle + \langle \mathbf{e}_2,\mathbf{e}_2\rangle = -1 +1 = 0.

This is disturbing, since it says that the nonzero vector \mathbf{f}_1 is orthogonal to itself, or equivalently that it has zero length, \|\mathbf{f}_1 \|=0. The original of this problem is that, unlike squares of real numbers, squares of complex numbers can be negative. We have to modify the scalar product for complex vector spaces to accommodate this, we insist that a complex scalar product \langle \cdot,\cdot \rangle is an antilinear function of its first argument:

\langle z_1 \mathbf{v}_1 + z_2 \mathbf{v}_2, \mathbf{w} \rangle = \overline{z}_1\langle \mathbf{v}_1, \mathbf{w} \rangle + \overline{z}_2\langle \mathbf{v}_2, \mathbf{w} \rangle,

where if z=x+yi then \bar{z} = x-yi is the complex conjugate of z. Complex vector spaces with which come with a complex scalar product are the complex version of Euclidean spaces, and they have a special name.

Definition 1: A Hilbert space is a pair (\mathbf{V},\langle \cdot,\cdot \rangle) consisting of a complex vector space \mathbf{V} together with a complex scalar product.

Continuing with our running example, let us re-compute the inner product of the eigenvectors \mathbf{f}_1,\mathbf{f}_2 of the rotation operator that we found above. Interpreting \langle \cdot,\cdot \rangle as a complex inner product, we now find that

\langle \mathbf{f}_1,\mathbf{f}_2 \rangle = \langle i\mathbf{e}_1+\mathbf{e}_2, i\mathbf{e}_1-\mathbf{e}_2 \rangle = \langle i\mathbf{e}_1,i\mathbf{e}_1 \rangle + \langle \mathbf{e}_2,-\mathbf{e}_2 \rangle = \bar{i}i \langle \mathbf{e}_1,\mathbf{e}_1 \rangle - \langle \mathbf{e}_2,-\mathbf{e}_2 \rangle = -ii-1 = 1-1 =0,

so that \mathbf{f}_1,\mathbf{f}_2 actually are orthogonal with respect to the complex scalar product on \mathbf{V} in which the basis E=\{\mathbf{e}_1,\mathbf{e}_2\} is orthonormal. Thus F=\{\mathbf{f}_1,\mathbf{f}_2\} is a basis of the complex vector space \mathbf{V} consisting of eigenvectors of the operator A, meaning that while A has no eigenvalues or eigenvectors when considered as an operator on a Euclidean space, it is in fact semisimple when considered as an operator on a Hilbert space.

Even though they may initially seem more complicated, Hilbert spaces are actually easier to work with than Euclidean spaces — linear algebra runs more smoothly over the complex numbers than over the real numbers. For example, it is possible to give a succinct necessary and sufficient criterion for an operator A on a (finite-dimensional) Hilbert space \mathbf{V} to be semisimple. As in the case of a Euclidean space, define the adjoint of A to be the unique operator A^* \in \mathrm{End}\mathbf{V} such that

\langle A^*\mathbf{v},\mathbf{w} \rangle = \langle \mathbf{v},A\mathbf{w}\rangle \quad \forall\ \mathbf{v},\mathbf{w} \in \mathbf{V}.

Theorem 2: A is a semisimple operator if and only if it commutes with its adjoint, meaning that A^*A=AA^*.

I regret that we will not have time to prove this Theorem.

Math 31AH: Lecture 26

For a long time, we have been skirting around the issue of whether or not it is possible to multiply vectors in a general vector space \mathbf{V}. We gave two answers to this question which are not really answers at all: we discussed ways to multiply vectors to get products which do not lie in the same vector space as their factors.

First, we showed that it is possible to multiply vectors in \mathbf{V} in such a way that the product of any two vectors \mathbf{v},\mathbf{w} is a number \langle \mathbf{v},\mathbf{w} \rangle. This sort of multiplication is what we termed a “bilinear form” on \mathbf{V}. The best bilinear forms are those which satisfy the scalar product axioms, because these allow us to talk about lengths of vectors and angles between vectors in \mathbf{V}. However, the bilinear form concept doesn’t answer the original question about multiplying vectors, because the product of \mathbf{v} and \mathbf{w} belongs to the vector space \mathbb{R}, which is probably not \mathbf{V}.

Second, we found that it is possible to multiply vectors in \mathbf{V} in such a way that the product of any two vectors \mathbf{v}_1,\mathbf{v}_2 is a tensor, namely the tensor \mathbf{v}_1 \otimes \mathbf{v}_2. This is useful because it ultimately led us to a related product, the wedge product \mathbf{v} \wedge \mathbf{w}, which allowed us to efficiently characterize linear independence and to introduce a notion of volume in \mathbf{V}. However, it again doesn’t answer the original question about multiplying vectors, because the product of \mathbf{v}_1 and \mathbf{v}_2 belongs to the vector space \mathbf{V} \otimes \mathbf{V}, which is definitely not \mathbf{V}.

Today, we will finally investigate the question of how to multiply two vectors to get a vector in the same space. We now have the tools to discuss this quite precisely.

Defintion 1: Given a vector space \mathbf{V}, a multiplication in \mathbf{V} is a linear transformation

M \colon \mathbf{V} \otimes \mathbf{V} \to \mathbf{V}.

It is reasonable to refer to an arbitrary linear transformation M \in \mathrm{Hom}(\mathbf{V} \otimes \mathbf{V}) as a multiplication because every such M possesses the fundamental property of multiplication that we refer to as bilinearity: it satisfies the FOIL identity

M\left((x_1\mathbf{v}_1 + y_1\mathbf{w}_1) \otimes (x_2\mathbf{v}_2 + y_2\mathbf{w}_2)\right)\\ =x_1x_2M(\mathbf{v}_1 \otimes \mathbf{v}_2) + x_1y_2M(\mathbf{v}_1\otimes \mathbf{w}_2) + x_2y_1M(\mathbf{v}_2 \otimes \mathbf{w}_1) + x_2y_2M(\mathbf{v}_2 \otimes \mathbf{w}_2).

Indeed, this is true precisely because \mathbf{V} \otimes \mathbf{V} was constructed as the vector space of all “unevaluated” products of vectors multiplied according to an unspecified bilinear multiplication \otimes, and the linear transformation M performs the missing evaluation.

We now see that there are many ways to multiply vectors — too many. Indeed, suppose \mathbf{V} is an n-dimensional vector space, and let E=\{\mathbf{e}_1,\dots,\mathbf{e}_n\} be a basis in \mathbf{V}. Then, a basis for \mathbf{V} is given by E \otimes E = \{\mathbf{e}_i \otimes \mathbf{e}_j \colon 1 \leq i,j \leq n\}, and hence every multiplication M \in \mathrm{Hom}(\mathbf{V} \otimes \mathbf{V}) uniquely corresponds to an n \times n^2 table of numbers, namely the matrix [M]_{E \otimes E,E}. But not all of these make for interesting multiplication rules. For example, we could choose M \in \mathrm{Hom}(\mathbf{V} \otimes \mathbf{V},\mathbf{V}) to be the zero transformation, which sends every tensor in \mathbf{V} \otimes \mathbf{V} to the zero vector \mathbf{0}_\mathbf{V} in \mathbf{V}. This is a rule for multiplying vectors in \mathbf{V}, but it is accurately described as “trivial.” We would like to find nontrivial multiplication rules which mimic our experience multiplying real numbers.

Defintion 2: A normed division algebra is a pair (\mathbf{V},M) consisting of a Euclidean space \mathbf{V} together with a multiplication M \in \mathrm{Hom}(\mathbf{V} \otimes \mathbf{V},\mathbf{V}) which has the following properties:

  1. There is a vector \mathbf{1} \in \mathbf{V} such that M(\mathbf{1} \otimes \mathbf{v})= M(\mathbf{v} \otimes \mathbf{1}) = \mathbf{v} for all \mathbf{v} \in \mathbf{V}.
  2. For every \mathbf{v} \in \mathbf{V}\backslash \{\mathbf{0}\}, there is a corresponding \mathbf{w} \in \mathbf{V} such that M(\mathbf{v} \otimes \mathbf{w})=M(\mathbf{w} \otimes \mathbf{v}) = \mathbf{1}.
  3. For every \mathbf{v}_1,\mathbf{v}_2 \in \mathbf{V}, we have \|M(\mathbf{v}_1 \otimes \mathbf{v}_2)\| = \|\mathbf{v}_1\| \|\mathbf{v}_2\|.

The axioms above are very natural, and reflect familiar properties of the real number system. The first stipulates that a normed division algebra should contain a multiplicative unit \mathbf{1} analogous to the real number 1 in the sense that multiplication by it does nothing. The second says that any nonzero element in our algebra should have a multiplicative inverse: multiplying an element by its inverse produces the unit element \mathbf{1}. The first says that our algebra has a norm analogous to the absolute value of a real number, in that the norm of a product of two vectors is the product of their norms.

Example 1: Let \mathbf{V} be a one-dimensional Euclidean space with orthonormal basis E=\{\mathbf{1}\}. Let M \in \mathrm{Hom}(\mathbf{V} \otimes \mathbf{V},\mathbf{V}) be the linear transformation uniquely determined by M(\mathbf{1} \otimes \mathbf{1})=\mathbf{1}. Then (\mathbf{V},M) is a normed division algebra (very easy exercise: check that the axioms are satisfied).

Further examining Example 1, we see that the multiplication of arbitrary vectors \mathbf{v}_1,\mathbf{v}_2 \in \mathbf{V} is given by

M(\mathbf{v}_1 \otimes \mathbf{v}_2) = M\left( (\langle \mathbf{1},\mathbf{v}_1\rangle \mathbf{1}) \otimes (\langle \mathbf{1},\mathbf{v}_2\rangle \mathbf{1})\right) = \langle \mathbf{1},\mathbf{v}_1\rangle\langle \mathbf{1},\mathbf{v}_2\rangle M(\mathbf{1} \otimes \mathbf{1}) = \langle \mathbf{1},\mathbf{v}_1\rangle\langle \mathbf{1},\mathbf{v}_2\rangle\mathbf{1}.

So, to multiply two vectors in \mathbf{V}, we simply multiply their coordinates relative to the basis E=\{\mathbf{1}\} using multiplication of real numbers. Thus \mathbf{V} is essentially the same as \mathbb{R}, with the unit vector \mathbf{1} playing the role of the number 1. More precisely, the linear transformation T \colon \mathbf{V} \to \mathbb{R} uniquely determined by T(\mathbf{1})=1 is a vector space isomorphism which respects multiplication, i.e. an algebra isomorphism. In fact, thinking a bit more about this example, we find that every one-dimensional normed division algebra is isomorphic to \mathbb{R}.

Now we construct something new: a two-dimensional normed division algebra. Let \mathbf{V} be a 2-dimensional Euclidean space with orthonormal basis E=\{\mathbf{1},\mathbf{i}\}. Let M \in \mathrm{Hom}(\mathbf{V} \otimes \mathbf{V},\mathbf{V}) be the linear transformation defined by

M(\mathbf{1} \otimes \mathbf{1}) = \mathbf{1},\     M(\mathbf{1} \otimes \mathbf{i}) = \mathbf{i},\ M(\mathbf{i} \otimes \mathbf{1}) = \mathbf{i},\ M(\mathbf{i} \otimes \mathbf{i}) = -\mathbf{1}.

Thus for any two vectors \mathbf{v}_1 = x_1\mathbf{1} + y_1\mathbf{i} and \mathbf{v}_2 = x_2\mathbf{1} + y_2\mathbf{i} we have

M(\mathbf{v}_1 \otimes \mathbf{v}_2) \\ = M\left((x_1\mathbf{1} + y_1\mathbf{i}) \otimes (x_2\mathbf{1} + y_2\mathbf{i}) \right) \\ =x_1x_2 M(\mathbf{1} \otimes \mathbf{1}) + x_1y_2 M(\mathbf{1} \otimes \mathbf{i}) + x_2y_1 M(\mathbf{i} \otimes \mathbf{1}) + y_1y_2 M(\mathbf{i} \otimes \mathbf{i}) \\ = x_1x_2\mathbf{1} + x_1y_2 \mathbf{i} + x_2y_1 \mathbf{i} - y_1y_2\mathbf{1} \\ = (x_1x_2- y_1y_2)\mathbf{1} + (x_1y_2+x_2y_1)\mathbf{i}.

One nice aspect of M that is clear from the above computation is that M(\mathbf{v}_1 \otimes \mathbf{v}_2) = M(\mathbf{v}_2 \otimes \mathbf{v}_1), meaning that M defines a commutative multiplication (this is an extra property not required by the normed division algebra axioms).

Theorem 1: The algebra (\mathbf{V},M) constructed above is a normed division algebra.

Proof: We have to check the axioms. First, for any vector \mathbf{v}=x\mathbf{1}+y\mathbf{i}, we directly compute that

M(\mathbf{1} \otimes \mathbf{v}) = \mathbf{v},

so that \mathbf{1} is a multiplicative identity. Second, we have to show that \mathbf{v} has a multiplicative inverse, provided \mathbf{v} \neq \mathbf{0}. Let \mathbf{v}^*= x\mathbf{1}-y\mathbf{i}. We then have

M(\mathbf{v} \otimes \mathbf{v}^*) = (x^2+y_2)\mathbf{1} = \|\mathbf{v}\|^2\mathbf{1}.

Now \|\mathbf{v}\| \neq 0 since \mathbf{v} \neq \mathbf{0}, and hence we have that

M(\mathbf{v} \otimes \frac{1}{\|\mathbf{v}\|^2}\mathbf{v}^*) = \mathbf{1},

which shows that \mathbf{v} has the multiplicative inverse \frac{1}{\|\mathbf{v}\|^2}\mathbf{v}^*. Third and finally, we have

\|M(\mathbf{v}_1 \otimes \mathbf{v}_2)\|^2 \\ = \|(x_1x_2- y_1y_2)\mathbf{1} + (x_1y_2+x_2y_1)\mathbf{i}\|^2\\ = (x_1x_2- y_1y_2)^2 + (x_1y_2+x_2y_1)^2 \\ = x_1^2y_1^2 - 2x_1x_2y_1y_2 + y_1^2y_2^2 + x_1^2y_2^2 + 2x_1x_2y_1y_2 + x_2^2y_1^2 \\ = (x_1^2+x_2^2)(y_1^2+y_2^2) \\= \|\mathbf{v}_1\|^2\|\mathbf{v}_2\|^2,

whence

\|M(\mathbf{v}_1 \otimes \mathbf{v}_2)\|= \|\mathbf{v}_1\|\|\mathbf{v}_2\|.

— Q.E.D.

You have probably recognized by now that the above construction has produced the algebra of complex numbers (it is fine if you were not previously familiar with this term). Indeed, taking our Euclidean space \mathbf{V} to be \mathbf{V}=\mathbb{R}^2 with orthonormal basis \mathbf{1}=(1,0) and \mathbf{i}=(0,1) gives a simple visualization of this algebra as a rule for multiplying vectors in the Euclidean plane. The complex number system contains and enlarges the real number system, in the sense that \mathbf{R}^2 contains the 1-dimensional subspace

\mathrm{Span}\{\mathbf{1}\} = \{(x,0) \colon x \in \mathbb{R}\},

which is isomorphic to \mathbb{R}. In this context one usually uses the symbol \mathbb{C} instead of \mathbb{R}^2 to indicate that we are considering \mathbb{R}^2 to be not just a vector space, but a normed division algebra with the multiplication described above.

It makes a lot of sense to recalibrate your understanding of the word “number” so that it means “element of \mathbb{C}. Indeed, complex numbers behave just like ordinary real numbers in all the ways that matter: you can add, subtract, multiply, and divide complex numbers in just the way you do real numbers. In order to psychologically prime ourselves for thinking of complex numbers as numbers rather than vectors, we follow the usual notational tradition of un-bolding them. So we just write z \in \mathbb{C} to indicate that z is a complex number, and we write z=x+yi where x,y are ordinary real numbers and i is the “imaginary” unit. Technically, all these symbols mean exactly what they meant above, they’ve just been un-bolded. So, the product of two complex numbers z_1=x_1+y_1i and $z_2=x_2+y_2i$ is

z_1z_2 = (x_1x_2-y_1y_2) + (x_1y_1+x_2y_1)i.

It’s also customary to denote the norm of a complex number using just single lines, and never to calling it “absolute value:”

|z| = |x+yi| = \sqrt{x^2+y^2}.

Once we enlarge our understanding of numbers from real to complex, it is becomes natural to modify our concept of vector space accordingly. Namely, a complex vector space \mathbf{V} is a set together with two operations, vector addition and scalar multiplication, which satisfy exactly the same axioms as Definition 1 in Lecture 1, except with \mathbb{C} replacing \mathbb{R}. We will discuss further consequences of the passage from real vector spaces to complex vector spaces in the next lecture.

Before finishing this lecture, let us briefly consider a natural question which, historically, was one of the main motivating questions in the development algebra: what other normed division algebras might exist? This question was first considered in detail by the Irish mathematician William Rowan Hamilton in the 1800s. In modern terms, Hamilton’s goal was the following: given a 3-dimensional Euclidean space \mathbf{V}, he wanted to find a multiplication rule M \in \mathrm{Hom}(\mathbf{V} \otimes \mathbf{V},\mathbf{V}) which would turn \mathbf{V} into a normed division algebra. The three-dimensional case is of clear interest due to the three physical dimensions of our world; Hamilton was looking for what he called “spatial numbers.” Unfortunately, he wasn’t able to find what he was looking for, because it doesn’t exist. After a long period of trying without results, in 1843 he suddenly realized that his desired construction could be performed in four dimensions, which led him to a new normed division algebra which he called the quaternions.

To construct the quaternions, let \mathbf{V} be a 4-dimensional Euclidean space with orthonormal basis E=\{\mathbf{1},\mathbf{i},\mathbf{j},\mathbf{k}\}, and let M \in \mathrm{Hom}(\mathbf{V} \otimes \mathbf{V},\mathbf{V}) be the multiplication defined by the table

1ijk
11ijk
ii1kj
jjk1i
kkji1
Hamilton’s multiplication table.

In this table, the first row and column contain the basis vectors, and the internal cells contain the result of applying M to the tensor product of the corresponding tensor product of basis vectors. This turns out to give a normed division algebra; however, as you can see from the above table, this algebra is noncommutative. It is denoted \mathbb{H}, in Hamilton’s honor (and also because the symbol \mathbb{Q} is already taken).

It turns out that, in addition to \mathbb{R},\mathbb{C},\mathbb{H}, there is only one more normed division algebra. This algebra is called the octonions, because it consists of a multiplication rule for eight dimensional vectors; it is traditionally denoted \mathbb{O}. It was proved by Adolf Hurwitz that these four constitute the complete list of normed division algebras.

Every time we move up the list of normed division algebras, we lose something. In passing from \mathbb{R} to \mathbb{C}, we lose the fact that the real numbers are ordered: for any two distinct real numbers, it makes sense to say which is smaller and which is larger, but this doesn’t make sense for complex numbers. When we move from the complex numbers to the quaternions, we lose commutativity. When we move from quaternions to the octonions, things get even worse and we lose associativity. This means the following. You may notice that in our definition of algebras, we have only talked about multiplying two vectors. Of course, once we can multiply two, we’d like to multiply three, and four, etc. A multiplication M \in \mathrm{Hom}(\mathbf{V} \otimes \mathbf{V},\mathbf{V}) is said to be associative if

M(M(\mathbf{v}_1 \otimes \mathbf{v}_2),\mathbf{v}_3) = M(\mathbf{v}_1,M(\mathbf{v}_1 \otimes \mathbf{v}_2)).

For associative algebras, unambiguously defining the product of any finite number of vectors is not a problem. However, for octonions, this is not the case.

Math 31AH: Lecture 24

Let us begin with a brief recap of Lecture 23, which introduced a number of new concepts which may seem complex at first, but are in fact rather simple. Let \mathbf{V} be an n-dimensional vector space; the dimension n could be arbitrarily large, so visualizing geometry in \mathbf{V} can be arbitrarily hard. However, no matter what n is, we have a naturally associated 1-dimensional vector space \mathbf{V}^{\wedge n}. We can visualize any 1-dimensional vector space easily, by picturing it as a line. Algebraically, we associate to a given set of n vectors \{\mathbf{v}_1,\dots,\mathbf{v}_n\} of n vectors in \mathbf{V} a point on the line \mathbf{V}^{\wedge n}, namely the point \mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n. The set \{\mathbf{v}_1,\dots,\mathbf{v}_n\} is linearly independent if and only if the point \mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n is not the zero point on the line \mathbf{V}^{\wedge n}. This is pure algebra. If \mathbf{V} is a Euclidean space, meaning that we can measure lengths and angles in \mathbf{V} using a scalar product, then the line \mathbf{V}^{\wedge n} is a Euclidean line, i.e. a line on which we have a notion of distance defined. The volume of the parallelepiped \mathcal{P}(\mathbf{v}_1,\dots,\mathbf{v}_n) \subset \mathbf{V} is then the distance from the point \mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n to zero on the line \mathbf{V}^{\wedge n}. There are only two points at distance one from the origin on the line \mathbf{V}^{\wedge n}, which we call \omega and -\omega. These are analogous to the numbers 1 and -1 on the number line \mathbb{R}. Choosing one of these, say \omega, is what it means to choose an orientation on \mathbf{V}. The oriented volume of the parallelepiped \mathcal{P}(\mathbf{v}_1,\dots,\mathbf{v}_n) is then the unique number a \in \mathbb{R} such that \mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n = a\omega. For example, if \mathbf{V}=\mathbb{R}, choosing \omega =1 means the number line is oriented left to right, and choosing \omega=-1 means the number line is oriented right to left. In the former case, the oriented length of the line segment \mathcal{P}(2) is 2, while in the latter case it is -2. That’s it, more or less.

In this lecture, we use the above machinery to define a function

\det \colon \mathrm{End} \mathbf{V} \to \mathbb{R}

which tells us when a given linear operators A \in \mathrm{End}\mathbf{V} is invertible. This function is called the determinant. All the complexity (i.e. the hard stuff) is concentrated in the definitions from the past two lectures, and if you understand these, then the definition of the determinant is exceedingly simple.

Let A \in \mathrm{End}\mathbf{V} be a given linear operator. We associate to A the linear operator A^{\wedge n} \in \mathrm{End}\mathbf{V}^{\wedge n} defined by

A^{\wedge n}\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n :=A\mathbf{v}_1 \wedge \dots \wedge A\mathbf{v}_n.

Definition 1: The determinant of A is the unique eigenvalue of A^{\wedge n}.

Let us unpack this definition. The key point is that the vector space \mathbf{V}^{\wedge n} is 1-dimensional. This means that every operator in \mathrm{End}\mathbf{V}^{\wedge n} is simply multiplication by a number (see Problem 1 on Assignment 6). We are defining \mathrm{det}(A) to be the number by which the operator A^{\wedge n} \in \mathbf{V}^{\wedge n} scales its argument. That is, \det(A) is the number such that

A\mathbf{v}_1 \wedge \dots \wedge A\mathbf{v}_n = \det(A)\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n

for all \mathbf{v}_1,\dots,\mathbf{v}_n \in \mathbf{V}.

Theorem 1: An operator A \in \mathrm{End}\mathbf{V} is invertible if and only if \det(A) \neq 0.

Proof: Suppose first that A is invertible. Let \{\mathbf{v}_1,\dots,\mathbf{v}_n\} be a linearly independent set in the n-dimensional vector space \mathbf{V}. Since A is invertible, \{A\mathbf{v}_1,\dots,A\mathbf{v}_n\} is also a linearly independent set in \mathbf{V}, and the linear independence of this set is equivalent to the statement that

A\mathbf{v}_1 \wedge \dots \wedge A\mathbf{v}_n

is not the zero tensor. Since the above nonzero tensor is equal to

\det(A)\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n,

we have \det(A) \neq 0.

Conversely, suppose that \det(A) \neq 0. Let \{\mathbf{v}_1,\dots,\mathbf{v}_n\} be a linearly independent set in \mathbf{V}. The statement that A is invertible is equivalent to the statement that \{A\mathbf{v}_1,\dots,A\mathbf{v}_n\} is linearly independent, which is in turn equivalent to the statement that

A\mathbf{v}_1 \wedge \dots \wedge A\mathbf{v}_n

is not the zero tensor. But

A\mathbf{v}_1 \wedge \dots \wedge A\mathbf{v}_n = \det(A) \mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n

is the nonzero tensor \mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n scaled by the nonzero number \det(A), hence it is nonzero.

— Q.E.D.

If you already know something about determinants of matrices, you are probably wondering how Definition 1 relates to this prior knowledge; let us explain this now. Let E=\{\mathbf{e}_1,\dots,\mathbf{e}_n\} be an orthonormal basis in \mathbf{V}, and let

\omega = \mathbf{e}_1 \wedge \dots \wedge \mathbf{e}_n

be the corresponding unit volume tensor in \mathbf{V}^{\wedge n}. We then have that

\det(A) = \langle \omega,A^{\wedge n}\omega\rangle;

this follows from the fact that \det(A) is by definition of the eigenvalue of the operator \mathbf{A}^{\wedge n} together with the fact that \{\omega\} is an orthonormal basis of \mathbf{V}^{\wedge n}. Now, we can explicitly evaluate the above inner product in terms of the matrix elements of A relative to the basis E of \mathbf{V}. Here is the computation:

\det(A) = \langle \omega,A^{\wedge n}\omega\rangle = \langle \mathbf{e}_1 \wedge \dots \wedge \mathbf{e}_n,A\mathbf{e}_1 \wedge \dots \wedge A\mathbf{e}_n\rangle = \sum\limits_{\pi \in \mathrm{S}(n)} \mathrm{sgn}(\pi) \langle \mathbf{e}_1,A\mathbf{e}_{\pi(1)} \rangle \dots \langle \mathbf{e}_n,A\mathbf{e}_{\pi(n)} \rangle.

We recall that \mathrm{S}(n) is the set of permutations \pi of the numbers 1,\dots,n, i.e. bijections \pi \colon \{1,\dots,n\} \rightarrow \{1,\dots,n\}. For example, in the case n=2, this computation reads

\det(A) = \langle \mathbf{e}_1 \wedge \mathbf{e}_2, A\mathbf{e}_1 \wedge A\mathbf{e}_2 \rangle = \langle \mathbf{e}_1,A\mathbf{e}_1\rangle \langle \mathbf{e}_2,A\mathbf{e}_2\rangle - \langle \mathbf{e}_1,A\mathbf{e}_2\rangle \langle \mathbf{e}_2,A\mathbf{e}_1\rangle.

This is the formula for the determinant of the 2 \times 2 matrix

[A]_E = \begin{bmatrix} \langle \mathbf{e}_1,A\mathbf{e}_1\rangle & \langle \mathbf{e}_1,A\mathbf{e}_2\rangle \\ \langle \mathbf{e}_2,A\mathbf{e}_1\rangle & \langle \mathbf{e}_2,A\mathbf{e}_2\rangle \end{bmatrix}

which you likely learned in high school algebra (if you didn’t learn it then, you just learned it now).

To reiterate, for any operator A \in \mathrm{End}V, the determinant \det(A) is the constant such that

A\mathbf{v}_1 \wedge \dots \wedge A\mathbf{v}_n = \det(A)\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n

for all \mathbf{v}_1,\dots,\mathbf{v}_n. Taking the scalar product with a unit volume tensor \omega on either side, this becomes

\langle \omega, A\mathbf{v}_1 \wedge \dots \wedge A\mathbf{v}_n \rangle= \det(A)\langle \omega,\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n\rangle.

The scalar product on the LHS is the oriented volume of the parallelepiped \mathcal{P}(A\mathbf{v}_1,\dots,A\mathbf{v}_n), while the scalar product on the right is the oriented volume of the parallelepiped \mathcal{P}(\mathbf{v}_1,\dots,\mathbf{v}_n). Assuming that the latter volume is nonzero — i.e. that \{\mathbf{v}_1,\dots,\mathbf{v}_n\} is a linearly independent set in \mathbf{V} — we thus have

\det(A) = \frac{\langle \omega, A\mathbf{v}_1 \wedge \dots \wedge A\mathbf{v}_n \rangle}{\langle \omega,\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n\rangle},

which says that \det(A) is the scalar factor by which the oriented volume of \mathcal{P}(\mathbf{v}_1,\dots,\mathbf{v}_n) transforms when each of the vectors \mathbf{v}_1,\dots,\mathbf{v}_n transforms by A. As an application of this geometric interpretation of the determinant, let us calculate the area of an ellipse whose semimajor and semiminor axes have lengths a and b, respectively. The basic observation is that such an ellipse is the image of the unit disc in the Euclidean plane \mathbf{V}= \mathbb{R}^2 under the linear transformation A \in \mathrm{End}\mathbb{R}^2 defined by

A\mathbf{e}_1=a\mathbf{e}_1,\ A\mathbf{e}_2=b\mathbf{e}_2,

where \mathbf{e}_1=(1,0) and \mathbf{e}_2=(0,1) is the standard basis. The determinant of the operator A is

\det(A) = \frac{\langle \mathbf{e}_1 \wedge \mathbf{e}_2, A\mathbf{e}_1 \wedge  A\mathbf{e}_2\rangle}{\langle \mathbf{e}_1 \wedge \mathbf{e}_2, \mathbf{e}_1 \wedge  \mathbf{e}_2\rangle} = ab.

Let \epsilon > 0 be an extremely small positive number, and tile the unit disc with translated copies of the tiny square P(\varepsilon \mathbf{e}_1,\varepsilon \mathbf{e}_2), so that the tiling approximates the area of the disc up to an exceedingly small error. The oriented area of this tiny square is

\langle \mathbf{e}_1 \wedge \mathbf{e}_2, \varepsilon\mathbf{e}_1 \wedge \varepsilon \mathbf{e}_2 \rangle = \varepsilon^2 \langle \mathbf{e}_1 \wedge \mathbf{e}_2,\mathbf{e}_1 \wedge \mathbf{e}_2 \rangle = \varepsilon^2.

The image of the tiny square under the transformation A is the tiny parallelogram \mathcal{P}(A\varepsilon \mathbf{e}_1,A\varepsilon \mathbf{e}_2), and the oriented area of this tiny parallelogram is

\det(A) \langle \mathbf{e}_1 \wedge \mathbf{e}_2, \varepsilon\mathbf{e}_1 \wedge \varepsilon \mathbf{e}_2 \rangle = \varepsilon^2 ab.

We conclude that the area of the parallelogram tiling which approximates the ellipse is the area of the square tiling which approximates the circle scaled by ab, and hence that the area of the ellipse equals the area of the disc scaled by ab. Since the area of the unit disc is \pi, the area of the ellipse in question is \pi a b.

Math 31AH: Lecture 23

Let \mathbf{V} be an n-dimensional Euclidean space. In Lecture 22, we constructed the Euclidean spaces \mathbf{V}^{\otimes d} of degree d tensors, as well as the subspaces \mathbf{V}^{\vee d} and \mathbf{V}^{\wedge d} of \mathbf{V}^{\otimes d} consisting of symmetric and antisymmetric tensors, respectively. Let us recall the formulas for the scalar product in these vector spaces induced by the scalar product \langle \cdot,\cdot \rangle in \mathbf{V}:

\langle \mathbf{v}_1 \otimes \dots \otimes \mathbf{v}_d,\mathbf{w}_1 \otimes \dots \otimes \mathbf{w}_d \rangle = \prod\limits_{i=1}^d \langle \mathbf{v}_i,\mathbf{w}_i \rangle \\ \langle \mathbf{v}_1 \vee \dots \vee \mathbf{v}_d,\mathbf{w}_1 \vee \dots \vee \mathbf{w}_d \rangle = \frac{1}{d!}\sum\limits_{\pi \in \mathrm{S}(d)}\prod\limits_{i=1}^d \langle \mathbf{v}_i,\mathbf{w}_{\pi(i)} \rangle \\ \langle \mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d,\mathbf{w}_1 \wedge \dots \wedge \mathbf{w}_d \rangle = \frac{1}{d!}\sum\limits_{\pi \in \mathrm{S}(d)} \mathrm{sgn}(\pi)\prod\limits_{i=1}^d \langle \mathbf{v}_i,\mathbf{w}_{\pi(i)} \rangle.

To be precise, the first of these formulas is a definition, while the second and third formulas are consequences of this definition and the fact that \mathbf{V}^{\vee d} and \mathbf{V}^{\wedge d} are subspaces of \mathbf{V}^{\otimes d}.

In this lecture we focus on the Euclidean space \mathbf{V}^{\wedge d}, which turns out to be the most important of the three. In fact, we will now be dealing with the antisymmetric tensor powers of \mathbf{V}^{\wedge d} as standalone objects, rather than viewing them as subspaces of the tensor powers \mathbf{V}^{\otimes d}, and as such it will be more convenient to renormalize the scalar product to

\langle \mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d,\mathbf{w}_1 \wedge \dots \wedge \mathbf{w}_d \rangle = \sum\limits_{\pi \in \mathrm{S}(d)} \mathrm{sgn}(\pi)\prod\limits_{i=1}^d \langle \mathbf{v}_i,\mathbf{w}_{\pi(i)} \rangle,

dropping the factor of \frac{1}{d!} which was previously hanging around outside the sum. We are free to do this, since rescaling a scalar product by any positive constant yields a scalar product.

We have already seen that \{ \mathbf{v}_1,\dots,\mathbf{v}_d\} is a linearly dependent set in \mathbf{V} if and only if

\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d = \mathbf{0},

where

\mathbf{0}=\mathbf{0}_{\mathbf{V}^{\otimes d}} = \underbrace{\mathbf{0}_\mathbf{V} \otimes \dots \otimes \mathbf{0}_\mathbf{V}}_{d \text{ factors}}

denotes the zero tensor of degree d. Equivalently, \{\mathbf{v}_1,\dots,\mathbf{v}_d\} is a linearly dependent set in \mathbf{V} if and only if

\|\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d\| = \sqrt{\langle \mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d,\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d \rangle}=0.

Today, we will expand on this by showing that length in \mathbf{V}^{\wedge n} provides a good definition of volume in \mathbf{V}.

Definition 1: For any vectors \mathbf{v}_1,\dots,\mathbf{v}_n, the corresponding parallelepiped is the subset of \mathbf{V} defined by

\mathcal{P}(\mathbf{v}_1,\dots,\mathbf{v}_n) := \{ t_1\mathbf{v}_1 + \dots + t_n\mathbf{v}_n \colon 0 \leq t_i \leq 1\}.

We define the volume of this parallelepiped by

\mathrm{Vol} \mathcal{P}(\mathbf{v}_1,\dots,\mathbf{v}_n) := \| \mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n\|.

According to Definition 1, we should interpret the length of the tensor \mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n to be the volume of the parallelepiped spanned by its factors. This makes sense, in that the length of \mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n is zero if and only if \{\mathbf{v}_1,\dots,\mathbf{v}_n\} is a linearly dependent set in \mathbf{V}, which means that \mathcal{P}(\mathbf{v}_1,\dots,\mathbf{v}_n) lies in a subspace of \mathbf{V} whose dimension strictly less than n. This corresponds to our intuition that the length of a point is zero, the area of a line segment is zero, and the volume of a parallelogram is zero. Indeed, Theorem 2 from Lecture 22 applied to Definition 1 yields the following statement.

Theorem 1: The set \{\mathbf{v}_1,\dots,\mathbf{v}_n\} is a basis in \mathbf{V} if and only if \mathrm{Vol}\mathcal{P}(\mathbf{v}_1,\dots,\mathbf{v}_n)>0.

As an example, let us consider the case n=1, i.e. \mathbf{V} is a 1-dimensional vector space. Let \mathbf{v} \in \mathbf{V}. We then have that \mathcal{P}(\mathbf{v}) consists of all vectors of the form w=t\mathbf{v} with 0 \leq t \leq 1, which can be visualized as the line segment joining \mathbf{0}_\mathbf{V} to \mathbf{v} in \mathbf{V}. The volume of \mathcal{P}(V) is then \|\mathbf{v}\|, which is the length of this line segment. So, according to our definition, volume and length are the same thing in one dimension, which seems reasonable.

As another example, let us consider the case n=2, i.e. \mathbf{V} is a 2-dimensional Euclidean space. Let \mathbf{v}_1,\mathbf{v}_2 \in \mathbf{V}. Then, \mathcal{P}(\mathbf{v}_1,\mathbf{v}_2) can be visualized as the set of all vectors in \mathbf{V} which lie on or inside the parallelogram with vertices

\mathbf{0}_\mathbf{V} = 0\mathbf{v}_1 + 0\mathbf{v}_2 \\ \mathbf{v}_1 = 1\mathbf{v}_1 + 0\mathbf{v}_2 \\ \mathbf{v}_2 = 0\mathbf{v}_1 + 1\mathbf{v}_2 \\ \mathbf{v}_1 + \mathbf{v}_2 = 1\mathbf{v}_1 + 1\mathbf{v}_2.

According to our definition, the volume of \mathcal{P}(\mathbf{v}_1,\mathbf{v}_2) is the number

\mathrm{Vol} \mathcal{P}(\mathbf{v}_1,\mathbf{v}_2) = \|\mathbf{v}_1 \wedge \mathbf{v}_2\|.

Let us evaluate this number more explicitly. Let E = \{\mathbf{e}_1,\mathbf{e}_2\} be an orthonormal basis of \mathbf{V}. We then have

\mathbf{v}_1 \wedge \mathbf{v}_2  = (\langle \mathbf{e}_1,\mathbf{v}_1\rangle \mathbf{e}_1 + \langle \mathbf{e}_2,\mathbf{v}_1\rangle \mathbf{e}_2)\wedge (\langle \mathbf{e}_1,\mathbf{v}_2\rangle \mathbf{e}_1 + \langle \mathbf{e}_2,\mathbf{v}_2\rangle \mathbf{e}_2) \\= \left( \langle \mathbf{e}_1,\mathbf{v}_1\rangle\langle \mathbf{e}_2,\mathbf{v}_2\rangle - \langle \mathbf{e}_1,\mathbf{v}_2\rangle\langle\mathbf{e}_2,\mathbf{v}_1\rangle\right)\mathbf{e}_1 \wedge \mathbf{e}_2,

so that

\mathrm{Vol}\mathcal{P}(\mathbf{v}_1,\mathbf{v}_2) = |\langle \mathbf{e}_1,\mathbf{v}_1\rangle\langle \mathbf{e}_2,\mathbf{v}_2\rangle - \langle \mathbf{e}_1,\mathbf{v}_2\rangle\langle\mathbf{e}_2,\mathbf{v}_1\rangle| \|\mathbf{e}_1 \wedge \mathbf{e}_2\|.

Since

\|\mathbf{e}_1 \wedge \mathbf{e}_2\|^2 = \langle \mathbf{e}_1 \wedge \mathbf{e}_2,\mathbf{e}_1\wedge \mathbf{e}_2 \rangle = \langle \mathbf{e}_1,\mathbf{e}_1\rangle \langle \mathbf{e}_2,\mathbf{e}_2\rangle-\langle \mathbf{e}_1,\mathbf{e}_2\rangle \langle \mathbf{e}_2,\mathbf{e}_1\rangle = \|\mathbf{e}_1\|^2\|\mathbf{e}_2\|^2=1,

we conclude that

\mathrm{Vol}\mathcal{P}(\mathbf{v}_1,\mathbf{v}_2) = |\langle \mathbf{e}_1,\mathbf{v}_1\rangle\langle \mathbf{e}_2,\mathbf{v}_2\rangle - \langle \mathbf{e}_1,\mathbf{v}_2\rangle\langle\mathbf{e}_2,\mathbf{v}_1\rangle|.

There are a couple of things we can notice observe about this number. First, we can also write it as

\mathrm{Vol}\mathcal{P}(\mathbf{v}_1,\mathbf{v}_2) =| \langle \mathbf{e}_1 \wedge \mathbf{e}_2,\mathbf{v}_1 \wedge \mathbf{v}_2|;

as we will see momentarily, this is because \{\mathbf{e}_1\wedge \mathbf{e}_2\} is an orthonormal basis of \mathbf{V}^{\wedge 2}. Second, if you are familiar with the definition of the determinant of a 2 \times 2 matrix,

\det \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22}\end{bmatrix} = a_{11}a_{22}-a_{12}a_{21},

it is apparent that \mathrm{Vol}\mathcal{P}(\mathbf{v}_1,\mathbf{v}_2) is equal to the absolute value of the determinant of the matrix

\begin{bmatrix} \langle \mathbf{e}_1,\mathbf{v}_1\rangle & \langle \mathbf{e}_1,\mathbf{v}_2\rangle\\ \langle \mathbf{e}_2,\mathbf{v}_1\rangle & \langle \mathbf{e}_2,\mathbf{v}_2\rangle\end{bmatrix}

whose columns are the coordinates of the vectors \mathbf{v}_1,\mathbf{v}_2 relative to the basis E=\{\mathbf{e}_1.\mathbf{e}_2\}. This coincidence will also be explained shortly. First, however, we want to ask a natural question: what if we have done the above computation using a different orthonormal basis F=\{\mathbf{f}_1,\mathbf{f}_2\} of \mathbf{V}? The same calculations would then have led us to the formula

\mathrm{Vol}\mathcal{P}(\mathbf{v}_1,\mathbf{v}_2) = |\langle \mathbf{f}_1,\mathbf{v}_1\rangle\langle \mathbf{f}_2,\mathbf{v}_2\rangle - \langle \mathbf{f}_1,\mathbf{v}_2\rangle\langle\mathbf{f}_2,\mathbf{v}_1\rangle|,

which is apparently different from out first formula. But volume should be a geometric entity, and its computation should not depend on the choice of coordinates, i.e. on a choice of basis. We can see our way through this by considering the orthogonal transformation U \in \mathrm{Aut}\mathbf{V} uniquely determined by

U\mathbf{e}_1 = \mathbf{f}_1,\ U\mathbf{e}_2 = \mathbf{f}_2.

We then see that

|\langle \mathbf{f}_1,\mathbf{v}_1\rangle\langle \mathbf{f}_2,\mathbf{v}_2\rangle - \langle \mathbf{f}_1,\mathbf{v}_2\rangle\langle\mathbf{f}_2,\mathbf{v}_1\rangle| = |\langle U\mathbf{e}_1,\mathbf{v}_1\rangle\langle U\mathbf{e}_2,\mathbf{v}_2\rangle - \langle U\mathbf{e}_1,\mathbf{v}_2\rangle\langle U\mathbf{e}_2,\mathbf{v}_1\rangle|\\=|\langle \mathbf{e}_1,U^{-1}\mathbf{v}_1\rangle\langle \mathbf{e}_2,U^{-1}\mathbf{v}_2\rangle - \langle \mathbf{e}_1,U^{-1}\mathbf{v}_2\rangle\langle \mathbf{e}_2,U^{-1}\mathbf{v}_1\rangle|,

which is equivalent to the assertion that

\mathrm{Vol}\mathcal{P}(\mathbf{v}_1,\mathbf{v}_2) = \mathrm{Vol}\mathcal{P}(U^{-1}\mathbf{v}_1,U^{-1}\mathbf{v}_2).

This makes perfect geometric sense: since orthogonal transformations preserve lengths and angles, the parallelogram \mathcal{P}(U^{-1}\mathbf{v}_1,U^{-1}\mathbf{v}_2) is just a rotated and/or flipped copy of \mathcal{P}(\mathbf{v}_1,\mathbf{v}_2), which naturally has the same area.

The key step in understanding the linear algebra underlying our definition of n-dimensional volume is finding an orthonormal basis of \mathbf{V}^{\wedge d}, the space of antisymmetric tensors of degree d. We have solved this problem for \mathbf{V}^{\otimes d}, the space of all tensors of degree d; back in Lecture 22, we showed that any orthonormal basis E=\{\mathbf{e}_1,\dots,\mathbf{e}_n\} in \mathbf{V} induces a corresponding orthonormal basis

E^{\otimes d} = \{\mathbf{e}_{i(1)} \otimes \dots \otimes \mathbf{e}_{i(d)} \colon i \in \mathrm{Fun}(d,n)\}

in \mathbf{V}^{\otimes d}, so that \dim \mathbf{V}^{\otimes d} = n^d. To obtain the analogous result for \mathbf{V}^{\wedge d}, we need to introduce the subset of \mathrm{Fun}(d,n) consisting of increasing functions:

\mathrm{Inc}(d,n):=\{i \in \mathrm{Fun}(d,n) \colon i(1) < i(2) < \dots < i(d)\}.

Theorem 2: For any 1 \leq d \leq n, the set

E^{\wedge d} = \{\mathbf{e}_{i(1)} \wedge \dots \wedge \mathbf{e}_{i(d)} \colon i \in \mathrm{Inc}(d,n)\}

is an orthonormal basis in \mathbf{V}^{\wedge d}. For all d>n, the only antisymmetric tensor of degree d is the zero tensor, i.e. \mathbf{V}^{\wedge d}=\{\mathbf{0}_{\mathbf{V}^{\otimes d}}\}.

Proof: First we prove that E^{\wedge d} spans \mathbf{V}^{\wedge d}. To do this, it is sufficient to show that any simple tensor

\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d

is a linear combination of the tensors in E^{\wedge d}. But this is clear from the multilinearity of the wedge product: we have

\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d \\ = \left( \sum_{i=1}^n \langle \mathbf{e}_i, \mathbf{v}_1\rangle \mathbf{e}_i\right) \wedge \dots \wedge \left( \sum_{i=1}^n \langle \mathbf{e}_i, \mathbf{v}_d\rangle \mathbf{e}_i\right) \\ = \sum\limits_{i \in \mathrm{Fun}(d,n)} \langle \mathbf{e}_{i(1)},\mathbf{v}_1\rangle \dots \mathbf{e}_{i(d)},\mathbf{v}_d\rangle \mathbf{e}_{i(1)} \wedge \dots \wedge \mathbf{e}_{i(d)}.

Now, since a wedge product which contains two copies of the same vector is zero, the only nonzero terms in the above sum are those corresponding to injective functions, i.e. those i \in \mathrm{Fun}(d,n) such that the numbers i(1),\dots,i(d) are pairwise distinct. If d > n, there are no such functions, and consequently \mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_{d} = \mathbf{0}. If latex d \leq n,$ then for any injection i, the factors of the nonzero tensor e_{i(1)} \wedge \dots \wedge \mathbf{e}_{i(d)} can be sorted so that the distinct indices are in increasing order; doing this will scale the original tensor by a factor of \pm 1.

Now we prove that E^{\wedge d} is an orthonormal set in \mathbf{V}^{\wedge d} (in particular, this will imply linear independence). Let i,j \in \mathrm{Inc}(d,n) be distinct increasing functions. We then have that

\langle \mathbf{e}_{i(1)} \wedge \dots \wedge \mathbf{e}_{i(d)},\mathbf{e}_{j(1)} \wedge \dots \wedge \mathbf{e}_{j(d)} = \sum\limits_{\sigma \in \mathrm{S}(d)} \mathrm{sgn}(\sigma) \langle \mathbf{e}_{i(1)},\mathbf{e}_{j\sigma(1)}\rangle \dots \langle \mathbf{e}_{i(d)},\mathbf{e}_{j\sigma(d)}\rangle.

Since there is no permutation \sigma \in \mathrm{S}(d) which can transform the list i(1) < \dots < i(d) of distinct numbers into the different list j(1) < \dots < j(d) of distinct numbers, every term in the sum is zero. This proves that E^{\wedge d} is an orthogonal set in \mathbf{V}^{\wedge d}. To prove that each element of E^{\wedge d} is of unit length, repeat the above argument with i=j. Then, the only nonzero term in the sum is that corresponding to the identity permutation:

\langle \mathbf{e}_i(1),\mathbf{e}_{I(d)} \rangle \dots \langle \mathbf{e}_i(1),\mathbf{e}_{I(d)} \rangle = 1.

— Q.E.D.

Corollary 1: The dimension of \mathbf{V}^{\wedge d} is {n \choose d}.

Proof: Any increasing function I(1) < \dots < I(d) corresponds to a unique cardinality d subset \{i(1),\dots,i(d)\} of \{1,\dots,n\}. By definition, the binomial coefficient {n \choose d} is the number of such sets.

— Q.E.D.

Although it may not seem so, the most important consequence of Corollary 1 is that

\mathbf{V}^{\wedge n} = {n \choose n} =1.

Indeed, according to Theorem 1, the set E^{\wedge n} =\{\omega\} consisting of the single tensor

\omega = \mathbf{e}_1 \wedge \dots \wedge \mathbf{e}_n

is a basis in \mathbf{V}^{\wedge n}. This means that every antisymmetric tensor of degree n is a scalar multiple of \omega, so that in particular for any vectors \mathbf{v}_1,\dots,\mathbf{v}_n \in \mathbf{V} we have

\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n = a \mathbf{e}_1 \wedge \dots \wedge \mathbf{e}_n.

for some scalar a \in \mathbb{R}, the value of which depends on the vectors \mathbf{v}_1,\dots,\mathbf{v}_n. Indeed, since \{\mathbf{e}_1 \wedge \dots \wedge \mathbf{e}_n\} is an orthonormal basis in \mathbf{V}^{\wedge n}, we have

a = \langle \mathbf{e}_1 \wedge \dots \wedge \mathbf{e}_n,\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n \rangle.

We thus have the following alternative description of volume.

Proposition 1: For any orthonormal basis E=\{\mathbf{e}_1,\dots,\mathbf{e}_n\} in \mathbf{V}, we have

\mathrm{Vol}\mathcal{P}(\mathbf{v}_1,\dots,\mathbf{v}_n) = |\langle \mathbf{e}_1 \wedge \dots \wedge \mathbf{e}_n,\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n \rangle|.

Proof: We will give two proofs, one computational and one conceptual.

Computational:

\mathrm{Vol}\mathcal{P}(\mathbf{v}_1,\dots,\mathbf{v}_n) \\=\|\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n\| \\ = \sqrt{\langle \mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n,\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n \rangle} \\ = \sqrt{\langle\langle \mathbf{e}_1 \wedge \dots \wedge \mathbf{e}_n,\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n\rangle \mathbf{e}_1 \wedge \dots \wedge \mathbf{e}_n,\langle \mathbf{e}_1 \wedge \dots \wedge \mathbf{e}_n,\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n\rangle \mathbf{e}_1 \wedge \dots \wedge \mathbf{e}_n\rangle} \\ = \sqrt{\langle \mathbf{e}_1 \wedge \dots \wedge \mathbf{e}_n,\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n\rangle^2\langle \mathbf{e}_1 \wedge \dots \wedge \mathbf{e}_n,\mathbf{e}_1 \wedge \dots \wedge \mathbf{e}_n\rangle} \\ = \sqrt{\langle \mathbf{e}_1 \wedge \dots \wedge \mathbf{e}_n,\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n\rangle^2} \\ =|\langle \mathbf{e}_1 \wedge \dots \wedge \mathbf{e}_n,\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n \rangle|.

Conceptual: by Cauchy-Schwarz,

|\langle \mathbf{e}_1 \wedge \dots \wedge \mathbf{e}_n,\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n \rangle| \leq \|\mathbf{e}_1 \wedge \dots \wedge \mathbf{e}_n\| \|\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n\|.

We know that equality holds in the Cauchy-Schwarz inequality precisely when linear dependence holds, which is the case here because \dim \mathbf{V}^{\wedge n}=1. Thus

|\langle \mathbf{e}_1 \wedge \dots \wedge \mathbf{e}_n,\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n \rangle| =\|\mathbf{e}_1 \wedge \dots \wedge \mathbf{e}_n\| \|\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n\| = \|\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n\|.

— Q.E.D.

Any tensor \omega \in \mathbf{V}^{\wedge n} of the form

\omega=\mathbf{e}_1 \wedge \dots \wedge \mathbf{e}_n,

where E=\{\mathbf{e}_1,\dots,\mathbf{e}_n\} is an orthonormal basis in \mathbf{V}, is called a unit volume tensor for \mathbf{V}. Indeed, for such a tensor we have \|\omega\|=1, corresponding exactly to the fact that \mathcal{P}(\mathbf{e}_1,\dots,\mathbf{e}_n) is an n-dimensional unit box. In fact, because \mathbf{V}^{\wedge n} is 1-dimensional, there are only two unit volume tensors: if \omega \in \mathbf{V}^{\wedge n} satisfies \|\omega\|=1, then the only other tensor in \mathbf{V}^{\wedge n} with this property is -\omega.

Definition 3: A triple (\mathbf{V},\langle \cdot,\cdot \rangle,\omega) consisting of a finite-dimensional vector space \mathbf{V}, a scalar product \langle \cdot,\cdot \rangle on \mathbf{V}, and a unit volume tensor \omega \in \mathbf{V}^{\wedge n}, is called an oriented Euclidean space.

For example, consider the case of the Euclidean plane, i.e. \mathbf{V}=\mathbb{R}^2 with the standard scalar product. Giving the Euclidean plane an orientation means choosing one of the two tensors

(1,0) \wedge (0,1) \quad\text{ or }\quad (0,1) \wedge (1,0).

Choosing the first tensor gives the Euclidean plane a counterclockwise orientation, while choosing the second gives the plane a clockwise orientation.

Definition 2: Given a unit volume tensor \omega \in \mathbf{V}^{\wedge n}, the corresponding determinant is the function

\det \colon \underbrace{\mathbf{V} \times \dots \times \mathbf{V}}_{n \text{ copies}} \to \mathbb{R}

defined by

\det(\mathbf{v}_1,\dots,\mathbf{v}_n) = \langle \omega,\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n \rangle.

Thus \det(\mathbf{v}_1,\dots,\mathbf{v}_n) is the oriented volume of the parallelepiped \mathcal{P}(\mathbf{v}_1,\dots,\mathbf{v}_n), meaning that it coincides with \mathrm{Vol} \mathcal{P}(\mathbf{v}_1,\dots,\mathbf{v}_n) up to a factor of \pm 1. So, “oriented volume function” might be a more natural name than “determinant.” However, the term determinant is also appropriate, given the following result (which is equivalent to Theorem 1 above).

Theorem 3: The set \{\mathbf{v}_1,\dots,\mathbf{v}_n\} is a basis in \mathbf{V} if and only if \det(\mathbf{v}_1,\dots,\mathbf{v}_n) \neq 0.

It is important to realize that the Euclidean space \mathbf{V} actually supports two distinct determinant functions, one for each of the two unit volume tensors in the one-dimensional space \mathbf{V}^{\wedge n}. For example, if n=2 and we have two orthonormal bases E=\{\mathbf{e}_1,\mathbf{e}_2\} and F=\{\mathbf{f}_1,\mathbf{f}_2\} then we have two corresponding determinant functions

\det_E(\mathbf{v}_1,\mathbf{v}_2) = \langle \mathbf{e}_1 \wedge \mathbf{e}_2, \mathbf{v}_1 \wedge \mathbf{v}_2\rangle \quad\text{ and }\quad \det_F(\mathbf{v}_1,\mathbf{v}_2) = \langle \mathbf{f}_1 \wedge \mathbf{f}_2, \mathbf{v}_1 \wedge \mathbf{v}_2\rangle

which may or may not coincide. For example, if

\mathbf{f}_1=\mathbf{e}_2,\ \mathbf{f}_2=\mathbf{e}_1,

then we have

\det_F(\mathbf{v}_1,\mathbf{v}_2) = \langle \mathbf{f}_1 \wedge \mathbf{f}_2,\mathbf{v}_1 \wedge \mathbf{v}_2 \rangle = \langle \mathbf{e}_2 \wedge \mathbf{e}_1,\mathbf{v}_1\wedge \mathbf{v}_2\rangle=-\det_E(\mathbf{v}_1,\mathbf{v}_2).

In Lecture 24, we shall discuss determinants of operators on Euclidean space, which are closely related to — but not quite the same as — oriented volumes of parallelepipeds in Euclidean space.

Math 31AH: Lecture 22

Let us now use the symmetric and antisymmetric tensor products to define two subspaces of the tensor square \mathbf{V} \otimes \mathbf{V} which store “unevaluated” symmetric and antisymmetric tensor products of vectors from \mathbf{V}. The symmetric square of \mathbf{V} is the subspace \mathbf{V} \vee \mathbf{V} of \mathbf{V} \otimes \mathbf{V} spanned by all symmetric tensor products

\mathbf{v}_1 \vee \mathbf{v}_2, \quad \mathbf{v}_1,\mathbf{v}_2 \in \mathbf{V}.

Elements of \mathbf{V} \vee \mathbf{V} are called symmetric tensors. Similarly, the antisymmetric square of \mathbf{V} is the subspace \mathbf{V} \wedge \mathbf{V} of \mathbf{V} \otimes \mathbf{V} spanned by all antisymmetric tensor products,

\mathbf{v}_1 \wedge \mathbf{v}_2, \quad \mathbf{v}_1,\mathbf{v}_2 \in \mathbf{V}.

Elements of \mathbf{V} \vee \mathbf{V} are called antisymmetric tensors.

All of what we have said above can be generalized in a natural way to products of more than two vectors. More precisely, for any natural number \mathbf{d} \in \mathbb{N}, we can define the dth tensor power of the vector space \mathbf{V} to be the new vector space \mathbf{V}^{\otimes d} spanned by all “unevaluated” products

\mathbf{v}_1 \otimes \dots \otimes \mathbf{v}_d

of d vectors \mathbf{v}_1,\dots,\mathbf{v}_d. The only feature of such multiple unevaluated products is that they are “multilinear,” which really just means that they behave like ordinary products (sans commutativity). For example, in the case d=3, this just means that we have the following three identities in the vector space \mathbf{V}^{\otimes 3}: for any scalars a_1,a_2 \in \mathbb{R}

(a_1\mathbf{u}_1 + a_2\mathbf{u}_2) \otimes \mathbf{v} \otimes \mathbf{w} = a_1\mathbf{u}_1 \otimes \mathbf{v} \otimes \mathbf{w} + a_2\mathbf{u}_2 \otimes \mathbf{v} \otimes \mathbf{w}

for all \mathbf{u}_1,\mathbf{u}_2,\mathbf{v},\mathbf{w} \in \mathbf{V}, and

\mathbf{u} \otimes (a_1\mathbf{v}_1 + a_2\mathbf{v}_2) \otimes \mathbf{w} = a_1 \mathbf{u} \otimes \mathbf{v}_1 \otimes \mathbf{w} + a_2\mathbf{u} \otimes \mathbf{v}_2 \mathbf{w}

for all \mathbf{u},\mathbf{v}_1,\mathbf{v}_2,\mathbf{w} \in \mathbf{V}, and

\mathbf{u} \otimes \mathbf{v} \otimes (a_1\mathbf{w}_1 + a_2\mathbf{w}_2) = a_1 \mathbf{u} \otimes \mathbf{v} \otimes \mathbf{w}_1 + a_2 \mathbf{u} \otimes \mathbf{v} \otimes \mathbf{w}_2

for all \mathbf{u},\mathbf{v},\mathbf{w}_1,\mathbf{w}_2 \in \mathbf{V}. If \mathbf{V} comes with a scalar product \langle \cdot,\cdot \rangle, we can use this to define a scalar product on \mathbf{V}^{\otimes d} in a very simple way by declaring

\langle \mathbf{v}_1 \otimes \dots \otimes \mathbf{v}_d,\mathbf{w}_1 \otimes \dots \otimes \mathbf{w}_d \rangle = \langle \mathbf{v}_1,\mathbf{w}_1 \rangle \dots \langle \mathbf{v}_d. \mathbf{w}_d\rangle.

Even better, we can use the scalar product so defined to construct an orthonormal basis of \mathbf{V}^{\otimes d} from a given orthonormal basis \mathbf{E}=\{\mathbf{e}_1,\dots,\mathbf{e}_n\}: such a basis is simply given by all tensor products with d factors such that each factor is a vector in \mathbf{V}. More precisely, these are the tensors

\mathbf{e}_{i(1)} \otimes \mathbf{e}_{i(2)} \otimes \dots \otimes \mathbf{e}_{i(d)}, \quad i \in \mathrm{Fun}(d,N),

where \mathrm{Fun}(d,N) is a fun notation for the set of all functions

i \colon \{1,\dots,d\} \to \{1,\dots,N\}.

In particular, since the cardinality of \mathrm{Fun}(d,N) is N^d (make N choices d times), the dimension of the vector space \mathbf{V}^{\otimes d} is N^d.

Example 1: If \mathbf{V} is a 2-dimensional vector space with orthonormal basis \{\mathbf{e}_1,\mathbf{e}_2\}, then an orthonormal basis of \mathbf{V}^{\otimes 3} is given by the tensors

\mathbf{e}_1 \otimes \mathbf{e}_1 \otimes \mathbf{e}_1, \\ \mathbf{e}_1 \otimes \mathbf{e}_1 \otimes \mathbf{e}_2, \mathbf{e}_1 \otimes \mathbf{e}_2 \otimes \mathbf{e}_1,\mathbf{e}_2 \otimes \mathbf{e}_1 \otimes \mathbf{e}_1, \\ \mathbf{e}_1 \otimes \mathbf{e}_2 \otimes \mathbf{e}_2, \mathbf{e}_2 \otimes \mathbf{e}_1 \otimes \mathbf{e}_2, \mathbf{e}_2 \otimes \mathbf{e}_2 \otimes \mathbf{e}_1, \\ \mathbf{e}_2 \otimes \mathbf{e}_2 \otimes \mathbf{e}_2.

We now define the d-fold symmetric and antisymmetric tensor products. These products rely on the concept of permutations.

Reading Assignment: Familiarize yourself with permutations. What is important for our purposes is that you understand how to multiply permutations, and that you understand what the sign of a permutation is. Feel free to ask questions as needed.

Definition 1: For any d \in \mathbb{N}, and any vectors \mathbf{v}_1,\dots,\mathbf{v}_d \in \mathbf{V}, we define the symmetric tensor product of these vectors by

\mathbf{v}_1 \vee \dots \vee \mathbf{v}_d = \frac{1}{d!} \sum\limits_{\pi \in \mathrm{S}(d)} \mathbf{v}_{\pi(1)} \otimes \dots \otimes \mathbf{v}_{\pi(d)},

and denote by \mathbf{V}^{\vee d} the subspace of \mathbf{V}^{\otimes d} spanned by all symmetric tensor products of d vectors from \mathbf{V}. Likewise, we define the antisymmetric tensor product of \mathbf{v}_1,\dots,\mathbf{v}_d by

\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d = \frac{1}{d!} \sum\limits_{\pi \in \mathrm{S}(d)} \mathrm{sgn}(\pi)\mathbf{v}_{\pi(1)} \otimes \dots \otimes \mathbf{v}_{\pi(d)},

and denote by \mathbf{V}^{\wedge d} the subspace of \mathbf{V}^{\otimes d} spanned by all antisymmetric tensor products of d vectors from \mathbf{V}.

Note that, in the case d=2, this definition coincides with the definitions

\mathbf{v}_1 \vee \mathbf{v}_2 = \frac{1}{2}\left( \mathbf{v}_1\otimes \mathbf{v}_2 + \mathbf{v}_2 \otimes \mathbf{v}_1\right)

and

\mathbf{v}_1 \wedge \mathbf{v}_2 = \frac{1}{2}\left(\mathbf{v}_1\otimes \mathbf{v}_2 - \mathbf{v}_2 \otimes \mathbf{v}_1\right)

from Lecture 21.

Since the symmetric and antisymmetric tensor products are defined in terms of the tensor product, they inherit multilinearity. For example, in the case d=3, this means that we have the following three identities in the vector space \mathbf{V}^{\vee 3}: for any scalars a_1,a_2 \in \mathbb{R}

(a_1\mathbf{u}_1 + a_2\mathbf{u}_2) \vee \mathbf{v} \vee \mathbf{w} = a_1\mathbf{u}_1 \vee \mathbf{v} \vee \mathbf{w} + a_2\mathbf{u}_2 \vee \mathbf{v} \vee \mathbf{w}

for all \mathbf{u}_1,\mathbf{u}_2,\mathbf{v},\mathbf{w} \in \mathbf{V}, and

\mathbf{u} \vee (a_1\mathbf{v}_1 + a_2\mathbf{v}_2) \vee \mathbf{w} = a_1 \mathbf{u} \vee \mathbf{v}_1 \vee \mathbf{w} + a_2\mathbf{u} \vee \mathbf{v}_2 \mathbf{w}

for all \mathbf{u},\mathbf{v}_1,\mathbf{v}_2,\mathbf{w} \in \mathbf{V}, and

\mathbf{u} \vee \mathbf{v} \vee (a_1\mathbf{w}_1 + a_2\mathbf{w}_2) = a_1 \mathbf{u} \vee \mathbf{v} \vee \mathbf{w}_1 + a_2 \mathbf{u} \vee \mathbf{v} \vee \mathbf{w}_2

for all \mathbf{u},\mathbf{v},\mathbf{w}_1,\mathbf{w}_2 \in \mathbf{V}. The analogous statements hold in \mathbf{V}^{\wedge 3}.

The symmetric tensor product is constructed in such a way that

\mathbf{v}_{\pi(1)} \vee \dots \vee \mathbf{v}_{\pi(d)} = \mathbf{v}_1 \vee \dots \vee \mathbf{v}_d

for any permutation \pi \in \mathrm{S}(d), whereas the antisymmetric tensor product is constructed in such a way that

\mathbf{v}_{\pi(1)} \wedge \dots \wedge \mathbf{v}_{\pi(d)} = \mathrm{sgn}(\pi)\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d

for any permutation \pi \in \mathrm{S}(d). In particular, if any two of the vectors \mathbf{v}_1,\dots,\mathbf{v}_d are equal, then

\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d = \mathbf{0}.

Indeed, suppose that \mathbf{v}_1=\mathbf{v}_2. On one hand, by the above antisymmetry we have

\mathbf{v}_2 \wedge \mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d = - \mathbf{v}_1 \wedge \mathbf{v}_2 \wedge \dots \wedge \mathbf{v}_d,

but on the other hand we also have

\mathbf{v}_2 \wedge \mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d = \mathbf{v}_1 \wedge \mathbf{v}_2 \wedge \dots \wedge \mathbf{v}_d

because \mathbf{v}_1=\mathbf{v}_2. This means that

\mathbf{v}_1 \wedge \mathbf{v}_2 \wedge \dots \wedge \mathbf{v}_d = - \mathbf{v}_1 \wedge \mathbf{v}_2 \wedge \dots \wedge \mathbf{v}_d

if \mathbf{v}_1=\mathbf{v}_2, which forces

\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d = \mathbf{0}.

The vector space \mathbf{V}^{\vee d} is called the dth symmetric power of \mathbf{V}, and its elements are called symmetric tensors of degree d. The vector space \mathbf{V}^{\wedge d} is called the dth antisymmetric power of \mathbf{V}, and its elements are called antisymmetric tensors of degree d. These vector spaces have a physical interpretation. In quantum mechanics, an n-dimensional vector space \mathrm{dim} \mathbf{V} is viewed as the state space of a particle that can be in any one of n quantum states. The space \mathbf{V}^{\vee d} is then the state space of d bosons, each of which may occupy one of n quantum states, while \mathbf{V}^{\wedge d} is the state space of d fermions, each of which may be in any of n quantum states. The vanishing of wedge products with two equal factors corresponds physically to the characteristic feature of fermions, i.e. the Pauli exclusion principle. You don’t have to know any of this — I included this perspective in order to provide some indication that the construction of these vector spaces is not just abstract nonsense.

Theorem 1: For any \mathrm{d} \in \mathbb{N} and any \mathbf{v}_1,\dots,\mathbf{v}_d \in \mathbf{V}, we have

\langle \mathbf{v}_1 \vee \dots \vee \mathbf{v}_d,\mathbf{w}_1 \vee \dots \vee \mathbf{w}_d \rangle = \frac{1}{d!} \sum\limits_{\pi \in \mathrm{S}(d)} \langle \mathbf{v}_1,\mathbf{w}_{\pi(1)}\rangle \dots \langle \mathbf{v}_d,\mathbf{w}_{\pi(d)}\rangle,

and

\langle \mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d,\mathbf{w}_1 \wedge \dots \wedge \mathbf{w}_d \rangle = \frac{1}{d!} \sum\limits_{\pi \in \mathrm{S}(d)} \mathrm{sgn}(\pi)\langle \mathbf{v}_1,\mathbf{w}_{\pi(1)}\rangle \dots \langle \mathbf{v}_d,\mathbf{w}_{\pi(d)}\rangle.

Since we won’t use this theorem much, we will skip the proof. However, the proof is not too difficult, and is an exercise in permutations: simply plug in the definitions of the symmetric and antisymmetric tensor products in terms of the original tensor products, expand the scalar product, and simplify.

Perhaps counterintuitively, the antisymmetric tensor product is more important than the symmetric tensor product in linear algebra. The next theorem explains why.

Theorem 2: For any d \in \mathbb{N} and any \mathbf{v}_1,\dots,\mathbf{v}_d, the set \{\mathbf{v}_1,\dots,\mathbf{v}_d\} is linearly dependent if and only if

\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d = \mathbf{0}.

Proof: Suppose first that \{\mathbf{v}_1,\dots,\mathbf{v}_d\} is a linearly dependent set of vectors in \mathbf{V}. If d=1, this means that \mathbf{v}_1=\mathbf{0}, whence

\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d = \mathbf{v}_1 = \mathbf{0}.

If d \geq 2, then without loss in generality, the vector \mathbf{v}_1 is a linear combination of the vectors \mathbf{v}_2,\dots,\mathbf{v}_d,

\mathbf{v}_1 = a_2\mathbf{v}_2 + \dots + a_d\mathbf{v}_d.

We then have that

\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d = \left(\sum\limits_{i=2}^d a_i\mathbf{v}_i \right) \wedge \mathbf{v}_2 \wedge \dots \wedge \mathbf{v}_d = \sum\limits_{i=2}^d a_i\mathbf{v}_i \wedge \mathbf{v}_2 \wedge \dots \wedge \mathbf{v}_d,

by multinearity of the wedge product. Now observe that the ith term in the sum is a scalar multiple of the wedge product

\mathbf{v}_i \wedge \dots \wedge \mathbf{v}_i \wedge \dots \wedge \mathbf{v}_d,

which contains the vector \mathbf{v}_i twice, and hence each term in the sum is the zero tensor.

Conversely, suppose \mathbf{v}_1,\dots,\mathbf{v}_d \in \mathbf{V} are vectors such that

\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d =\mathbf{0}.

We must prove that \{\mathbf{v}_1,\dots,\mathbf{v}_d\} is a linearly dependent set in \mathbf{V}. We will prove the (equivalent) contrapositive statement: if \{\mathbf{v}_1,\dots,\mathbf{v}_d\} is a linearly independent set in \mathbf{V}, then

\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d \neq \mathbf{0}.

We prove this by induction on \mathbf{d}. In the case d=1, we have that \{\mathbf{v}_1\} is linearly independent, so

\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d = \mathbf{v}_d \neq \mathbf{0}.

For the inductive step, we proceed as follows. Since \{\mathbf{v}_1,\dots,\mathbf{v}_d\} is a linearly independent set, it is a basis of the subspace

\mathbf{W} = \mathrm{Span}\{\mathbf{v}_1,\dots,\mathbf{v}_d\}.

Let \langle \cdot,\cdot \rangle denote the scalar product on \mathbf{W} defined by declaring this basis to be orthonormal. We now define a linear transformation

L \colon \mathbf{W}^{\wedge d} \to \mathbf{W}^{\wedge d-1}

by

L\mathbf{w}_1 \wedge \dots \wedge \mathbf{w}_d = \langle \mathbf{v}_1,\mathbf{w}_1\rangle \mathbf{w}_2 \wedge \dots \wedge \mathbf{w}_d.

We then have that

L\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d = \langle \mathbf{v}_1,\mathbf{v}_1\rangle \mathbf{v}_2 \wedge \dots \wedge \mathbf{v}_d = \mathbf{v}_2 \wedge \dots \wedge \mathbf{v}_d.

Now, since \{\mathbf{v}_1,\dots,\mathbf{v}_d\} is a linearly independent set, so is the subset \{\mathbf{v}_2,\dots,\mathbf{v}_d\}. Thus, by the induction hypothesis,

\mathbf{v}_2 \wedge \dots \mathbf{v}_d \neq 0.

It then follows that

\mathbf{v}_1 \wedge \dots \mathbf{v}_d \neq \mathbf{0},

since otherwise the linear transformation L would map the zero vector in \mathbf{W}^{\wedge d} to a nonzero vector in \mathbf{W}^{\wedge d-1}, which is impossible.

— Q.E.D.

Corollary 1: We have \|\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d\| \geq 0 with equality if and only if \{\mathbf{v}_1,\dots,\mathbf{v}_d\} is linearly dependent.

Since

\|\|\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d\|^2 = \sum\limits_{\pi \in \mathrm{S}(d)} \mathrm{sgn}(\pi) \langle \mathbf{v}_1,\mathbf{v}_{\pi(1)}\rangle \dots \langle \mathbf{v}_d,\mathbf{v}_{\pi(d)}\rangle

you can think of this as a massive generalization of the Cauchy-Schwarz inequality, which is the case d=2.

Lecture 22 coda

Math 31AH: Lecture 21

Two basic issues in linear algebra which we have not yet resolved are:

  1. Can we multiply vectors?
  2. Can we certify linear independence?

The answers to these questions turn out to be closely related to one another. In this lecture, we discuss the first item.

Let \mathbf{V} be a vector space. We have seen one sort of multiplication of vectors, namely the scalar product \langle \mathbf{v},\mathbf{w} \rangle. One one hand, the scalar product is a proper multiplication rule in the sense that it satisfies the FOIL identity, which is referred to as bilinearity in polite company. On the other hand, the scalar product does not correspond to our usual notion of multiplication in the sense that the product of two vectors is a number, not a vector. This is strange in that one instinctively feels that the “product” of two objects should be another object of the same type. It is natural to ask whether, we can define a bilinear “vector product” which has the feature that the product of two vectors in \mathbf{V} is a vector in \mathbf{V}. In other words, we are asking whether it is possible to give some universal recipe for multiplication of vectors which would turn every vector space into an algebra.

So far, we have only seen certain specific vector spaces \mathbf{V} where a bilinear multiplication of vectors naturally presents itself. Here is a list of these spaces.

  1. \mathbf{V} = \mathbb{R}. In this case, vectors \mathbf{v} \in \mathbf{V} are real numbers, and the vector product \mathbf{v}\mathbf{w} is the product of real numbers.
  2. \mathbf{V}=\mathbb{R}^2. Technically, we have not seen this example yet, but here it is. Let \mathbf{v}=(x_1,x_2) and \mathbf{w}=(y_1,y_2) be vectors in \mathbf{V}. We then define their product to be \mathbf{v}\mathbf{w}=(x_1y_1-x_2y_2,x_1y_2+x_2y_1). Next week, we will see that this example of vector multiplication gives the complex number system.
  3. \mathbf{V}=\mathbb{R}^\infty. In this example, the vector space \mathbf{V} consists of infinite sequences \mathbf{v}=(x_0,x_1,x_2,\dots) which are identically zero after finitely many terms. This means that \mathbf{V} is isomorphic to the vector space of polynomials in a single variable. Let \mathbf{v}=(x_0,x_1,x_2,\dots) and \mathbf{w}=(y_0,y_1,y_2,\dots) be vectors in \mathbf{V}. We define their product to be \mathbf{v}\mathbf{w} = (x_0y_0,x_0y_1+x_1y_0,x_2y_0+x_1y_1+x_2y_0,\dots), which is just the recipe for multiplying polynomials and collecting together terms of the same degree.
  4. \mathbf{V}=\mathbb{R}^{n \times n}. In this example, the vector space \mathbf{V} consists of matrices with n rows and n columns. This means that \mathbf{V} is isomorphic to the vector space of linear operators on an n-dimensional vector space. A vector product in \mathbf{V} is then defined by matrix multiplication.

The above examples are quite different from one another, and they do not appear to be given by any universal recipe for defining a product of vectors. It turns out that in order to answer the question of how to define a universal vector product, it is better not to answer it at all. This is the idea behind the tensor product, which we now introduce.

To every pair of vectors \mathbf{v},\mathbf{w} \in \mathbf{V}, we associate a new vector denoted \mathbf{v} \otimes \mathbf{w}, which is called the tensor product of \mathbf{v} and \mathbf{w}. However, the vector \mathbf{v} \otimes \mathbf{w} does not reside in \mathbf{V}; rather, it is a vector in a new vector space called the tensor square of \mathbf{V} and denoted \mathbf{V} \otimes \mathbf{V}. What is happening here is that we view the symbol \otimes as a rule for multiplying two vectors, but we do not specify what this rule is — instead, we view \mathbf{v} \otimes \mathbf{w} as an “unevaluated” product of two vectors. We then store this unevaluated product in a new vector space \mathbf{V} \otimes \mathbf{V}, which contains all unevaluated products of vectors from \mathbf{V}. More precisely, the vectors in \mathbf{V} \otimes \mathbf{V} are all unevaluated expressions of the form

\tau = \mathbf{v}_1 \otimes \mathbf{w}_1 + \dots + \mathbf{v}_k \otimes \mathbf{w}_k,

where k \in \mathbb{N} is a natural number and \mathbf{v}_1,\mathbf{w}_1,\dots,\mathbf{v}_k,\mathbf{w}_k \in \mathbf{V} are vectors. These unevaluated expressions are called tensors, and often denoted by Greek letters. So tensor products are ambiguous, in the sense that we do not specify what the result of the multiplication \mathbf{v} \otimes \mathbf{w} actually is. The only thing we specify about this rule is that it is bilinear:

(a_1\mathbf{v}_1 + a_2\mathbf{v}_2) \otimes (b_1\mathbf{w}_1 + b_2\mathbf{w}_2) \\ = a_1b_1\mathbf{v}_1 \otimes \mathbf{w}_1 + a_1b_2 \mathbf{v}_1 \otimes \mathbf{w}_2 + a_2b_1 \mathbf{v}_2\otimes \mathbf{w}_1  + a_2b_2\mathbf{v}_2\otimes \mathbf{w}_2,

where the equality means that the LHS and the RHS are different expressions for the same vector in the vector space \mathbf{V} \otimes \mathbf{V}.

A tensor in \mathbf{V} \otimes \mathbf{V} which can be represented as the product of two vectors from \mathbf{V} is called a simple tensor. Note that a tensor may be simple without obviously being so, in the event that it can be “factored” as in high school algebra. For example, we have

\mathbf{v}_1 \otimes \mathbf{w}_1 + \mathbf{v}_2 \otimes \mathbf{w}_1 = (\mathbf{v}_1+\mathbf{v}_2) \otimes \mathbf{w}_1.

We haven’t yet said how to scale tensors by numbers. The rule for scalar multiplication of tensors is determined by bilinearity: it is defined by

a \mathbf{v} \otimes \mathbf{w} = (a\mathbf{v}) \otimes \mathbf{w} = \mathbf{v} \otimes (a\mathbf{w}),

and

a \sum_{i=1}^k \mathbf{v}_i \otimes \mathbf{w}_i = \sum_{i=1}^k a\mathbf{v}_i \otimes \mathbf{w}_i.

We can summarize all of the above by saying that two tensors \tau,\sigma \in \mathbf{V} \otimes \mathbf{V} are equal if and only if it is possible to rewrite \tau as \sigma using bilinearity.

Tensor products take a while to get used to. It’s important to remember that the only specified property of the tensor product is bilinearity; apart from this, it’s entirely ambiguous. So, anything we can say about tensor products must ultimately be a consequence of bilinearity. Here is an example.

Proposition 1: For any \mathbf{v} \in \mathbf{V}, we have

\mathbf{v} \otimes \mathbf{0}_\mathbf{V} = \mathbf{0}_\mathbf{V} \otimes \mathbf{v} = \mathbf{0}_\mathbf{V} \otimes \mathbf{0}_\mathbf{V}.

Proof: We are going to use the fact that scaling any vector \mathbf{v} \in \mathbf{V} by the number 0 \in \mathbb{R} produces the zero vector \mathbf{0}_\mathbf{V} \in \mathbf{V}. This was proved in Lecture 1, when we discussed the definition of a vector space. We have

\mathbf{v} \otimes \mathbf{0}_\mathbf{V} = \mathbf{v} \otimes (0\mathbf{0}_\mathbf{V}) = (0\mathbf{v}) \otimes \mathbf{0}_\mathbf{V} = \mathbf{0}_\mathbf{V} \otimes \mathbf{0}_\mathbf{V}.

Notice that bilinearity was used here to move the scalar zero from the second factor in the tensor product to the first factor in the tensor product. The proof that \mathbf{0}_\mathbf{V} \otimes \mathbf{v} = \mathbf{0}_\mathbf{V} \otimes \mathbf{0}_\mathbf{V} is essentially the same (try it!).

— Q.E.D.

Using Proposition 1, we can explicitly identify the “zero tensor,” i.e. the zero vector \mathbf{0}_{\mathbf{V} \otimes \mathbf{V}} in the vector space \mathbf{V} \otimes \mathbf{V}.

Proposition 2: We have \mathbf{0}_{\mathbf{V} \otimes \mathbf{V}}=\mathbf{0}_{\mathbf{V}} \otimes \mathbf{0}_{\mathbf{V}}.

Proof: Let

\tau = \sum_{i=1}^k \mathbf{v}_i \otimes \mathbf{w}_i

be any tensor. We want to prove that \tau+\mathbf{0}_\mathbf{V} \otimes \mathbf{0}_\mathbf{V} = \tau.

In the case k=1, we have \tau = \mathbf{v}_1 \otimes \mathbf{w}_1. Using bilinearity, we have

\mathbf{v}_1 \otimes \mathbf{w}_1 + \mathbf{0}_{\mathbf{V}} \otimes \mathbf{0}_{\mathbf{V}} = \mathbf{v}_1 \otimes \mathbf{w}_1 + \mathbf{0}_{\mathbf{V}} \otimes \mathbf{w}_1 = (\mathbf{v}_1+\mathbf{0}_\mathbf{V}) \otimes \mathbf{w}_1 = \mathbf{v}_1 \otimes \mathbf{w}_1,

where we used Proposition 1 and bilinearity.

The case k>1 now follows from the case k=1,

\tau + \mathbf{0}_{\mathbf{V}} \otimes \mathbf{0}_{\mathbf{V}} = \sum_{i=1}^k \mathbf{v}_i \otimes \mathbf{w}_i + \mathbf{0}_{\mathbf{V}} \otimes \mathbf{0}_{\mathbf{V}} = \sum_{i=1}^{k-1} \mathbf{v}_i \otimes \mathbf{w}_i + \left(a_k\mathbf{v}_k \otimes \mathbf{w}_k + \mathbf{0}_{\mathbf{V}} \otimes \mathbf{0}_{\mathbf{V}}\right) = \sum_{i=1}^{k-1} \mathbf{v}_i \otimes \mathbf{w}_i + a_k\mathbf{v}_k \otimes \mathbf{w}_k = \tau.

— Q.E.D.

Suppose now that \mathbf{V} is a Euclidean space, i.e. it comes with a scalar product \langle \cdot,\cdot \rangle. Then, there is an associated scalar product on the vector space \mathbf{V} \otimes \mathbf{V}, which by abuse of notation we also write as \langle \cdot,\cdot \rangle. This natural scalar product on \mathbf{V} \otimes \mathbf{V} is uniquely determined by the requirement that

\langle \mathbf{v}_1 \otimes \mathbf{w}_1,\mathbf{v}_2 \otimes \mathbf{w}_2 \rangle = \langle \mathbf{v}_1,\mathbf{v}_2\rangle \langle \mathbf{w}_1,\mathbf{w}_2\rangle, \quad \forall \mathbf{v}_1,\mathbf{v}_2,\mathbf{w}_1,\mathbf{w}_2 \in \mathbf{V}.

Exercise 1: Verify that the scalar product on \mathbf{V} \otimes \mathbf{V} just defined really does satisfy the scalar product axioms.

Proposition 3: If S is an orthogonal set of vectors in \mathbf{V}, then

S \otimes S = \{\mathbf{v} \otimes \mathbf{w} \colon \mathbf{v},\mathbf{w} \in S\}

is an orthogonal set of tensors in \mathbf{V} \otimes \mathbf{V}.

Proof: We must show that if \mathbf{v}_1 \otimes \mathbf{w}_1,\mathbf{v}_2 \otimes \mathbf{w}_2 \in S \otimes S are different tensors, then their scalar product is zero. We have

\langle \mathbf{v}_1 \otimes \mathbf{w}_1,\mathbf{v}_2 \otimes \mathbf{w}_2 \rangle = \langle \mathbf{v}_1,\mathbf{v}_2\rangle \langle \mathbf{w}_1,\mathbf{w}_2\rangle.

The assumption that these tensors are different is equivalent to saying that one of the following conditions holds:

\mathbf{v}_1 \neq \mathbf{v}_2 \text{ or } \mathbf{w}_1 \neq \mathbf{w}_2.

Since S is an orthogonal set, the first possibility implies \langle \mathbf{v}_1,\mathbf{v}_2 \rangle =0, and the second implies \langle \mathbf{w}_1,\mathbf{w}_2 \rangle = 0. In either case, the product \langle \mathbf{v}_1,\mathbf{v}_2\rangle \langle \mathbf{w}_1,\mathbf{w}_2\rangle is equal to zero.

— Q.E.D.

Theorem 1: If E=\{\mathbf{e}_1,\dots,\mathbf{e}_n\} is an orthonormal basis in \mathbf{V}, then E \otimes E = \{\mathbf{e}_i \otimes \mathbf{e}_j \colon 1 \leq i,j \leq n\} is an orthonormal basis in \mathbf{V} \otimes \mathbf{V}.

Proof: Let us first show that E \otimes E spans \mathbf{V} \otimes \mathbf{V}. We have

\tau = \sum\limits_{k=1}^l a_k \mathbf{v}_k \otimes \mathbf{w}_k = \sum\limits_{k=1}^l a_k \left(\sum\limits_{i=1}^n \langle \mathbf{e}_i,\mathbf{v}_k\rangle\mathbf{e}_i \right) \otimes \left(\sum\limits_{j=1}^n \langle \mathbf{e}_j,\mathbf{w}_k\rangle\mathbf{e}_j \right) \\= \sum\limits_{i,j=1}^n \left(\sum\limits_{k=1}^n a_k \langle \mathbf{e}_i,\mathbf{v}_k\rangle\langle \mathbf{e}_j,\mathbf{w}_k\rangle\right)\mathbf{e}_i \otimes \mathbf{e}_j,

which shows that an arbitrary tensor is a linear combination of the tensors \mathbf{e}_i \otimes \mathbf{e}_j.

Since E is an orthogonal set in \mathbf{V}, by Proposition 3 we have that E \otimes E is an orthogonal set in \mathbf{V} \otimes \mathbf{V}, and therefore it is linearly independent.

It remains only to show that all tensors in E \otimes E have unit length. This is established by direct computation:

\|\mathbf{e}_i \otimes \mathbf{e}_i \| = \langle \mathbf{e}_i\otimes \mathbf{e}_i,\mathbf{e}_i \otimes \mathbf{e}_i \rangle = \langle \mathbf{e}_i,\mathbf{e}_i \rangle\langle \mathbf{e}_i,\mathbf{e}_i \rangle= 1.

— Q.E.D.

Corollary 1: If \dim \mathbf{V} = n, then \dim \mathbf{V} \otimes \mathbf{V} = n^2.

It is important to note that the tensor product is noncommutative: it is typically not the case that \mathbf{v} \otimes \mathbf{w} = \mathbf{w} \otimes \mathbf{v}. However, we can decompose a simple tensor into two pieces, as

\mathbf{v} \otimes \mathbf{w} = \frac{\mathbf{v} \otimes \mathbf{w} + \mathbf{w} \otimes \mathbf{v}}{2} + \frac{\mathbf{v} \otimes \mathbf{w} - \mathbf{w} \otimes \mathbf{v}}{2}.

The first of these fractions is called the “symmetric part” of \mathbf{v} \otimes \mathbf{w}, and is denoted

\mathbf{v} \vee \mathbf{w} := \frac{\mathbf{v} \otimes \mathbf{w} + \mathbf{w} \otimes \mathbf{v}}{2}.

The reason for this notation is that we can think of \vee as a symmetric version of the tensor product: a bilinear multiplication of vectors that, by construction, is commutative:

\mathbf{v} \vee \mathbf{w} = \mathbf{w} \vee \mathbf{v}.

Note that if \mathbf{v}=\mathbf{w}, the symmetric tensor product produces the same tensor as the tensor product itself:

\mathbf{v} \vee \mathbf{v} = \mathbf{v} \otimes \mathbf{v}.

The second fraction above is called the “antisymmetric part” of \mathbf{v} \otimes \mathbf{w}, and denoted

\mathbf{v} \wedge \mathbf{w} := \frac{\mathbf{v} \otimes \mathbf{w} - \mathbf{w} \otimes \mathbf{v}}{2}.

This is an antisymmetric version of the tensor product in that, by construction, satisfies

\mathbf{v} \wedge \mathbf{w} = -\mathbf{w} \wedge \mathbf{v}.

Note that the antisymmetric tensor product of any vector with itself produces the zero tensor:

\mathbf{v} \wedge \mathbf{v} = \mathbf{0}_{\mathbf{V} \otimes \mathbf{V}}.

Although it may seem like the symmetric tensor product is more natural (commutative products are nice), it turns out that the antisymmetric tensor product — or wedge product as it’s often called — is more important. Here is a first indication of this. Suppose that \mathbf{V} is a 2-dimensional Euclidean space with orthonormal basis \{\mathbf{e}_1,\mathbf{e}_2\}. Let

\mathbf{v}_1 = a_{11}\mathbf{e}_1 + a_{12}\mathbf{e}_2 \quad\text{ and }\quad \mathbf{v}_2 = a_{21}\mathbf{e}_1 + a_{22}\mathbf{e}_2

be two vectors in \mathbf{V}. Let’s compute their wedge product: using FOIL, we find

\mathbf{v}_1 \wedge \mathbf{v}_2 \\ = (a_{11}\mathbf{e}_1 + a_{12}\mathbf{e}_2) \wedge (a_{21}\mathbf{e}_1 + a_{22}\mathbf{e}_2) \\ = (a_{11}\mathbf{e}_1) \wedge (a_{21}\mathbf{e}_1) + (a_{11}\mathbf{e}_1) \wedge (a_{22}\mathbf{e}_2) + (a_{12}\mathbf{e}_2)\wedge (a_{21}\mathbf{e}_1) + (a_{12}\mathbf{e}_2) \wedge (a_{22}\mathbf{e}_2) \\ = a_{11}a_{21} \mathbf{e}_1 \wedge \mathbf{e}_1 + a_{11}a_{22}\mathbf{e}_1 \wedge \mathbf{e}_2 + a_{12}a_{21} \mathbf{e}_2 \wedge \mathbf{e}_1 + a_{12}a_{22}\mathbf{e}_2 \wedge \mathbf{e}_2 \\ = a_{11}a_{22}\mathbf{e}_1 \wedge \mathbf{e}_2 + a_{12}a_{21} \mathbf{e}_2 \wedge \mathbf{e}_1  \\ = a_{11}a_{22}\mathbf{e}_1 \wedge \mathbf{e}_2 - a_{12}a_{21} \mathbf{e}_1 \wedge \mathbf{e}_2 \\ = (a_{11}a_{22}\mathbf{e}_1 - a_{12}a_{21}) \mathbf{e}_1 \wedge \mathbf{e}_2.

Probably, you recognize the lone scalar (a_{11}a_{22}\mathbf{e}_1 - a_{12}a_{21}) remaining at the end of this computation as a determinant:

\begin{vmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{vmatrix} = a_{11}a_{22}-a_{21}a_{12}.

Even if you don’t, no need to worry: you are not expected to know what a determinant is at this point. Indeed, in Lecture 22 we are going to use the wedge product to define determinants.

Lecture 21 coda

Math 31AH: Lecture 19

Let \mathbf{V} be an n-dimensional vector space equipped with a scalar product \langle \cdot,\cdot \rangle. Recall from Lecture 16 that an operator A \in \mathrm{End}\mathbf{V} is said to be selfadjoint (or symmetric) if

\langle \mathbf{v},A\mathbf{w} \rangle = \langle A\mathbf{v},\mathbf{w}\rangle \quad \forall\ \mathbf{v},\mathbf{w} \in \mathbf{V}.

Also recall from Lecture 18 that A \in \mathrm{End}\mathbf{V} is said to be semisimple if there exists a basis of \mathbf{V} consisting of eigenvectors of A. The goal of this lecture is to prove the following cornerstone result in linear algebra.

Theorem 1 (Spectral Theorem for selfadjoint operators): If A \in \mathrm{End}\mathbf{V} is selfadjoint, then it is semisimple.

The proof of this important theorem occupies the remainder of this lecture. It is a constructive argument that builds an eigenbasis for A one vector at a time. A nice feature of the construction is that the eigenbasis it outputs is an orthonormal basis of \mathbf{V}.

Let us begin with an important observation on a special subspecies of selfadjoint operators.

Definition 1: A selfadjoint operator B \in \mathrm{End}\mathbf{V} is said to be nonnegative if the associated quadratic form is nonnegative, i.e. if the function defined by

Q_B(\mathbf{v}) := \langle \mathbf{v},B \mathbf{v} \rangle, \quad v \in \mathbf{V},

satisfies Q_B(\mathbf{v}) \geq 0 for all \mathbf{v} \in \mathbf{V}.

Any nonnegative selfadjoint operator B has the property that membership in its kernel is certified by vanishing of Q_B.

Lemma 1: If B \in \mathrm{End}\mathbf{V} is a nonnegative selfadjoint operator, then \mathbf{v} \in \mathrm{Ker} B if and only if Q_B(\mathbf{v})=0.

Proof: One direction of this equivalence is obvious: if \mathbf{v} \in \mathrm{Ker}B, then

Q_B(\mathbf{v}) = \langle \mathbf{v},B\mathbf{v}\rangle = \langle \mathbf{v},\mathbf{0} \rangle = 0.

The proof of the converse statement is similar to the proof of the Cauchy-Schwarz inequality. More precisely, suppose that Q_B(\mathbf{v})=0, and let t \in \mathbb{R} be any number and let \mathbf{w} \in \mathbf{V} be an arbitrary vector. We have

Q_B(\mathbf{v}+t\mathbf{w}) = \langle \mathbf{v}+t\mathbf{w},B\mathbf{v}+tB\mathbf{w}\rangle \\= \langle \mathbf{v},B\mathbf{v} \rangle + \langle \mathbf{v},tB\mathbf{w} \rangle + \langle t\mathbf{w},B\mathbf{v} \rangle + \langle t\mathbf{w},tB\mathbf{w} \rangle.

Using the definition of Q_B together with the fact that B is selfadjoint, this simplifies to

Q_B(\mathbf{v}+t\mathbf{w}) = Q_B(\mathbf{v}) + 2t\langle B\mathbf{v},\mathbf{w} \rangle + t^2Q_B(\mathbf{w}),

and since Q_B(\mathbf{v})=0 this further simplifies to

Q_B(\mathbf{v}+t\mathbf{w}) = 2t\langle B\mathbf{v},\mathbf{w} \rangle + t^2Q_B(\mathbf{w}).

Now, as a function of t \in \mathbb{R} the righthand side of this equation is a parabola, and since Q_B(\mathbf{w}) \geq 0 this parabola is upward=opening. Moreover, since the lefthand side satisfies Q_B(\mathbf{v}+t\mathbf{w}) \geq 0, the lowest point of this parabola cannot lie below the line t=0, and this forces

\langle B\mathbf{v},\mathbf{w} \rangle = 0.

But the vector \mathbf{w} was chosen arbitrarily, so the above equation holds for any \mathbf{w} \in \mathbf{V}, in particular \mathbf{w}=B\mathbf{v}. We thus have

\langle B\mathbf{v},B\mathbf{v}\rangle = \|B\mathbf{v}\|^2=0,

which means that B\mathbf{v}=\mathbf{0}, i.e. \mathbf{v} \in \mathrm{Ker}B.

— Q.E.D.

Now, let A \in \mathrm{End}\mathbf{V} be any selfadjoint operator. We are going to use the Lemma just established to prove that A admits an eigenvector \mathbf{e}; the argument even gives a description of the corresponding eigenvalue \lambda.

Consider the unit sphere in the Euclidean space \mathbf{V}, i.e. the set

S(\mathbf{V}) = \{ \mathbf{v} \in \mathbf{V} \colon \|\mathbf{v}\|=1\}

of all vectors of length 1. The quadratic form Q_A(\mathbf{v}) = \langle \mathbf{v},A\mathbf{v}\rangle is a continuous function, and hence by the Extreme Value Theorem the minimum value of Q_A on the sphere,

\lambda = \min\limits_{\mathbf{v} \in S(\mathbf{V})} Q_A(\mathbf{v}),

does indeed exist, and is moreover achieved at a vector \mathbf{e} \in S(\mathbf{V}) at which the minimum is achieved, i.e.

Q_A(\mathbf{e})=\lambda.

Theorem 2: The minimum \lambda of Q_A on the unit sphere is an eigenvalue of A, and the minimizer \mathbf{e} lies in the eigenspace \mathbf{V}_\lambda.

Proof: By definition of \lambda as the minimum value of Q_A, we have that

\langle \mathbf{v},A\mathbf{v} \rangle \geq \lambda \quad \forall \mathbf{v} \in S(\mathbf{V}).

Since \mathbf{v},\mathbf{v} \rangle =1 for any \mathbf{v} \in S_1(\mathbf{V}), the above inequality can be rewritten as

\langle \mathbf{v},A\mathbf{v} \rangle \geq \lambda\langle \mathbf{v},\mathbf{v} \rangle \quad \forall \mathbf{v} \in S_1(\mathbf{V}).

But actually, this implies that

\langle \mathbf{v},A\mathbf{v} \rangle \geq \lambda\langle \mathbf{v},\mathbf{v} \rangle \quad \forall \mathbf{v} \in \mathbf{V},

since every vector in \mathbf{V} is a nonnegative scalar multiple of a vector of unit length (make sure you understand this). We thus have that

\langle \mathbf{v},(A-\lambda I)\mathbf{v} \rangle \geq 0 \quad \forall v \in \mathbf{V}.

This says that the selfadjoint operator B:= A-\lambda I is nonnegative. Moreover, we have that

Q_B(\mathbf{e}) = \langle \mathbf{e},(A-\lambda I)\mathbf{e} \rangle = Q_A(\mathbf{e})-\lambda \langle \mathbf{e},\mathbf{e}\rangle = \lambda - \lambda = 0.

Thus, by Lemma 1, we have that \mathbf{e} \in \mathrm{Ker}(A-\lambda)I, meaning that

(A-\lambda I)\mathbf{e} = \mathbf{0}.

or equivalently

A\mathbf{e} = \lambda \mathbf{e}.

— Q.E.D.

Theorem 2 has established that an arbitrary selfadjoint operator A has an eigenvector. However, this seems to be a long way from Theorem 1, which makes the much stronger assertion that A has n linearly independent eigenvectors. In fact, the distance from Theorem 2 to Theorem 1 is not so long as it may seem. To see why, we need to introduce one more very important concept.

Defintion 2: Let T\in \mathrm{End}\mathbf{V} be a linear operator, and let \mathbf{W} be a subspace of \mathbf{V}. We say that \mathbf{W} is invariant under T if

T\mathbf{w} \in \mathbf{W} \quad \forall\ \mathbf{w} \in \mathbf{W}.

The meaning of this definition is that if \mathbf{W} is invariant under T, then T may be considered as a linear operator on the smaller space \mathbf{W}, i.e. as an element of the algebra \mathrm{End}\mathbf{W}.

Let us adorn the eigenvalue/eigenvector pair produced by Theorem 2 with a subscript, writing this pair as (\mathbf{e}_1,\lambda_1). Consider the orthogonal complement of the line spanned by \mathbf{e}_1, i.e. the subspace of \mathbf{V} given by

\mathbf{V}_2 = \{ \mathbf{v} \in \mathbf{V} \colon \langle \mathbf{v},\mathbf{e}_1 \rangle = 0\}.

Proposition 1: The subspace \mathbf{V}_2 is invariant under A.

Proof: We have to prove that if \mathbf{v} is orthogonal to the eigenvector \mathbf{e}_1 of A, then so is A\mathbf{v}. This follows easily from the fact that A is selfadjoint:

\langle A\mathbf{v},\mathbf{e}_1 \rangle = \langle \mathbf{v},A\mathbf{e}_1 \rangle = \langle \mathbf{v},\lambda_1\mathbf{e}_1 \rangle = \lambda_1 \langle \mathbf{v},\mathbf{e}_1 \rangle=0.

— Q.E.D.

The effect of Proposition 1 is that we may consider A as a selfadjoint operator defined on the (n-1)-dimensional subspace \mathbf{V}_2. But this means that we can simply apply Theorem 2 again, with \mathbf{V}_2 replacing \mathbf{V}. We will then get a new eigenvector/eigenvalue pair (\mathbf{e}_2,\lambda_1), where

\lambda_2 = \min\limits_{\mathbf{v} \in S(\mathbf{V}_2} Q_A(\mathbf{v})

is the minimum value of Q_A on the unit sphere in the Euclidean space \mathbf{V}_2, and e_2 \in S_(\mathbf{V}_2) is a vector at which the minimum is achieved,

Q_A(\mathbf{e}_2) = \lambda_2.

By construction, \mathbf{e}_2 is a unit vector orthogonal to \mathbf{e}_1, so that in particular \{\mathbf{e}_1,\mathbf{e}_2\} is a linearly independent set in \mathbf{V}. Moreover, we have that \lambda_1 \leq \lambda_2, since S(\mathbf{V}_2) is a subset of S(\mathbf{V}_1).

Lecture 19 coda

Math 31AH: Lecture 18

Let \mathbf{V} be a vector space, and let us consider the algebra \mathrm{End}\mathbf{V} as a kind of ecosystem consisting of various life forms of varying complexity. We now move on to the portion of the course which is concerned with the taxonomy of linear operators — their classification and division into various particular classes.

The simplest organisms in the ecosystem \mathrm{End}\mathbf{V} are operators which act by scaling every vector \mathbf{v} \in \mathbf{V} by a fixed number \lambda \in \mathbb{R}; these are the single-celled organisms of the operator ecosystem.

Definition 1: An operator A \in \mathrm{End}\mathbf{V} is said to be simple if there exists a scalar \lambda \in \mathbb{R} such that

A\mathbf{v}=\lambda \mathbf{v} \quad \forall\ \mathbf{v} \in \mathbf{V}.

— Q.E.D.

Simple operators really are very simple, in the sense that they are no more complicated than numbers. Indeed, Definition 1 is equivalent to saying that A=\lambda I, where I \in \mathrm{End}\mathbf{V} is the identity operator, which plays the role of the number 1 in the algebra \mathrm{End}\mathbf{V}, meaning that it is the multiplicative identity in this algebra. Simple operators are extremely easy to manipulate algebraically: if A=\lambda I, then we have

A^k = \underbrace{(\lambda I)(\lambda I) \dots (\lambda I)}_{k \text{ factors }} =\lambda^kI,

for any nonnegative integer k, and more generally if p(x) is any polynomial in a single variable then we have

p(A) = p(\lambda)I.

Exercise 1: Prove the above formula.

The formula A^k=\lambda^kI even works in the case that k is a negative integer, provided that \lambda \neq 0; equivalently, the simple operator A=\lambda I is invertible if and only if \lambda \neq 0, its inverse being A^{-1} = \lambda^{-1}I. If A =\lambda I and B = \mu I are simple operators, then they commute,

AB = (\lambda I)(\mu I)=(\lambda\mu)I = (\mu I)(\lambda I) = BA,

just like ordinary numbers, and more generally

p(A,B) = p(A,B)I

for any polynomial p(x,y) in two variables.

Exercise 2: Prove the above formula.

Another way to appreciate how truly simple simple operators are is to look at their matrices. In order to do this, we have to restrict to the case that the vector space \mathbf{V} is finite-dimensional. If \mathbf{V} is n-dimensional, and E=\{\mathbf{e}_1,\dots,\mathbf{e}_n\} is any basis of \mathbf{V}, then the matrix of A=\lambda I relative to E is simply

[A]_E = \begin{bmatrix} \lambda & {} & {} \\ {} & \ddots & {} \\ {} & {} & \lambda \end{bmatrix},

where the off-diagonal matrix elements are all equal to zero. For this reason, simple operators are often called diagonal operators.

Most operators in \mathrm{End}\mathbf{V} are not simple operators — they are complicated multicellular organisms. So, to understand them we have to dissect them and look at their organs one at a time. Mathematically, this means that, given an operator A \in \mathrm{End}\mathbf{V}, we look for special vectors in \mathbf{V} on which A acts as if it was simple.

Definition 2: A nonzero vector \mathbf{e} \in \mathbf{V} is said to be an eigenvector of an operator A \in \mathbf{End} \mathbf{V} if

A\mathbf{e} = \lambda \mathbf{e}

for some \lambda \in \mathbf{R}. The scalar \lambda is said to be an eigenvalue of A.

The best case scenario is that we can find a basis of \mathbf{V} entirely made up of eigenvectors of A.

Defintion 3: An operator A \in \mathrm{End} \mathbf{V} is said to be semisimple if there exists a basis E of \mathbf{V} consisting of eigenvectors of A. Such a basis is called an eigenbasis for A.

As the name suggests, semisimple operators are pretty simple, but not quite as simple as simple operators. In particular, every simple operator is semisimple, because if A is simple then every nonzero vector in \mathbf{V} is an eigenvector of A, and hence any basis in \mathbf{V} is an eigenbasis for A. The converse, however, is not true.

Let \mathbf{V} be an n-dimensional vector space, and let A \in \mathrm{End} \mathbf{V} be a semisimple operator. By definition, this means that there exists a basis E=\{\mathbf{e}_1,\dots,\mathbf{e}_n\} in \mathbf{V} consisting of eigenvectors of A. This in turn means that there exist numbers \lambda_1,\dots,\lambda_n \in \mathbb{R} such that

A\mathbf{e}_i = \lambda_i \mathbf{e}_i \quad \forall\ 1 \leq i \leq n.

If \lambda_1=\dots=\lambda_n, then A is simple, but if these numbers are not all the same then it is not. However, even if all these numbers are different, the matrix of A relative to E will still be a diagonal matrix, i.e. it will have the form

[A]_E = \begin{bmatrix} \lambda_1 & {} & {} \\ {} & \ddots & {} \\ {} & {} & \lambda_n \end{bmatrix}.

For this reason, semisimple operators are often called diagonalizable operators. Note the shift in terminology from “diagonal,” for simple, to “diagonalizable,” for semisimple. The former term suggest an immutable characteristic, independent of basis, whereas the latter indicates that some action must be taken, in that a special basis must be found to reveal diagonal form. More precisely, the matrix of a semisimple operator A is not diagonal with respect to an arbitrary basis; the definition only says that the matrix of A is diagonal relative to some basis.

Most linear operators are not semisimple — indeed, there are plenty of operators that have no eigenvectors at all. Consider the operator

R_\theta \colon \mathbb{R}^2 \to \mathbb{R}^2

which rotates a vector \mathbf{v} \in \mathbb{R}^2 counterclockwise through the angle \theta \in [0,2\pi). The matrix of this operator relative to the standard basis

\mathbf{e}_1 = (1,0),\ \mathbf{e}_2 = (0,1)

of \mathbb{R}^2 is

\begin{bmatrix} \cos \theta & - \sin \theta \\ \sin \theta & \cos \theta \end{bmatrix}.

If \theta = 0, then R_\theta = I, so that R_\theta is a simple operator: \mathbf{e}_1,\mathbf{e}_2 are eigenvectors, with eigenvalues \lambda_1=\lambda_2=1. If \theta = \pi, then R_\theta=-I and again R_\theta is simple, with the same eigenvectors and eigenvalues \lambda_1=\lambda_2=-1. However, taking any other value of \theta, for example \theta = \frac{\pi}{2}, rotation through a right angle, it is geometrically clear that R_\theta \mathbf{v} is never a scalar multiple of \mathbf{v}, so that R_\theta has no eigenvectors at all. In particular, it is not semisimple.

Let us now formulate necessary and sufficient conditions for an operator to be semisimple. In this endeavor it is psychologically helpful to reorganize the eigenvector/eigenvalue definition by thinking of eigenvalues as the primary objects, and eigenvectors as secondary objects associated to them.

Defintion 4: The spectrum of an operator A \in \mathrm{End}\mathbf{V} is the set \sigma(A) \subseteq \mathbb{R} defined by

\sigma(A) = \{ \lambda \in \mathbb{R} \colon \lambda \text{ is an eigenvalues of } A\}.

For each \lambda \in \sigma(A), the set \mathbf{V}_\lambda \subseteq \mathbf{V} defined by

\mathbf{V}_\lambda = \{\mathbf{v} \in \mathbf{V} \colon A\mathbf{v} = \lambda \mathbf{v}\}

is called the \lambdaeigenspace of A. The dimension of \mathbf{V}_\lambda is called the geometric multiplicity of \lambda.

In these terms, saying that A \in \mathrm{End}\mathbf{V} is a simple operator means that the spectrum of A consists of a single number,

\sigma(A) = \{\lambda\},

and that the corresponding eigenspace exhausts \mathbf{V},

\mathbf{V}_\lambda = \mathbf{V}.

At the other extreme, the rotation operator R_{\pi/2} considered above has empty spectrum,

\sigma(R_{\pi/2}) = \{\},

and thus does not have any eigenspaces.

Proposition 1: For any A \in \mathrm{End}\mathbf{V}, for each \lambda \in \sigma(A) the eigenspace \mathbf{V}_\lambda is a subspace of \mathbf{V}.

Proof: First, observe that \mathbf{0} \in \mathbf{V}_\lambda, because

A\mathbf{0} = \mathbf{0} = \lambda \mathbf{0}.

Second, \mathbf{V}_\lambda is closed under scalar multiplication: if \mathbf{v} \in \mathbf{V}_\lambda, then

A(t\mathbf{v}) = tA\mathbf{v} = t\lambda\mathbf(v)=\lambda(t\mathbf{v}).

Third, \mathbf{V}_\lambda is closed under vector addition: if \mathbf{v},\mathbf{w} \in \mathbf{V}_\lambda, then

A(\mathbf{v}+\mathbf{w}) = A\mathbf{v}+A\mathbf{w}=\lambda\mathbf{v}+\lambda\mathbf{w}=\lambda(\mathbf{v}+\mathbf{w}).

— Q.E.D.

So, the eigenspaces of an operator A \in \mathrm{End}\mathbf{V} constitute a collection of subspaces of \mathbf{V}_\lambda of \mathbf{V} indexed by the numbers \lambda \in \sigma(A). A key feature of these subspaces is that they are independent of one another.

Theorem 1: Suppose that \lambda_1,\dots,\lambda_k are distinct eigenvalues of an operator A \in \mathrm{End}\mathbf{V}. Let \mathbf{e}_1,\dots,\mathbf{e}_k be nonzero vectors such that \mathbf{e}_i \in \mathbf{V}_{\lambda_i} for each 1 \leq  i\leq k. Then \{\mathbf{e}_1,\dots,\mathbf{e}_k\} is a linearly independent set.

Proof: We prove this by induction on k. The base case is k=1, and in this case the assertion is simply that the set \{\mathbf{e}_{\lambda_1}\} consisting of a single eigenvector of A is linearly independent. This is true, since eigenvectors are nonzero by definition.

For the induction step, suppose that \{\mathbf{e}_1,\dots,\mathbf{e}_k\} is a linearly dependent set. Then, there exist numbers t_1,\dots,t_k \in \mathbb{R}, not all equal to zero, such that

\sum_{i=1}^k t_i\mathbf{e}_i = \mathbf{0}.

Let us suppose that t_1 \neq 0. Applying the operator A to both sides of the above vector equation, we get

\sum_{i=1}^k t_i\lambda_i\mathbf{e}_i = \mathbf{0}.

On the other hand, we can multiply the original vector equation by any scalar and it remains true; in particular, we have

\sum_{i=1}^k t_i\lambda_k\mathbf{e}_i = \mathbf{0}.

Now, subtracting this third equation from the second equation, we obtain

\sum_{i=1}^{k-1} t_i(\lambda_i-\lambda_k)\mathbf{e}_i = \mathbf{0}.

By the induction hypothesis, \{\mathbf{e}_1,\dots,\mathbf{e}_{k-1}\} is a linearly independent set, and hence all the coefficients in this vector equation are zero. In particular, we have

t_1(\lambda_1-\lambda_k) \neq 0.

But this is impossible, since t_1 \neq 0 and \lambda_1 \neq \lambda_k. Hence, the set \{\mathbf{e}_1,\dots,\mathbf{e}_k\} cannot be linearly dependent — it must be linearly independent.

— Q.E.D.

Restricting to the case that \mathbf{V} is finite-dimensional, \dim \mathbf{V}=n, Theorem 1 has the following crucial consequences.

Corollary 1: A \in \mathrm{End}\mathbf{V} is semisimple if and only if

\sum\limits_{\lambda \in \sigma(A)} \dim \mathbf{V}_\lambda = \dim \mathbf{V}.

Proof: Suppose first that A is semisimple. By definition, this means that the span of the eigenspaces of A is all of \mathbf{V},

\mathrm{Span} \bigcup\limits_{\lambda \in \sigma(A)} \mathbf{V}_\lambda = \mathbf{V}.

Thus

\dim \mathrm{Span} \bigcup\limits_{\lambda \in \sigma(A)} \mathbf{V}_\lambda = \dim \mathbf{V}.

By Theorem 1, we have

\dim \mathrm{Span} \bigcup\limits_{\lambda \in \sigma(A)} \mathbf{V}_\lambda =\sum\limits_{\lambda \in \sigma(A)} \dim \mathbf{V}_\lambda,

and hence

\sum\limits_{\lambda \in \sigma(A)} \dim \mathbf{V}_\lambda = \dim \mathbf{V}.

Conversely, suppose that the sum of the dimensions of the eigenspaces of A is equal to the dimension of \mathbf{V}. For each \lambda \in \sigma(A), let E_\lambda be a basis of the eigenspace \mathbf{V}_\lambda. Then, by Theorem 1, the set

E = \bigcup\limits_{\lambda \in \sigma(A)} E_\lambda

is a linearly independent set, and hence a basis of the subspace \mathrm{Span}(E) of \mathbf{V}. Thus

\dim \mathrm{Span}(E) = \sum\limits_{\lambda \in \sigma(A)} \dim \mathbf{V}_\lambda.

Since by hypothesis we have

\sum\limits_{\lambda \in \sigma(A)} \dim \mathbf{V}_\lambda = \dim \mathbf{V},

this implies that

\dim \mathrm{Span}(E) =\dim \mathbf{V},

which in turn implies that

\mathrm{Span}(E) =\mathbf{V}.

Thus E is a basis of \mathbf{V} consisting of eigenvectors of A, whence A is semisimple.

— Q.E.D.

Corollay 3: If |\sigma(A)| = \dim \mathbf{V}, then A is semisimple.

Proof: To say that |\sigma(A)|=\dim \mathbf{V} is equivalent to saying that the spectrum of A consists of n=\dim \mathbf{V} distinct numbers,

\sigma(A) = \{\lambda_1,\dots,\lambda_n\}.

Sampling a collection of nonzero vectors from each corresponding eigenspace,

e_i \in \mathbf{V}_{\lambda_i}, \quad 1 \leq i \leq n,

we get a set E= \{\mathbf{e}_1,\dots,\mathbf{e}_n\} of eigenvectors of A. By Theorem 1, E is a linearly independent set, hence it is a basis of \mathbf{V}.

— Q.E.D.

Lecture 18 video