Math 262A: Lecture 3

In this lecture we continue our long march towards Stirling’s formula. Let me begin by saying that, while we seem to be spending an absurd amount of time on this, Stirling’s approximation is being used as a case study in something much more general, Laplace’s method, and we will understand the general thing almost automatically once we understand the specific thing.

We return to the integral representation

N! = \frac{N^{N+1}}{e^N}\int_{-1}^\infty e^{-N(w-\log(1+w))}\mathrm{d}w.

In Lecture 2, we performed some elementary estimates which showed that, for all sufficiently small \varepsilon >0, we have

\int_{-1}^\infty e^{-N(w-\log(1+w))}\mathrm{d}w = \int_{-\varepsilon}^{+\varepsilon} e^{-N(w-\log(1+w))}\mathrm{d}w + O(e^{-c(\varepsilon)N}),

where c(\varepsilon) is a positive number depending on \varepsilon but not on N. This quantifies the intuition that the bulk contribution to the integral comes from a small neighborhood of the minimum of the action.

We now come to the problem of estimating this bulk contribution

\int_{-\varepsilon}^{+\varepsilon} e^{-N(w-\log(1+w))}\mathrm{d}w.

The action

S(w) = w-\log(1+w)

is holomorphic on the open unit disc \mathbb{D}_1 = \{w \in \mathbb{C} \colon |w|<1\}, and its Maclaurin series is

S(w) = \sum\limits_{d=2}^{\infty} (-1)^d\frac{w^d}{d},

this series converging absolutely and uniformly on compact subsets of \mathbb{D}_1 to the correct value S(w). In Lecture 2, we observed that if we throw away all terms in the series expansion of S(w) beyond the quadratic term, our integral \int_{-\varepsilon}^{+\varepsilon}-integral becomes the truncated Gaussian integral

\int_{-\varepsilon}^{+\varepsilon} e^{-\frac{N}{2}w^2}\mathrm{d}w,

which can then be replaced by a total Gaussian integral at the cost of introducing an exponentially small error,

\int_{-\varepsilon}^{+\varepsilon} e^{-\frac{N}{2}w^2}\mathrm{d}w =\int_{-\infty}^{+\infty} e^{-\frac{N}{2}w^2}\mathrm{d}w + O(e^{-c(\varepsilon)N}),

this being beneficial because the total Gaussian integral can be exactly evaluated,

\int_{-\infty}^{+\infty} e^{-\frac{N}{2}w^2}\mathrm{d}w = \sqrt{\frac{2\pi}{N}}.

Thus, the remaining question is to quantify the error incurred by replacing

\int_{-\varepsilon}^{+\varepsilon} e^{-N(w-\log(1+w))}\mathrm{d}w \quad\text{with}\quad \int_{-\varepsilon}^{+\varepsilon} e^{-\frac{N}{2}w^2}\mathrm{d}w.

We now want quantify the error incurred by throwing away all terms of the action S beyond its dominant (quadratic) term. For small w, we naturally expect this error to be a small number which decays as N increases, but this decay turns out to be much slower than exponential and represents the main source of error in the approximation scheme we’re building.

To get started, we write

\int_{-\varepsilon}^{+\varepsilon} e^{-N(w-\log(1+w))}\mathrm{d}w = \int_{-\varepsilon}^{+\varepsilon} e^{-\frac{N}{2}w^2}e^{-N\sum_{d=3}^\infty (-1)^d \frac{w^d}{d}}\mathrm{d}w.

Next, we Maclaurin-expand the non-Gaussian part of the integrand, writing

e^{-N\sum_{d=3}^\infty (-1)^d \frac{w^d}{d}} = \sum\limits_{d=0}^\infty m_d(N) \frac{w^d}{d!},

where a_d(N) is the dth derivative of e^{-N\sum_{d=3}^\infty (-1)^d \frac{w^d}{d}} at w=0. Assuming \varepsilon < 1, this series converges uniformly on \overline{\mathbb{D}}_\varepsilon = \{w \in \mathbb{C} \colon |w| \leq \varepsilon\}; indeed, it is the Maclaurin series of the function S(w)-\frac{w^2}{2}, and we aren’t changing the location of singularities by subtracting an entire function. We can thus interchange the order of summation and integration, thereby obtaining

\int_{-\varepsilon}^{+\varepsilon} e^{-N(w-\log(1+w))}\mathrm{d}w = \sum\limits_{d=0}^\infty \frac{m_d(N)}{d!} \int_{-\varepsilon}^{+\varepsilon} e^{-\frac{N}{2}w^2}w^d \mathrm{d}w.

We have thus reduced the initial problem to two problems.

Problem 1: Compute the coefficients m_d(N).

Problem 2: Compute the integrals \int_{-\varepsilon}^{+\varepsilon} e^{-\frac{N}{2}w^2}w^d \mathrm{d}w.

We will solve Problem 1 today, and Problem 2 on Thursday (in case you’d like to think ahead).

Problem 1 is pure algebraic combinatorics, being a question about composition in the algebra of formal power series \mathbb{C}[[z]]. This question can be phrased in greater generality, as follows. Let

E(z) = \sum\limits_{d=0}^\infty e_d \frac{z^d}{d!} \quad\text{ and }\quad C(z) = \sum\limits_{d=1}^\infty c_d \frac{z^d}{d!}

be elements of \mathbb{C}[[z]] such that the constant term of C(z) is zero. Notice that we are writing these series as if their coefficients were the derivatives of analytic functions E(z) and C(z) at z=0, but we are not actually imposing any convergence conditions on them. Series of this form are often referred to as exponential generating functions. The composition

E(C(z)) = \sum\limits_{d=0}^\infty e_d \frac{C(z)^d}{d!} = e_0 + e_1\frac{1}{1!}\left(c_1\frac{z^1}{1!} + c_2\frac{z^2}{2!} + \dots \right)^1 + e_2\frac{1}{2!}\left( c_1\frac{z^1}{1!} + c_2\frac{z^2}{2!} + \dots\right)^2 + \dots

is well-defined. Let M(z) := E(C(z)), and expand

M(z) = \sum\limits_{d=0}^\infty m_d \frac{z^d}{d!}.

The problem is to understand the coefficients of M(z) in terms of the coefficients of E(z) and C(z). This problem was posed and solved by Hurwitz in the nineteenth century in connection with what is now a classical problem in enumerative geometry, namely the question of counting branched covers of the Riemann sphere \mathbf{P}^1(\mathbb{C}) with specified ramification data (I hope to touch on this problem more extensively later in the course).

Exercise 1: Compute the first few coefficients of M(z) by hand.

Exercise 2: Does it make sense to do this computation using the chain rule? Why or why not? Do you get correct formulas by this method?

Let’s consider first an easier problem, which well serve as a stepping stone. Consider two series

A(z) = \sum\limits_{d=0}^\infty a_d \frac{z^d}{d!} \quad\text{ and }\quad B(z) = \sum\limits_{d=0}^\infty b_d \frac{z^d}{d!},

and let

P(z) = A(z)B(z) = \sum\limits_{d=0}^\infty p_d \frac{z^d}{d!}

be their product. It is then immediate that

p_d = \sum\limits_{k=0}^d {d \choose k} a_k b_{n-k}.

This algebraic expression also has combinatorial meaning, which becomes clear if we sacrifice its elegance and rewrite it in the garish form

p_d = \sum\limits_{\substack{ (S_A,S_B) \\ S_A,S_B \subseteq [d] \\ S_A \cap S_B = \{\}}} a_{|S_A|}b_{|S_B|}.

The enumerative meaning of this expression is the following. Suppose that a_d is the number of combinatorial structures of Type A which can be built on [d]= \{1,\dots,d\}, and similarly b_d is the number of combinatorial structures of Type B which can be built on [d]. Then, p_d is the number of ways to build a hybrid combinatorial structure of Type AB on [d] by choosing disjoint subsets S_A,S_B \subseteq \{1,\dots,d\} and building a type A structure on S_A and a Type B structure on S_B. This can be iterated: the product of k exponential generating functions A_1(z),A_2(z),\dots,A_k(z) is an exponential generating function P(z) whose coefficients p_d count the number of ways to select a list (S_{1},\dots,S_{k}) of disjoin subsets of [d] whose union is [d], and then build a Type A_j structure on S_j for each 1 \leq j \leq k.

Exercise 3: Read this (easy), this (harder), or this (hardest).

We can now state the main theorem on composition of exponential generating functions, M(z)=E(C(z)). A partition of [d] is a set \pi of disjoint nonempty subsets of [d] whose union is d. The elements B of \pi are called its blocks, and we denote by b(\pi) the number of blocks in \pi. Let \mathrm{P}(d) denote the set of all partitions of [d].

Theorem (The Compositional Formula): With notation as above, we have

m_d = \sum\limits_{\pi \in \mathrm{P}(d)} e_{b(\pi)} \prod\limits_{B \in \pi} c_{|B|}.

Proof: Write

M(z) = \sum\limits_{k=0}^\infty e_k \frac{1}{k!} C(z)^k,

use the combinatorial expansion C(z)^k, and reorganize the sum.

— Q.E.D.

Corollary (The Exponential Formula): If E(z) = e^z, then

m_d = \sum\limits_{\pi \in \mathrm{P}(d)} \prod\limits_{B \in \pi} c_{|B|}.

The combinatorial meaning of the Exponential Formula is the following: if C(z) is the exponential generating function for a class of “connected” combinatorial structures The Exponential Formula gets used all over the place — probabilists call it the moment-cumulant formula, while physicists often call it the linked cluster expansion and invoke it using colorful phrases like “connected vacuum bubbles exponentiate.” A basic example is the following: let m_d be the number of simple graphs with vertex set [d], let c_d be the number of connected simply graphs with vertex set [d], and form the generating functions

M(z) = 1 + \sum\limits_{d=1}^\infty m_d\frac{z^d}{d!} \quad\text{ and }\quad C(z) = \sum\limits_{d=1}^\infty c_d \frac{z^d}{d!}.

Then M(z) = e^{C(z)}, and the sequences (m_d)_{d=1}^\infty and (c_d)_{d=1}^\infty are related as in the Exponential Formula.

Now we return to Problem 1 above, which is to compute the coefficients m_d(N) in the expansion

e^{-N\sum_{d=3}^\infty (-1)^d \frac{w^d}{d}} = \sum\limits_{d=0}^\infty m_d(N) \frac{w^d}{d!}.

We do this using the Exponential Formula: we have

e^{-N\sum_{d=3}^\infty (-1)^d \frac{w^d}{d}} = e^{C(w)},

where

C(w) = \sum\limits_{d=3}^\infty (-N)(-1)^d(d-1)! \frac{w^d}{d!}.

The Exponential Formula now tells us that the coefficients m_d(N) we seek are given by

m_d(N) = \sum\limits_{\pi \in \mathrm{P}(d)} \prod\limits_{B \in \pi} (-N)(-1)^{|B|} \delta_{|B| \geq 3}(|B|-1)!.

Thus, we have solved Problem 1: we see that

m_d(N) = (-1)^d \sum\limits_{k=1}^{d} (-N)^k t_k(d),

with t_k(d) the number of permutations of 1,\dots,d consisting of k cycles, each of length at least 3.

Exercise 4: Do the same computation instead using the Compositional Formula with E(z) = e^{-Nz}.

Math 262A: Lecture 2

We continue with the estimation of N! for large N via Euler’s integral,

N! = \int_0^\infty t^N e^{-t} \mathrm{d}t.

As at the end of Lecture 1, we make the substitution t=Nx, thereby obtaining

N! = \int_0^\infty (Nx)^N e^{-Nx} N\mathrm{d}x = N^{N+1} \int_0^\infty e^{-N(x-\log x)} \mathrm{d}x.

Now, one way to characterize an algebraic combinatorialist is to say that such a person loathes \log x, this being some horrible transcendental thing, but loves \log(1+x), this being an exponential generating function for cyclic permutations:

\log(1+x) = \sum\limits_{d=1}^\infty (-1)^{d-1} (d-1)! \frac{x^d}{d!} = \sum\limits_{d=1}^\infty (-1)^{d-1} \frac{x^d}{d}.

Accordingly, we make the further change of variables x=1+w, which gets us to the formula

N!= \frac{N^{N+1}}{e^N} \int_{-1}^\infty e^{-N(w-\log (1+w))} \mathrm{d}w.

We now focus on the problem of determining the N \to \infty asymptotics of

I_N = \int_{-1}^\infty e^{-NS(w)} \mathrm{d}w,

where the “action” S(w) = w-\log(1+w) is a smooth function on (-1,\infty), the Maclaurin series of which is

S(w) = \frac{w^2}{2} - \frac{w^3}{3} + \frac{w^4}{4} - \dots.

Exercise 1: What is the radius of convergence of this series?

Having solved Exercise 1 (which is singularly easy), you know that the series expansion of S(w) is only valid in a neighborhood of w=0, so ostensibly is not of any help in the estimation of I_N, since the integrand involves S(w) with arbitrarily large inputs w. On the other hand, we’re trying to work out an approximation, not perform an exact evaluation, so perhaps we can get away with something local. Indeed, the shape of S indicates that this might well be the case. Since

S'(w) = 1- \frac{1}{1+w},

our action S has a unique stationary point at w=0, and the value S(0)=0 is the global minimum of S on its domain of definition, (-1,\infty).

Plot of S(w) near w=0.

This means that e^{-NS(0)}=1 is the global maximum of e^{-NS(w)}, the integrand of I_N: we have

0 < e^{-NS(w)} < 1 \quad \forall w \in (-1,\infty)-\{0\},

In particular the integrand of I_N is exponentially smaller than 1 for all w \neq 0. This means that the integrand of I_N is sharply peaked at the unique minimizer w=0 of the action S(w), and the larger N is, the more exaggerated this effect becomes, as in the plots below.

Plot of e^{-10S(w)} near w=0.
Plot of e^{-1000S(w)} near w=0.

Accordingly, we expect that as N \to \infty the integral I_N “localizes” at the stationary point of the action meaning that

I_N \approx e^{-NS(0)}

might not be such a bad approximation. Let us investigate this more closely: we fix \varepsilon >0, and seek to estimate the integral

\int_{-\varepsilon}^{+\varepsilon} e^{-NS(w)} \mathrm{d}w.

Being combinatorialists, we will count the complement, meaning that we will try to estimate the above integral by instead estimating

\int_{-1}^{-\varepsilon} e^{-NS(w)} \mathrm{d}w \quad\text{ and }\quad \int_{+\varepsilon}^{+\infty} e^{-NS(w)} \mathrm{d}w.

To estimate the left integral, observe that left of zero we have

S(w) > \frac{w^2}{2},

and hence

\int_{-1}^{-\varepsilon} e^{-NS(w)} \mathrm{d}w < \int_{-1}^{-\varepsilon} e^{-\frac{N}{2}w^2} \mathrm{d}w < \int_{-\infty}^{-\varepsilon} e^{-\frac{N}{2}w^2} \mathrm{d}w,

where the rightmost integral is convergent (say, by comparison with \int_{-\infty}^{-\varepsilon} e^{-\frac{N}{2}w} \mathrm{d}w). To control the rightmost integral, we can observe that

e^{-\frac{N}{2}w^2} = e^{-\frac{N}{4}w^2}e^{-\frac{N}{4}w^2} \leq e^{-\frac{N}{4}\varepsilon^2}e^{-\frac{N}{4}w^2}

for w \in (-\infty,-\varepsilon), so that we have the bound

\int_{-1}^{-\varepsilon} e^{-NS(w)} \mathrm{d}w < C_ Ne^{-\frac{N}{4}\varepsilon^2}

where

C_N = \int_{-\infty}^{-\varepsilon} e^{-\frac{N}{4}w^2} \mathrm{d}w < \infty

satisfies

\lim\limits_{N \to \infty} C_N=0

by the Bounded Convergence Theorem. In particular, we can conclude that

\int_{-1}^{-\varepsilon} e^{-NS(w)} \mathrm{d}w = O(e^{-\varepsilon N}).

Now we estimate the \int_{+\varepsilon}^{+\infty}-integral using a similar strategy Since S(w) is strictly increasing for w>0, we have

e^{-NS(w)} = e^{-\frac{N}{2}S(w)}e^{-\frac{N}{2}S(w)} \leq e^{-\frac{N}{2}S(\varepsilon)}e^{-\frac{N}{2}S(w)}

for w \in (\epsilon,\infty). Consequently, we have the bound

\int_{+\varepsilon}^{+\infty} e^{-NS(w)} \mathrm{d}w < C_N(\varepsilon)e^{-\frac{N}{2}S(\varepsilon)},

where

C_N(\varepsilon) = \int_{+\varepsilon}^{+\infty} e^{-\frac{N}{2}S(w)} \mathrm{d}w <\infty

is a convergent integral, and moreover

\lim\limits_{N \to \infty} C_N(\varepsilon) = 0

by the Bounded Convergence Theorem. So we similarly have that

\int_{+\varepsilon}^\infty e^{-NS(w)} \mathrm{d}w = O(e^{-\frac{S(\varepsilon}{2} N}).

We thus conclude that

I_N = \int_{-\varepsilon}^{+\varepsilon} e^{-NS(w)} \mathrm{d}w + \text{exponentially decaying error in }N.

Just to remind ourselves where we are vis-a-vis our principal goal of approximation N!, we have arrived at the estimate

N! = \frac{N^{N+1}}{e^N}\left( \int_{-\varepsilon}^{+\varepsilon} e^{-NS(w)} \mathrm{d}w + \text{exponentially decaying error in }N\right).

It remains to estimate the \int_{-\varepsilon}^{+\varepsilon} integral. This is ostensibly more difficult than what we did above, since it apparently requires some actual insight into the behavior of S(w). On the other hand, we are now reduced to considering the behavior of the action on a small interval where we can represent it as a series: we have that

\int_{-\varepsilon}^{+\varepsilon} e^{-NS(w)} \mathrm{d}w = \int_{-\varepsilon}^{+\varepsilon} e^{-N\sum_{d=2}^\infty (-1)^d \frac{w^d}{d}} \mathrm{d}w

for all \varepsilon sufficiently small. Now, since

S(w) = \frac{w^2}{2} + O(w^3) \quad\text{ as }w \rightarrow 0,

we expect a further approximation of the form

\int_{-\varepsilon}^{+\varepsilon} e^{-NS(w)} \mathrm{d}w \approx \int_{-\varepsilon}^{+\varepsilon} e^{-\frac{N}{2}w^2} \mathrm{d}w.

Of course, such an approximation is not exactly true, and we have to understand how accurate it actually is in order to proceed rigorously. For now, however, let us see what would happen if we had the exact identity

\int_{-\varepsilon}^{+\varepsilon} e^{-NS(w)} \mathrm{d}w ``="\int_{-\varepsilon}^{+\varepsilon} e^{-\frac{N}{2}w^2} \mathrm{d}w,

where the quotes are meant to indicate that we are considering a hypothetical truth which we know to be false, but might still be conceptually useful.

Consider the integral

\int_{-\varepsilon}^{+\varepsilon} e^{-\frac{N}{2}w^2} \mathrm{d}w,

which we would like to approximate. An ideal outcome would be that we could exactly evaluate it. Unfortunately, even though the integrand here doesn’t look so bad, trying to compute the indefinite integral

\int e^{-\frac{w^2}{2}} \mathrm{d}w

is a fool’s errand, being in some sense analogous to trying to solve the quintic. On the other hand, by the same sort of rough estimates we have been using so far in this lecture, we have

\int_{-\varepsilon}^{+\varepsilon} e^{-\frac{N}{2}w^2} \mathrm{d}w =\int_{-\infty}^{+\infty} e^{-\frac{N}{2}w^2} \mathrm{d}w + \text{exponentially decaying error in }N.

The integral

\int_\mathbb{R} e^{-x^2} \mathrm{d}x,

which is known as the Gaussian integral, can in fact be exactly evaluated — not using the Fundamental Theorem of Calculus, but by exploiting symmetry. The answer is

\int_\mathbb{R} e^{-x^2} \mathrm{d}x = \sqrt{\pi},

a remarkable formula which is the basis of various amusing proclamations, including Lord Kelvin‘s assertion that “a mathematician is one to whom this is as obvious as that twice two makes four is to you.” I leave you to either try the derivation of this formula on your own, or look it up somewhere.

Changing variables in the Gaussian integral, we get the evaluation we actually need:

\int_{-\infty}^{+\infty} e^{-\frac{N}{2}w^2} \mathrm{d}w = \sqrt{\frac{2\pi}{N}}.

Following through on our thought experiment, this would imply the approximation

N! = \sqrt{2\pi}\frac{N^{N+\frac{1}{2}}}{e^N}\left(1 + \text{exponentially decaying error in }N\right).

The meat of this formula is correct, but the error term is the result of our wishful thinking, and it is false — we will correct it in Lecture 3.

Math 262A: Lecture 1

The goal of this topics course is to introduce you to asymptotic algebraic combinatorics. Each of these words is familiar individually, but when combined in this order the resulting phrase is ambiguous; indeed, looking at the list of lectures from a recent conference dedicated to asymptotic algebraic combinatorics, it is clear that it means different things to different people.

In this course, we will try to get a feeling for what asymptotic algebraic combinatorics is about by going on a quest: we are going to start at Stirling’s approximation, which might be considered the historically first result in asymptotic algebraic combinatorics, and from there forge a path to what physicists simply call “Large N,” which has something to do with quantum field theory. There will be various (hopefully edifying) digressions along the way, sometimes leading to unsolved problems — for example, I hope to discuss the pattern-restricted factorial, for which an analogue of Stirling’s formula is only sometimes known, and the Bhargava factorial, for which no analogue of Stirling’s formula is known at all. Let’s start the quest.

We want to estimate N!. Let’s start by writing down the stupidest thing we can think of — that way, things can only get better.

Proposition 1: We have

1 \leq N! \leq N^N.

Proof: By definition,

N! = \prod\limits_{x=1}^N x.

On the interval [1,N], the linear function l(x) = x has a minimum value of 1 at x=1, and a maximum value of N at x=N.

— Q.E.D.

To improve on the shameful bounds in Proposition 1, we can use a multiplicative version of Gauss’s famous “replica trick” for summing consecutive numbers.

Proposition 2: We have

N^{N/2} \leq N! \leq \left(\frac{N+1}{2} \right)^N.

Proof: Take two copies of N!, multiply them together

N! \cdot N! = \prod\limits_{x=1}^Nx \cdot \prod\limits_{x=1}^N x,

write the second product backwards,

N! \cdot N! = \prod\limits_{x=1}^Nx \cdot \prod\limits_{x=1}^N (N-x+1),

and interlace the factors of the two products to get

(N!)^2 = \prod\limits_{x=1}^N x(N-x+1).

The minimum of the quadratic function

q(x) = x(N-x+1) = -(x-\frac{1}{2}(N+1))^2 +\frac{1}{4}(N+1)^2

on the interval [1,N] occurs at x=1, with

q(1) = N,

and the maximum occurs at x=\frac{1}{2}(N+1), with

q(\frac{1}{2}(N+1))=\frac{1}{4}(N+1)^2.

— Q.E.D.

Problem 1: Suppose we use k replicas of N!, meaning that we write (N!)^k = \prod\limits_{x=1}^N \prod\limits_{j=1}^k\pi_j(x), where \pi_1,\dots,\pi_k \in \mathrm{S}(N) may be any permutations. Anything to be gained from this?

The fact that we now have a non-trivial lower bound on N! has some nice consequences. In particular, let us consider the prime factorization of N!,

N! = \prod\limits_p p^{e_p(N!)},

where the product is over distinct primes p and e_p(N!) is the multiplicity of p in N!, which is given by Legendre’s formula:

e_p(N!) = \sum\limits_{k=0}^\infty \lfloor \frac{N}{p^k} \rfloor.

Since floors are lower than chairs and tables, we get the estimate

e_p(N!) < \sum\limits_{k=0}^\infty \frac{N}{p^k}=\frac{N}{p-1},

so that the “isotypic component” of N! corresponding to the prime p is

p^{e_p(N!)} < p^{\frac{N}{p-1}}.

Now, since p \leq 2^{p-1}, we have

p^{e_p(N!)} < 2^N,

which gives us the estimate

N! < 2^{N\pi(N)}

with \pi(N) the number of primes less than or equal to N. In order for this to comport with the lower bound in Proposition 2, we get

\pi(N) > \frac{1}{2} \log_2 N,

which isn’t great from the perspective of actually counting primes, but it does tell you that there are infinitely many.

Another approach to estimating factorials is to use calculus: instead of squaring N!, we take its logarithm and interpret the resulting sum as a Riemann sum.

Proposition 3: We have

e\left(\frac{N}{e}\right)^N \leq N! \leq eN\left(\frac{N}{e}\right)^N.

Proof: Write

\sum\limits_{x=1}^{N-1} \log(x) \leq \int\limits_1^N \log(x) \mathrm{d}x \leq \sum\limits_{x=1}^{N} \log(x),

where the lower and upper bounds are, respectively, lower and upper Riemann sums for the integral between them. This integral can be evaluated:

\int\limits_1^N \log(x) \mathrm{d}x = N \log N - N +1.

The upper bound on the integral thus yields

e^{N \log N - N +1} \leq N!,

which is the claimed lower bound on N!. The lower bound on the integral yields

(N-1)! \leq e^{N \log N - N +1},

and multiplying through by N we get the claimed upper bound on N!.

— Q.E.D.

Problem 2: Which is better — Proposition 2 or Proposition 3?

Instead of comparing N! to an integral, we can represent it as an integral.

Proposition 4: We have

N! = \int_0^\infty t^N e^{-t} \mathrm{d}t.

Proof: This can be proved by induction on N. For N=0, we have

\int_0^\infty e^{-t} \mathrm{d}t = -e^{-t}|_{t=0}^{t=\infty}=1 = 0!.

Now, for any N \in \mathbb{N}, integrating by parts we have

\int t^N e^{-t} \mathrm{d}t = -t^Ne^{-t} - N\int t^{N-1} e^{-t}\mathrm{d}t.

Since f(t)=z^Ne^{-t} satisfies f(0)=0 and \lim_{t \to \infty} f(t) =0, this gives us

\int_0^\infty t^N e^{-t} \mathrm{d}x = N\int_H t^{N-1} e^{-t} \mathrm{d}t = N(N-1)! = N!.

— Q.E.D.

The integral representation of N! given by Proposition 4 was found by Euler, who wanted to define z! for z a complex variable. Indeed, the integral

z! := \int_0^\infty t^z e^{-t} \mathrm{d}t

converges to define a holomorphic function on the half plane H=\{ z\in \mathbb{C} \colon \Re z >0\}, so that we have an analytic factorial function on this domain. In some sense, this is the only way to define z! (precise statements are the Bohr-Mollerup Theorem and Wielandt’s Theorem).

This is not the direction we’re headed. For us, the main significance Euler’s integral representation is that it can be re-written as follows. Making the change of variables x=\frac{1}{N}t, we get

N! = N^{N+1} \mathcal{Z}_N(S),

where

Z_N(S) = \int_\mathbb{R} e^{-NS(x)} \mathrm{d}x,

where S(x) = x - \log x for x>0 and S(x)=\infty for x \leq 0. What we want to do is to consider the generalization of the factorial function obtained by considering all integrals of the form Z_N(S) with

S \colon \mathbb{R} \to \mathbb{R}.

a given function called the “action.” This term comes from physics, and is related to the fact that Z_N(S) is the partition function a zero-dimensional quantum field theory with action S. We will see that, for S drawn from a broad class of functions, the N \to \infty behavior of Z_N(S) can be understood in a uniform way using a robust technique known as the Laplace method. Eventually, we will see that all such integrals admit asymptotic expansions whose coefficients can be understood in terms of certain graphs known as Feynman diagrams. Next time, we’ll work out Stirling’s formula from the integral representation of N!, and start developing Laplace’s method in general.

Math 31AH: Lecture 27

Let \mathbf{V} be a two-dimensional Euclidean space with orthonormal basis E=\{\mathbf{e}_1,\mathbf{e}_2\}. Let A \in \mathrm{End}\mathbf{V} be the operator defined by

A\mathbf{e}_1 = \mathbf{e}_1+\mathbf{e}_2,\ A\mathbf{e}_1 = -\mathbf{e}_1+\mathbf{e}_2,

so that

[A]_E = \begin{bmatrix} 1 & -1 \\ 1 & 1 \end{bmatrix}.

Geometrically, the operator A acts on vectors in \mathbf{V} by rotating them counterclockwise through an angle of 45^\circ, and then scaling it by \sqrt{2}. It is geometrically clear that A has no eigenvalues: rotating any vector by 45^\circ results in a new vector which is not a scalar multiple of the original vector. Right? Maybe not, now that we know about complex numbers.

By definition, a nonzero vector \mathbf{v} \in \mathbf{V} is an eigenvector of A if and only if we have R\mathbf{v}= \lambda \mathbf{v} for some scalar \lambda. This is the same thing as saying that the vector \mathbf{v} belongs to the kernel of the operator A-\lambda I, where I is the identity operator. This is in turn equivalent to saying that the kernel of A contains the nonzero vector \mathbf{v}, which means that A is not invertible, which in turn means that \det(A-\lambda I)=0. This chain of reasoning is true in general, i.e. we have the following general proposition.

Proposition 1: The eigenvalues of an operator A \in \mathrm{End}\mathbf{V} are exactly the scalars \lambda such that \det(A-\lambda I)=0.

So, according to Proposition 1, to find the eigenvalues of an operator A, we can try to solve the characteristic equation

\det(A-\lambda)I = 0.

The “unknown” in this equation is \lambda.

Let us write down the characteristic equation of the rotation operator A defined above. We have

(A-\lambda I)^{\wedge 2} \mathbf{e}_1 \wedge \mathbf{e}_2 \\ = (A-\lambda I) \mathbf{e}_1 \wedge (A-\lambda I)\mathbf{e}_2 \\ = (A\mathbf{e}_1-\lambda\mathbf{e}_1) \wedge (A\mathbf{e}_2 - \lambda \mathbf{e}_2) \\  =(\mathbf{e}_1+\mathbf{e}_2-\lambda\mathbf{e}_1)\wedge (-\mathbf{e_1}+\mathbf{e}_2-\lambda\mathbf{e}_2) \\ = ((1-\lambda)\mathbf{e}_1 + \mathbf{e}_2) \wedge (-\mathbf{e}_1+(1-\lambda)\mathbf{e}_2) \\ = (1-\lambda)^2\mathbf{e}_1 \wedge \mathbf{e}_2 -\mathbf{e}_2 \wedge \mathbf{e}_1 \\ = \left( (\lambda-1)^2+1 \right)\mathbf{e}_1 \wedge \mathbf{e}_2,

so that \det(A-\lambda I) = (\lambda-1)^2+1 and the characteristic equation of A is

(\lambda-1)^2+1=0.

There is no number \lambda \in \mathbb{R} which solves the characteristic equation, since for any such number the LHS is the sum of a nonnegative number and a positive number, which is a nonzero quantity. However, if we widen our scope of the number concept, this equation does have solutions, corresponding to the fact that

i^2+1=0 \text{ and } (-i)^2+1=0,

where i \in \mathbb{C} is the imaginary unit, as introduced in Lecture 26. That is, while the characteristic equation has no real solutions, it has the two distinct complex solutions

\lambda_1 = 1+i \text{ and } \lambda_2 = 1-i.

In fact, this is always the case: the main advantage of complex vector spaces is that operators on such spaces always have eigenvalues.

Theorem 1: If A \in \mathrm{End}V is a linear operator on an n-dimensional complex vector space, then A has n (not necessarily distinct) complex eigenvalues \lambda_1,\dots,\lambda_n.

Proof: The eigenvalues of A are the solutions of the characteristic equation, \det(A-\lambda I)=0. Now, since

(A-\lambda I)\mathbf{e}_1 \wedge \dots \wedge (A-\lambda I)\mathbf{e}_n = \det(A-\lambda)I \mathbf{e}_1 \wedge \dots \wedge (A-\lambda I)\mathbf{e}_n,

where \{\mathbf{e}_1,\dots,\mathbf{e}_n\} is any basis of \mathbf{V}, the determinant \det(A-\lambda I) is a polynomial function of \lambda, and the highest degree term of which is (-1)^n \lambda^n. But the fundamental theorem of algebra says that every polynomial of degree n has n (not necessarily distinct) roots in \mathbb{C}.

— Q.E.D.

The above is saying that if we consider the rotation operator A of our example as on operator on a complex vector space, then it does have eigenvalues, even though it did not when considered as an operator on a real vector space. Now comes the question of what the eigenvectors corresponding to these eigenvalues are. In order for the solutions of the characteristic equation to actually correspond to eigenvalues of the operator A, there must be nonzero vectors \mathbf{f}_1,\mathbf{f}_2 \in \mathbf{V} such that

A\mathbf{f}_1 = (1+i)\mathbf{v}_1 \text{ and } A\mathbf{v}=(1-i)\mathbf{f}_2.

Let us see if we can actually calculate \mathbf{f}_1 and \mathbf{f}_2. We have that

[A-\lambda_1I]_E = \begin{bmatrix} 1-(1+i) & -1 \\ 1 & 1-(1+i) \end{bmatrix} = \begin{bmatrix} -i & -1 \\ 1 & -i \end{bmatrix}.

Thus, \mathbf{f}_1=x\mathbf{e}_1+y\mathbf{e}_2 satisfies A\mathbf{f}_1=\lambda_1\mathbf{f}_1 if and only if x,y \in \mathbb{C} are complex numbers, not both zero, such that

\begin{bmatrix} 1-(1+i) & -1 \\ 1 & 1-(1+i) \end{bmatrix} = \begin{bmatrix} -i & -1 \\ 1 & -i \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \end{bmatrix},

or equivalently

ix+y = 0\\ x-iy=0.

By inspection, x=i and y=1 are solutions to the above equations, whence

\mathbf{f}_1 = i\mathbf{e}_1 + \mathbf{e}_2

is an eigenvector of A. Similarly, \mathbf{f}_2=x\mathbf{e}_1+y\mathbf{e}_2 satisfies A\mathbf{f}_2=\lambda_2\mathbf{f}_2 if and only if x,y \in \mathbb{C} are complex numbers, not both zero, such that

\begin{bmatrix} 1-(1-i) & -1 \\ 1 & 1-(1-i) \end{bmatrix} = \begin{bmatrix} i & -1 \\ 1 & i \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \end{bmatrix},

or equivalently

ix-y=0 \\ x+iy=0.

By inspection, x=i and y=-1 are solutions of these equations, whence

\mathbf{f}_2=i\mathbf{e}_1-\mathbf{e}_2

is an eigenvector of A. Now, what is the whole point of calculating eigenvalues and eigenvectors? Well, if F=\{\mathbf{f}_1,\mathbf{f}_2\} is a basis of \mathbf{V}, then we will have found that the matrix of the operator A in this basis is diagonal,

[A]_F = \begin{bmatrix} 1+i &0 \\ 0 & 1-i \end{bmatrix},

which is a more convenient matrix representation of A then that given by [A]_E, since we can use it to easily do computations with A. So, we wish to show that F = \{\mathbf{f}_1,\mathbf{f}_2\} is a linearly independent set in \mathbf{V}. This follows immediately if \mathbf{f}_1,\mathbf{f}_2 are orthogonal, so let’s see if we get lucky:

\langle \mathbf{f}_1,\mathbf{f}_2 \rangle = \langle i\mathbf{e}_1+\mathbf{e}_2,i\mathbf{e}_1 - \mathbf{e}_2 \rangle = i^2\langle \mathbf{e}_1,\mathbf{e}_1 \rangle - \langle \mathbf{e}_2,\mathbf{e}_2\rangle = -1 -1 = -2 \neq 0.

We didn’t get lucky, and worse than that this scalar product calculation suggests that something unpleasant happens when we start computing scalar products with complex numbers. Indeed, if we modify the above calculation by computing the scalar product of \mathbf{f}_1 with itself, we find that

\langle \mathbf{f}_1,\mathbf{f}_1 \rangle = \langle i\mathbf{e}_1+\mathbf{e}_2,i\mathbf{e}_1 + \mathbf{e}_2 \rangle = i^2\langle \mathbf{e}_1,\mathbf{e}_1 \rangle + \langle \mathbf{e}_2,\mathbf{e}_2\rangle = -1 +1 = 0.

This is disturbing, since it says that the nonzero vector \mathbf{f}_1 is orthogonal to itself, or equivalently that it has zero length, \|\mathbf{f}_1 \|=0. The original of this problem is that, unlike squares of real numbers, squares of complex numbers can be negative. We have to modify the scalar product for complex vector spaces to accommodate this, we insist that a complex scalar product \langle \cdot,\cdot \rangle is an antilinear function of its first argument:

\langle z_1 \mathbf{v}_1 + z_2 \mathbf{v}_2, \mathbf{w} \rangle = \overline{z}_1\langle \mathbf{v}_1, \mathbf{w} \rangle + \overline{z}_2\langle \mathbf{v}_2, \mathbf{w} \rangle,

where if z=x+yi then \bar{z} = x-yi is the complex conjugate of z. Complex vector spaces with which come with a complex scalar product are the complex version of Euclidean spaces, and they have a special name.

Definition 1: A Hilbert space is a pair (\mathbf{V},\langle \cdot,\cdot \rangle) consisting of a complex vector space \mathbf{V} together with a complex scalar product.

Continuing with our running example, let us re-compute the inner product of the eigenvectors \mathbf{f}_1,\mathbf{f}_2 of the rotation operator that we found above. Interpreting \langle \cdot,\cdot \rangle as a complex inner product, we now find that

\langle \mathbf{f}_1,\mathbf{f}_2 \rangle = \langle i\mathbf{e}_1+\mathbf{e}_2, i\mathbf{e}_1-\mathbf{e}_2 \rangle = \langle i\mathbf{e}_1,i\mathbf{e}_1 \rangle + \langle \mathbf{e}_2,-\mathbf{e}_2 \rangle = \bar{i}i \langle \mathbf{e}_1,\mathbf{e}_1 \rangle - \langle \mathbf{e}_2,-\mathbf{e}_2 \rangle = -ii-1 = 1-1 =0,

so that \mathbf{f}_1,\mathbf{f}_2 actually are orthogonal with respect to the complex scalar product on \mathbf{V} in which the basis E=\{\mathbf{e}_1,\mathbf{e}_2\} is orthonormal. Thus F=\{\mathbf{f}_1,\mathbf{f}_2\} is a basis of the complex vector space \mathbf{V} consisting of eigenvectors of the operator A, meaning that while A has no eigenvalues or eigenvectors when considered as an operator on a Euclidean space, it is in fact semisimple when considered as an operator on a Hilbert space.

Even though they may initially seem more complicated, Hilbert spaces are actually easier to work with than Euclidean spaces — linear algebra runs more smoothly over the complex numbers than over the real numbers. For example, it is possible to give a succinct necessary and sufficient criterion for an operator A on a (finite-dimensional) Hilbert space \mathbf{V} to be semisimple. As in the case of a Euclidean space, define the adjoint of A to be the unique operator A^* \in \mathrm{End}\mathbf{V} such that

\langle A^*\mathbf{v},\mathbf{w} \rangle = \langle \mathbf{v},A\mathbf{w}\rangle \quad \forall\ \mathbf{v},\mathbf{w} \in \mathbf{V}.

Theorem 2: A is a semisimple operator if and only if it commutes with its adjoint, meaning that A^*A=AA^*.

I regret that we will not have time to prove this Theorem.

Math 31AH: Lecture 26

For a long time, we have been skirting around the issue of whether or not it is possible to multiply vectors in a general vector space \mathbf{V}. We gave two answers to this question which are not really answers at all: we discussed ways to multiply vectors to get products which do not lie in the same vector space as their factors.

First, we showed that it is possible to multiply vectors in \mathbf{V} in such a way that the product of any two vectors \mathbf{v},\mathbf{w} is a number \langle \mathbf{v},\mathbf{w} \rangle. This sort of multiplication is what we termed a “bilinear form” on \mathbf{V}. The best bilinear forms are those which satisfy the scalar product axioms, because these allow us to talk about lengths of vectors and angles between vectors in \mathbf{V}. However, the bilinear form concept doesn’t answer the original question about multiplying vectors, because the product of \mathbf{v} and \mathbf{w} belongs to the vector space \mathbb{R}, which is probably not \mathbf{V}.

Second, we found that it is possible to multiply vectors in \mathbf{V} in such a way that the product of any two vectors \mathbf{v}_1,\mathbf{v}_2 is a tensor, namely the tensor \mathbf{v}_1 \otimes \mathbf{v}_2. This is useful because it ultimately led us to a related product, the wedge product \mathbf{v} \wedge \mathbf{w}, which allowed us to efficiently characterize linear independence and to introduce a notion of volume in \mathbf{V}. However, it again doesn’t answer the original question about multiplying vectors, because the product of \mathbf{v}_1 and \mathbf{v}_2 belongs to the vector space \mathbf{V} \otimes \mathbf{V}, which is definitely not \mathbf{V}.

Today, we will finally investigate the question of how to multiply two vectors to get a vector in the same space. We now have the tools to discuss this quite precisely.

Defintion 1: Given a vector space \mathbf{V}, a multiplication in \mathbf{V} is a linear transformation

M \colon \mathbf{V} \otimes \mathbf{V} \to \mathbf{V}.

It is reasonable to refer to an arbitrary linear transformation M \in \mathrm{Hom}(\mathbf{V} \otimes \mathbf{V}) as a multiplication because every such M possesses the fundamental property of multiplication that we refer to as bilinearity: it satisfies the FOIL identity

M\left((x_1\mathbf{v}_1 + y_1\mathbf{w}_1) \otimes (x_2\mathbf{v}_2 + y_2\mathbf{w}_2)\right)\\ =x_1x_2M(\mathbf{v}_1 \otimes \mathbf{v}_2) + x_1y_2M(\mathbf{v}_1\otimes \mathbf{w}_2) + x_2y_1M(\mathbf{v}_2 \otimes \mathbf{w}_1) + x_2y_2M(\mathbf{v}_2 \otimes \mathbf{w}_2).

Indeed, this is true precisely because \mathbf{V} \otimes \mathbf{V} was constructed as the vector space of all “unevaluated” products of vectors multiplied according to an unspecified bilinear multiplication \otimes, and the linear transformation M performs the missing evaluation.

We now see that there are many ways to multiply vectors — too many. Indeed, suppose \mathbf{V} is an n-dimensional vector space, and let E=\{\mathbf{e}_1,\dots,\mathbf{e}_n\} be a basis in \mathbf{V}. Then, a basis for \mathbf{V} is given by E \otimes E = \{\mathbf{e}_i \otimes \mathbf{e}_j \colon 1 \leq i,j \leq n\}, and hence every multiplication M \in \mathrm{Hom}(\mathbf{V} \otimes \mathbf{V}) uniquely corresponds to an n \times n^2 table of numbers, namely the matrix [M]_{E \otimes E,E}. But not all of these make for interesting multiplication rules. For example, we could choose M \in \mathrm{Hom}(\mathbf{V} \otimes \mathbf{V},\mathbf{V}) to be the zero transformation, which sends every tensor in \mathbf{V} \otimes \mathbf{V} to the zero vector \mathbf{0}_\mathbf{V} in \mathbf{V}. This is a rule for multiplying vectors in \mathbf{V}, but it is accurately described as “trivial.” We would like to find nontrivial multiplication rules which mimic our experience multiplying real numbers.

Defintion 2: A normed division algebra is a pair (\mathbf{V},M) consisting of a Euclidean space \mathbf{V} together with a multiplication M \in \mathrm{Hom}(\mathbf{V} \otimes \mathbf{V},\mathbf{V}) which has the following properties:

  1. There is a vector \mathbf{1} \in \mathbf{V} such that M(\mathbf{1} \otimes \mathbf{v})= M(\mathbf{v} \otimes \mathbf{1}) = \mathbf{v} for all \mathbf{v} \in \mathbf{V}.
  2. For every \mathbf{v} \in \mathbf{V}\backslash \{\mathbf{0}\}, there is a corresponding \mathbf{w} \in \mathbf{V} such that M(\mathbf{v} \otimes \mathbf{w})=M(\mathbf{w} \otimes \mathbf{v}) = \mathbf{1}.
  3. For every \mathbf{v}_1,\mathbf{v}_2 \in \mathbf{V}, we have \|M(\mathbf{v}_1 \otimes \mathbf{v}_2)\| = \|\mathbf{v}_1\| \|\mathbf{v}_2\|.

The axioms above are very natural, and reflect familiar properties of the real number system. The first stipulates that a normed division algebra should contain a multiplicative unit \mathbf{1} analogous to the real number 1 in the sense that multiplication by it does nothing. The second says that any nonzero element in our algebra should have a multiplicative inverse: multiplying an element by its inverse produces the unit element \mathbf{1}. The first says that our algebra has a norm analogous to the absolute value of a real number, in that the norm of a product of two vectors is the product of their norms.

Example 1: Let \mathbf{V} be a one-dimensional Euclidean space with orthonormal basis E=\{\mathbf{1}\}. Let M \in \mathrm{Hom}(\mathbf{V} \otimes \mathbf{V},\mathbf{V}) be the linear transformation uniquely determined by M(\mathbf{1} \otimes \mathbf{1})=\mathbf{1}. Then (\mathbf{V},M) is a normed division algebra (very easy exercise: check that the axioms are satisfied).

Further examining Example 1, we see that the multiplication of arbitrary vectors \mathbf{v}_1,\mathbf{v}_2 \in \mathbf{V} is given by

M(\mathbf{v}_1 \otimes \mathbf{v}_2) = M\left( (\langle \mathbf{1},\mathbf{v}_1\rangle \mathbf{1}) \otimes (\langle \mathbf{1},\mathbf{v}_2\rangle \mathbf{1})\right) = \langle \mathbf{1},\mathbf{v}_1\rangle\langle \mathbf{1},\mathbf{v}_2\rangle M(\mathbf{1} \otimes \mathbf{1}) = \langle \mathbf{1},\mathbf{v}_1\rangle\langle \mathbf{1},\mathbf{v}_2\rangle\mathbf{1}.

So, to multiply two vectors in \mathbf{V}, we simply multiply their coordinates relative to the basis E=\{\mathbf{1}\} using multiplication of real numbers. Thus \mathbf{V} is essentially the same as \mathbb{R}, with the unit vector \mathbf{1} playing the role of the number 1. More precisely, the linear transformation T \colon \mathbf{V} \to \mathbb{R} uniquely determined by T(\mathbf{1})=1 is a vector space isomorphism which respects multiplication, i.e. an algebra isomorphism. In fact, thinking a bit more about this example, we find that every one-dimensional normed division algebra is isomorphic to \mathbb{R}.

Now we construct something new: a two-dimensional normed division algebra. Let \mathbf{V} be a 2-dimensional Euclidean space with orthonormal basis E=\{\mathbf{1},\mathbf{i}\}. Let M \in \mathrm{Hom}(\mathbf{V} \otimes \mathbf{V},\mathbf{V}) be the linear transformation defined by

M(\mathbf{1} \otimes \mathbf{1}) = \mathbf{1},\     M(\mathbf{1} \otimes \mathbf{i}) = \mathbf{i},\ M(\mathbf{i} \otimes \mathbf{1}) = \mathbf{i},\ M(\mathbf{i} \otimes \mathbf{i}) = -\mathbf{1}.

Thus for any two vectors \mathbf{v}_1 = x_1\mathbf{1} + y_1\mathbf{i} and \mathbf{v}_2 = x_2\mathbf{1} + y_2\mathbf{i} we have

M(\mathbf{v}_1 \otimes \mathbf{v}_2) \\ = M\left((x_1\mathbf{1} + y_1\mathbf{i}) \otimes (x_2\mathbf{1} + y_2\mathbf{i}) \right) \\ =x_1x_2 M(\mathbf{1} \otimes \mathbf{1}) + x_1y_2 M(\mathbf{1} \otimes \mathbf{i}) + x_2y_1 M(\mathbf{i} \otimes \mathbf{1}) + y_1y_2 M(\mathbf{i} \otimes \mathbf{i}) \\ = x_1x_2\mathbf{1} + x_1y_2 \mathbf{i} + x_2y_1 \mathbf{i} - y_1y_2\mathbf{1} \\ = (x_1x_2- y_1y_2)\mathbf{1} + (x_1y_2+x_2y_1)\mathbf{i}.

One nice aspect of M that is clear from the above computation is that M(\mathbf{v}_1 \otimes \mathbf{v}_2) = M(\mathbf{v}_2 \otimes \mathbf{v}_1), meaning that M defines a commutative multiplication (this is an extra property not required by the normed division algebra axioms).

Theorem 1: The algebra (\mathbf{V},M) constructed above is a normed division algebra.

Proof: We have to check the axioms. First, for any vector \mathbf{v}=x\mathbf{1}+y\mathbf{i}, we directly compute that

M(\mathbf{1} \otimes \mathbf{v}) = \mathbf{v},

so that \mathbf{1} is a multiplicative identity. Second, we have to show that \mathbf{v} has a multiplicative inverse, provided \mathbf{v} \neq \mathbf{0}. Let \mathbf{v}^*= x\mathbf{1}-y\mathbf{i}. We then have

M(\mathbf{v} \otimes \mathbf{v}^*) = (x^2+y_2)\mathbf{1} = \|\mathbf{v}\|^2\mathbf{1}.

Now \|\mathbf{v}\| \neq 0 since \mathbf{v} \neq \mathbf{0}, and hence we have that

M(\mathbf{v} \otimes \frac{1}{\|\mathbf{v}\|^2}\mathbf{v}^*) = \mathbf{1},

which shows that \mathbf{v} has the multiplicative inverse \frac{1}{\|\mathbf{v}\|^2}\mathbf{v}^*. Third and finally, we have

\|M(\mathbf{v}_1 \otimes \mathbf{v}_2)\|^2 \\ = \|(x_1x_2- y_1y_2)\mathbf{1} + (x_1y_2+x_2y_1)\mathbf{i}\|^2\\ = (x_1x_2- y_1y_2)^2 + (x_1y_2+x_2y_1)^2 \\ = x_1^2y_1^2 - 2x_1x_2y_1y_2 + y_1^2y_2^2 + x_1^2y_2^2 + 2x_1x_2y_1y_2 + x_2^2y_1^2 \\ = (x_1^2+x_2^2)(y_1^2+y_2^2) \\= \|\mathbf{v}_1\|^2\|\mathbf{v}_2\|^2,

whence

\|M(\mathbf{v}_1 \otimes \mathbf{v}_2)\|= \|\mathbf{v}_1\|\|\mathbf{v}_2\|.

— Q.E.D.

You have probably recognized by now that the above construction has produced the algebra of complex numbers (it is fine if you were not previously familiar with this term). Indeed, taking our Euclidean space \mathbf{V} to be \mathbf{V}=\mathbb{R}^2 with orthonormal basis \mathbf{1}=(1,0) and \mathbf{i}=(0,1) gives a simple visualization of this algebra as a rule for multiplying vectors in the Euclidean plane. The complex number system contains and enlarges the real number system, in the sense that \mathbf{R}^2 contains the 1-dimensional subspace

\mathrm{Span}\{\mathbf{1}\} = \{(x,0) \colon x \in \mathbb{R}\},

which is isomorphic to \mathbb{R}. In this context one usually uses the symbol \mathbb{C} instead of \mathbb{R}^2 to indicate that we are considering \mathbb{R}^2 to be not just a vector space, but a normed division algebra with the multiplication described above.

It makes a lot of sense to recalibrate your understanding of the word “number” so that it means “element of \mathbb{C}. Indeed, complex numbers behave just like ordinary real numbers in all the ways that matter: you can add, subtract, multiply, and divide complex numbers in just the way you do real numbers. In order to psychologically prime ourselves for thinking of complex numbers as numbers rather than vectors, we follow the usual notational tradition of un-bolding them. So we just write z \in \mathbb{C} to indicate that z is a complex number, and we write z=x+yi where x,y are ordinary real numbers and i is the “imaginary” unit. Technically, all these symbols mean exactly what they meant above, they’ve just been un-bolded. So, the product of two complex numbers z_1=x_1+y_1i and $z_2=x_2+y_2i$ is

z_1z_2 = (x_1x_2-y_1y_2) + (x_1y_1+x_2y_1)i.

It’s also customary to denote the norm of a complex number using just single lines, and never to calling it “absolute value:”

|z| = |x+yi| = \sqrt{x^2+y^2}.

Once we enlarge our understanding of numbers from real to complex, it is becomes natural to modify our concept of vector space accordingly. Namely, a complex vector space \mathbf{V} is a set together with two operations, vector addition and scalar multiplication, which satisfy exactly the same axioms as Definition 1 in Lecture 1, except with \mathbb{C} replacing \mathbb{R}. We will discuss further consequences of the passage from real vector spaces to complex vector spaces in the next lecture.

Before finishing this lecture, let us briefly consider a natural question which, historically, was one of the main motivating questions in the development algebra: what other normed division algebras might exist? This question was first considered in detail by the Irish mathematician William Rowan Hamilton in the 1800s. In modern terms, Hamilton’s goal was the following: given a 3-dimensional Euclidean space \mathbf{V}, he wanted to find a multiplication rule M \in \mathrm{Hom}(\mathbf{V} \otimes \mathbf{V},\mathbf{V}) which would turn \mathbf{V} into a normed division algebra. The three-dimensional case is of clear interest due to the three physical dimensions of our world; Hamilton was looking for what he called “spatial numbers.” Unfortunately, he wasn’t able to find what he was looking for, because it doesn’t exist. After a long period of trying without results, in 1843 he suddenly realized that his desired construction could be performed in four dimensions, which led him to a new normed division algebra which he called the quaternions.

To construct the quaternions, let \mathbf{V} be a 4-dimensional Euclidean space with orthonormal basis E=\{\mathbf{1},\mathbf{i},\mathbf{j},\mathbf{k}\}, and let M \in \mathrm{Hom}(\mathbf{V} \otimes \mathbf{V},\mathbf{V}) be the multiplication defined by the table

1ijk
11ijk
ii1kj
jjk1i
kkji1
Hamilton’s multiplication table.

In this table, the first row and column contain the basis vectors, and the internal cells contain the result of applying M to the tensor product of the corresponding tensor product of basis vectors. This turns out to give a normed division algebra; however, as you can see from the above table, this algebra is noncommutative. It is denoted \mathbb{H}, in Hamilton’s honor (and also because the symbol \mathbb{Q} is already taken).

It turns out that, in addition to \mathbb{R},\mathbb{C},\mathbb{H}, there is only one more normed division algebra. This algebra is called the octonions, because it consists of a multiplication rule for eight dimensional vectors; it is traditionally denoted \mathbb{O}. It was proved by Adolf Hurwitz that these four constitute the complete list of normed division algebras.

Every time we move up the list of normed division algebras, we lose something. In passing from \mathbb{R} to \mathbb{C}, we lose the fact that the real numbers are ordered: for any two distinct real numbers, it makes sense to say which is smaller and which is larger, but this doesn’t make sense for complex numbers. When we move from the complex numbers to the quaternions, we lose commutativity. When we move from quaternions to the octonions, things get even worse and we lose associativity. This means the following. You may notice that in our definition of algebras, we have only talked about multiplying two vectors. Of course, once we can multiply two, we’d like to multiply three, and four, etc. A multiplication M \in \mathrm{Hom}(\mathbf{V} \otimes \mathbf{V},\mathbf{V}) is said to be associative if

M(M(\mathbf{v}_1 \otimes \mathbf{v}_2),\mathbf{v}_3) = M(\mathbf{v}_1,M(\mathbf{v}_1 \otimes \mathbf{v}_2)).

For associative algebras, unambiguously defining the product of any finite number of vectors is not a problem. However, for octonions, this is not the case.

Math 31AH: Lecture 24

Let us begin with a brief recap of Lecture 23, which introduced a number of new concepts which may seem complex at first, but are in fact rather simple. Let \mathbf{V} be an n-dimensional vector space; the dimension n could be arbitrarily large, so visualizing geometry in \mathbf{V} can be arbitrarily hard. However, no matter what n is, we have a naturally associated 1-dimensional vector space \mathbf{V}^{\wedge n}. We can visualize any 1-dimensional vector space easily, by picturing it as a line. Algebraically, we associate to a given set of n vectors \{\mathbf{v}_1,\dots,\mathbf{v}_n\} of n vectors in \mathbf{V} a point on the line \mathbf{V}^{\wedge n}, namely the point \mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n. The set \{\mathbf{v}_1,\dots,\mathbf{v}_n\} is linearly independent if and only if the point \mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n is not the zero point on the line \mathbf{V}^{\wedge n}. This is pure algebra. If \mathbf{V} is a Euclidean space, meaning that we can measure lengths and angles in \mathbf{V} using a scalar product, then the line \mathbf{V}^{\wedge n} is a Euclidean line, i.e. a line on which we have a notion of distance defined. The volume of the parallelepiped \mathcal{P}(\mathbf{v}_1,\dots,\mathbf{v}_n) \subset \mathbf{V} is then the distance from the point \mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n to zero on the line \mathbf{V}^{\wedge n}. There are only two points at distance one from the origin on the line \mathbf{V}^{\wedge n}, which we call \omega and -\omega. These are analogous to the numbers 1 and -1 on the number line \mathbb{R}. Choosing one of these, say \omega, is what it means to choose an orientation on \mathbf{V}. The oriented volume of the parallelepiped \mathcal{P}(\mathbf{v}_1,\dots,\mathbf{v}_n) is then the unique number a \in \mathbb{R} such that \mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n = a\omega. For example, if \mathbf{V}=\mathbb{R}, choosing \omega =1 means the number line is oriented left to right, and choosing \omega=-1 means the number line is oriented right to left. In the former case, the oriented length of the line segment \mathcal{P}(2) is 2, while in the latter case it is -2. That’s it, more or less.

In this lecture, we use the above machinery to define a function

\det \colon \mathrm{End} \mathbf{V} \to \mathbb{R}

which tells us when a given linear operators A \in \mathrm{End}\mathbf{V} is invertible. This function is called the determinant. All the complexity (i.e. the hard stuff) is concentrated in the definitions from the past two lectures, and if you understand these, then the definition of the determinant is exceedingly simple.

Let A \in \mathrm{End}\mathbf{V} be a given linear operator. We associate to A the linear operator A^{\wedge n} \in \mathrm{End}\mathbf{V}^{\wedge n} defined by

A^{\wedge n}\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n :=A\mathbf{v}_1 \wedge \dots \wedge A\mathbf{v}_n.

Definition 1: The determinant of A is the unique eigenvalue of A^{\wedge n}.

Let us unpack this definition. The key point is that the vector space \mathbf{V}^{\wedge n} is 1-dimensional. This means that every operator in \mathrm{End}\mathbf{V}^{\wedge n} is simply multiplication by a number (see Problem 1 on Assignment 6). We are defining \mathrm{det}(A) to be the number by which the operator A^{\wedge n} \in \mathbf{V}^{\wedge n} scales its argument. That is, \det(A) is the number such that

A\mathbf{v}_1 \wedge \dots \wedge A\mathbf{v}_n = \det(A)\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n

for all \mathbf{v}_1,\dots,\mathbf{v}_n \in \mathbf{V}.

Theorem 1: An operator A \in \mathrm{End}\mathbf{V} is invertible if and only if \det(A) \neq 0.

Proof: Suppose first that A is invertible. Let \{\mathbf{v}_1,\dots,\mathbf{v}_n\} be a linearly independent set in the n-dimensional vector space \mathbf{V}. Since A is invertible, \{A\mathbf{v}_1,\dots,A\mathbf{v}_n\} is also a linearly independent set in \mathbf{V}, and the linear independence of this set is equivalent to the statement that

A\mathbf{v}_1 \wedge \dots \wedge A\mathbf{v}_n

is not the zero tensor. Since the above nonzero tensor is equal to

\det(A)\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n,

we have \det(A) \neq 0.

Conversely, suppose that \det(A) \neq 0. Let \{\mathbf{v}_1,\dots,\mathbf{v}_n\} be a linearly independent set in \mathbf{V}. The statement that A is invertible is equivalent to the statement that \{A\mathbf{v}_1,\dots,A\mathbf{v}_n\} is linearly independent, which is in turn equivalent to the statement that

A\mathbf{v}_1 \wedge \dots \wedge A\mathbf{v}_n

is not the zero tensor. But

A\mathbf{v}_1 \wedge \dots \wedge A\mathbf{v}_n = \det(A) \mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n

is the nonzero tensor \mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n scaled by the nonzero number \det(A), hence it is nonzero.

— Q.E.D.

If you already know something about determinants of matrices, you are probably wondering how Definition 1 relates to this prior knowledge; let us explain this now. Let E=\{\mathbf{e}_1,\dots,\mathbf{e}_n\} be an orthonormal basis in \mathbf{V}, and let

\omega = \mathbf{e}_1 \wedge \dots \wedge \mathbf{e}_n

be the corresponding unit volume tensor in \mathbf{V}^{\wedge n}. We then have that

\det(A) = \langle \omega,A^{\wedge n}\omega\rangle;

this follows from the fact that \det(A) is by definition of the eigenvalue of the operator \mathbf{A}^{\wedge n} together with the fact that \{\omega\} is an orthonormal basis of \mathbf{V}^{\wedge n}. Now, we can explicitly evaluate the above inner product in terms of the matrix elements of A relative to the basis E of \mathbf{V}. Here is the computation:

\det(A) = \langle \omega,A^{\wedge n}\omega\rangle = \langle \mathbf{e}_1 \wedge \dots \wedge \mathbf{e}_n,A\mathbf{e}_1 \wedge \dots \wedge A\mathbf{e}_n\rangle = \sum\limits_{\pi \in \mathrm{S}(n)} \mathrm{sgn}(\pi) \langle \mathbf{e}_1,A\mathbf{e}_{\pi(1)} \rangle \dots \langle \mathbf{e}_n,A\mathbf{e}_{\pi(n)} \rangle.

We recall that \mathrm{S}(n) is the set of permutations \pi of the numbers 1,\dots,n, i.e. bijections \pi \colon \{1,\dots,n\} \rightarrow \{1,\dots,n\}. For example, in the case n=2, this computation reads

\det(A) = \langle \mathbf{e}_1 \wedge \mathbf{e}_2, A\mathbf{e}_1 \wedge A\mathbf{e}_2 \rangle = \langle \mathbf{e}_1,A\mathbf{e}_1\rangle \langle \mathbf{e}_2,A\mathbf{e}_2\rangle - \langle \mathbf{e}_1,A\mathbf{e}_2\rangle \langle \mathbf{e}_2,A\mathbf{e}_1\rangle.

This is the formula for the determinant of the 2 \times 2 matrix

[A]_E = \begin{bmatrix} \langle \mathbf{e}_1,A\mathbf{e}_1\rangle & \langle \mathbf{e}_1,A\mathbf{e}_2\rangle \\ \langle \mathbf{e}_2,A\mathbf{e}_1\rangle & \langle \mathbf{e}_2,A\mathbf{e}_2\rangle \end{bmatrix}

which you likely learned in high school algebra (if you didn’t learn it then, you just learned it now).

To reiterate, for any operator A \in \mathrm{End}V, the determinant \det(A) is the constant such that

A\mathbf{v}_1 \wedge \dots \wedge A\mathbf{v}_n = \det(A)\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n

for all \mathbf{v}_1,\dots,\mathbf{v}_n. Taking the scalar product with a unit volume tensor \omega on either side, this becomes

\langle \omega, A\mathbf{v}_1 \wedge \dots \wedge A\mathbf{v}_n \rangle= \det(A)\langle \omega,\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n\rangle.

The scalar product on the LHS is the oriented volume of the parallelepiped \mathcal{P}(A\mathbf{v}_1,\dots,A\mathbf{v}_n), while the scalar product on the right is the oriented volume of the parallelepiped \mathcal{P}(\mathbf{v}_1,\dots,\mathbf{v}_n). Assuming that the latter volume is nonzero — i.e. that \{\mathbf{v}_1,\dots,\mathbf{v}_n\} is a linearly independent set in \mathbf{V} — we thus have

\det(A) = \frac{\langle \omega, A\mathbf{v}_1 \wedge \dots \wedge A\mathbf{v}_n \rangle}{\langle \omega,\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n\rangle},

which says that \det(A) is the scalar factor by which the oriented volume of \mathcal{P}(\mathbf{v}_1,\dots,\mathbf{v}_n) transforms when each of the vectors \mathbf{v}_1,\dots,\mathbf{v}_n transforms by A. As an application of this geometric interpretation of the determinant, let us calculate the area of an ellipse whose semimajor and semiminor axes have lengths a and b, respectively. The basic observation is that such an ellipse is the image of the unit disc in the Euclidean plane \mathbf{V}= \mathbb{R}^2 under the linear transformation A \in \mathrm{End}\mathbb{R}^2 defined by

A\mathbf{e}_1=a\mathbf{e}_1,\ A\mathbf{e}_2=b\mathbf{e}_2,

where \mathbf{e}_1=(1,0) and \mathbf{e}_2=(0,1) is the standard basis. The determinant of the operator A is

\det(A) = \frac{\langle \mathbf{e}_1 \wedge \mathbf{e}_2, A\mathbf{e}_1 \wedge  A\mathbf{e}_2\rangle}{\langle \mathbf{e}_1 \wedge \mathbf{e}_2, \mathbf{e}_1 \wedge  \mathbf{e}_2\rangle} = ab.

Let \epsilon > 0 be an extremely small positive number, and tile the unit disc with translated copies of the tiny square P(\varepsilon \mathbf{e}_1,\varepsilon \mathbf{e}_2), so that the tiling approximates the area of the disc up to an exceedingly small error. The oriented area of this tiny square is

\langle \mathbf{e}_1 \wedge \mathbf{e}_2, \varepsilon\mathbf{e}_1 \wedge \varepsilon \mathbf{e}_2 \rangle = \varepsilon^2 \langle \mathbf{e}_1 \wedge \mathbf{e}_2,\mathbf{e}_1 \wedge \mathbf{e}_2 \rangle = \varepsilon^2.

The image of the tiny square under the transformation A is the tiny parallelogram \mathcal{P}(A\varepsilon \mathbf{e}_1,A\varepsilon \mathbf{e}_2), and the oriented area of this tiny parallelogram is

\det(A) \langle \mathbf{e}_1 \wedge \mathbf{e}_2, \varepsilon\mathbf{e}_1 \wedge \varepsilon \mathbf{e}_2 \rangle = \varepsilon^2 ab.

We conclude that the area of the parallelogram tiling which approximates the ellipse is the area of the square tiling which approximates the circle scaled by ab, and hence that the area of the ellipse equals the area of the disc scaled by ab. Since the area of the unit disc is \pi, the area of the ellipse in question is \pi a b.

Math 31AH: Lecture 23

Let \mathbf{V} be an n-dimensional Euclidean space. In Lecture 22, we constructed the Euclidean spaces \mathbf{V}^{\otimes d} of degree d tensors, as well as the subspaces \mathbf{V}^{\vee d} and \mathbf{V}^{\wedge d} of \mathbf{V}^{\otimes d} consisting of symmetric and antisymmetric tensors, respectively. Let us recall the formulas for the scalar product in these vector spaces induced by the scalar product \langle \cdot,\cdot \rangle in \mathbf{V}:

\langle \mathbf{v}_1 \otimes \dots \otimes \mathbf{v}_d,\mathbf{w}_1 \otimes \dots \otimes \mathbf{w}_d \rangle = \prod\limits_{i=1}^d \langle \mathbf{v}_i,\mathbf{w}_i \rangle \\ \langle \mathbf{v}_1 \vee \dots \vee \mathbf{v}_d,\mathbf{w}_1 \vee \dots \vee \mathbf{w}_d \rangle = \frac{1}{d!}\sum\limits_{\pi \in \mathrm{S}(d)}\prod\limits_{i=1}^d \langle \mathbf{v}_i,\mathbf{w}_{\pi(i)} \rangle \\ \langle \mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d,\mathbf{w}_1 \wedge \dots \wedge \mathbf{w}_d \rangle = \frac{1}{d!}\sum\limits_{\pi \in \mathrm{S}(d)} \mathrm{sgn}(\pi)\prod\limits_{i=1}^d \langle \mathbf{v}_i,\mathbf{w}_{\pi(i)} \rangle.

To be precise, the first of these formulas is a definition, while the second and third formulas are consequences of this definition and the fact that \mathbf{V}^{\vee d} and \mathbf{V}^{\wedge d} are subspaces of \mathbf{V}^{\otimes d}.

In this lecture we focus on the Euclidean space \mathbf{V}^{\wedge d}, which turns out to be the most important of the three. In fact, we will now be dealing with the antisymmetric tensor powers of \mathbf{V}^{\wedge d} as standalone objects, rather than viewing them as subspaces of the tensor powers \mathbf{V}^{\otimes d}, and as such it will be more convenient to renormalize the scalar product to

\langle \mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d,\mathbf{w}_1 \wedge \dots \wedge \mathbf{w}_d \rangle = \sum\limits_{\pi \in \mathrm{S}(d)} \mathrm{sgn}(\pi)\prod\limits_{i=1}^d \langle \mathbf{v}_i,\mathbf{w}_{\pi(i)} \rangle,

dropping the factor of \frac{1}{d!} which was previously hanging around outside the sum. We are free to do this, since rescaling a scalar product by any positive constant yields a scalar product.

We have already seen that \{ \mathbf{v}_1,\dots,\mathbf{v}_d\} is a linearly dependent set in \mathbf{V} if and only if

\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d = \mathbf{0},

where

\mathbf{0}=\mathbf{0}_{\mathbf{V}^{\otimes d}} = \underbrace{\mathbf{0}_\mathbf{V} \otimes \dots \otimes \mathbf{0}_\mathbf{V}}_{d \text{ factors}}

denotes the zero tensor of degree d. Equivalently, \{\mathbf{v}_1,\dots,\mathbf{v}_d\} is a linearly dependent set in \mathbf{V} if and only if

\|\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d\| = \sqrt{\langle \mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d,\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d \rangle}=0.

Today, we will expand on this by showing that length in \mathbf{V}^{\wedge n} provides a good definition of volume in \mathbf{V}.

Definition 1: For any vectors \mathbf{v}_1,\dots,\mathbf{v}_n, the corresponding parallelepiped is the subset of \mathbf{V} defined by

\mathcal{P}(\mathbf{v}_1,\dots,\mathbf{v}_n) := \{ t_1\mathbf{v}_1 + \dots + t_n\mathbf{v}_n \colon 0 \leq t_i \leq 1\}.

We define the volume of this parallelepiped by

\mathrm{Vol} \mathcal{P}(\mathbf{v}_1,\dots,\mathbf{v}_n) := \| \mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n\|.

According to Definition 1, we should interpret the length of the tensor \mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n to be the volume of the parallelepiped spanned by its factors. This makes sense, in that the length of \mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n is zero if and only if \{\mathbf{v}_1,\dots,\mathbf{v}_n\} is a linearly dependent set in \mathbf{V}, which means that \mathcal{P}(\mathbf{v}_1,\dots,\mathbf{v}_n) lies in a subspace of \mathbf{V} whose dimension strictly less than n. This corresponds to our intuition that the length of a point is zero, the area of a line segment is zero, and the volume of a parallelogram is zero. Indeed, Theorem 2 from Lecture 22 applied to Definition 1 yields the following statement.

Theorem 1: The set \{\mathbf{v}_1,\dots,\mathbf{v}_n\} is a basis in \mathbf{V} if and only if \mathrm{Vol}\mathcal{P}(\mathbf{v}_1,\dots,\mathbf{v}_n)>0.

As an example, let us consider the case n=1, i.e. \mathbf{V} is a 1-dimensional vector space. Let \mathbf{v} \in \mathbf{V}. We then have that \mathcal{P}(\mathbf{v}) consists of all vectors of the form w=t\mathbf{v} with 0 \leq t \leq 1, which can be visualized as the line segment joining \mathbf{0}_\mathbf{V} to \mathbf{v} in \mathbf{V}. The volume of \mathcal{P}(V) is then \|\mathbf{v}\|, which is the length of this line segment. So, according to our definition, volume and length are the same thing in one dimension, which seems reasonable.

As another example, let us consider the case n=2, i.e. \mathbf{V} is a 2-dimensional Euclidean space. Let \mathbf{v}_1,\mathbf{v}_2 \in \mathbf{V}. Then, \mathcal{P}(\mathbf{v}_1,\mathbf{v}_2) can be visualized as the set of all vectors in \mathbf{V} which lie on or inside the parallelogram with vertices

\mathbf{0}_\mathbf{V} = 0\mathbf{v}_1 + 0\mathbf{v}_2 \\ \mathbf{v}_1 = 1\mathbf{v}_1 + 0\mathbf{v}_2 \\ \mathbf{v}_2 = 0\mathbf{v}_1 + 1\mathbf{v}_2 \\ \mathbf{v}_1 + \mathbf{v}_2 = 1\mathbf{v}_1 + 1\mathbf{v}_2.

According to our definition, the volume of \mathcal{P}(\mathbf{v}_1,\mathbf{v}_2) is the number

\mathrm{Vol} \mathcal{P}(\mathbf{v}_1,\mathbf{v}_2) = \|\mathbf{v}_1 \wedge \mathbf{v}_2\|.

Let us evaluate this number more explicitly. Let E = \{\mathbf{e}_1,\mathbf{e}_2\} be an orthonormal basis of \mathbf{V}. We then have

\mathbf{v}_1 \wedge \mathbf{v}_2  = (\langle \mathbf{e}_1,\mathbf{v}_1\rangle \mathbf{e}_1 + \langle \mathbf{e}_2,\mathbf{v}_1\rangle \mathbf{e}_2)\wedge (\langle \mathbf{e}_1,\mathbf{v}_2\rangle \mathbf{e}_1 + \langle \mathbf{e}_2,\mathbf{v}_2\rangle \mathbf{e}_2) \\= \left( \langle \mathbf{e}_1,\mathbf{v}_1\rangle\langle \mathbf{e}_2,\mathbf{v}_2\rangle - \langle \mathbf{e}_1,\mathbf{v}_2\rangle\langle\mathbf{e}_2,\mathbf{v}_1\rangle\right)\mathbf{e}_1 \wedge \mathbf{e}_2,

so that

\mathrm{Vol}\mathcal{P}(\mathbf{v}_1,\mathbf{v}_2) = |\langle \mathbf{e}_1,\mathbf{v}_1\rangle\langle \mathbf{e}_2,\mathbf{v}_2\rangle - \langle \mathbf{e}_1,\mathbf{v}_2\rangle\langle\mathbf{e}_2,\mathbf{v}_1\rangle| \|\mathbf{e}_1 \wedge \mathbf{e}_2\|.

Since

\|\mathbf{e}_1 \wedge \mathbf{e}_2\|^2 = \langle \mathbf{e}_1 \wedge \mathbf{e}_2,\mathbf{e}_1\wedge \mathbf{e}_2 \rangle = \langle \mathbf{e}_1,\mathbf{e}_1\rangle \langle \mathbf{e}_2,\mathbf{e}_2\rangle-\langle \mathbf{e}_1,\mathbf{e}_2\rangle \langle \mathbf{e}_2,\mathbf{e}_1\rangle = \|\mathbf{e}_1\|^2\|\mathbf{e}_2\|^2=1,

we conclude that

\mathrm{Vol}\mathcal{P}(\mathbf{v}_1,\mathbf{v}_2) = |\langle \mathbf{e}_1,\mathbf{v}_1\rangle\langle \mathbf{e}_2,\mathbf{v}_2\rangle - \langle \mathbf{e}_1,\mathbf{v}_2\rangle\langle\mathbf{e}_2,\mathbf{v}_1\rangle|.

There are a couple of things we can notice observe about this number. First, we can also write it as

\mathrm{Vol}\mathcal{P}(\mathbf{v}_1,\mathbf{v}_2) =| \langle \mathbf{e}_1 \wedge \mathbf{e}_2,\mathbf{v}_1 \wedge \mathbf{v}_2|;

as we will see momentarily, this is because \{\mathbf{e}_1\wedge \mathbf{e}_2\} is an orthonormal basis of \mathbf{V}^{\wedge 2}. Second, if you are familiar with the definition of the determinant of a 2 \times 2 matrix,

\det \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22}\end{bmatrix} = a_{11}a_{22}-a_{12}a_{21},

it is apparent that \mathrm{Vol}\mathcal{P}(\mathbf{v}_1,\mathbf{v}_2) is equal to the absolute value of the determinant of the matrix

\begin{bmatrix} \langle \mathbf{e}_1,\mathbf{v}_1\rangle & \langle \mathbf{e}_1,\mathbf{v}_2\rangle\\ \langle \mathbf{e}_2,\mathbf{v}_1\rangle & \langle \mathbf{e}_2,\mathbf{v}_2\rangle\end{bmatrix}

whose columns are the coordinates of the vectors \mathbf{v}_1,\mathbf{v}_2 relative to the basis E=\{\mathbf{e}_1.\mathbf{e}_2\}. This coincidence will also be explained shortly. First, however, we want to ask a natural question: what if we have done the above computation using a different orthonormal basis F=\{\mathbf{f}_1,\mathbf{f}_2\} of \mathbf{V}? The same calculations would then have led us to the formula

\mathrm{Vol}\mathcal{P}(\mathbf{v}_1,\mathbf{v}_2) = |\langle \mathbf{f}_1,\mathbf{v}_1\rangle\langle \mathbf{f}_2,\mathbf{v}_2\rangle - \langle \mathbf{f}_1,\mathbf{v}_2\rangle\langle\mathbf{f}_2,\mathbf{v}_1\rangle|,

which is apparently different from out first formula. But volume should be a geometric entity, and its computation should not depend on the choice of coordinates, i.e. on a choice of basis. We can see our way through this by considering the orthogonal transformation U \in \mathrm{Aut}\mathbf{V} uniquely determined by

U\mathbf{e}_1 = \mathbf{f}_1,\ U\mathbf{e}_2 = \mathbf{f}_2.

We then see that

|\langle \mathbf{f}_1,\mathbf{v}_1\rangle\langle \mathbf{f}_2,\mathbf{v}_2\rangle - \langle \mathbf{f}_1,\mathbf{v}_2\rangle\langle\mathbf{f}_2,\mathbf{v}_1\rangle| = |\langle U\mathbf{e}_1,\mathbf{v}_1\rangle\langle U\mathbf{e}_2,\mathbf{v}_2\rangle - \langle U\mathbf{e}_1,\mathbf{v}_2\rangle\langle U\mathbf{e}_2,\mathbf{v}_1\rangle|\\=|\langle \mathbf{e}_1,U^{-1}\mathbf{v}_1\rangle\langle \mathbf{e}_2,U^{-1}\mathbf{v}_2\rangle - \langle \mathbf{e}_1,U^{-1}\mathbf{v}_2\rangle\langle \mathbf{e}_2,U^{-1}\mathbf{v}_1\rangle|,

which is equivalent to the assertion that

\mathrm{Vol}\mathcal{P}(\mathbf{v}_1,\mathbf{v}_2) = \mathrm{Vol}\mathcal{P}(U^{-1}\mathbf{v}_1,U^{-1}\mathbf{v}_2).

This makes perfect geometric sense: since orthogonal transformations preserve lengths and angles, the parallelogram \mathcal{P}(U^{-1}\mathbf{v}_1,U^{-1}\mathbf{v}_2) is just a rotated and/or flipped copy of \mathcal{P}(\mathbf{v}_1,\mathbf{v}_2), which naturally has the same area.

The key step in understanding the linear algebra underlying our definition of n-dimensional volume is finding an orthonormal basis of \mathbf{V}^{\wedge d}, the space of antisymmetric tensors of degree d. We have solved this problem for \mathbf{V}^{\otimes d}, the space of all tensors of degree d; back in Lecture 22, we showed that any orthonormal basis E=\{\mathbf{e}_1,\dots,\mathbf{e}_n\} in \mathbf{V} induces a corresponding orthonormal basis

E^{\otimes d} = \{\mathbf{e}_{i(1)} \otimes \dots \otimes \mathbf{e}_{i(d)} \colon i \in \mathrm{Fun}(d,n)\}

in \mathbf{V}^{\otimes d}, so that \dim \mathbf{V}^{\otimes d} = n^d. To obtain the analogous result for \mathbf{V}^{\wedge d}, we need to introduce the subset of \mathrm{Fun}(d,n) consisting of increasing functions:

\mathrm{Inc}(d,n):=\{i \in \mathrm{Fun}(d,n) \colon i(1) < i(2) < \dots < i(d)\}.

Theorem 2: For any 1 \leq d \leq n, the set

E^{\wedge d} = \{\mathbf{e}_{i(1)} \wedge \dots \wedge \mathbf{e}_{i(d)} \colon i \in \mathrm{Inc}(d,n)\}

is an orthonormal basis in \mathbf{V}^{\wedge d}. For all d>n, the only antisymmetric tensor of degree d is the zero tensor, i.e. \mathbf{V}^{\wedge d}=\{\mathbf{0}_{\mathbf{V}^{\otimes d}}\}.

Proof: First we prove that E^{\wedge d} spans \mathbf{V}^{\wedge d}. To do this, it is sufficient to show that any simple tensor

\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d

is a linear combination of the tensors in E^{\wedge d}. But this is clear from the multilinearity of the wedge product: we have

\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d \\ = \left( \sum_{i=1}^n \langle \mathbf{e}_i, \mathbf{v}_1\rangle \mathbf{e}_i\right) \wedge \dots \wedge \left( \sum_{i=1}^n \langle \mathbf{e}_i, \mathbf{v}_d\rangle \mathbf{e}_i\right) \\ = \sum\limits_{i \in \mathrm{Fun}(d,n)} \langle \mathbf{e}_{i(1)},\mathbf{v}_1\rangle \dots \mathbf{e}_{i(d)},\mathbf{v}_d\rangle \mathbf{e}_{i(1)} \wedge \dots \wedge \mathbf{e}_{i(d)}.

Now, since a wedge product which contains two copies of the same vector is zero, the only nonzero terms in the above sum are those corresponding to injective functions, i.e. those i \in \mathrm{Fun}(d,n) such that the numbers i(1),\dots,i(d) are pairwise distinct. If d > n, there are no such functions, and consequently \mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_{d} = \mathbf{0}. If latex d \leq n,$ then for any injection i, the factors of the nonzero tensor e_{i(1)} \wedge \dots \wedge \mathbf{e}_{i(d)} can be sorted so that the distinct indices are in increasing order; doing this will scale the original tensor by a factor of \pm 1.

Now we prove that E^{\wedge d} is an orthonormal set in \mathbf{V}^{\wedge d} (in particular, this will imply linear independence). Let i,j \in \mathrm{Inc}(d,n) be distinct increasing functions. We then have that

\langle \mathbf{e}_{i(1)} \wedge \dots \wedge \mathbf{e}_{i(d)},\mathbf{e}_{j(1)} \wedge \dots \wedge \mathbf{e}_{j(d)} = \sum\limits_{\sigma \in \mathrm{S}(d)} \mathrm{sgn}(\sigma) \langle \mathbf{e}_{i(1)},\mathbf{e}_{j\sigma(1)}\rangle \dots \langle \mathbf{e}_{i(d)},\mathbf{e}_{j\sigma(d)}\rangle.

Since there is no permutation \sigma \in \mathrm{S}(d) which can transform the list i(1) < \dots < i(d) of distinct numbers into the different list j(1) < \dots < j(d) of distinct numbers, every term in the sum is zero. This proves that E^{\wedge d} is an orthogonal set in \mathbf{V}^{\wedge d}. To prove that each element of E^{\wedge d} is of unit length, repeat the above argument with i=j. Then, the only nonzero term in the sum is that corresponding to the identity permutation:

\langle \mathbf{e}_i(1),\mathbf{e}_{I(d)} \rangle \dots \langle \mathbf{e}_i(1),\mathbf{e}_{I(d)} \rangle = 1.

— Q.E.D.

Corollary 1: The dimension of \mathbf{V}^{\wedge d} is {n \choose d}.

Proof: Any increasing function I(1) < \dots < I(d) corresponds to a unique cardinality d subset \{i(1),\dots,i(d)\} of \{1,\dots,n\}. By definition, the binomial coefficient {n \choose d} is the number of such sets.

— Q.E.D.

Although it may not seem so, the most important consequence of Corollary 1 is that

\mathbf{V}^{\wedge n} = {n \choose n} =1.

Indeed, according to Theorem 1, the set E^{\wedge n} =\{\omega\} consisting of the single tensor

\omega = \mathbf{e}_1 \wedge \dots \wedge \mathbf{e}_n

is a basis in \mathbf{V}^{\wedge n}. This means that every antisymmetric tensor of degree n is a scalar multiple of \omega, so that in particular for any vectors \mathbf{v}_1,\dots,\mathbf{v}_n \in \mathbf{V} we have

\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n = a \mathbf{e}_1 \wedge \dots \wedge \mathbf{e}_n.

for some scalar a \in \mathbb{R}, the value of which depends on the vectors \mathbf{v}_1,\dots,\mathbf{v}_n. Indeed, since \{\mathbf{e}_1 \wedge \dots \wedge \mathbf{e}_n\} is an orthonormal basis in \mathbf{V}^{\wedge n}, we have

a = \langle \mathbf{e}_1 \wedge \dots \wedge \mathbf{e}_n,\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n \rangle.

We thus have the following alternative description of volume.

Proposition 1: For any orthonormal basis E=\{\mathbf{e}_1,\dots,\mathbf{e}_n\} in \mathbf{V}, we have

\mathrm{Vol}\mathcal{P}(\mathbf{v}_1,\dots,\mathbf{v}_n) = |\langle \mathbf{e}_1 \wedge \dots \wedge \mathbf{e}_n,\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n \rangle|.

Proof: We will give two proofs, one computational and one conceptual.

Computational:

\mathrm{Vol}\mathcal{P}(\mathbf{v}_1,\dots,\mathbf{v}_n) \\=\|\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n\| \\ = \sqrt{\langle \mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n,\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n \rangle} \\ = \sqrt{\langle\langle \mathbf{e}_1 \wedge \dots \wedge \mathbf{e}_n,\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n\rangle \mathbf{e}_1 \wedge \dots \wedge \mathbf{e}_n,\langle \mathbf{e}_1 \wedge \dots \wedge \mathbf{e}_n,\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n\rangle \mathbf{e}_1 \wedge \dots \wedge \mathbf{e}_n\rangle} \\ = \sqrt{\langle \mathbf{e}_1 \wedge \dots \wedge \mathbf{e}_n,\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n\rangle^2\langle \mathbf{e}_1 \wedge \dots \wedge \mathbf{e}_n,\mathbf{e}_1 \wedge \dots \wedge \mathbf{e}_n\rangle} \\ = \sqrt{\langle \mathbf{e}_1 \wedge \dots \wedge \mathbf{e}_n,\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n\rangle^2} \\ =|\langle \mathbf{e}_1 \wedge \dots \wedge \mathbf{e}_n,\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n \rangle|.

Conceptual: by Cauchy-Schwarz,

|\langle \mathbf{e}_1 \wedge \dots \wedge \mathbf{e}_n,\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n \rangle| \leq \|\mathbf{e}_1 \wedge \dots \wedge \mathbf{e}_n\| \|\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n\|.

We know that equality holds in the Cauchy-Schwarz inequality precisely when linear dependence holds, which is the case here because \dim \mathbf{V}^{\wedge n}=1. Thus

|\langle \mathbf{e}_1 \wedge \dots \wedge \mathbf{e}_n,\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n \rangle| =\|\mathbf{e}_1 \wedge \dots \wedge \mathbf{e}_n\| \|\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n\| = \|\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n\|.

— Q.E.D.

Any tensor \omega \in \mathbf{V}^{\wedge n} of the form

\omega=\mathbf{e}_1 \wedge \dots \wedge \mathbf{e}_n,

where E=\{\mathbf{e}_1,\dots,\mathbf{e}_n\} is an orthonormal basis in \mathbf{V}, is called a unit volume tensor for \mathbf{V}. Indeed, for such a tensor we have \|\omega\|=1, corresponding exactly to the fact that \mathcal{P}(\mathbf{e}_1,\dots,\mathbf{e}_n) is an n-dimensional unit box. In fact, because \mathbf{V}^{\wedge n} is 1-dimensional, there are only two unit volume tensors: if \omega \in \mathbf{V}^{\wedge n} satisfies \|\omega\|=1, then the only other tensor in \mathbf{V}^{\wedge n} with this property is -\omega.

Definition 3: A triple (\mathbf{V},\langle \cdot,\cdot \rangle,\omega) consisting of a finite-dimensional vector space \mathbf{V}, a scalar product \langle \cdot,\cdot \rangle on \mathbf{V}, and a unit volume tensor \omega \in \mathbf{V}^{\wedge n}, is called an oriented Euclidean space.

For example, consider the case of the Euclidean plane, i.e. \mathbf{V}=\mathbb{R}^2 with the standard scalar product. Giving the Euclidean plane an orientation means choosing one of the two tensors

(1,0) \wedge (0,1) \quad\text{ or }\quad (0,1) \wedge (1,0).

Choosing the first tensor gives the Euclidean plane a counterclockwise orientation, while choosing the second gives the plane a clockwise orientation.

Definition 2: Given a unit volume tensor \omega \in \mathbf{V}^{\wedge n}, the corresponding determinant is the function

\det \colon \underbrace{\mathbf{V} \times \dots \times \mathbf{V}}_{n \text{ copies}} \to \mathbb{R}

defined by

\det(\mathbf{v}_1,\dots,\mathbf{v}_n) = \langle \omega,\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_n \rangle.

Thus \det(\mathbf{v}_1,\dots,\mathbf{v}_n) is the oriented volume of the parallelepiped \mathcal{P}(\mathbf{v}_1,\dots,\mathbf{v}_n), meaning that it coincides with \mathrm{Vol} \mathcal{P}(\mathbf{v}_1,\dots,\mathbf{v}_n) up to a factor of \pm 1. So, “oriented volume function” might be a more natural name than “determinant.” However, the term determinant is also appropriate, given the following result (which is equivalent to Theorem 1 above).

Theorem 3: The set \{\mathbf{v}_1,\dots,\mathbf{v}_n\} is a basis in \mathbf{V} if and only if \det(\mathbf{v}_1,\dots,\mathbf{v}_n) \neq 0.

It is important to realize that the Euclidean space \mathbf{V} actually supports two distinct determinant functions, one for each of the two unit volume tensors in the one-dimensional space \mathbf{V}^{\wedge n}. For example, if n=2 and we have two orthonormal bases E=\{\mathbf{e}_1,\mathbf{e}_2\} and F=\{\mathbf{f}_1,\mathbf{f}_2\} then we have two corresponding determinant functions

\det_E(\mathbf{v}_1,\mathbf{v}_2) = \langle \mathbf{e}_1 \wedge \mathbf{e}_2, \mathbf{v}_1 \wedge \mathbf{v}_2\rangle \quad\text{ and }\quad \det_F(\mathbf{v}_1,\mathbf{v}_2) = \langle \mathbf{f}_1 \wedge \mathbf{f}_2, \mathbf{v}_1 \wedge \mathbf{v}_2\rangle

which may or may not coincide. For example, if

\mathbf{f}_1=\mathbf{e}_2,\ \mathbf{f}_2=\mathbf{e}_1,

then we have

\det_F(\mathbf{v}_1,\mathbf{v}_2) = \langle \mathbf{f}_1 \wedge \mathbf{f}_2,\mathbf{v}_1 \wedge \mathbf{v}_2 \rangle = \langle \mathbf{e}_2 \wedge \mathbf{e}_1,\mathbf{v}_1\wedge \mathbf{v}_2\rangle=-\det_E(\mathbf{v}_1,\mathbf{v}_2).

In Lecture 24, we shall discuss determinants of operators on Euclidean space, which are closely related to — but not quite the same as — oriented volumes of parallelepipeds in Euclidean space.

Math 31AH: Lecture 22

Let us now use the symmetric and antisymmetric tensor products to define two subspaces of the tensor square \mathbf{V} \otimes \mathbf{V} which store “unevaluated” symmetric and antisymmetric tensor products of vectors from \mathbf{V}. The symmetric square of \mathbf{V} is the subspace \mathbf{V} \vee \mathbf{V} of \mathbf{V} \otimes \mathbf{V} spanned by all symmetric tensor products

\mathbf{v}_1 \vee \mathbf{v}_2, \quad \mathbf{v}_1,\mathbf{v}_2 \in \mathbf{V}.

Elements of \mathbf{V} \vee \mathbf{V} are called symmetric tensors. Similarly, the antisymmetric square of \mathbf{V} is the subspace \mathbf{V} \wedge \mathbf{V} of \mathbf{V} \otimes \mathbf{V} spanned by all antisymmetric tensor products,

\mathbf{v}_1 \wedge \mathbf{v}_2, \quad \mathbf{v}_1,\mathbf{v}_2 \in \mathbf{V}.

Elements of \mathbf{V} \vee \mathbf{V} are called antisymmetric tensors.

All of what we have said above can be generalized in a natural way to products of more than two vectors. More precisely, for any natural number \mathbf{d} \in \mathbb{N}, we can define the dth tensor power of the vector space \mathbf{V} to be the new vector space \mathbf{V}^{\otimes d} spanned by all “unevaluated” products

\mathbf{v}_1 \otimes \dots \otimes \mathbf{v}_d

of d vectors \mathbf{v}_1,\dots,\mathbf{v}_d. The only feature of such multiple unevaluated products is that they are “multilinear,” which really just means that they behave like ordinary products (sans commutativity). For example, in the case d=3, this just means that we have the following three identities in the vector space \mathbf{V}^{\otimes 3}: for any scalars a_1,a_2 \in \mathbb{R}

(a_1\mathbf{u}_1 + a_2\mathbf{u}_2) \otimes \mathbf{v} \otimes \mathbf{w} = a_1\mathbf{u}_1 \otimes \mathbf{v} \otimes \mathbf{w} + a_2\mathbf{u}_2 \otimes \mathbf{v} \otimes \mathbf{w}

for all \mathbf{u}_1,\mathbf{u}_2,\mathbf{v},\mathbf{w} \in \mathbf{V}, and

\mathbf{u} \otimes (a_1\mathbf{v}_1 + a_2\mathbf{v}_2) \otimes \mathbf{w} = a_1 \mathbf{u} \otimes \mathbf{v}_1 \otimes \mathbf{w} + a_2\mathbf{u} \otimes \mathbf{v}_2 \mathbf{w}

for all \mathbf{u},\mathbf{v}_1,\mathbf{v}_2,\mathbf{w} \in \mathbf{V}, and

\mathbf{u} \otimes \mathbf{v} \otimes (a_1\mathbf{w}_1 + a_2\mathbf{w}_2) = a_1 \mathbf{u} \otimes \mathbf{v} \otimes \mathbf{w}_1 + a_2 \mathbf{u} \otimes \mathbf{v} \otimes \mathbf{w}_2

for all \mathbf{u},\mathbf{v},\mathbf{w}_1,\mathbf{w}_2 \in \mathbf{V}. If \mathbf{V} comes with a scalar product \langle \cdot,\cdot \rangle, we can use this to define a scalar product on \mathbf{V}^{\otimes d} in a very simple way by declaring

\langle \mathbf{v}_1 \otimes \dots \otimes \mathbf{v}_d,\mathbf{w}_1 \otimes \dots \otimes \mathbf{w}_d \rangle = \langle \mathbf{v}_1,\mathbf{w}_1 \rangle \dots \langle \mathbf{v}_d. \mathbf{w}_d\rangle.

Even better, we can use the scalar product so defined to construct an orthonormal basis of \mathbf{V}^{\otimes d} from a given orthonormal basis \mathbf{E}=\{\mathbf{e}_1,\dots,\mathbf{e}_n\}: such a basis is simply given by all tensor products with d factors such that each factor is a vector in \mathbf{V}. More precisely, these are the tensors

\mathbf{e}_{i(1)} \otimes \mathbf{e}_{i(2)} \otimes \dots \otimes \mathbf{e}_{i(d)}, \quad i \in \mathrm{Fun}(d,N),

where \mathrm{Fun}(d,N) is a fun notation for the set of all functions

i \colon \{1,\dots,d\} \to \{1,\dots,N\}.

In particular, since the cardinality of \mathrm{Fun}(d,N) is N^d (make N choices d times), the dimension of the vector space \mathbf{V}^{\otimes d} is N^d.

Example 1: If \mathbf{V} is a 2-dimensional vector space with orthonormal basis \{\mathbf{e}_1,\mathbf{e}_2\}, then an orthonormal basis of \mathbf{V}^{\otimes 3} is given by the tensors

\mathbf{e}_1 \otimes \mathbf{e}_1 \otimes \mathbf{e}_1, \\ \mathbf{e}_1 \otimes \mathbf{e}_1 \otimes \mathbf{e}_2, \mathbf{e}_1 \otimes \mathbf{e}_2 \otimes \mathbf{e}_1,\mathbf{e}_2 \otimes \mathbf{e}_1 \otimes \mathbf{e}_1, \\ \mathbf{e}_1 \otimes \mathbf{e}_2 \otimes \mathbf{e}_2, \mathbf{e}_2 \otimes \mathbf{e}_1 \otimes \mathbf{e}_2, \mathbf{e}_2 \otimes \mathbf{e}_2 \otimes \mathbf{e}_1, \\ \mathbf{e}_2 \otimes \mathbf{e}_2 \otimes \mathbf{e}_2.

We now define the d-fold symmetric and antisymmetric tensor products. These products rely on the concept of permutations.

Reading Assignment: Familiarize yourself with permutations. What is important for our purposes is that you understand how to multiply permutations, and that you understand what the sign of a permutation is. Feel free to ask questions as needed.

Definition 1: For any d \in \mathbb{N}, and any vectors \mathbf{v}_1,\dots,\mathbf{v}_d \in \mathbf{V}, we define the symmetric tensor product of these vectors by

\mathbf{v}_1 \vee \dots \vee \mathbf{v}_d = \frac{1}{d!} \sum\limits_{\pi \in \mathrm{S}(d)} \mathbf{v}_{\pi(1)} \otimes \dots \otimes \mathbf{v}_{\pi(d)},

and denote by \mathbf{V}^{\vee d} the subspace of \mathbf{V}^{\otimes d} spanned by all symmetric tensor products of d vectors from \mathbf{V}. Likewise, we define the antisymmetric tensor product of \mathbf{v}_1,\dots,\mathbf{v}_d by

\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d = \frac{1}{d!} \sum\limits_{\pi \in \mathrm{S}(d)} \mathrm{sgn}(\pi)\mathbf{v}_{\pi(1)} \otimes \dots \otimes \mathbf{v}_{\pi(d)},

and denote by \mathbf{V}^{\wedge d} the subspace of \mathbf{V}^{\otimes d} spanned by all antisymmetric tensor products of d vectors from \mathbf{V}.

Note that, in the case d=2, this definition coincides with the definitions

\mathbf{v}_1 \vee \mathbf{v}_2 = \frac{1}{2}\left( \mathbf{v}_1\otimes \mathbf{v}_2 + \mathbf{v}_2 \otimes \mathbf{v}_1\right)

and

\mathbf{v}_1 \wedge \mathbf{v}_2 = \frac{1}{2}\left(\mathbf{v}_1\otimes \mathbf{v}_2 - \mathbf{v}_2 \otimes \mathbf{v}_1\right)

from Lecture 21.

Since the symmetric and antisymmetric tensor products are defined in terms of the tensor product, they inherit multilinearity. For example, in the case d=3, this means that we have the following three identities in the vector space \mathbf{V}^{\vee 3}: for any scalars a_1,a_2 \in \mathbb{R}

(a_1\mathbf{u}_1 + a_2\mathbf{u}_2) \vee \mathbf{v} \vee \mathbf{w} = a_1\mathbf{u}_1 \vee \mathbf{v} \vee \mathbf{w} + a_2\mathbf{u}_2 \vee \mathbf{v} \vee \mathbf{w}

for all \mathbf{u}_1,\mathbf{u}_2,\mathbf{v},\mathbf{w} \in \mathbf{V}, and

\mathbf{u} \vee (a_1\mathbf{v}_1 + a_2\mathbf{v}_2) \vee \mathbf{w} = a_1 \mathbf{u} \vee \mathbf{v}_1 \vee \mathbf{w} + a_2\mathbf{u} \vee \mathbf{v}_2 \mathbf{w}

for all \mathbf{u},\mathbf{v}_1,\mathbf{v}_2,\mathbf{w} \in \mathbf{V}, and

\mathbf{u} \vee \mathbf{v} \vee (a_1\mathbf{w}_1 + a_2\mathbf{w}_2) = a_1 \mathbf{u} \vee \mathbf{v} \vee \mathbf{w}_1 + a_2 \mathbf{u} \vee \mathbf{v} \vee \mathbf{w}_2

for all \mathbf{u},\mathbf{v},\mathbf{w}_1,\mathbf{w}_2 \in \mathbf{V}. The analogous statements hold in \mathbf{V}^{\wedge 3}.

The symmetric tensor product is constructed in such a way that

\mathbf{v}_{\pi(1)} \vee \dots \vee \mathbf{v}_{\pi(d)} = \mathbf{v}_1 \vee \dots \vee \mathbf{v}_d

for any permutation \pi \in \mathrm{S}(d), whereas the antisymmetric tensor product is constructed in such a way that

\mathbf{v}_{\pi(1)} \wedge \dots \wedge \mathbf{v}_{\pi(d)} = \mathrm{sgn}(\pi)\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d

for any permutation \pi \in \mathrm{S}(d). In particular, if any two of the vectors \mathbf{v}_1,\dots,\mathbf{v}_d are equal, then

\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d = \mathbf{0}.

Indeed, suppose that \mathbf{v}_1=\mathbf{v}_2. On one hand, by the above antisymmetry we have

\mathbf{v}_2 \wedge \mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d = - \mathbf{v}_1 \wedge \mathbf{v}_2 \wedge \dots \wedge \mathbf{v}_d,

but on the other hand we also have

\mathbf{v}_2 \wedge \mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d = \mathbf{v}_1 \wedge \mathbf{v}_2 \wedge \dots \wedge \mathbf{v}_d

because \mathbf{v}_1=\mathbf{v}_2. This means that

\mathbf{v}_1 \wedge \mathbf{v}_2 \wedge \dots \wedge \mathbf{v}_d = - \mathbf{v}_1 \wedge \mathbf{v}_2 \wedge \dots \wedge \mathbf{v}_d

if \mathbf{v}_1=\mathbf{v}_2, which forces

\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d = \mathbf{0}.

The vector space \mathbf{V}^{\vee d} is called the dth symmetric power of \mathbf{V}, and its elements are called symmetric tensors of degree d. The vector space \mathbf{V}^{\wedge d} is called the dth antisymmetric power of \mathbf{V}, and its elements are called antisymmetric tensors of degree d. These vector spaces have a physical interpretation. In quantum mechanics, an n-dimensional vector space \mathrm{dim} \mathbf{V} is viewed as the state space of a particle that can be in any one of n quantum states. The space \mathbf{V}^{\vee d} is then the state space of d bosons, each of which may occupy one of n quantum states, while \mathbf{V}^{\wedge d} is the state space of d fermions, each of which may be in any of n quantum states. The vanishing of wedge products with two equal factors corresponds physically to the characteristic feature of fermions, i.e. the Pauli exclusion principle. You don’t have to know any of this — I included this perspective in order to provide some indication that the construction of these vector spaces is not just abstract nonsense.

Theorem 1: For any \mathrm{d} \in \mathbb{N} and any \mathbf{v}_1,\dots,\mathbf{v}_d \in \mathbf{V}, we have

\langle \mathbf{v}_1 \vee \dots \vee \mathbf{v}_d,\mathbf{w}_1 \vee \dots \vee \mathbf{w}_d \rangle = \frac{1}{d!} \sum\limits_{\pi \in \mathrm{S}(d)} \langle \mathbf{v}_1,\mathbf{w}_{\pi(1)}\rangle \dots \langle \mathbf{v}_d,\mathbf{w}_{\pi(d)}\rangle,

and

\langle \mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d,\mathbf{w}_1 \wedge \dots \wedge \mathbf{w}_d \rangle = \frac{1}{d!} \sum\limits_{\pi \in \mathrm{S}(d)} \mathrm{sgn}(\pi)\langle \mathbf{v}_1,\mathbf{w}_{\pi(1)}\rangle \dots \langle \mathbf{v}_d,\mathbf{w}_{\pi(d)}\rangle.

Since we won’t use this theorem much, we will skip the proof. However, the proof is not too difficult, and is an exercise in permutations: simply plug in the definitions of the symmetric and antisymmetric tensor products in terms of the original tensor products, expand the scalar product, and simplify.

Perhaps counterintuitively, the antisymmetric tensor product is more important than the symmetric tensor product in linear algebra. The next theorem explains why.

Theorem 2: For any d \in \mathbb{N} and any \mathbf{v}_1,\dots,\mathbf{v}_d, the set \{\mathbf{v}_1,\dots,\mathbf{v}_d\} is linearly dependent if and only if

\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d = \mathbf{0}.

Proof: Suppose first that \{\mathbf{v}_1,\dots,\mathbf{v}_d\} is a linearly dependent set of vectors in \mathbf{V}. If d=1, this means that \mathbf{v}_1=\mathbf{0}, whence

\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d = \mathbf{v}_1 = \mathbf{0}.

If d \geq 2, then without loss in generality, the vector \mathbf{v}_1 is a linear combination of the vectors \mathbf{v}_2,\dots,\mathbf{v}_d,

\mathbf{v}_1 = a_2\mathbf{v}_2 + \dots + a_d\mathbf{v}_d.

We then have that

\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d = \left(\sum\limits_{i=2}^d a_i\mathbf{v}_i \right) \wedge \mathbf{v}_2 \wedge \dots \wedge \mathbf{v}_d = \sum\limits_{i=2}^d a_i\mathbf{v}_i \wedge \mathbf{v}_2 \wedge \dots \wedge \mathbf{v}_d,

by multinearity of the wedge product. Now observe that the ith term in the sum is a scalar multiple of the wedge product

\mathbf{v}_i \wedge \dots \wedge \mathbf{v}_i \wedge \dots \wedge \mathbf{v}_d,

which contains the vector \mathbf{v}_i twice, and hence each term in the sum is the zero tensor.

Conversely, suppose \mathbf{v}_1,\dots,\mathbf{v}_d \in \mathbf{V} are vectors such that

\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d =\mathbf{0}.

We must prove that \{\mathbf{v}_1,\dots,\mathbf{v}_d\} is a linearly dependent set in \mathbf{V}. We will prove the (equivalent) contrapositive statement: if \{\mathbf{v}_1,\dots,\mathbf{v}_d\} is a linearly independent set in \mathbf{V}, then

\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d \neq \mathbf{0}.

We prove this by induction on \mathbf{d}. In the case d=1, we have that \{\mathbf{v}_1\} is linearly independent, so

\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d = \mathbf{v}_d \neq \mathbf{0}.

For the inductive step, we proceed as follows. Since \{\mathbf{v}_1,\dots,\mathbf{v}_d\} is a linearly independent set, it is a basis of the subspace

\mathbf{W} = \mathrm{Span}\{\mathbf{v}_1,\dots,\mathbf{v}_d\}.

Let \langle \cdot,\cdot \rangle denote the scalar product on \mathbf{W} defined by declaring this basis to be orthonormal. We now define a linear transformation

L \colon \mathbf{W}^{\wedge d} \to \mathbf{W}^{\wedge d-1}

by

L\mathbf{w}_1 \wedge \dots \wedge \mathbf{w}_d = \langle \mathbf{v}_1,\mathbf{w}_1\rangle \mathbf{w}_2 \wedge \dots \wedge \mathbf{w}_d.

We then have that

L\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d = \langle \mathbf{v}_1,\mathbf{v}_1\rangle \mathbf{v}_2 \wedge \dots \wedge \mathbf{v}_d = \mathbf{v}_2 \wedge \dots \wedge \mathbf{v}_d.

Now, since \{\mathbf{v}_1,\dots,\mathbf{v}_d\} is a linearly independent set, so is the subset \{\mathbf{v}_2,\dots,\mathbf{v}_d\}. Thus, by the induction hypothesis,

\mathbf{v}_2 \wedge \dots \mathbf{v}_d \neq 0.

It then follows that

\mathbf{v}_1 \wedge \dots \mathbf{v}_d \neq \mathbf{0},

since otherwise the linear transformation L would map the zero vector in \mathbf{W}^{\wedge d} to a nonzero vector in \mathbf{W}^{\wedge d-1}, which is impossible.

— Q.E.D.

Corollary 1: We have \|\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d\| \geq 0 with equality if and only if \{\mathbf{v}_1,\dots,\mathbf{v}_d\} is linearly dependent.

Since

\|\|\mathbf{v}_1 \wedge \dots \wedge \mathbf{v}_d\|^2 = \sum\limits_{\pi \in \mathrm{S}(d)} \mathrm{sgn}(\pi) \langle \mathbf{v}_1,\mathbf{v}_{\pi(1)}\rangle \dots \langle \mathbf{v}_d,\mathbf{v}_{\pi(d)}\rangle

you can think of this as a massive generalization of the Cauchy-Schwarz inequality, which is the case d=2.

Lecture 22 coda

Math 31AH: Lecture 21

Two basic issues in linear algebra which we have not yet resolved are:

  1. Can we multiply vectors?
  2. Can we certify linear independence?

The answers to these questions turn out to be closely related to one another. In this lecture, we discuss the first item.

Let \mathbf{V} be a vector space. We have seen one sort of multiplication of vectors, namely the scalar product \langle \mathbf{v},\mathbf{w} \rangle. One one hand, the scalar product is a proper multiplication rule in the sense that it satisfies the FOIL identity, which is referred to as bilinearity in polite company. On the other hand, the scalar product does not correspond to our usual notion of multiplication in the sense that the product of two vectors is a number, not a vector. This is strange in that one instinctively feels that the “product” of two objects should be another object of the same type. It is natural to ask whether, we can define a bilinear “vector product” which has the feature that the product of two vectors in \mathbf{V} is a vector in \mathbf{V}. In other words, we are asking whether it is possible to give some universal recipe for multiplication of vectors which would turn every vector space into an algebra.

So far, we have only seen certain specific vector spaces \mathbf{V} where a bilinear multiplication of vectors naturally presents itself. Here is a list of these spaces.

  1. \mathbf{V} = \mathbb{R}. In this case, vectors \mathbf{v} \in \mathbf{V} are real numbers, and the vector product \mathbf{v}\mathbf{w} is the product of real numbers.
  2. \mathbf{V}=\mathbb{R}^2. Technically, we have not seen this example yet, but here it is. Let \mathbf{v}=(x_1,x_2) and \mathbf{w}=(y_1,y_2) be vectors in \mathbf{V}. We then define their product to be \mathbf{v}\mathbf{w}=(x_1y_1-x_2y_2,x_1y_2+x_2y_1). Next week, we will see that this example of vector multiplication gives the complex number system.
  3. \mathbf{V}=\mathbb{R}^\infty. In this example, the vector space \mathbf{V} consists of infinite sequences \mathbf{v}=(x_0,x_1,x_2,\dots) which are identically zero after finitely many terms. This means that \mathbf{V} is isomorphic to the vector space of polynomials in a single variable. Let \mathbf{v}=(x_0,x_1,x_2,\dots) and \mathbf{w}=(y_0,y_1,y_2,\dots) be vectors in \mathbf{V}. We define their product to be \mathbf{v}\mathbf{w} = (x_0y_0,x_0y_1+x_1y_0,x_2y_0+x_1y_1+x_2y_0,\dots), which is just the recipe for multiplying polynomials and collecting together terms of the same degree.
  4. \mathbf{V}=\mathbb{R}^{n \times n}. In this example, the vector space \mathbf{V} consists of matrices with n rows and n columns. This means that \mathbf{V} is isomorphic to the vector space of linear operators on an n-dimensional vector space. A vector product in \mathbf{V} is then defined by matrix multiplication.

The above examples are quite different from one another, and they do not appear to be given by any universal recipe for defining a product of vectors. It turns out that in order to answer the question of how to define a universal vector product, it is better not to answer it at all. This is the idea behind the tensor product, which we now introduce.

To every pair of vectors \mathbf{v},\mathbf{w} \in \mathbf{V}, we associate a new vector denoted \mathbf{v} \otimes \mathbf{w}, which is called the tensor product of \mathbf{v} and \mathbf{w}. However, the vector \mathbf{v} \otimes \mathbf{w} does not reside in \mathbf{V}; rather, it is a vector in a new vector space called the tensor square of \mathbf{V} and denoted \mathbf{V} \otimes \mathbf{V}. What is happening here is that we view the symbol \otimes as a rule for multiplying two vectors, but we do not specify what this rule is — instead, we view \mathbf{v} \otimes \mathbf{w} as an “unevaluated” product of two vectors. We then store this unevaluated product in a new vector space \mathbf{V} \otimes \mathbf{V}, which contains all unevaluated products of vectors from \mathbf{V}. More precisely, the vectors in \mathbf{V} \otimes \mathbf{V} are all unevaluated expressions of the form

\tau = \mathbf{v}_1 \otimes \mathbf{w}_1 + \dots + \mathbf{v}_k \otimes \mathbf{w}_k,

where k \in \mathbb{N} is a natural number and \mathbf{v}_1,\mathbf{w}_1,\dots,\mathbf{v}_k,\mathbf{w}_k \in \mathbf{V} are vectors. These unevaluated expressions are called tensors, and often denoted by Greek letters. So tensor products are ambiguous, in the sense that we do not specify what the result of the multiplication \mathbf{v} \otimes \mathbf{w} actually is. The only thing we specify about this rule is that it is bilinear:

(a_1\mathbf{v}_1 + a_2\mathbf{v}_2) \otimes (b_1\mathbf{w}_1 + b_2\mathbf{w}_2) \\ = a_1b_1\mathbf{v}_1 \otimes \mathbf{w}_1 + a_1b_2 \mathbf{v}_1 \otimes \mathbf{w}_2 + a_2b_1 \mathbf{v}_2\otimes \mathbf{w}_1  + a_2b_2\mathbf{v}_2\otimes \mathbf{w}_2,

where the equality means that the LHS and the RHS are different expressions for the same vector in the vector space \mathbf{V} \otimes \mathbf{V}.

A tensor in \mathbf{V} \otimes \mathbf{V} which can be represented as the product of two vectors from \mathbf{V} is called a simple tensor. Note that a tensor may be simple without obviously being so, in the event that it can be “factored” as in high school algebra. For example, we have

\mathbf{v}_1 \otimes \mathbf{w}_1 + \mathbf{v}_2 \otimes \mathbf{w}_1 = (\mathbf{v}_1+\mathbf{v}_2) \otimes \mathbf{w}_1.

We haven’t yet said how to scale tensors by numbers. The rule for scalar multiplication of tensors is determined by bilinearity: it is defined by

a \mathbf{v} \otimes \mathbf{w} = (a\mathbf{v}) \otimes \mathbf{w} = \mathbf{v} \otimes (a\mathbf{w}),

and

a \sum_{i=1}^k \mathbf{v}_i \otimes \mathbf{w}_i = \sum_{i=1}^k a\mathbf{v}_i \otimes \mathbf{w}_i.

We can summarize all of the above by saying that two tensors \tau,\sigma \in \mathbf{V} \otimes \mathbf{V} are equal if and only if it is possible to rewrite \tau as \sigma using bilinearity.

Tensor products take a while to get used to. It’s important to remember that the only specified property of the tensor product is bilinearity; apart from this, it’s entirely ambiguous. So, anything we can say about tensor products must ultimately be a consequence of bilinearity. Here is an example.

Proposition 1: For any \mathbf{v} \in \mathbf{V}, we have

\mathbf{v} \otimes \mathbf{0}_\mathbf{V} = \mathbf{0}_\mathbf{V} \otimes \mathbf{v} = \mathbf{0}_\mathbf{V} \otimes \mathbf{0}_\mathbf{V}.

Proof: We are going to use the fact that scaling any vector \mathbf{v} \in \mathbf{V} by the number 0 \in \mathbb{R} produces the zero vector \mathbf{0}_\mathbf{V} \in \mathbf{V}. This was proved in Lecture 1, when we discussed the definition of a vector space. We have

\mathbf{v} \otimes \mathbf{0}_\mathbf{V} = \mathbf{v} \otimes (0\mathbf{0}_\mathbf{V}) = (0\mathbf{v}) \otimes \mathbf{0}_\mathbf{V} = \mathbf{0}_\mathbf{V} \otimes \mathbf{0}_\mathbf{V}.

Notice that bilinearity was used here to move the scalar zero from the second factor in the tensor product to the first factor in the tensor product. The proof that \mathbf{0}_\mathbf{V} \otimes \mathbf{v} = \mathbf{0}_\mathbf{V} \otimes \mathbf{0}_\mathbf{V} is essentially the same (try it!).

— Q.E.D.

Using Proposition 1, we can explicitly identify the “zero tensor,” i.e. the zero vector \mathbf{0}_{\mathbf{V} \otimes \mathbf{V}} in the vector space \mathbf{V} \otimes \mathbf{V}.

Proposition 2: We have \mathbf{0}_{\mathbf{V} \otimes \mathbf{V}}=\mathbf{0}_{\mathbf{V}} \otimes \mathbf{0}_{\mathbf{V}}.

Proof: Let

\tau = \sum_{i=1}^k \mathbf{v}_i \otimes \mathbf{w}_i

be any tensor. We want to prove that \tau+\mathbf{0}_\mathbf{V} \otimes \mathbf{0}_\mathbf{V} = \tau.

In the case k=1, we have \tau = \mathbf{v}_1 \otimes \mathbf{w}_1. Using bilinearity, we have

\mathbf{v}_1 \otimes \mathbf{w}_1 + \mathbf{0}_{\mathbf{V}} \otimes \mathbf{0}_{\mathbf{V}} = \mathbf{v}_1 \otimes \mathbf{w}_1 + \mathbf{0}_{\mathbf{V}} \otimes \mathbf{w}_1 = (\mathbf{v}_1+\mathbf{0}_\mathbf{V}) \otimes \mathbf{w}_1 = \mathbf{v}_1 \otimes \mathbf{w}_1,

where we used Proposition 1 and bilinearity.

The case k>1 now follows from the case k=1,

\tau + \mathbf{0}_{\mathbf{V}} \otimes \mathbf{0}_{\mathbf{V}} = \sum_{i=1}^k \mathbf{v}_i \otimes \mathbf{w}_i + \mathbf{0}_{\mathbf{V}} \otimes \mathbf{0}_{\mathbf{V}} = \sum_{i=1}^{k-1} \mathbf{v}_i \otimes \mathbf{w}_i + \left(a_k\mathbf{v}_k \otimes \mathbf{w}_k + \mathbf{0}_{\mathbf{V}} \otimes \mathbf{0}_{\mathbf{V}}\right) = \sum_{i=1}^{k-1} \mathbf{v}_i \otimes \mathbf{w}_i + a_k\mathbf{v}_k \otimes \mathbf{w}_k = \tau.

— Q.E.D.

Suppose now that \mathbf{V} is a Euclidean space, i.e. it comes with a scalar product \langle \cdot,\cdot \rangle. Then, there is an associated scalar product on the vector space \mathbf{V} \otimes \mathbf{V}, which by abuse of notation we also write as \langle \cdot,\cdot \rangle. This natural scalar product on \mathbf{V} \otimes \mathbf{V} is uniquely determined by the requirement that

\langle \mathbf{v}_1 \otimes \mathbf{w}_1,\mathbf{v}_2 \otimes \mathbf{w}_2 \rangle = \langle \mathbf{v}_1,\mathbf{v}_2\rangle \langle \mathbf{w}_1,\mathbf{w}_2\rangle, \quad \forall \mathbf{v}_1,\mathbf{v}_2,\mathbf{w}_1,\mathbf{w}_2 \in \mathbf{V}.

Exercise 1: Verify that the scalar product on \mathbf{V} \otimes \mathbf{V} just defined really does satisfy the scalar product axioms.

Proposition 3: If S is an orthogonal set of vectors in \mathbf{V}, then

S \otimes S = \{\mathbf{v} \otimes \mathbf{w} \colon \mathbf{v},\mathbf{w} \in S\}

is an orthogonal set of tensors in \mathbf{V} \otimes \mathbf{V}.

Proof: We must show that if \mathbf{v}_1 \otimes \mathbf{w}_1,\mathbf{v}_2 \otimes \mathbf{w}_2 \in S \otimes S are different tensors, then their scalar product is zero. We have

\langle \mathbf{v}_1 \otimes \mathbf{w}_1,\mathbf{v}_2 \otimes \mathbf{w}_2 \rangle = \langle \mathbf{v}_1,\mathbf{v}_2\rangle \langle \mathbf{w}_1,\mathbf{w}_2\rangle.

The assumption that these tensors are different is equivalent to saying that one of the following conditions holds:

\mathbf{v}_1 \neq \mathbf{v}_2 \text{ or } \mathbf{w}_1 \neq \mathbf{w}_2.

Since S is an orthogonal set, the first possibility implies \langle \mathbf{v}_1,\mathbf{v}_2 \rangle =0, and the second implies \langle \mathbf{w}_1,\mathbf{w}_2 \rangle = 0. In either case, the product \langle \mathbf{v}_1,\mathbf{v}_2\rangle \langle \mathbf{w}_1,\mathbf{w}_2\rangle is equal to zero.

— Q.E.D.

Theorem 1: If E=\{\mathbf{e}_1,\dots,\mathbf{e}_n\} is an orthonormal basis in \mathbf{V}, then E \otimes E = \{\mathbf{e}_i \otimes \mathbf{e}_j \colon 1 \leq i,j \leq n\} is an orthonormal basis in \mathbf{V} \otimes \mathbf{V}.

Proof: Let us first show that E \otimes E spans \mathbf{V} \otimes \mathbf{V}. We have

\tau = \sum\limits_{k=1}^l a_k \mathbf{v}_k \otimes \mathbf{w}_k = \sum\limits_{k=1}^l a_k \left(\sum\limits_{i=1}^n \langle \mathbf{e}_i,\mathbf{v}_k\rangle\mathbf{e}_i \right) \otimes \left(\sum\limits_{j=1}^n \langle \mathbf{e}_j,\mathbf{w}_k\rangle\mathbf{e}_j \right) \\= \sum\limits_{i,j=1}^n \left(\sum\limits_{k=1}^n a_k \langle \mathbf{e}_i,\mathbf{v}_k\rangle\langle \mathbf{e}_j,\mathbf{w}_k\rangle\right)\mathbf{e}_i \otimes \mathbf{e}_j,

which shows that an arbitrary tensor is a linear combination of the tensors \mathbf{e}_i \otimes \mathbf{e}_j.

Since E is an orthogonal set in \mathbf{V}, by Proposition 3 we have that E \otimes E is an orthogonal set in \mathbf{V} \otimes \mathbf{V}, and therefore it is linearly independent.

It remains only to show that all tensors in E \otimes E have unit length. This is established by direct computation:

\|\mathbf{e}_i \otimes \mathbf{e}_i \| = \langle \mathbf{e}_i\otimes \mathbf{e}_i,\mathbf{e}_i \otimes \mathbf{e}_i \rangle = \langle \mathbf{e}_i,\mathbf{e}_i \rangle\langle \mathbf{e}_i,\mathbf{e}_i \rangle= 1.

— Q.E.D.

Corollary 1: If \dim \mathbf{V} = n, then \dim \mathbf{V} \otimes \mathbf{V} = n^2.

It is important to note that the tensor product is noncommutative: it is typically not the case that \mathbf{v} \otimes \mathbf{w} = \mathbf{w} \otimes \mathbf{v}. However, we can decompose a simple tensor into two pieces, as

\mathbf{v} \otimes \mathbf{w} = \frac{\mathbf{v} \otimes \mathbf{w} + \mathbf{w} \otimes \mathbf{v}}{2} + \frac{\mathbf{v} \otimes \mathbf{w} - \mathbf{w} \otimes \mathbf{v}}{2}.

The first of these fractions is called the “symmetric part” of \mathbf{v} \otimes \mathbf{w}, and is denoted

\mathbf{v} \vee \mathbf{w} := \frac{\mathbf{v} \otimes \mathbf{w} + \mathbf{w} \otimes \mathbf{v}}{2}.

The reason for this notation is that we can think of \vee as a symmetric version of the tensor product: a bilinear multiplication of vectors that, by construction, is commutative:

\mathbf{v} \vee \mathbf{w} = \mathbf{w} \vee \mathbf{v}.

Note that if \mathbf{v}=\mathbf{w}, the symmetric tensor product produces the same tensor as the tensor product itself:

\mathbf{v} \vee \mathbf{v} = \mathbf{v} \otimes \mathbf{v}.

The second fraction above is called the “antisymmetric part” of \mathbf{v} \otimes \mathbf{w}, and denoted

\mathbf{v} \wedge \mathbf{w} := \frac{\mathbf{v} \otimes \mathbf{w} - \mathbf{w} \otimes \mathbf{v}}{2}.

This is an antisymmetric version of the tensor product in that, by construction, satisfies

\mathbf{v} \wedge \mathbf{w} = -\mathbf{w} \wedge \mathbf{v}.

Note that the antisymmetric tensor product of any vector with itself produces the zero tensor:

\mathbf{v} \wedge \mathbf{v} = \mathbf{0}_{\mathbf{V} \otimes \mathbf{V}}.

Although it may seem like the symmetric tensor product is more natural (commutative products are nice), it turns out that the antisymmetric tensor product — or wedge product as it’s often called — is more important. Here is a first indication of this. Suppose that \mathbf{V} is a 2-dimensional Euclidean space with orthonormal basis \{\mathbf{e}_1,\mathbf{e}_2\}. Let

\mathbf{v}_1 = a_{11}\mathbf{e}_1 + a_{12}\mathbf{e}_2 \quad\text{ and }\quad \mathbf{v}_2 = a_{21}\mathbf{e}_1 + a_{22}\mathbf{e}_2

be two vectors in \mathbf{V}. Let’s compute their wedge product: using FOIL, we find

\mathbf{v}_1 \wedge \mathbf{v}_2 \\ = (a_{11}\mathbf{e}_1 + a_{12}\mathbf{e}_2) \wedge (a_{21}\mathbf{e}_1 + a_{22}\mathbf{e}_2) \\ = (a_{11}\mathbf{e}_1) \wedge (a_{21}\mathbf{e}_1) + (a_{11}\mathbf{e}_1) \wedge (a_{22}\mathbf{e}_2) + (a_{12}\mathbf{e}_2)\wedge (a_{21}\mathbf{e}_1) + (a_{12}\mathbf{e}_2) \wedge (a_{22}\mathbf{e}_2) \\ = a_{11}a_{21} \mathbf{e}_1 \wedge \mathbf{e}_1 + a_{11}a_{22}\mathbf{e}_1 \wedge \mathbf{e}_2 + a_{12}a_{21} \mathbf{e}_2 \wedge \mathbf{e}_1 + a_{12}a_{22}\mathbf{e}_2 \wedge \mathbf{e}_2 \\ = a_{11}a_{22}\mathbf{e}_1 \wedge \mathbf{e}_2 + a_{12}a_{21} \mathbf{e}_2 \wedge \mathbf{e}_1  \\ = a_{11}a_{22}\mathbf{e}_1 \wedge \mathbf{e}_2 - a_{12}a_{21} \mathbf{e}_1 \wedge \mathbf{e}_2 \\ = (a_{11}a_{22}\mathbf{e}_1 - a_{12}a_{21}) \mathbf{e}_1 \wedge \mathbf{e}_2.

Probably, you recognize the lone scalar (a_{11}a_{22}\mathbf{e}_1 - a_{12}a_{21}) remaining at the end of this computation as a determinant:

\begin{vmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{vmatrix} = a_{11}a_{22}-a_{21}a_{12}.

Even if you don’t, no need to worry: you are not expected to know what a determinant is at this point. Indeed, in Lecture 22 we are going to use the wedge product to define determinants.

Lecture 21 coda