Math 202C: Lecture 1

Throughout this lecture, \mathbf{V} is a finite-dimensional complex vector space equipped with a scalar product \langle \cdot,\cdot \rangle, which is by convention a linear function of its second argument. The scalar product on \mathbf{V} determines various symmetry classes in \mathrm{End}\mathbf{V}, the algebra of linear operators A \colon \mathbf{V} \to \mathbf{V}, and the study of these symmetry classes is one of the main components of Math 202A. In particular, recall that an operator A \in \mathrm{End}\mathbf{V} is said to be selfajdoint (or Hermitian) if

\langle \mathbf{v},A\mathbf{w} \rangle = \langle A\mathbf{v},\mathbf{w}\rangle, \quad \mathbf{v},\mathbf{w} \in \mathcal{H}.

Hermitian operators are the subject of the Spectral Theorem, which may be formulated as follows.

Theorem 1: If A \in \mathrm{End} \mathbf{V} is selfadjoint, then there exists an orthonormal basis \mathbf{e}_1,\dots\mathbf{e}_N of \mathbf{V} such that A\mathbf{e}_i = \lambda_i\mathbf{e}_i, where \lambda_1 \geq \dots \geq \lambda_N are real numbers.

Theorem 1 is the eigenvalue/eigenvector form of the Spectral Theorem. We will give a proof of this result which iteratively constructs the eigenvalues/eigenvectors of A as the maxima/maximizers of the (necessarily real-valued) quadratic form Q_A(\mathbf{v}) = \langle \mathbf{v},A\mathbf{v}\rangle on a nested sequence of compact sets in \mathbf{V}. Instead of invoking the Fundamental Theorem of Algebra to claim the existence of a root of the characteristic polynomial, this argument calls the Extreme Value Theorem as a subroutine.

Lemma 1: If B \colon \mathbf{V} \to \mathbf{V} is a selfadjoint operator such that the affiliated quadratic form Q_B(\mathbf{v})=\langle \mathbf{v},B\mathbf{v}\rangle is nonnegative, then any zero of Q_B \colon \mathbf{V} \to \mathbb{R} is in the kernel of B.

Proof: In fact, the argument does not use the finite-dimensionality of \mathbf{V}. Let \mathbf{z} \in \mathbf{V} be such that Q_B(\mathbf{z})=0. Consider a real perturbation of this zero in an arbitrary direction, i.e. consider Q_B(\mathbf{z}+t\mathbf{h}), where t \in \mathbb{R} is arbitrary and \mathbf{h} \in \mathbf{V} is a fixed unit vector. We then have

Q_B(\mathbf{z}+t\mathbf{h}) = \langle \mathbf{z}+t\mathbf{h},B(\mathbf{z}+t\mathbf{h})\rangle=Q_B(\mathbf{z})+t\langle \mathbf{z},B\mathbf{h} \rangle+t\langle\mathbf{h},B\mathbf{z}\rangle+t^2Q_B(\mathbf{h}).

The first term is zero by hypothesis. The middle terms are the real number t times

\langle \mathbf{z},B\mathbf{h} \rangle+\langle\mathbf{h},B\mathbf{z}\rangle = \langle \mathbf{z},B\mathbf{h} \rangle+\langle B\mathbf{h},\mathbf{z}\rangle = \langle \mathbf{z},B\mathbf{h} \rangle+\overline{\langle \mathbf{z},B\mathbf{h} \rangle} = 2 \Re \langle \mathbf{z},B\mathbf{h} \rangle.

Thus by hypothesis the function

f(t)=2t \Re \langle \mathbf{z},B\mathbf{h} \rangle +t^2Q_B(\mathbf{h})

is a nonnegative function of the real variable t. This function has the asymptotics

f(t) \sim 2t \Re \langle \mathbf{z},B\mathbf{h} \rangle

as t \to 0, so it must be the case that \Re \langle \mathbf{z},B\mathbf{h} \rangle=\Re \langle B\mathbf{z},\mathbf{h} \rangle=0, otherwise f(t) would change sign at t=0, which it does not, being a nonnegative function. Repeating the above with a pure imaginary perturbation it\mathbf{h}, we get that \Im \langle \mathbf{z},B\mathbf{h} \rangle =\Im \langle B\mathbf{z},\mathbf{h} \rangle=0 as well. Thus B\mathbf{z} is orthogonal to every unit vector \mathbf{h}, whence it must be the zero vector, B\mathbf{z}=\mathbf{0}_\mathbf{V}.

— Q.E.D.

Proof of Theorem 1: Since \mathbf{V} is finite-dimensional, its unit sphere

S_1 = \{\mathbf{v} \in \mathbf{V} \colon \|\mathbf{v}\|=1\}

is compact, and the function Q_A \colon \mathbf{V} \to \mathbb{R} is continuous. Thus, by the Extreme Value Theorem, there exists a unit vector \mathbf{e}_1 \in \mathbf{V} at which Q_A attains its maximum value on S_1, i.e. we have

Q_A(\mathbf{v}) \leq \lambda_1

for all unit vectors \mathbf{v}. But this is equivalent to

\langle \mathbf{v},A\mathbf{v}\rangle \leq \lambda_1 \langle \mathbf{v},\mathbf{v}\rangle

for all unit vectors \mathbf{v}, and in this form it is clear that the inequality actually holds for arbitrary vectors \mathbf{v} \in \mathbf{V}, i.e. we have

\langle \mathbf{v},B_1\mathbf{v} \rangle \geq 0

for all \mathbf{v} \in \mathbf{V}, where

B_1=\lambda_1I -A.

Note that B_1 \in \mathrm{End}\mathbf{V} is selfadjoint (indeed, selfadjoint operators in \mathrm{End}\mathbf{V} form a real vector space), and the above says that the associated quadratic form Q_{B_1} is nonnegative. Since Q_{B_1}(\mathbf{e}_1)=0 by construction, we may invoke Lemma 1 to conclude that \mathbf{e}_1 is in the kernel of B_1=\lambda_1I-A, i.e. \mathbf{e}_1 is an eigenvector of A with eigenvalue \lambda_1.

Consider now the orthogonal complement

\mathbf{e}_1^\perp = \{\mathbf{v} \in \mathbf{V} \colon \langle \mathbf{e}_1,\mathbf{v} \rangle = 0\},

i.e. the (N-1)-dimensional hyperplane in \mathbf{V} perpendicular to the line through \mathbf{e}_1. Let S_2 be the unit sphere in \mathbf{e}_1^\perp, let \lambda_2\leq \lambda_1 be the maximum of the quadratic form Q_A on S_2 \subset S_1, and let \mathbf{e}_2 \in S_2 be a point at which the maximum is achieved. By exactly the same argument as above, we get that \mathbf{e}_2 is an eigenvector of A with eigenvalue \lambda_2. Indeed, we can iterate the argument N times to obtain an orthonormal set \{\mathbf{e}_1,\dots,\mathbf{e}_N\} of eigenvectors with eigenvalues \lambda_1 \geq \dots \geq \lambda_N.

— Q.E.D.

2. The subspace lattice

Let \mathcal{L}(\mathbf{V}) denote the set of all subspaces of \mathbf{V}, including the trivial subspace \{\mathbf{0}_\mathbf{V}\} and \mathbf{V} itself. This set carries a natural partial order: given \mathbf{W}_1,\mathbf{W}_2 \in \mathcal{L}(\mathbf{V}), we declare \mathbf{W}_1 \leq \mathbf{W}_2 if and only if \mathbf{W}_1 is a subspace of \mathbf{W}_2. In fact, \mathcal{L}(\mathbf{V}) is a graded poset in which the rank of a subspace \mathbf{W} \in \mathcal{L}(\mathbf{V}) is its dimension \dim \mathbf{W}. The graded poset \mathcal{L}(\mathbf{V}) is moreover a lattice: the largest subspace contained in each of two given subspaces \mathbf{W}_1 and \mathbf{W}_2 is their intersection, and the smallest subspaces containing both of them is their span. Finally, the lattice \mathcal{L}(\mathbf{V}) is self-dual: the map \mathbf{W} \to \mathbf{W}^\perp sending a subspace to its orthogonal complement is an order-reversing involution.

In formulating the above, we have just encountered our first example of a functor: we can view \mathcal{L} as a covariant functor from the category of vector spaces to the category of posets. Indeed, if \mathbf{V}_1,\mathbf{V}_2 are vector spaces and T \in \mathrm{Hom}(\mathbf{V}_1,\mathbf{V}_2) is a linear transformation, we get an order-preserving function \mathcal{L}(T) \in \mathrm{Hom}(\mathcal{L}(\mathbf{V}_1),\mathcal{L}(\mathbf{V}_2)) defined by


In this course we will be considering quite a few more functors, most of which will be endofunctors from the category of vector spaces to itself. The most basic of these are tensor powers, symmetric powers, and exterior powers, which are the basic operations of multilinear algebra and are closely related to the physical concept of Fock space, which is intended to capture the notion of symmetry classes of elementary particles in quantum mechanics — bosons and fermions and such.

3. Flags and Grassmannians

Very loosely speaking, vector spaces are continuous versions of sets. A finite set E=\{\mathbf{e}_1,\dots,\mathbf{e}_N\} might represent the possible states of some physical system, while the free vector space \mathbf{V}=\mathbb{C}[E] over this set allows for superpositions of states, i.e. linear combinations of them. Associated to E is its lattice of subsets \mathcal{L}(E), where meet and join are intersection and union. This is called a Boolean lattice, and it is graded by cardinality rather than dimension. Sometimes \mathcal{L}(E) is referred to as a hypercube, because its elements can be thought of as length N bitstrings, and these may in turn be thought of as the vertices of a polytope in \mathbf{R}^N which is a cube when N=3.

In a general poset, a chain is a totally ordered subset, while an antichain is a subset which does not contain any comparable pairs of elements. A chain or antichain is said to be saturated if adding any new element will cause it to no longer be a chain or antichain. It is easy to spot saturated chains in the Boolean lattice \mathcal{L}(E) — they are just sequences E_0,E_1,\dots,E_N of subsets of E such that |E_k|=k, and there are factorially many of these. It is equally easy to spot N+1 saturated antichains, namely the collections A_k \subset \mathcal{L}(E) consisting of all subsets of E of cardinality k. The cardinality of A_k itself is then {N \choose k}, and it is a famous result in extremal combinatorics that the size of the largest antichain in the Boolean lattice \mathcal{L}(E) is the largest number in the Nth row of Pascal’s triangle.

The lattice of subspaces of a finite-dimensional vector space is in many ways analogous to the lattice of subsets of a finite set. Indeed, if one could define a field with one element in a satisfactory way, then the two notions would coincide for vector spaces over \mathbb{F}_1. This is problematic, but one can analyze the subspace lattice of a vector space defined over \mathbb{F}_q without difficulty, and count chains, antichains, etc. For example, the antichain consisting of k-dimensional subspaces is the Gaussian binomial coefficient {N \choose k}_q, which degenerates to the ordinary binomial coefficient in the q \to 1 limit, almost as if \mathbf{F}_1 really did exist…

Anyway, we are working over \mathbb{C}, so everything is infinite. Chains in \mathcal{L}(\mathbf{V}) are typically referred to as flags, though this term is more often than not reserved for chains that contain the trivial subspace and the \mathbf{V} itself. Saturated chains are called complete flags: they are sequences

\mathbf{V}_0 \leq \mathbf{V}_1 \leq \dots \leq \mathbf{V}_N

such that \dim\mathbf{V}_k =k. The analogue of the antichain A_k in the Boolean lattice consisting of subsets of cardinality k is the collection of k-dimensional subspaces of \mathbf{V}; these antichains are called Grassmannians and denoted \mathrm{Gr}_k(\mathbf{V}). We are going to discuss Grassmannians of finite-dimensional complex vector spaces in some detail, and I hope that we will have time to discuss a certain special infinite-dimensional Grassmannian known as the Sato Grassmannian. For now, we concentrate on the following fundamental relationship between eigenvalues of selfadjoint operators and Grassmannians.

3. Eigenvalues as extremizers

Let A \in \mathrm{End}\mathbf{V} be a selfadjoint operator, and define a function

M_A \colon \mathcal{L}(\mathbf{V}) \to \mathbb{R}


M_A(\mathbf{W}) = \max\limits_{\substack{\mathbf{w} \in \mathbf{W} \\ \|\mathbf{w}\|=1}} \langle \mathbf{w},A\mathbf{w}\rangle.

Thus, M_A(\mathbf{W}) assigns to each subspace \mathbf{W} \leq \mathbf{V} the maximum of the quadratic form associated to A on the unit sphere in \mathbf{W}. In particular, we know from our proof of the Spectral Theorem above that the largest eigenvalue of A is given by

\lambda_1 = M_A(\mathbf{V}).

The following result, known as the Courant-Fisher minimax theorem, generalizes this by giving a variational characterization of each individual eigenvalue \lambda_k of A.

Theorem 2: If A \in \mathrm{End}\mathbf{V} is a selfadjoint operator with eigenvalues \lambda_1 \geq \dots \geq \lambda_N, then for each 1 \leq k \leq N we have

\lambda_k = \min \{M_A(W) \colon W \in \mathrm{Gr}_{N-k+1}(\mathbf{V})\}.

That is, the kth largest eigenvalue of A is the minimum value of M_A over the Grassmannian of (N-k+1)-dimensional subspaces of \mathbf{V}.

Proof: Let \{\mathbf{e}_1,\dots,\mathbf{e}_N\} be an orthonormal basis of \mathbf{V} such that A\mathbf{e}_i=\lambda_i\mathbf{e}_i, \quad 1 \leq i \leq N. For each 1 \leq k \leq N, consider the space

\mathbf{S}_k = \mathrm{Span} \{\mathbf{e}_1,\dots,\mathbf{e}_k\}

spanned by the eigenvectors corresponding to the k largest eigenvalues of A, and also set \mathbf{S}_0 = \mathbf{V}. In the proof of Theorem 1, we showed that

\lambda_k = M_A(\mathbf{S}_{k-1}^\perp),

so that indeed the kth largest eigenvalue \lambda_k is given by evaluating the function M_A at a point of the Grassmannian \mathrm{Gr}_{N-k+1}(\mathbf{V}), namely \mathbf{S}_{k-1}^\perp. It remains to show that the evaluation of M_A on every other subspace of dimension N-k+1 is at least \lambda_k.

To achieve this end, we establish the following squeeze inequality for the values of the quadratic form Q_A on the subspace spanned by the first k eigenvectors: for all \mathbf{v} \in \mathbf{S}_k we have

\lambda_k \langle \mathbf{v},\mathbf{v}\rangle \leq \langle \mathbf{v},A\mathbf{v}\rangle \leq \lambda_1\langle \mathbf{v},\mathbf{v} \rangle.

Indeed, let \mathbf{v}=x_1\mathbf{e}_1+\dots+x_k\mathbf{e}_k be any vector in \mathbf{S}_k. Using orthonormality of the eigenvectors we calculate

\langle \mathbf{v},A\mathbf{v} \rangle = \lambda_1x_1^2+\dots+\lambda_kx_k^2 \leq \lambda_1(x_1^2+\dots+x_k)^2 \leq \lambda_1\langle \mathbf{v},\mathbf{v}\rangle,

and the lower bound is established in exactly the same way.

Now let \mathbf{W} \in \mathrm{Gr}_{N-k+1}(\mathbf{V}) be arbitrary. Since

\dim \mathbf{S}_k + \dim \mathbf{W} = k + N-k+1 = N+1 > \dim \mathbf{V},

the intersection of the subspaces \mathbf{S}_k and \mathbf{W} is at least one-dimensional (see Problem set 1). Let \mathbf{e} be a unit vector in \mathbf{S}_k \cap \mathbf{W}. Since \mathbf{e} \in \mathbf{S}_k, we have Q_A(\mathbf{e}) \geq \lambda_k by the above inequality. But also \mathbf{e} \in \mathbf{W}, which shows that M_A(\mathbf{W}) \geq \lambda_k, as required.

— Q.E.D.

4. Adjoint orbits

Any operator U \in \mathrm{End}\mathbf{V} which preserves the scalar product,

\langle U\mathbf{v},U\mathbf{w} \rangle = \langle \mathbf{v},\mathbf{w} \rangle, \quad \mathbf{v},\mathbf{w} \in \mathbf{V},

is called a unitary operator. It is clear that the set \mathrm{U}(\mathbf{V}) of unitary operators on \mathbf{V} is a group, indeed a subgroup of the general linear group \mathrm{GL}(\mathbf{V}). Note that the former depends on the scalar product but the latter does not; one may equivalently say that invertible operators map bases to bases, while unitary operators map orthonormal bases to orthonormal bases.

Let \lambda_1 \geq \dots \geq \lambda_N be given real numbers, and let \mathcal{O}_\lambda \subset \mathrm{End}\mathbf{V} be the set of selfadjoint operators on \mathbf{V} with this given spectrum.

Theorem 3: Choose any orthonormal basis \mathbf{e}_1,\dots,\mathbf{e}_N \in \mathbf{V}, and let \Lambda \in \mathrm{End}\mathbf{V} be the selfadjoint operator defined by

\Lambda \mathbf{e}_i = \lambda_i \mathbf{e}_i, \quad 1 \leq i \leq N.

Then the isospectral set \mathcal{O}_\lambda \subset \mathrm{End}\mathbf{V} is given by

\mathcal{O}_\lambda = \{U\Lambda U^{-1} \colon U \in \mathrm{U}(\mathbf{V}).

Proof: First we check that \mathcal{O}_\lambda consists entirely of selfadjoint operators having spectrum \lambda. The selfadjointness of U\Lambda U^{-1} follows immediately from the selfadjointness of \Lambda together with the fact that U^*=U^{-1}. To see that U\Lambda U^{-1} has spectrum \lambda, let \mathbf{f}_1,\dots,\mathbf{f}_N be the orthonormal basis of \mathbf{V} defined by

\mathbf{f}_i = U\mathbf{e}_i, \quad 1 \leq i\leq N.

Thent \mathbf{f}_i is an eigenvector of U\Lambda U^{-1} with eigenvalue \lambda_i.

Now let A be a given selfadjoint operator with eigenvalues \lambda_1,\dots,\lambda_N. By the Spectral Theorem, there exists an orthonormal basis \mathbf{f}_1,\dots,\mathbf{f}_N such that A\mathbf{f}_i=\lambda_i\mathbf{f}_i. If U is the unitary operator sending \mathbf{e}_i to \mathbf{f}_i, then

AU^{-1}\mathbf{e}_i = \lambda_i \mathbf{e}_i, \quad 1 \leq  \leq N,

which is equivalent to UAU^{-1}=\Lambda.

— Q.E.D.

In due time, we shall see that both Grassmannians \mathrm{Gr}_k(\mathbf{V} and adjoint orbits \mathcal{O}_\lambda are complex projective varieties.

1 Comment

  1. C says:

    The numbering of the subtitles goes 2. 3. 3. 4. Could you format the links to open in a new tab?

Leave a Reply