Math 202C: Lecture 5

Lectures 5 and 6 of Math 202C give an application of the representation theory of the symmetric group \mathrm{S}(d) as discussed in Math 202B together with the additional material on the YJM elements in the corresponding group algebra \mathcal{A}(d). This application is probabilistic in nature, and concerns the following question: how many times do you have to shuffle a deck of cards in order to randomize it?

In order to analyze this question rigorously, we need to model it mathematically. The first step is to say exactly what we mean by “shuffle.” We consider a deck of d cards labeled 1,\dots,d laid face up on a table, arranged in a single row. Initially, this arrangement is 1,\dots,d, from left to right. At each tick of the clock, we choose a random card, and swap its position with the rightmost card. This gives us a new arrangement of cards which, reading the labels from left to right, corresponds to a permutation in the symmetric group \mathrm{S}(d). This is called the “top-to-random” shuffle. Note that it is possible the randomly chosen card is in fact the rightmost (top) card, and in this case no shuffling occurs.

The top-to-random shuffle is a stochastic process evolving in discrete time. It may equivalently be formulated as a random walk on the the Cayley graph \Gamma(d)=\mathrm{Cay}(\mathrm{S}(d),R) of the symmetric group generated by the set

Y_d=\{(1\ d), (2\ d),\dots, (d-1\ d)\}.

The elements of R are transpositions; they are called “star transpositions” because, if we picture R as the edge set of a graph on \{1,\dots,d\}, we get the star graph with hub vertex d. The simple random walk on \Gamma(d) is the following stochastic process: a particle is initially located at the identity permutation \iota \in \mathrm{S}(d), and at each tick of the clock it jumps to a neighboring vertex, with equal probability of jumping in any direction. This can be encoded algebraically as follows. Let \mathcal{A}(d) be the group algebra of \mathrm{S}(d), equipped with the scalar product in which the permutation basis is orthonormal. Identify the generating set with the sum of its elements,

Y_d (1\ d) +\dots +(d-1\ d),

which you will recognize from our previous lectures as the top YJM element in \mathcal{A}(d). The probability that a particle on \Gamma(d) evolving by simple random walk is located at a given permutation \pi \in \mathrm{S}(d) at a given time r \in \mathbb{N}_0 is then equal to the normalized scalar product

P_r(\pi) = \frac{1}{(d-1)^r} \langle \pi, Y_d^r \iota \rangle.

This random walk is not equivalent to the top-to-random shuffle, but rather to a shuffling process in which one of the d-1 cards below the top card is exchanged with the top card. This is not a good way to shuffles cards, because randomness is never achieved; equivalently, the corresponding random walk is not ergodic. A random walk on a graph is said to be ergodic if, for all sufficiently large times, the particle has positive probability of being located at each vertex. This is not the case for simple random walk on \Gamma(d) — at even times, the probability the particle is located at an odd permutation is zero, and vice versa.

A shuffling process should have the property that as r \to \infty we get closer and closer to a uniform distribution, which algebraically is represented by the vector

\mathbf{u} = \frac{1}{d!} \sum\limits_{\pi \in \mathrm{S}(d)} \pi.

More formally, let us introduce the \ell_1-norm on \mathcal{A}(d) corresponding to the permutation basis, which assigns to each \mathbf{a}\in \mathcal{A}(d) the number

\|\mathbf{a}\| = \sum\limits_{\pi \in \mathrm{S}(d)} |\langle \pi,\mathbf{a}\rangle|.

In probability theory, the corresponding distance \|\mathbf{a}_1-\mathbf{a}_2\| is called total variation distance. The lazy simple random walk on \Gamma(d) is the modification of simple random walk in which, at each tick of the clock, the particle either jumps to an adjacent vertex or stays put, with each such event occurring with equal probability. This random walk is generated by the group algebra element \iota + Y_d, which means that the probability the particle is located at \pi \in \mathrm{S}(d) is

P^r(\pi) = \frac{1}{d^r} \langle \pi, (\iota+Y_d)^r\iota \rangle.

This random walk is ergodic, and it follows from the Perron-Frobenius theorem in linear algebra that

\lim\limits_{d\to \infty} P^r(\pi) = \frac{1}{d!}

for each \pi \in \mathrm{S}(d), so that perfect randomness is achieved in the limit of infinitely many shuffles.

Shuffling infinitely many times is not something that can be done in the real world. Therefore what we want is an effective version of the above, which tells us how close our deck is to randomized after r shuffles have been performed. Algebraically, we want bounds on the \ell_1-norm

\|\frac{1}{d^r}(\iota+Y_d)^r\iota - \mathbf{u}\|.

This weeks lectures, which take the form of a typeset document posted to Piazza in the resources section, explain how to do this using the representation theory of the symmetric group, in particular the spectral theory of the YJM elements which we have been discussing. A remarkable insight coming out of the analysis is that the top-to-random shuffle exhibits what is known as the cutoff phenomenon: for small values of r, the total variation distance to uniform remains close to its initial value, and then suddenly crosses a d-dependent cutoff time and becomes exponentially small, indicating that near-perfect randomness has been achieved in finite time.

Math 202C: Lecture 4

In Lecture 3, we introduced the Young-Jucys-Murphy elements Y_1,\dots,Y_d in the group algebra \in \mathcal{A}(d) of the symmetric group \mathrm{S}(d). The YJM elements form a commuting family of elements in the group algebra of the symmetric group \mathrm{S}(d), and have the amazing feature that symmetric polynomials in Y_1,\dots,Y_d always belong to the class algebra \mathcal{Z}(d) despite the non-centrality of the Y_j‘s themselves. This is quite remarkable, as there is no obvious reason why an arbitrary symmetric polynomial f(Y_1,\dots,Y_d) should evaluate to a linear combination of conjugacy classes.

Since f(Y_1,\dots,Y_d) is central, Schur’s Lemma implies that it acts in any given irreducible representation of \mathcal{A}(d) as a scalar operator. More precisely, if (\mathbf{V}^\lambda,R^\lambda) is the irreducible representation of \mathcal{A}(d) corresponding to the partition \lambda \vdash d (more precisely, a representative of the isomorphism class of all such representations), then the operator R^\lambda(f(Y_1,\dots,Y_d) \in \mathrm{End}\mathbf{V}^\lambda is of the form

R^\lambda(f(Y_1,\dots,Y_d)) = \omega_f(\lambda) I_{\mathbf{V}^\lambda},

where I_{\mathbf{V}^\lambda} \in \mathrm{End}\mathbf{V}^\lambda is the identity operator and \omega_f(\lambda) \in \mathbb{R} is a scalar. Note that this scalar must in fact be real, because each Y_i, being a sum of transpositions, is a selfadjoint element of \mathcal{A}(d). Indeed,

R^\lambda(Y_1),\dots,R^\lambda(Y_d) \in \mathrm{End}\mathbf{V}^\lambda

are commuting selfadjoint operators, and hence are simultaneously diagonalizable, i.e. admit a common eigenbasis. A natural question is then how to explicitly describe the eigenvalue \omega_f(\lambda). This question has an amazing answer, which we now describe.

According to Schur’s Lemma, the class algebra \mathcal{Z}(d) may be described as the subalgebra of \mathcal{A}(d) consisting of elements which act as scalar operators in all irreducible representations. Note that we are including here a converse to Schur’s Lemma, namely the statement that if R^\lambda(C) is a scalar operator in \mathrm{End}\mathbf{V}^\lambda for all \lambda \vdash d, then C \in \mathcal{Z}(d); this follows immediately from the fact that the map

\mathcal{F} \colon \mathcal{A}(d) \longrightarrow \bigoplus\limits_{\lambda \vdash d} \mathrm{End}\mathbf{V}^\lambda

defined by

\mathcal{F}(A) = \bigoplus\limits_{\lambda \vdash d} R^\lambda(A)

is an algebra isomorphism (often referred to as the noncommutative Fourier transform). We can generalize this as follows.

As you know from Math 202B, the branching of irreducible \mathcal{A}(d)-modules is multiplicity free. More precisely, we may consider the group algebra \mathcal{A}(d-1) of the symmetric group \mathcal{S}(d-1) as the subalgebra of \mathcal{A}(d) spanned by permutations in \mathcal{S}(d) satisfying \pi(d)=d. The isotopic decomposition of an irreducible \mathcal{A}(d)-module \mathbf{V}^\lambda as an \mathcal{A}(d-1)-module is

\mathbf{V}^\lambda = \bigoplus\limits_{\mu \nearrow \lambda} \mathbf{V}^\mu,

where the direct sum is over all irreducible \mathcal{A}(d-1)-modules \mathbf{V}^\mu such that the Young diagram \lambda\vdash d can be obtained from the Young diagram \mu\vdash d-1 by adding a single cell.

Example 1: For the “staircase” diagram \lambda =(3,2,1), the isotypic decomposition of the irreducible \mathcal{A}(6)-module as an \mathcal{A}(5)-module is

\mathbf{V}^{(3,2,1)} = \mathbf{V}^{(2,2,1)} \oplus \mathbf{V}^{(3,1,1)} \oplus \mathbf{V}^{(3,2)}.

Iterating this construction, we obtain a decomposition of \mathbf{V}^\lambda into a sum of 1-dimensional spaces \mathbf{V}^T indexed by “growth sequences”

T=\mu^1 \nearrow \mu^2 \nearrow \dots \nearrow \lambda,

where \mu^i is a diagram with i-cells, which encode the possible life histories of the multicellular organism \lambda \vdash d starting from the single-celled organism \mu^1=\Box. Obviously, each such growth history corresponds uniquely to a standard Young tableau of shape \lambda — the entry of a given cell encodes the time instant at which that cell was added. Thus, multiplicity-free branching gives us a basis \{\mathbf{e}_T \colon T \in \mathrm{SYT}(\lambda)\} of each irreducible representation \mathbf{V}^\lambda canonically determined up to scaling; this basis is known as the Young basis (or sometimes the Gelfand-Tsetlin basis). By Schur’s lemma, the subalgebra

\mathcal{Y}(d) = \mathrm{Span}(\mathcal{Z}(1),\mathcal{Z}(2),\dots,\mathcal{Z}(d)

coincides with the maximal commutative subalgebra of \mathcal{A}(d) of elements which act diagonally on the Young basis of every irreducible representation \mathbf{V}^\lambda; this algebra is known as the Young subalgebra (or Gelfand-Tsetlin subalgebra) of the group algebra \mathcal{A}(d), and its construction depends crucially on the fact that we have a natural inclusion \mathcal{A}(d-1) \hookrightarrow \mathcal{A}(d). Observe that the YJM elements Y_1,\dots,Y_d belong to the algebra \mathcal{Y}(d) (explaining why is a conceptually optimal way to answer one of the homework problems). The following Theorem is an enormously useful tool, both theoretically and computationally.

Theorem 1 (Diaconis-Greene,Murphy,Okounkov-Vershik): The commutative algebra \mathcal{Y}(d) is exactly the algebra of polynomials in the YJM elements Y_1,\dots,Y_d. Moreover, for each irreducible representation (\mathbf{V}^\lambda,R^\lambda) of \mathcal{A}(d), the action of Y_k on the corresponding Young basis \{\mathbf{e}_T \colon T \in \mathrm{SYT}(d)\} is given by

R^\lambda(Y_k)\mathbf{e}_T = c_T(k) \mathbf{e}_T,

where c_T(k) is the content (i.e. column index minus row index) of the cell in T containing the number k.

Example 2: Consider the group algebra of the symmetric group of degree d=6 acting in the irreducible representation \mathbf{V}^\lambda labeled by the staircase partition \lambda=(3,2,1), and let \mathbf{e}_T \in \mathbf{V}^\lambda be the Young basis vector corresponding to the standard tableau

T = \begin{matrix} 1 & 3 & 6 \\ 2 & 5 & {} \\ 4 & {} & {} \end{matrix}.

Theorem 1 says that \mathbf{e}_T is an eigenvector of each of the operators R^\lambda(Y_1),R^\lambda(Y_2),R^\lambda(Y_3),R^\lambda(Y_4),R^\lambda(Y_5),R^\lambda(Y_6), and that the corresponding eigenvalues are, respectively,


Corollary 1: For any f \in \mathbb{C}[X_1,\dots,X_d]^{\mathrm{S}(d)} and any \lambda \vdash d, we have

\omega_f(\lambda) = f(c(\Box) \colon \Box \in \lambda),

where the right hand side denotes the evaluation of the symmetric polynomial f(X_1,\dots,X_d) on the numbers c(\Box_1),\dots,c(\Box_d), with \Box_1,\dots,\Box_d being an arbitrary enumeration of the cells of \lambda.

Example 2: Corollary 1 tells us that the eigenvalue of the symmetric polynomial

f(Y_1,Y_2,Y_3,Y_4,Y_5,Y_6) = Y_1^2+Y_2^2+Y_3^2+Y_4^2+Y_5^2+Y_6^2

acting in \mathbf{V}^{(3,2,1)} is

\omega_f(\lambda) = f(0,0,-1,1,-2,2) = 10.

We will black-box Theorem 1, and (possibly) return to its proof later, though I will post a resource where you can look it up if you wish. Our goal will be to mine some rather remarkable information from Theorem 1. There are two natural directions in which one might want to go.

First, there is the question of whether evaluation on the YJM elements in fact defines a surjective morphism

\mathbb{C}[X_1,\dots,X_d]^{\mathrm{S}(d)} \longrightarrow \mathcal{Z}(d).

If this were so, then for each \lambda \vdash d there would exist a symmetric polynomial f_\alpha(X_1,\dots,X_d) \in \mathbb{C}[X_1,\dots,X_d]^{\mathrm{S}(d)} such that

C_\alpha = f_\alpha(Y_1,\dots,Y_d,),

i.e. we could express each conjugacy class as a symmetric polynomial function of the YJM elements. This would in turn have the following character-theoretic consequence. One one hand, the eigenvalue \omega_\alpha(\lambda) of the conjugacy class C_\alpha acting in the irreducible representation \mathbf{V}^\lambda is given by

\omega_\alpha(\lambda) = |C_\alpha| \frac{\chi^\lambda_\alpha}{\dim \mathbf{V}^\lambda},

where as always the character \chi^\lambda_\alpha is by definition the trace of the operator R^\lambda(\pi) \in \mathrm{End}\mathbf{V}^\lambda, with \pi an arbitrary permutation in the conjugacy class C_\alpha. Indeed, on one hand we have

\mathrm{Tr} R^\lambda(C_\alpha) = \omega_\alpha(\lambda) \dim \mathbf{V}^\lambda,

since the trace of a scalar operator is simply its unique eigenvalue summed dimension-many times, and on the other hand

\mathrm{Tr} R^\lambda(C_\alpha) = \sum\limits_{\pi \in C_\alpha} \mathrm{Tr} R^\lambda(\pi) = |C_\alpha| \chi^\lambda_\alpha.

Corollary 1 would then give us the formula

\omega_\alpha(\lambda) = f_\alpha(c(\Box) \colon \Box \in \lambda),

which in turn gives us a formula for the irreducible character \chi^\lambda_\alpha as a function of the content alphabet of the Young diagram \lambda. In fact, this is all true, and is a consequence of combining Corollary 1 with the following classical theorem on the class algebra.

Theorem 2 (Farahat-Higman): Every conjugacy class C_\alpha \in \mathcal{Z}(d) can be expressed as a polynomial in the levels L_0,\dots,L_{d-1} of the Cayley graph of \mathrm{S}(d).

A good treatment of this approach to the irreducible characters of the symmetric groups has been given by our colleague Adriano Garsia, and I will post his lecture notes to our Piazza page. There is an inescapable downside to this seemingly miraculous result: the symmetric polynomials f_\alpha(X_1,\dots,X_d) are not given by any simple formula. This is in a sense inevitable, as the computation of the irreducible characters of the symmetric groups is a P^\sharp-complete problem, and consequently the quest for an explicit closed form is quixotic.

Instead, one can take the opposite approach, and start from the fact that there are many interesting (symmetric) polynomials f which we can write down explicitly, and Theorem 1 gives us a way to explicitly calculate the action of f(Y_1,\dots,Y_d) in irreducible representations. It is this direction that we shall follow in the coming lectures.

Math 202C: Lecture 3

In Lecture 2, we established the following unique factorization result for the symmetric group \mathrm{S}(d).

Theorem 1 (Jucys): Every permutation \pi \in \mathrm{S}(d) admits a unique factorization

\pi = (i_1\ j_1) \dots (i_r\ j_r)

into transpositions (i\ j), 1 \leq i<j\leq d, such that

j_1 < \dots < j_r.

Moreover, the number r of factors in this decomposition is r=|\pi|.

We now consider some algebraic ramifications of Jucys’ result. Let \mathcal{A}(d) denote the group algebra of the symmetric group. As you know from Math 202B, this is the d!-dimensional Hilbert space in which \mathrm{S}(d) sits as an orthonormal basis, and which supports the bilinear product inherited from the group law on \mathrm{S}(d). In fact, \mathbb{C}\mathrm{S}(d) is a *-algebra, with antilinear involution given by \pi^*=\pi^{-1}. All of this is true for arbitrary groups, and indeed you can view \mathcal{A} as a functor from the category of groups to the category of unital associative *-algebras which transports each group \mathrm{G} to its corresponding group algebra \mathcal{A}(\mathrm{G}).

Any subset of \mathrm{S}(d) may be identified with a vector in \mathbb{C}\mathrm{S}(d), simply by identifying it with the sum of its elements, and any family of disjoint subsets of \mathrm{S}(d) is linearly independent, since any two of its members are orthogonal in \mathbb{C}\mathrm{S}(d). In particular, for each \alpha \vdash d we may view the corresponding conjugacy class C_\alpha as an element of \mathbb{C}\mathrm{S}(d), the sum of all permutations of cycle type \alpha, and consider the linearly independent set in the group algebra,

\{ C_\alpha \colon \alpha \vdash d\}.

Let \mathcal{Z}(d) denote the subspace of \mathcal{A}(d) spanned by the conjugacy classes, the dimension of which is the number p(d) of partitions of d. Given C_\alpha,C_\beta \in \mathcal{Z}(d), we have

C_\alpha C_\beta = \sum\limits_{\pi \in \mathrm{S}(d)} c_{\alpha\beta}^\pi \pi,


c_{\alpha\beta}^\pi = \left| \{ (\rho,\sigma) \in C_\alpha \times C_\beta \colon \rho\sigma=\pi\}\right|.

Th number c_{\alpha\beta}^\pi in fact depends only on the conjugacy class of \pi, for if \rho\sigma=\pi then (\kappa\rho\kappa^{-1})(\kappa\sigma\kappa^{-1})=\kappa\pi\kappa^{-1} for any \kappa \in \mathrm{S}(d). Thus \mathcal{Z}(d) is a subalgebra of \mathbb{C}\mathrm{S}(d) called the class algebra, which is a particularly appropriate name for something encountered in an algebra class. We thus have

C_\alpha C_\beta = \sum\limits_{\gamma \vdash d} c_{\alpha\beta}^\gamma C_\gamma


c_{\alpha\beta}^\gamma = \left| \{ (\rho,\sigma) \in C_\alpha \times C_\beta \colon \rho\sigma \in C_\gamma\}\right|.

The numbers c_{\alpha\beta}^\gamma are called the connection coefficients (aka structure constants) of the class algebra \mathcal{Z}(d); they are the entries of the p(d) \times p(d) multiplication table for conjugacy classes. More generally, we can take arbitrary polynomials in conjugacy classes, and write them as linear combinations of conjugacy classes, and this often has enumerative meaning. For example, consider the class T=C_{(21^{d-2})} of transpositions. For any r \in \mathbb{N}, we have

T^r = \sum\limits_{j \in \mathrm{Fun}(r,d)}\ \sum\limits_{\substack{i \in \mathrm{Fun}(r,d) \\ i<j \text{ pointwise}}} (i(1)\ j(1)) \dots (i(r)\ j(r)),

and each term in the sum corresponds uniquely to an r-step walk on the Cayley graph \Gamma(d) of \mathrm{S}(d) beginning at \iota. Thus,

T^r = \sum\limits_{\pi \in \mathrm{S}(d)} W^r(\pi) \pi = \sum_{\alpha \vdash d} W^r(\alpha) C_\alpha,

which says that the combinatorial problem of counting r-step walks on the graph \Gamma(d) from the identity permutation to a permutation of cycle type \alpha is the same thing as the algebraic problem of resolving T^r \in \mathcal{Z}(d) into a linear combination of conjugacy classes.

For any \pi \in \mathrm{S}(d), we have

C_\alpha \pi = \sum\limits_\rho \rho\pi \quad\text{ and }\quad \pi C_\alpha= \sum\limits_\rho \pi\rho,

where both sums are over \rho \in C_\alpha, and the two sums are in fact the same since (\pi\rho\pi^{-1})\pi = \pi \rho. Thus, the class algebra is a subalgebra of the center of the group algebra. Conversely, if C \in \mathbb{C}\mathrm{S}(d) is a central element, then \pi C \pi^{-1}=C \pi \pi^{-1}=C, whence C is a linear combination of conjugacy classes, and we conclude that the class algebra \mathcal{Z}(d) coincides with the center of the group algebra \mathbb{C}\mathrm{S}(d). Again, all of the above is true for any group, and you can view \mathcal{Z} as a functor from groups to commutative, unital *-algebras which transports each group \mathrm{G} to its class algebra \mathcal{Z}(\mathrm{G}).

In addition to the conjugacy class themselves, the levels L_r of the Cayley graph of \mathrm{S}(d) belong to the class algebra, and are \mathbb{Z}_2-linear combinations of conjugacy classes,

L_r = \sum\limits_{\substack{\pi \in \mathrm{S}(d) \\ |\pi|=r}} \pi = \sum\limits_{\substack{\alpha \vdash d \\ \ell(\alpha)=d-r}} C_\alpha.

Lifted to the group algebra, Theorem 1 says that these central elements are given by the sums

L_r = \sum\limits_{\substack{j \in \mathrm{Fun}(r,d) \\ j \text{ strictly increasing}}}\ \sum\limits_{\substack{i \in \mathrm{Fun}(r,d) \\ i<j \text{ pointwise}}} (i(1)\ j(1)) \dots (i(r)\ j(r)),

where \mathrm{Fun}(r,d) is the set of functions from [r] to [d]. Equivalently, we have

L_r = \sum\limits_{\substack{ j \in \mathrm{Fun}(r,d) \\ j\text{ strictly increasing}}} Y_{j(1)} \dots Y_{j(r)},

where Y_1,Y_2,\dots,Y_d \in \mathbb{C}\mathrm{S}(d) are the sums

Y_j = \sum\limits_{\{i \in [d] \colon i<j\}} (i\ j), \quad 1 \leq j \leq d.

Note that Y_1 is an empty sum, hence equal to the zero element in \mathbb{C}\mathrm{S}(d). A nice way to picture these sums is to take the conjugacy class T \subset \mathrm{S}(d) of transpositions and view it as the d \times d strictly upper triangular matrix over \mathbb{C}\mathrm{S}(d) whose entry in row i and column j is the transposition (i\ j) whenever i<j, and 0 \in \mathbb{C}\mathrm{S}(d) when not. Then Y_j is simply the jth column sum of the matrix T. For example, in the case d=4, we have

\begin{bmatrix} 0 & (1\ 2) & (1\ 3) & (1\ 4) \\ 0 & 0 & (2\ 3) & (2\ 4) \\ 0 & 0 & 0 & (3\ 4) \\ 0 & 0 & 0 & 0\end{bmatrix},


Y_1 = 0 \\ Y_2 = (1\ 2) \\ Y_3 = (1\ 3) + (2\ 3) \\ Y_4=(1\ 4)+(2\ 4) +(3\ 4).

The transpositions sums Y_1,Y_2,\dots,Y_d are known as the Young-Jucys-Murphy elements of the group algebra \mathbb{C}\mathrm{S}(d). Note that the YJM elements are non-central, since they correspond to proper subsets of a conjugacy class (namely, the class T), and therefore cannot be expressed as linear combinations of conjugacy classes. Nevertheless, we have the following.

Proposition 2: The YJM elements commute with one another.

Proof: Problem set 2.

In view of the commutativity of the elements Y_1,\dots,Y_d, we may equivalently write Theorem 1 as

L_r = \sum\limits_{\substack{R \subseteq [d] \\ |R|=r}} \prod\limits_{j \in R} Y_j,

which says that L_r is the sum of all squarefree monomials of degree r in the commuting elements Y_1,\dots,Y_d. But this is a polynomial you know from Math 202B: it is the elementary symmetric polynomial e_r of degree r in d variables. In particular, Theorem 1 is equivalent to the following polynomial identity in the class algebra \mathcal{Z}(d).

Theorem 2: L_r = e_r(Y_1,\dots,Y_d).

As you know from Math 202B, the elementary symmetric polynomials are important because they express the coefficients of a polynomial as functions of its roots (the inverse problem, expressing the roots as functions of the coefficients, is harder). Thus Theorem 2 is equivalent to the following factorization in the class algebra.

Theorem 3: For any \hbar \in \mathbb{C}, we have

\sum\limits_{r=0}^d \hbar^r L_r = \prod\limits_{i=1}^d (\iota + \hbar Y_i).

In addition to being a direct incarnation of the relationship between the roots of a polynomial and its coefficients, there is another reason why the elementary symmetric polynomials are deserving of their name. Let X_1,\dots,X_d be commuting indeterminates, and consider the corresponding polynomial algebra \mathbb{C}[X_1,\dots,X_d]. As you know from Math 202B, polynomials f \in \mathbb{C}[X_1,\dots,X_d] satisfying

f(X_{\sigma(1)},\dots,X_{\sigma(d)}) = f(X_1,\dots,X_d) \quad\text{for all }\sigma \in \mathrm{S}(d)

are called symmetric polynomials, and their totality forms a subalgebra of \mathbb{C}[X_1,\dots,X_d] denoted \mathbb{C}[X_1,\dots,X_d]^{\mathrm{S}(d)}. This subalgebra is itself a polynomial algebra.

Theorem 3 (Newton): We have \mathbb{C}[X_1,\dots,X_d]^{\mathrm{S}(d)} = \mathbb{C}[e_1,\dots,e_d].

Theorem 3 is often referred to as the fundamental theorem on symmetric polynomials. A proof is given in Professor Sam’s 202B notes (Theorem 3.3.4). Combining Theorem 3 with Theorem 2, we learn the following: even though the JM elements Y_1,\dots,Y_d do not themselves belong to the class algebra \mathcal{Z}(d), any symmetric function of them does.

Theorem 4: For any f \in \mathbb{C}[X_1,\dots,X_d]^{\mathrm{S}(d)}, the group algebra element f(Y_1,\dots,Y_d) \in \mathbb{C}\mathrm{S}(d) is central.

Theorem 4 is quite remarkable, and has many interesting consequences. Here is one. Suppose we weaken our definition of strictly monotone walks on the edge-labeled Cayley graph \Gamma(d) of the symmetric group \mathrm{S}(d) to just plain monotone walks, meaning walks which have the feature that the labels of the edges they traverse form a weakly increasing sequence.

Theorem 5: The number \vec{W}^r(\pi) depends only on the conjugacy class of \pi.

Proof: The complete homogeneous symmetric polynomial of degree r in commuting variables X_1,\dots,X_d is the sum of all monomials of degree r in these variables (hence the name “complete”). Equivalently,

h_r(X_1,\dots,X_d) = \sum\limits_{\substack{j \in \mathrm {Fun}(r,d) \\ j\text{ weakly increasing}}} X_{j(1)} \dots X_{j(d)}.

Evaluating h_r on the YJM elements, we obtain

h_r(Y_1,\dots,Y_d) = \sum\limits_{\substack{j \in \mathrm{Fun}(r,d) \\ j \text{ weakly increasing}}}\ \sum\limits_{\substack{i \in \mathrm{Fun}(r,d) \\ i<j \text{ pointwise}}} (i(1)\ j(1)) \dots (i(r)\ j(r)),

which is precisely the sum of all weakly monotone walks on \Gamma(d) beginning at \iota. Thus,

h_r(Y_1,\dots,Y_d) = \sum\limits_{\pi \in \mathrm{S}(d)} \vec{W}^r(\pi) \pi.

But since h_r is a symmetric polynomial, Theorem 4 guarantees that h_r(Y_1,\dots,Y_d) \in \mathcal{Z}(d), so that we must have \vec{W}^r(\pi)=\vec{W}^r(\kappa\pi\kappa^{-1}) for all \pi,\kappa \in \mathrm{S}(d), as claimed.

— Q.E.D.

No direct combinatorial proof of Theorem 5 is known.

Exercise: In the symmetric group \mathrm{S}(4), write down all factorizations of the full cycle \gamma = (1\ 2\ 3\ 4) into three transpositions; identify the factorizations which are weakly monotone, and those which are strictly monotone. Repeat this procedure with the full cycle \gamma'=(1\ 3\ 2\ 4).

Math 202C: Lecture 2

Before leaving the world of linear algebra for multilinear algebra (i.e. tensors), let us reformulate the spectral theorem entirely “upstairs” in \mathrm{End} \mathbf{V} (as in Lecture 1, \mathbf{V} is complex vector space of dimension N < \infty equipped with a scalar product).

Theorem 1: If A \in \mathrm{End}\mathbf{V} is selfadjoint, then there is a positive integer n \leq N, real numbers \lambda_1>\dots>\lambda_n, and selfadjoint operators P_1,\dots,P_n in \mathrm{End}\mathbf{V} satisfying

P_i^2 = P_i, \quad 1 \leq i \leq n


P_iP_j = 0 \quad 1 \leq i \neq j \leq n

such that

A = \lambda_1 P_1 + \dots + \lambda_n P_n.

The operators P_1,\dots,P_n above are called orthogonal idempotents. The “idempotent” part is P_i^2=P_i, and this term is applied more generally in ring theory to refer to anything equal to its own square. The “orthogonal” part is P_iP_j = 0, the zero operator in \mathrm{End}\mathbf{V}, when i \neq j. This is a useful representation of A, because orthogonal idempotence immediately implies that

A^k = \lambda_1^k P_1 + \dots + \lambda_n^k P_n.

Given your Math 202A background, you have realized immediately that Theorem 1 is equivalent to the eigenvalue/eigenvector form of the Spectral Theorem discussed in Lecture 1, because the set \mathcal{P}(\mathbf{V}) \subset \mathrm{End}\mathbf{V} of selfadjoint idempotents acting in \mathbf{V} is in canonical bijection with the lattice \mathcal{L}(\mathbf{V}) of subspaces of \mathbf{V}, since these are exactly the operators which act by orthogonal projection onto a subspace: for any W \in \mathcal{L}(\mathbf{V}), the corresponding P \in \mathcal{P}(\mathbf{V}) maps \mathbf{v} \in \mathbf{V} to the nearest point of the subspace \mathbf{W}. The correspondence between selfadjoint idempotents — which are often simply called orthogonal projections — and subspaces even holds in the infinite-dimensional setting provided \mathbf{V} is complete with respect to the norm \|\mathbf{v}\|=\sqrt{\langle \mathbf{v},\mathbf{v}\rangle}, i.e. \mathbf{V} is a Hilbert space, provided we restrict to closed subspaces. Thus the Grassmannian \mathrm{Gr}_k(\mathbf{V}) is for all practical purposes equivalent to the set \mathrm{Pr}_k(\mathbf{V}) of selfadjoint idempotents/orthogonal projections of rank k. Theorem 1 above is obtained from the eigenvalue/eigenvector form of the Spectral Theorem by replacing the eigenspaces of A with the orthogonal projections onto those spaces, and conversely Theorem 1 gives the eigenvalue/eigenvector form of the Spectral Theorem with \lambda_1>\dots>\lambda_n being the distinct eigenvalues of A whose corresponding eigenspaces are the images of the orthogonal idempotents P_1,\dots,P_n.

Let us close out this brief recap of eigenvectors and eigenvalues (i.e. Math 202A material) by recalling the algebraic criterion for diagonalizability. An operator A \in \mathrm{End}\mathbf{V} is said to be normal if it commutes with its adjoint, AA^*=A^*A. Obviously, selfadjoint operators X are normal, XX^*=X^*X=X^2, and so are unitary operators U, since UU^*=U^*U=I.

Theorem 2: Given A \in \mathrm{End} \mathbf{V}, there exists an orthonormal basis \mathbf{e}_1,\dots,\mathbf{e}_N of \mathbf{V} such that A\mathbf{e}_i=\lambda_i\mathbf{e}_i for each 1 \leq i \leq N, where \lambda_1,\dots,\lambda_N \in \mathbb{C} are scalars, if and only if A is normal. Equivalently, A is a \mathbb{C}-linear combination of pairwise orthogonal selfadjoint idempotents if and only if AA^*=A^*A.

After revisiting this essential piece of 202A technology, my plan was to jump into multilinear algebra — tensor products, exterior products, symmetric products, and their applications to eigenvalue inequalities — and work up to the construction of Schur functors, which interpolate between these using the representation theory of the symmetric groups. This would be an application of the material you learned in 202B wherein the goal is to enhance the material you learned in Math 202A. However, I have since changed my mind. After reviewing Professor Sam’s 202B lecture notes (which I am posting to Piazza for your reference), and speaking briefly with Steven, I have realized that you already have some exposure to this material. Even though there is much left to be done in this direction, namely starting from Section 6 in Steven’s notes and working up to a proof of Theorem 6.4.1 via Schur-Weyl duality, I have ultimately decided to leave this for the second half of the course, and begin instead with some very natural and appealing applications of symmetric group representations which we can sink our teeth into immediately. Also, this will give you exposure to material which cannot be looked up in textbooks, or at least cannot be found presented in the way we’ll do it.

Definition 1: For each d \in \mathbb{N}, the symmetric group \mathrm{S}(d) of degree d consists of the set of bijective functions

\sigma \colon [d] \to [d],

i.e. automorphisms of [d]:=\{1,\dots,d\}, with the group law being composition of functions. Elements of \mathrm{S}(d) are called permutations.

As you know from Math 202B (and likely much before), any permutation \sigma can be factored into a product of disjoint cyclic permutations, and this factorization is unique up to the order of the factors. Our starting point will be a rather different unique factorization theorem for permutations. To explain, let us consider \mathrm{S}(d) from the perspective of geometric group theory. Let T \subset \mathrm{S}(d) denote the set of transpositions, i.e T is the conjugacy class of 2-cycles,

(i\ j), \quad 1 \leq i<j \leq d.

As you are also no doubt aware, T generates \mathrm{S}(d), which means that any permutation \sigma \in \mathrm{S}(d) can be written as a product of transpositions,

\sigma = \tau_1 \dots \tau_r, \quad r \in \mathbb{N}_0,\ \tau_i \in T.

This representation is very far from being unique, and indeed the question of counting the number of factorizations of a given permutation into a given number of transpositions is a classical problem in algebraic combinatorics known as the Hurwitz enumeration problem; despite its elementary formulation, this question has deep ties to enumerative algebraic geometry and mathematical physics, some of which we will hopefully be able to discuss. Nevertheless, for each \sigma there is a minimal value of r for which such a factorization exists, and this is called the word norm of \sigma and denoted |\sigma|_T, where the subscript reminds that this quantity depends on the choice of generating set T. We will omit this subscript in order to lighten the notation, and if we switch to a different generating set T' inducing a different word norm on \mathrm{S}(d), we will say so loud and clear. The following is elementary combinatorics, but you should take the time to prove it.

Proposition 1: The length r of any factorization \sigma=\tau_1\dots\tau_r of a permutation \sigma \in \mathrm{S}(d) into transpositions satisfies r=|\sigma|+2k for some k \in \mathbb{N}_0. That is, all factorizations of \sigma into transpositions have the same parity.

Somewhat less elementary but no less important is the fact that there is a tight connection between the word norm of a permutation \sigma and the number of factors c(\sigma) in its unique decomposition into disjoint cycles.

Proposition 2 (join-cut relation): We have |\sigma|=d-c(\sigma).

The key idea in the proof of Proposition 2 is expressed in its name: multiplying a permutation by a transposition will either join two of its cycles into one, or cut one of its cycles into two. Thus,

The word norm makes the symmetric group \mathrm{S}(d) into a metric space space.

Proposition 3: The function

\mathrm{d} \colon \mathrm{S}(d) \times \mathrm{S}(d) \longrightarrow \mathbb{N}_0

defined by \mathrm{d}(\rho,\sigma) = |\rho^{-1}\sigma| is a metric.

We may thus for example speak of the open ball of radius r centered at a given permutation \rho,

B_r(\rho) = \{\sigma \in \mathrm{S}(d) \colon \mathrm{d}(\rho,\sigma) < r\},

and its boundary, the sphere

\partial B_r(\rho) = \{\sigma \in \mathrm{S}(d) \colon \mathrm{d}(\rho,\sigma)= r\}.

The metric \mathrm{d} is nothing but the graph theory distance (or geodesic distance) on \Gamma(d) = \mathrm{Cay}(\mathrm{S}(d),T), the Cayley graph of the symmetric group as generated by the conjugacy class T \subset \mathrm{S}(d) of transpositions. The vertex set of this graph is \mathrm{S}(d) itself, and two permutations \rho,\sigma \in \mathrm{S}(d) are joined by an edge if and only if there is \tau \in T such that \sigma = \rho\tau. Note that this is an undirected graph because \tau^2=\iota is the identity permutation for every \tau \in T, so that \sigma=\rho\tau is equivalent to \rho=\sigma\tau. Note also that \Gamma(d) is a regular graph of degree |T|={d \choose 2}. Another feature of \Gamma(d) is that it is a graded graph: it decomposes into a disjoint union of independent sets or “levels,”

\mathrm{S}(d) = \bigsqcup\limits_{r=0}^{d-1} L_r,

with L_r being the sphere of radius r centered at the identity permutation \iota. By the join-cut relation, L_r may also be described as the set of permutations whose disjoint cycle decomposition contains d-r factors.

The Hurwitz enumeration problem has an equivalent formulation in terms of the Cayley graph: it asks for the number W^r(\pi) of r-step walks from \iota to \pi on \Gamma(d). This description also points to a refinement of the Hurwitz problem. Let us introduce the edge-labeling on \Gamma(d) in which each edge corresponding to the transposition (i\ j) is marked by j, the larger of the two numbers it interchanges. We will call this the Biane-Stanley edge labeling of the symmetric group, after Philippe Biane and Richard Stanley, in whose work it appears implicitly.

\mathrm{S}(4) with Biane-Stanley labeling.

Given an r-step walks \rho \to \sigma on \mathrm{S}(d), we define its signature to be the corresponding sequence j_1,\dots,j_r of edge labels. We say that a walk is strictly monotone if its signature is a strictly increasing sequence. Thus, an r-step strictly monotone walk from \iota to \sigma is the same thing as a strictly monotone factorization of \sigma into r transpositions, i.e. a factorization

\sigma = (i_1\ j_1) \dots (i_r\ j_r)

such that

j_1 < \dots < j_r.

Theorem (unique monotone factorization): Every permutation \pi \in \mathrm{S}(d) admits a unique strictly monotone factorization, and the length of this factorization is equal to |\pi|. That is,

\vec{\vec{W}}^r(\pi) = \begin{cases} 1, \text{ if }r=|\pi| \\ 0, \text{ otherwise}\end{cases}.

Proof: The proof is by induction on d.

In the base case we have \mathrm{S}(2) = \{\iota,\tau\} with \iota = (1)(2) and \tau=(1\ 2). The complete answer to the Hurwitz enumeration problem for d=2 is thus given by

W^r(\iota)=\begin{cases} 1,\text{ if } r \text{ even} \\ 0, \text{ if }r \text{ odd } \end{cases}


W^r(\tau)=\begin{cases} 1, \text{ if } r \text{ odd } \\0, \text{ if }r\text{ even}\end{cases}.

This corresponds to the fact that if \pi \in \mathrm{S}(2) and r \in \mathbb{N}_0 have the same parity, then we have the unique length r factorization

\pi=\underbrace{(1\ 2) \dots (1\ 2)}_{r \text{ factors}},

but no factorization exists if \pi and r have opposite parity. Imposing the strictly monotone constraint whittles this down to

\vec{\vec{W}}^r(\iota)=\begin{cases} 1, \text{ if }r=0 \\ 0, \text{ otherwise} \end{cases}


\vec{\vec{W}}^r(\tau) = \begin{cases} 1, \text{ if } r=1 \\ 0, \text{ otherwise} \end{cases}.

For the inductive step, there are two cases to consider. Let d>2 and let \pi \in \mathrm{S}(d) be any permutation. If d is a fixed point of \pi, i.e. \pi(d)=d, then \pi \in \mathrm{S}(d-1) has a unique strictly monotone factorization by the induction hypothesis. If on the other hand \pi \in \mathrm{S}(d) is one of the

d! - (d-1)! = (d-1)(d-1)!

permutations which contains d in its support (meaning \pi(d)\neq d), then \pi is of the form

\pi = \pi'(i\ d)

for a unique pair \pi' \in \mathrm{S}(d-1) and i \in \{1,\dots,d-1\}. By the induction hypothesis, \pi' admits a unique strictly monotone factorization of length equal to |\pi'|_{d-1}, and the result follows.

—- Q.E.D.

Graphically, the unique monotone factorization theorem gives a presentation of the symmetric group \mathrm{S}(d) as a starlike tree, which we call the Jucys tree after the Lithuanian physicist Algimantas Adolfas Jucys (not to be confuse with his father, the Lithuanian physicist Adolfas Jucys), in whose work it appears implicitly. We will embed this construction in the group algebra \mathbb{C}\mathrm{S}(d) in Lecture 3, and start using it in conjunction with linear representations of the group algebra.

Jucys tree for \mathrm{S}(4).

Math 202C: Lecture 1

Throughout this lecture, \mathbf{V} is a finite-dimensional complex vector space equipped with a scalar product \langle \cdot,\cdot \rangle, which is by convention a linear function of its second argument. The scalar product on \mathbf{V} determines various symmetry classes in \mathrm{End}\mathbf{V}, the algebra of linear operators A \colon \mathbf{V} \to \mathbf{V}, and the study of these symmetry classes is one of the main components of Math 202A. In particular, recall that an operator A \in \mathrm{End}\mathbf{V} is said to be selfajdoint (or Hermitian) if

\langle \mathbf{v},A\mathbf{w} \rangle = \langle A\mathbf{v},\mathbf{w}\rangle, \quad \mathbf{v},\mathbf{w} \in \mathcal{H}.

Hermitian operators are the subject of the Spectral Theorem, which may be formulated as follows.

Theorem 1: If A \in \mathrm{End} \mathbf{V} is selfadjoint, then there exists an orthonormal basis \mathbf{e}_1,\dots\mathbf{e}_N of \mathbf{V} such that A\mathbf{e}_i = \lambda_i\mathbf{e}_i, where \lambda_1 \geq \dots \geq \lambda_N are real numbers.

Theorem 1 is the eigenvalue/eigenvector form of the Spectral Theorem. We will give a proof of this result which iteratively constructs the eigenvalues/eigenvectors of A as the maxima/maximizers of the (necessarily real-valued) quadratic form Q_A(\mathbf{v}) = \langle \mathbf{v},A\mathbf{v}\rangle on a nested sequence of compact sets in \mathbf{V}. Instead of invoking the Fundamental Theorem of Algebra to claim the existence of a root of the characteristic polynomial, this argument calls the Extreme Value Theorem as a subroutine.

Lemma 1: If B \colon \mathbf{V} \to \mathbf{V} is a selfadjoint operator such that the affiliated quadratic form Q_B(\mathbf{v})=\langle \mathbf{v},B\mathbf{v}\rangle is nonnegative, then any zero of Q_B \colon \mathbf{V} \to \mathbb{R} is in the kernel of B.

Proof: In fact, the argument does not use the finite-dimensionality of \mathbf{V}. Let \mathbf{z} \in \mathbf{V} be such that Q_B(\mathbf{z})=0. Consider a real perturbation of this zero in an arbitrary direction, i.e. consider Q_B(\mathbf{z}+t\mathbf{h}), where t \in \mathbb{R} is arbitrary and \mathbf{h} \in \mathbf{V} is a fixed unit vector. We then have

Q_B(\mathbf{z}+t\mathbf{h}) = \langle \mathbf{z}+t\mathbf{h},B(\mathbf{z}+t\mathbf{h})\rangle=Q_B(\mathbf{z})+t\langle \mathbf{z},B\mathbf{h} \rangle+t\langle\mathbf{h},B\mathbf{z}\rangle+t^2Q_B(\mathbf{h}).

The first term is zero by hypothesis. The middle terms are the real number t times

\langle \mathbf{z},B\mathbf{h} \rangle+\langle\mathbf{h},B\mathbf{z}\rangle = \langle \mathbf{z},B\mathbf{h} \rangle+\langle B\mathbf{h},\mathbf{z}\rangle = \langle \mathbf{z},B\mathbf{h} \rangle+\overline{\langle \mathbf{z},B\mathbf{h} \rangle} = 2 \Re \langle \mathbf{z},B\mathbf{h} \rangle.

Thus by hypothesis the function

f(t)=2t \Re \langle \mathbf{z},B\mathbf{h} \rangle +t^2Q_B(\mathbf{h})

is a nonnegative function of the real variable t. This function has the asymptotics

f(t) \sim 2t \Re \langle \mathbf{z},B\mathbf{h} \rangle

as t \to 0, so it must be the case that \Re \langle \mathbf{z},B\mathbf{h} \rangle=\Re \langle B\mathbf{z},\mathbf{h} \rangle=0, otherwise f(t) would change sign at t=0, which it does not, being a nonnegative function. Repeating the above with a pure imaginary perturbation it\mathbf{h}, we get that \Im \langle \mathbf{z},B\mathbf{h} \rangle =\Im \langle B\mathbf{z},\mathbf{h} \rangle=0 as well. Thus B\mathbf{z} is orthogonal to every unit vector \mathbf{h}, whence it must be the zero vector, B\mathbf{z}=\mathbf{0}_\mathbf{V}.

— Q.E.D.

Proof of Theorem 1: Since \mathbf{V} is finite-dimensional, its unit sphere

S_1 = \{\mathbf{v} \in \mathbf{V} \colon \|\mathbf{v}\|=1\}

is compact, and the function Q_A \colon \mathbf{V} \to \mathbb{R} is continuous. Thus, by the Extreme Value Theorem, there exists a unit vector \mathbf{e}_1 \in \mathbf{V} at which Q_A attains its maximum value on S_1, i.e. we have

Q_A(\mathbf{v}) \leq \lambda_1

for all unit vectors \mathbf{v}. But this is equivalent to

\langle \mathbf{v},A\mathbf{v}\rangle \leq \lambda_1 \langle \mathbf{v},\mathbf{v}\rangle

for all unit vectors \mathbf{v}, and in this form it is clear that the inequality actually holds for arbitrary vectors \mathbf{v} \in \mathbf{V}, i.e. we have

\langle \mathbf{v},B_1\mathbf{v} \rangle \geq 0

for all \mathbf{v} \in \mathbf{V}, where

B_1=\lambda_1I -A.

Note that B_1 \in \mathrm{End}\mathbf{V} is selfadjoint (indeed, selfadjoint operators in \mathrm{End}\mathbf{V} form a real vector space), and the above says that the associated quadratic form Q_{B_1} is nonnegative. Since Q_{B_1}(\mathbf{e}_1)=0 by construction, we may invoke Lemma 1 to conclude that \mathbf{e}_1 is in the kernel of B_1=\lambda_1I-A, i.e. \mathbf{e}_1 is an eigenvector of A with eigenvalue \lambda_1.

Consider now the orthogonal complement

\mathbf{e}_1^\perp = \{\mathbf{v} \in \mathbf{V} \colon \langle \mathbf{e}_1,\mathbf{v} \rangle = 0\},

i.e. the (N-1)-dimensional hyperplane in \mathbf{V} perpendicular to the line through \mathbf{e}_1. Let S_2 be the unit sphere in \mathbf{e}_1^\perp, let \lambda_2\leq \lambda_1 be the maximum of the quadratic form Q_A on S_2 \subset S_1, and let \mathbf{e}_2 \in S_2 be a point at which the maximum is achieved. By exactly the same argument as above, we get that \mathbf{e}_2 is an eigenvector of A with eigenvalue \lambda_2. Indeed, we can iterate the argument N times to obtain an orthonormal set \{\mathbf{e}_1,\dots,\mathbf{e}_N\} of eigenvectors with eigenvalues \lambda_1 \geq \dots \geq \lambda_N.

— Q.E.D.

2. The subspace lattice

Let \mathcal{L}(\mathbf{V}) denote the set of all subspaces of \mathbf{V}, including the trivial subspace \{\mathbf{0}_\mathbf{V}\} and \mathbf{V} itself. This set carries a natural partial order: given \mathbf{W}_1,\mathbf{W}_2 \in \mathcal{L}(\mathbf{V}), we declare \mathbf{W}_1 \leq \mathbf{W}_2 if and only if \mathbf{W}_1 is a subspace of \mathbf{W}_2. In fact, \mathcal{L}(\mathbf{V}) is a graded poset in which the rank of a subspace \mathbf{W} \in \mathcal{L}(\mathbf{V}) is its dimension \dim \mathbf{W}. The graded poset \mathcal{L}(\mathbf{V}) is moreover a lattice: the largest subspace contained in each of two given subspaces \mathbf{W}_1 and \mathbf{W}_2 is their intersection, and the smallest subspaces containing both of them is their span. Finally, the lattice \mathcal{L}(\mathbf{V}) is self-dual: the map \mathbf{W} \to \mathbf{W}^\perp sending a subspace to its orthogonal complement is an order-reversing involution.

In formulating the above, we have just encountered our first example of a functor: we can view \mathcal{L} as a covariant functor from the category of vector spaces to the category of posets. Indeed, if \mathbf{V}_1,\mathbf{V}_2 are vector spaces and T \in \mathrm{Hom}(\mathbf{V}_1,\mathbf{V}_2) is a linear transformation, we get an order-preserving function \mathcal{L}(T) \in \mathrm{Hom}(\mathcal{L}(\mathbf{V}_1),\mathcal{L}(\mathbf{V}_2)) defined by


In this course we will be considering quite a few more functors, most of which will be endofunctors from the category of vector spaces to itself. The most basic of these are tensor powers, symmetric powers, and exterior powers, which are the basic operations of multilinear algebra and are closely related to the physical concept of Fock space, which is intended to capture the notion of symmetry classes of elementary particles in quantum mechanics — bosons and fermions and such.

3. Flags and Grassmannians

Very loosely speaking, vector spaces are continuous versions of sets. A finite set E=\{\mathbf{e}_1,\dots,\mathbf{e}_N\} might represent the possible states of some physical system, while the free vector space \mathbf{V}=\mathbb{C}[E] over this set allows for superpositions of states, i.e. linear combinations of them. Associated to E is its lattice of subsets \mathcal{L}(E), where meet and join are intersection and union. This is called a Boolean lattice, and it is graded by cardinality rather than dimension. Sometimes \mathcal{L}(E) is referred to as a hypercube, because its elements can be thought of as length N bitstrings, and these may in turn be thought of as the vertices of a polytope in \mathbf{R}^N which is a cube when N=3.

In a general poset, a chain is a totally ordered subset, while an antichain is a subset which does not contain any comparable pairs of elements. A chain or antichain is said to be saturated if adding any new element will cause it to no longer be a chain or antichain. It is easy to spot saturated chains in the Boolean lattice \mathcal{L}(E) — they are just sequences E_0,E_1,\dots,E_N of subsets of E such that |E_k|=k, and there are factorially many of these. It is equally easy to spot N+1 saturated antichains, namely the collections A_k \subset \mathcal{L}(E) consisting of all subsets of E of cardinality k. The cardinality of A_k itself is then {N \choose k}, and it is a famous result in extremal combinatorics that the size of the largest antichain in the Boolean lattice \mathcal{L}(E) is the largest number in the Nth row of Pascal’s triangle.

The lattice of subspaces of a finite-dimensional vector space is in many ways analogous to the lattice of subsets of a finite set. Indeed, if one could define a field with one element in a satisfactory way, then the two notions would coincide for vector spaces over \mathbb{F}_1. This is problematic, but one can analyze the subspace lattice of a vector space defined over \mathbb{F}_q without difficulty, and count chains, antichains, etc. For example, the antichain consisting of k-dimensional subspaces is the Gaussian binomial coefficient {N \choose k}_q, which degenerates to the ordinary binomial coefficient in the q \to 1 limit, almost as if \mathbf{F}_1 really did exist…

Anyway, we are working over \mathbb{C}, so everything is infinite. Chains in \mathcal{L}(\mathbf{V}) are typically referred to as flags, though this term is more often than not reserved for chains that contain the trivial subspace and the \mathbf{V} itself. Saturated chains are called complete flags: they are sequences

\mathbf{V}_0 \leq \mathbf{V}_1 \leq \dots \leq \mathbf{V}_N

such that \dim\mathbf{V}_k =k. The analogue of the antichain A_k in the Boolean lattice consisting of subsets of cardinality k is the collection of k-dimensional subspaces of \mathbf{V}; these antichains are called Grassmannians and denoted \mathrm{Gr}_k(\mathbf{V}). We are going to discuss Grassmannians of finite-dimensional complex vector spaces in some detail, and I hope that we will have time to discuss a certain special infinite-dimensional Grassmannian known as the Sato Grassmannian. For now, we concentrate on the following fundamental relationship between eigenvalues of selfadjoint operators and Grassmannians.

3. Eigenvalues as extremizers

Let A \in \mathrm{End}\mathbf{V} be a selfadjoint operator, and define a function

M_A \colon \mathcal{L}(\mathbf{V}) \to \mathbb{R}


M_A(\mathbf{W}) = \max\limits_{\substack{\mathbf{w} \in \mathbf{W} \\ \|\mathbf{w}\|=1}} \langle \mathbf{w},A\mathbf{w}\rangle.

Thus, M_A(\mathbf{W}) assigns to each subspace \mathbf{W} \leq \mathbf{V} the maximum of the quadratic form associated to A on the unit sphere in \mathbf{W}. In particular, we know from our proof of the Spectral Theorem above that the largest eigenvalue of A is given by

\lambda_1 = M_A(\mathbf{V}).

The following result, known as the Courant-Fisher minimax theorem, generalizes this by giving a variational characterization of each individual eigenvalue \lambda_k of A.

Theorem 2: If A \in \mathrm{End}\mathbf{V} is a selfadjoint operator with eigenvalues \lambda_1 \geq \dots \geq \lambda_N, then for each 1 \leq k \leq N we have

\lambda_k = \min \{M_A(W) \colon W \in \mathrm{Gr}_{N-k+1}(\mathbf{V})\}.

That is, the kth largest eigenvalue of A is the minimum value of M_A over the Grassmannian of (N-k+1)-dimensional subspaces of \mathbf{V}.

Proof: Let \{\mathbf{e}_1,\dots,\mathbf{e}_N\} be an orthonormal basis of \mathbf{V} such that A\mathbf{e}_i=\lambda_i\mathbf{e}_i, \quad 1 \leq i \leq N. For each 1 \leq k \leq N, consider the space

\mathbf{S}_k = \mathrm{Span} \{\mathbf{e}_1,\dots,\mathbf{e}_k\}

spanned by the eigenvectors corresponding to the k largest eigenvalues of A, and also set \mathbf{S}_0 = \mathbf{V}. In the proof of Theorem 1, we showed that

\lambda_k = M_A(\mathbf{S}_{k-1}^\perp),

so that indeed the kth largest eigenvalue \lambda_k is given by evaluating the function M_A at a point of the Grassmannian \mathrm{Gr}_{N-k+1}(\mathbf{V}), namely \mathbf{S}_{k-1}^\perp. It remains to show that the evaluation of M_A on every other subspace of dimension N-k+1 is at least \lambda_k.

To achieve this end, we establish the following squeeze inequality for the values of the quadratic form Q_A on the subspace spanned by the first k eigenvectors: for all \mathbf{v} \in \mathbf{S}_k we have

\lambda_k \langle \mathbf{v},\mathbf{v}\rangle \leq \langle \mathbf{v},A\mathbf{v}\rangle \leq \lambda_1\langle \mathbf{v},\mathbf{v} \rangle.

Indeed, let \mathbf{v}=x_1\mathbf{e}_1+\dots+x_k\mathbf{e}_k be any vector in \mathbf{S}_k. Using orthonormality of the eigenvectors we calculate

\langle \mathbf{v},A\mathbf{v} \rangle = \lambda_1x_1^2+\dots+\lambda_kx_k^2 \leq \lambda_1(x_1^2+\dots+x_k)^2 \leq \lambda_1\langle \mathbf{v},\mathbf{v}\rangle,

and the lower bound is established in exactly the same way.

Now let \mathbf{W} \in \mathrm{Gr}_{N-k+1}(\mathbf{V}) be arbitrary. Since

\dim \mathbf{S}_k + \dim \mathbf{W} = k + N-k+1 = N+1 > \dim \mathbf{V},

the intersection of the subspaces \mathbf{S}_k and \mathbf{W} is at least one-dimensional (see Problem set 1). Let \mathbf{e} be a unit vector in \mathbf{S}_k \cap \mathbf{W}. Since \mathbf{e} \in \mathbf{S}_k, we have Q_A(\mathbf{e}) \geq \lambda_k by the above inequality. But also \mathbf{e} \in \mathbf{W}, which shows that M_A(\mathbf{W}) \geq \lambda_k, as required.

— Q.E.D.

4. Adjoint orbits

Any operator U \in \mathrm{End}\mathbf{V} which preserves the scalar product,

\langle U\mathbf{v},U\mathbf{w} \rangle = \langle \mathbf{v},\mathbf{w} \rangle, \quad \mathbf{v},\mathbf{w} \in \mathbf{V},

is called a unitary operator. It is clear that the set \mathrm{U}(\mathbf{V}) of unitary operators on \mathbf{V} is a group, indeed a subgroup of the general linear group \mathrm{GL}(\mathbf{V}). Note that the former depends on the scalar product but the latter does not; one may equivalently say that invertible operators map bases to bases, while unitary operators map orthonormal bases to orthonormal bases.

Let \lambda_1 \geq \dots \geq \lambda_N be given real numbers, and let \mathcal{O}_\lambda \subset \mathrm{End}\mathbf{V} be the set of selfadjoint operators on \mathbf{V} with this given spectrum.

Theorem 3: Choose any orthonormal basis \mathbf{e}_1,\dots,\mathbf{e}_N \in \mathbf{V}, and let \Lambda \in \mathrm{End}\mathbf{V} be the selfadjoint operator defined by

\Lambda \mathbf{e}_i = \lambda_i \mathbf{e}_i, \quad 1 \leq i \leq N.

Then the isospectral set \mathcal{O}_\lambda \subset \mathrm{End}\mathbf{V} is given by

\mathcal{O}_\lambda = \{U\Lambda U^{-1} \colon U \in \mathrm{U}(\mathbf{V}).

Proof: First we check that \mathcal{O}_\lambda consists entirely of selfadjoint operators having spectrum \lambda. The selfadjointness of U\Lambda U^{-1} follows immediately from the selfadjointness of \Lambda together with the fact that U^*=U^{-1}. To see that U\Lambda U^{-1} has spectrum \lambda, let \mathbf{f}_1,\dots,\mathbf{f}_N be the orthonormal basis of \mathbf{V} defined by

\mathbf{f}_i = U\mathbf{e}_i, \quad 1 \leq i\leq N.

Thent \mathbf{f}_i is an eigenvector of U\Lambda U^{-1} with eigenvalue \lambda_i.

Now let A be a given selfadjoint operator with eigenvalues \lambda_1,\dots,\lambda_N. By the Spectral Theorem, there exists an orthonormal basis \mathbf{f}_1,\dots,\mathbf{f}_N such that A\mathbf{f}_i=\lambda_i\mathbf{f}_i. If U is the unitary operator sending \mathbf{e}_i to \mathbf{f}_i, then

AU^{-1}\mathbf{e}_i = \lambda_i \mathbf{e}_i, \quad 1 \leq  \leq N,

which is equivalent to UAU^{-1}=\Lambda.

— Q.E.D.

In due time, we shall see that both Grassmannians \mathrm{Gr}_k(\mathbf{V} and adjoint orbits \mathcal{O}_\lambda are complex projective varieties.

Math 202C: Lecture 0

Welcome to Math 202C in the Spring quarter of 2022. This course concludes the applied algebra graduate course sequence, the previous installments of which were taught by Professor Freddie Manners and Professor Steven Sam. The goal of the course will be to unite and extend the material you learned in 202A and 202B in a natural, pleasing, and useful way. We will focus on applications of the representation theory of the symmetric groups, which was the subject of 202B. The Jucys-Murphy elements will be constructed, and their action in irreducible representations will be applied to analyze random walks (card shuffling) and permutation factorizations (Hurwitz theory). We then review some 202A material, namely the basic constructions of multilinear algebra (tensor, exterior, and symmetric algebras), and their physical interpretation as state spaces for bosons and fermions (Fock spaces). We apply the representation theory of the symmetric groups to prove the existence of a q-Fock space interpolating between the symmetric and exterior algebras. The two parts of the course are then unified using the infinite wedge space formalism, whose connection with the theory of integrable systems via the Sato Grassmannian will be discussed if time permits.

Now to the mechanics of Math 202C. This post, Lecture 0, serves as the course syllabus. The course is being taught remotely, and the plan is that the two weekly lectures will be delivered in the form of blog posts like the one you are reading right now. Each of these posts will likely cover more than I would be able to cover in an in-person ninety minute lecture. These posts will appear every Tuesday and Thursday in a temporal neighborhood of our scheduled lecture slots, and the course will feel much like a reading course, albeit a fairly intense one leading up to a qualifying exam. I will also hold live office hours over Zoom, so that we can discuss the course content in real time, and so that I can embellish and add to it verbally. This will happen on Fridays, and we will work out a time over Piazza, which we will use as a question and answer forum (signup link here). We’ll be conducting all class-related discussion on Piazza. The quicker you begin asking questions on Piazza (rather than via emails), the quicker you’ll benefit from the collective knowledge of your classmates and instructors. I encourage you to ask questions when you’re struggling to understand a concept.

As for evaluation, there are no exams (neither midterm nor final), though of course you’ll be taking the Applied Algebra qualifying exam mid-quarter. We will however have weekly problem sets which will be due on Sundays and submitted via Gradescope. More information about this will be posted on Piazza. These problem sets will help to cement the concepts you’ll be learning in the course, and will also give you an idea of the sorts of problems that are likely to appear on the qualifying exam. I believe this is everything you need to know to get started — see you on Piazza starting today, and in real time on Friday.

1Spectral theorem, flags and Grassmannians, minimax
2Jucys’s theorem: unique factorization
3Symmetric polynomials and JM elements
4Murphy’s theorem: content eigenvalues
5Top-to-random shuffle I
6Top-to-random shuffle II
7Basic category theory
8Tensor product, tensor powers, tensor algebra
9Symmetric and antisymmetric tensors, bosons and fermions
10Polynomials and varieties
11Grassmann varieties
12Selfadjoint algebras I
13Selfadjoint algebras II
14Selfadjoint algebras III
15Unitary representations I
16Unitary representations II
17Unitary representations III
18Schur-Weyl duality I
19Schur-Weyl duality II
20Invariant theory
Schedule and topics subject to change.

Math202C: Lecture 17 – Expander Graphs

Author: Haixiao Wang

As demonstrated in [1], expander graphs are a sparse graph that mimic the structure properties of complete graphs, quantified by vertex, edge or spectral expansion. At the same time, expander constructions have spawned research in pure and applied mathematics, with several applications to complexity theory, design of robust computer networks, and the theory of error-correcting codes.

This lecture is organized as follows. We introduce two different definitions of expanders in the first section, and then explore the relationship between them in second part.

17.1. Two Definitions

We start our discussion in expansion with two definitions of expansion: combinatorial and spectral. In the rest of this post, we consider an undirected graph G = (V, E) with |V| = n, where self loops and multiple edges are allowed. Given two subsets S,T \subset V, the set of edges between S and T is denoted by E(S, T) = \{(u,v ): (u, v) \in E, u\in S, v\in T\}.

17.1.1. Combinatorial expansion

Intuitively, a graph G=(V,E) is an expander if subsets S \subseteq V expand outwards into V \setminus S. In other words, all small subsets will have large boundary, respectively. Follow this intuition, we introduce the definition of boundary, based on edges and vertices respectively, to quantify the corresponding expansion.

Definition 1(Boundary):

  • The Edge Boundary of a set S, denoted by \partial S, is the set of edges \partial S = E(S, V \setminus S), which is the set of edges between disjoint set S and V \setminus S.
  • The Vertex Boundary of a set S is a subset of vertices in V \setminus S adjacent to vertices in S, denoted as N(S).

Obviously, vertex boundary |N(S)| is the same as the number of neighboring vertices of vertex sets S. With those notations, we then define the expansion over the entire graph G via maximizing over subsets S.

Definition 2(Expander): For some \varepsilon \in [0, 1], a graph G = (V, E) with |V| = n is an

  • \varepsilon-edge-expander if \min\limits_{ S\subset V, |S| \leq \frac{n}{2} } \frac{|\partial S|}{|E(S, V)|} \geq \varepsilon.
  • \varepsilon-vertex-expander if \min\limits_{ S\subset V, |S| \leq \frac{n}{2} } \frac{|N(S)|}{|S|} \geq \varepsilon.

Remark 3: The reason for the restriction |S| \leq n/2 is that we are examining the relative size of every cut in the graph, i.e., the number of edges crossing between S and V \setminus S, and we examine each case only once.

Problem 1: Show that the complete graph K_{n} is

  • at least a 1/2-edge-expander.
  • a 1-vertex-expander. (We only consider \varepsilon \in [0, 1] here.)

From the definition above, the maximum \varepsilon, which makes G = (V, E) an \varepsilon-edge-expander, is an important criteria, leading to the definition of Cheeger constant.

Definition 4(Cheeger Constant): The Edge Expansion Ratio, or Cheeger constant, of graph G = (V, E) with |V| = n, denoted by h(G), is defined as

h(G) = \min\limits_{S\subset V, |S| \leq \frac{n}{2}}\frac{|\partial S|}{|E(S, V)|}

Remark 5: Actually, Cheeger constant can be defined alternatively as \min\limits_{S\subset V, |S| \leq n/2}\frac{|\partial S|}{|S|} = O(n). We didn’t follow this definition since we want to keep \varepsilon\in [0, 1].

17.1.2. Spectral Expansion

To simplify our discussion, we focus on regular graphs in the rest of this lecture. Let A = A(G)\in \mathbb{R}^{n \times n} denote the adjacency Matrix of a d-regular graph G = (V, E) with |V| = n and A being real symmetric, then A has n real eigenvalues, denoted by \lambda_{1} \geq \lambda_{2} \geq \cdots \geq \lambda_n, and corresponding orthonormal eigenvectors v_1, \dots, v_n with A v_i = \lambda_i v_i. The spectrum of A encodes a lot of information about the graph structure. Recall the following properties, which we have seen in previous lectures.

  • The largest eigenvalue is \lambda_1 = d and the corresponding eigenvector is v_1 = (1/\sqrt{n}, \dots, 1/\sqrt{n}).
  • The graph G is connected if and only if \lambda_1 > \lambda_2.
  • The graph is bipartite if and only if \lambda_1 = -\lambda_n.

The definition of spectral expansion and spectral gap then follows.

Definition 6(Spectral Expansion): A graph G = (V, E) with |V| = n is a \lambda-spectral-expander if \lambda(G):= \max \{ |\lambda_2|, |\lambda_n| \} \leq \lambda.

In other words, the second largest eigenvalue of A(G), in absolute value, is at most \lambda. This also gives rise to the definition of Ramanujan graph.

Definition 7(Ramanujan graph): A connected d-regular graph G is a Ramanujan graph if \lambda (G)\leq 2\sqrt{d-1}.

We now refer to the spectral gap.

Definition 8(Spectral Gap): The spectral gap of d-regular graph G is \lambda_{1} - \lambda(G).

Remark 9: Sometimes the spectral gap is defined as \lambda_1 - \lambda_2, which is often referred as one-sided.

Problem 2:

  • Show that the complete graph K_{n} has spectral gap n - 2. Actually, we can also show that the one-sided spectral gap is n.
  • Show that the complete bipartite graph K_{m, n} has spectral gap 0. (Hint: use the spectral property above.)

17.2. Relating Two Definitions

It is not so obvious that the combinatorial and spectral versions of expansion are related. In this section, we will introduce two relations between edge and spectral expansion: the Expander-Mixing Lemma and Cheeger’s inequality.

17.2.1. The Expander-Mixing Lemma

Lemma 10(Expander-Mixing Lemma[2]): Let G = (V, E) be a d-regular graph with n vertices and eigenvalues \lambda_1 \geq \dots \geq \lambda_n. Then for all S, T \subset V:

\Bigg| |E(S, T)| - \frac{d|S|\cdot |T|}{n}\Bigg| \leq \lambda(G) \sqrt{|S|\cdot |T|}.

As we can see, the left-hand side measures the deviation between two quantities:

  • |E(S, T)|: the number of edges between the two subsets S and T.
  • |S|\cdot |T|\cdot d/n: the expected number of edges between two subsets S and T, drawn from an Erdos–Renyi random graph G(n, d/n).

The Expander-Mixing lemma tell us a small \lambda(G) (a.k.a. large spectral gap) implies that this discrepancy is small, leading to the fact that the graph is nearly random in this sense, as discussed in [1]. We should mention that the converse of expander-mixing is true as well, stated as follows.

Lemma 11(Expander-Mixing Lemma Converse [4]): Let G = (V, E) be a d-regular graph with n vertices and eigenvalues \lambda_1 \geq \dots \geq \lambda_n satisfying:

\Bigg| |E(S, T)| - \frac{d|S|\cdot |T|}{n}\Bigg| \leq \rho \sqrt{|S|\cdot |T|}.

for any two disjoint subsets S,T\subset V and some positive \rho. Then \lambda(G) \leq O(\rho \cdot(1 + \log(d/\rho))).

Proof of Expander-Mixing Lemma: Recall that the adjacency matrix A has eigenvalues \lambda_1 \geq \dots \geq \lambda_n with corresponding orthonormal eigenvectors v_1, \dots, v_n. Let 1_S and 1_T be the characteristic vectors of subsets S and T(a.k.a., the v-th coordinate of 1_S is 1 if v\in S and 0 otherwise). We then decompose 1_S and 1_T in the orthonormal basis of eigenvectors v_1, \dots, v_n as

1_S = \sum\limits_{i=1}^n \alpha_i v_i, \quad 1_S = \sum\limits_{j=1}^n\beta_j v_j.

Observing that |E(S, T)| = (1_S)^T A 1_T, we expand out the right hand side and apply orthogonality, which yields

(1_S)^T A 1_T = \left (\sum\limits_{i=1}^n \alpha_i v_i \right)^T A \left (\sum\limits_{j=1}^n\beta_j v_j \right) = \sum\limits_{i=1}^n \lambda_i\alpha_i\beta_i = d\frac{|S|\cdot|T|}{n} + \sum\limits_{i=2}^n \lambda_i\alpha_i\beta_i .

where the last step follows from the facts \lambda_1 = d with corresponding v_1 = (1/\sqrt{n},\dots, 1/\sqrt{n}), \alpha_1 =<1_S, v_1> = |S|/\sqrt{n} and \beta_1 =<1_T, v_1> = |T|/\sqrt{n}. Shifting the first term on right hand side to the left, the desired result follows from Cauchy-Schwarz inequality:

\left| |E(S, T)| - d\frac{|S||T|}{n} \right | = \left|\sum\limits_{i=2}^n\lambda_i\alpha_i\beta_i\right| \leq \lambda(G) \sum\limits_{i=2}^n |\alpha_i \beta_i| \leq \lambda(G) ||\alpha||_2 ||\beta ||_2 = \lambda(G) ||1_S||_2  ||1_T||_2 = \lambda(G)\sqrt{|S||T|}.

Remark 12: If we normalize the adjacency matrix A = A(G) and obtain \bar{A} = (1/d) A, where each entry of A is divided by d, then the corresponding eigenvalues of \bar{A}, namely \bar{\lambda}_{1}\geq \dots \geq \bar{\lambda}_n, lie in the interval [-1, 1]. The Expander-Mixing Lemma is in the form of

\Bigg| |E(S, T)| - \frac{d|S|\cdot |T|}{n}\Bigg| \leq \bar{\lambda}(G)d \sqrt{|S|\cdot |T|}.

It is sometimes convenient to consider the normalized spectrum. For example in random graph theory, it would be convenient to look at the normalized spectrum when the expected degree d = O(\log n). However, we can focus on the spectrum of $\latex A = A(G)$ without normalization when the expected degree $\latex d = O(1)$.

Regular \lambda-spectral expanders with small \lambda(G)(a.k.a. large spectral gap) have some of significant properties, listed as follows.

Corollary 13: An independent set , in a graph G = (V, E) with |V| = n, is a set of vertices S, where no two vertices in S are adjacent, i.e. with |E(S, S)| = 0. An independent set S, in a d-regular \alpha-spectral expander G, has cardinality at most \alpha n, i.e., |S|\leq \alpha n, which follows immediately from Lemma 10(Expander-Mixing Lemma[2]).

Corollary 14: Given a graph G = (V, E) with |V| = n, the diameter of G is defined as \mathrm{diam}(G) = \max\limits_{u, v}d_{G}(u, v), where d_{G}(u, v) is the length of shortest path between u and v. If G is a d-regular \alpha-spectral expander G with \alpha < d/2, then we have

\Omega \Big( \frac{\log n}{\log d} \Big) \leq \mathrm{diam}(G) \leq O(\log n).

17.2.2. Cheeger’s Inequality

As we have seen from previous discussion, spectral expansion implies strong structural properties on the edge-structure of graphs. In this section, we introduce another famous result, Cheeger’s inequality, indicating that the Cheeger constant(Edge Expansion Ratio) of a graph G can be approximated by the its spectral gap \lambda_1 - \lambda(G).[6]

Theorem 15(Cheeger’s inequality, [3]): Let G = (V, E) be a d-regular graph with eigenvalues \lambda_1\geq \cdots \geq \lambda_n, then

\frac{d - \lambda_2}{2} \leq \min\limits_{S\subset V, |S|\leq n/2} \frac{|\partial S|}{|S|} \leq \sqrt{2d(d - \lambda_2)}.

In the normalized scenario \bar{A} = (1/d)A(G) with eigenvalues \bar{\lambda}_{1}, \cdots, \bar{\lambda}_n \in [-1, 1], we have

\frac{1 - \bar{\lambda}_2}{2} \leq h(G):=\min\limits_{S\subset V, |S|\leq n/2} \frac{|\partial S|}{|E(S, V)|} \leq \sqrt{2(1 - \bar{\lambda}_2)}.

Remark 16: The proof for the normalized scenario is different from the regular scenario. We should start from the desired quantity and finish the calculation step by step. However, the strategy are the same. We use the Rayleigh Quotient to obtain upper bound and lower bound.

In the context of n\gg d, the estimate of spectral gap was given by Nilli Alon.

Theorem 17([5]): For every d-regular graph G, we have

\lambda(G) \geq 2\sqrt{d - 1} - o_{n}(1)

The o_{n}(1) term is a quantity that tends to zero for every fixed d as n \to \infty.

[1]. Shlomo Horry, N. L. and Wigderson, A. (2006). Expander graphs andtheir applications.BULLETIN (New Series) OF THE AMERICAN MATHEMATICAL SOCIETY, 43(4).

[2]. Alon, N. and Chung, F. R. K. (1988). Explicit construction of linear sized tolerantnetworks.Discrete Mathematics, 42

[3]. Cheeger, J. (1970). A lower bound for the smallest eigenvalue of the laplacian.Problems inanalysis, 26.[Nilli, 1991] Nilli, A. (1991). On the secon

[4]. Yonatan Bilu, N. L. (2006). Lifts, discrepancy and nearly optimal spectral gap.Com-binatorica, 26

[5]. Nilli, A. (1991). On the second eigenvalue of a graph.Discrete Mathematics, 91(2):207–210.

[6]. Lovett, S. (2021). Lecture notes for expander graphs and high-dimensional expanders.

Math 202C: Lecture 12

Since the representation theory of the symmetric group is likely fresh in your mind thanks to yesterday’s exam, this is a good time to show you a very nice probabilistic application of this theory to card shuffling. This post will discuss the “top-to-random” shuffle, where the cards 1,\dots,n are arranged in a single row (or a single stack), and a randomly chosen card is repeatedly swapped with the rightmost (or top) card. The algebraic model for this procedure is the Cayley graph \Gamma(n) of the symmetric group as generated by the so-called “star transpositions”

\tau_1=(1\ n), \tau_2=(2\ n),\dots, \tau_{n-1} = (n-1\ n).

Problem 12.1: Prove that the above transpositions do in fact generate \mathrm{S}(n).

These are called star transpositions because, if one forms the graph on \{1,\dots,n\} such that \{i,j\} is an edge if and only if one of the above transpositions swaps i and j, then this graph is a star graph. The top-to-random shuffle is then the same thing as the lazy simple random walk on the Cayley graph \Gamma(n).

The paper attached below gives a detailed analysis of the top-to-random shuffle using the representation theory of \mathrm{S}(n), and gives a precise meaning to the statement that n \log n such shuffles are required in order to randomize the deck. The argument is pretty much all representation theory — only some very basic probabilistic reasoning is used towards the end.

Math 202C: Lecture 9

We begin with a brief recap of the main result from Lecture 8. Let

\chi_\sigma = \sum\limits_{\pi \in \mathrm{B}(k)} \chi_\sigma(\pi) \pi, \quad \sigma \in \mathrm{B}(k)

be the characters of the rank k-hypercube group

\mathrm{B}(k) = \langle (1\ 2), (3\ 4), \dots, (2k-1\ k) \rangle.

Definition 1: The orthogonal idempotents in the group algebra \mathbb{C}\mathrm{B}(k) are defined as the characters of \mathrm{B}(k) normalized by the order of \mathrm{B}(k):

\tilde{\chi}_\sigma = \frac{1}{2^k} \chi_\sigma, \quad \sigma \in \mathrm{B}(k).

The reason for the name is the following key property of these elements:

\tilde{\chi}_\rho \tilde{\chi}_\sigma = \tilde{\chi}_\rho [\rho = \sigma], \quad \rho,\sigma \in \mathrm{B}(k).

This is quite valuable, since expanding elements a \in \mathbb{C}\mathrm{B}(k) in the group algebra relative to the basis of orthogonal idempotents essentially makes multiplication in the group algebra trivial, which translates into the “Fourier coefficients of a product are products of Fourier coefficients” fact that we saw last Lecture. Note also that the behavior of orthogonal idempotents in the group algebra is formally identical to the behavior of orthogonal projections in Hilbert space — you perhaps know from Math 202B that this is not a coincidence, and if you don’t know this don’t worry, because we will discuss it.

The main result for today is the following.

Theorem 1: The class of multiplication operators in \mathrm{End}\mathbb{C}\mathrm{B}(k) coincides with the class of operators that are diagonal in the character basis of \mathbb{C}\mathrm{B}(k).

Proof: Let b \in \mathbb{C}\mathrm{B}(k) be an element of the group algebra, and let R_b \in \mathrm{End} \mathbb{C}\mathrm{B}(k) be the corresponding multiplication operator. Consider the character expansion of b,

b = 2^{-k} \sum\limits_{\pi \in \mathrm{B}(k)} \langle \chi_\pi,b\rangle \chi_\pi .

Now let \chi_\sigma \in \mathbb{C}\mathrm{B}(k) be another character. We have

R_b\chi_\sigma = 2^{-k}\sum\limits_{\pi \in \mathrm{B}(k)} \langle \chi_\pi,b\rangle \chi_\sigma\chi_\pi = 2^{-k}\langle \chi_\sigma,b\rangle 2^k \chi_\sigma = \langle \chi_\sigma,b\rangle\chi_\sigma.

So, \chi_\sigma is an eigenvector of R_b, with eigenvalue the Fourier coefficient \langle \chi_\sigma,b\rangle.

Conversely, suppose that R \in  \mathrm{End}\mathbb{C}\mathrm{B}(k) acts diagonally in the character basis:

R \chi_\pi = \lambda_\pi \chi_\pi, \quad \pi \in \mathrm{B}(k).

Then the action of R on an arbitrary a \in \mathbb{C}\mathrm{B}(k) is

Ma = M 2^{-k}\sum\limits_{\pi \in \mathrm{B}(k)} \langle \chi_\pi,a \rangle \chi_\pi = 2^{-k}\sum\limits_{\pi \in \mathrm{B}(k)} \langle \chi_\pi,a \rangle \lambda_\pi\chi_\pi = \left(  2^{-k}\sum\limits_{\pi \in \mathrm{B}(k)} \langle \chi_\pi,a \rangle \chi_\pi \right) \left(2^{-k}\sum\limits_{\sigma \in \mathrm{B}(k)} \lambda_\sigma \chi_\sigma \right),

so that M=R_b with

b= 2^{-k}\sum\limits_{\sigma \in \mathrm{B}(k)} \lambda_\sigma \chi_\sigma.

— Q.E.D.

With Theorem 1 in hand, we can immediately compute the the eigenvalues and eigenvectors of the Cayley graph of \mathrm{B}(k). Indeed, the adjacency operator of the Cayley graph of \mathrm{B}(k) is just the multiplication operators

R_{\sum_{i=1}^k \tau_i},

so by Theorem 1 we immediately have that

R_{\sum_{i=1}^k \tau_i} \chi_\sigma = \langle \chi_\sigma,\sum\limits_{i=1}^k \tau_i \rangle \chi_\sigma, \quad \sigma \in \mathrm{B}(k).

Now, we have that the resulting eigenvalue is

\langle \chi_\sigma,\sum\limits_{i=1}^k \tau_i \rangle = \sum\limits_{i=1}^k \langle \chi_\sigma,\tau_i \rangle,

and this is a sum we can explicitly evaluate, because we explicitly computed the characters of \mathrm{B}(k) back in Lecture 7. In particular, what we know is that

\langle \chi_\sigma,\tau_i \rangle =\chi_\sigma(\tau_i)

is equal to -1 if \tau_i is a factor of \sigma, and is equal to +1 if \tau_i is not a factor of \sigma. Since we compute this scalar product for each and every generator \tau_i of \mathrm{B}(k), the sum has |\sigma| terms equal to -1, and k - |\sigma| terms equal to +1, where |\sigma| is the word norm of \sigma, or equivalently its distance to the identity permutation in the Cayley graph. So we get

\sum\limits_{i=1}^k \langle \chi_\sigma,\tau_i \rangle = k-2|\sigma|

as our explicit eigenvalue of the character \chi_\sigma under the action of the adjacency operator. Moreover, the number of elements \sigma \in \mathrm{B}(k) such that |\sigma|=j for a given 0 \leq j \leq k is simply {k \choose j}, since this just amounts to choosing j of the generators \tau_1,\dots,\tau_k of \mathrm{B}(k). So we can summarize our results as follows.

Theorem 2: The eigenvalues of Cayley graph of \mathrm{B}(k) are the numbers

k-2j, \quad j=0,\dots,k,

and the dimension of the eigenspace corresponding to k-2j is {k \choose j}.

Since all eigenvalues of the Cayley graph of \mathrm{B}(k) are given explicitly by Theorem 2, we are in a position to explicitly enumerate walks on this graph, using the spectral approach to counting walks discussed previously.

Problem 9.1: Show that the number W^r(\rho,\sigma) of r-step walks on \mathrm{B}(k) from \rho to \sigma is

W^r(\rho,\sigma) = \frac{1}{2^k} \sum\limits_{i=0}^k \sum\limits_{j=0}^{|\rho\sigma|} (-1)^j {|\rho\sigma| \choose j} {k-|\rho\sigma| \choose i-j} (k-2i)^r.

This material, and problem 9.1, have applications in coding theory which you might find interesting to look into.

Math 202C: Lecture 8

We are at the beginning of the theory of harmonic analysis on finite abelian groups, starting with the concrete example of the the rank k hypercube group, which we have chosen to define as the permutation group \mathrm{B}(k) generated by the disjoint transpositions

\tau_1=(1\ 2), \tau_2=(3\ 4), \dots, \tau_k = (2k-1\ 2k).

Typically, analysts think of harmonic analysis on \mathrm{B}(k) (or any other group) as the study of the Hilbert space \mathbb{C}^{\mathrm{B}(k)} of functions

a \colon \mathrm{B}(k) \longrightarrow \mathbb{C}

equipped with the scalar product

\langle a,b \rangle = \sum\limits_{\pi \in \mathrm{B}(k)} \overline{a(\pi)}b(\pi),

so that one has a canonical orthonormal basis \{\delta_\pi\} of \mathbb{C}^{\mathrm{B}(k)} consisting of indicator functions. This space has the structure of an algebra when equipped with the convolution product of functions,

[a*b](\pi) = \sum\limits_{\sigma \in \mathrm{B}(k)} a(\pi\sigma^{-1}) b(\sigma),

which is exactly what one gets from bilinear extension of group law in \mathrm{B}(k) lifted to the group element basis:

\delta_\rho * \delta_\sigma = \delta_{\rho\sigma}.

In particular, the convolution product on \mathbb{C}^{\mathrm{B}(k)} is quite different from the pointwise product. This is the right point of view to take when one is dealing with non-discrete groups (e.g. Lie groups), but for finite groups there is another way of writing things out which is often simpler and cleaner, and favored by algebraists.

The algebraists’ preferred Hilbert space \mathbb{C}\mathrm{B}(k) of formal \mathbb{C}-linear combinations of permutations \pi \in \mathrm{B}(k),

a = \sum\limits_{\pi \in \mathrm{B}(k)} a(\pi)\pi.

Such linear combinations are of course equivalent to functions on the group, and you can think of the linear combination a as a generating function which are equivalent to complex valued functions on \mathrm{B}(k). From this perspective the canonical orthonormal basis of \mathbb{C}\mathrm{B}(k) consists of the permutations \pi \in \mathrm{B}(k) themselves (rather than their indicator functions), which induces the equivalent scalar product,

\langle a,b \rangle = \sum\limits_{\pi \in \mathrm{B}(k)} \overline{\langle \pi, a\rangle} \langle \pi,b \rangle.

The convolution product on \mathbb{C}\mathrm{B}(k) is likewise replaced with the equivalent multiplication (written simply as juxtaposition)

ab = \sum\limits_{(\rho,\sigma) \in \mathrm{B}(k)^2} \langle \rho,a\rangle \langle \sigma,b\rangle \rho\sigma =\sum\limits_{\pi \in \mathrm{B}(k)} \left(\sum\limits_{\sigma \in \mathrm{B}(k)} \langle \pi\sigma^{-1},a\rangle \langle \sigma,b \rangle \right)\pi.

In short, \mathbb{C}^{\mathrm{B}(k)} and \mathbb{C}\mathrm{B}(k) are isomorphic both as Hilbert spaces and as algebras, so the choice to use one rather than the other is a matter only of preference. In practice, it is quite useful to be comfortable with both the analytic and the algebraic way of doing things.

Yet another way of repackaging everything is the following probably too-general perspective. Let us zoom back out to the very beginning, where we started with an arbitrary finite graph \Gamma. The algebraic way to understand this combinatorial object is to replace the vertex set V of \Gamma with the algebra \mathbb{C}\langle x_v \colon v \in V \rangle of polynomials in noncommuting variables x_v indexed by the vertices of v. In other words, this is the free algebra of rank |V|. This algebra is graded by degree, and we have a decomposition

\mathbb{C}\langle x_v \colon v \in V \rangle = \bigoplus_{d=0}^\infty \mathbb{C}^{(d)}\langle x_v \colon v \in V \rangle,

with the dth summand being the space of homogeneous polynomials of degree d. Observe that this vectors space is isomorphic to the space of complex-valued functions on V^d, i.e. the space of functions on d-tuples of vertices, or equivalently formal linear combinations of d-tuples of vertices. In particular, the linear component \mathbb{C}^{(1)}\langle x_v \colon v \in V\rangle is isomorphic to either/both \mathbb{C}^V and \mathbb{C}V, and the adjacency operator acts on this linear component to capture the edge structure of \Gamma. Indeed, the free algebra is isomorphic to the tensor algebra of the vector space of functions on the vertex set V. In the case where the vertex set of V of \Gamma is actually a group \mathrm{G}, we can mod out by the group law and consider the quotient

\mathbb{C}\langle x_g \colon g \in \mathrm{G} \rangle/\langle x_gx_h=x_{gh}\rangle

of the free algebra by the ideal corresponding to the group multiplication. This quotient consists solely of polynomials of degree 1, so functions on the group, or equivalently linear combinations of group elements, but now with the structure of an algebra.

Coming back to reality, the fact that \mathbb{C}\mathrm{B}(k) is an algebra and not just a vector space is reflected in the presence of a certain class of operators on the group algebra which are not present in the endomorphism algebra of a vector space devoid of multiplication.

Definition 1: Given b \in \mathbb{C}\mathrm{B}(k), the corresponding multilplication operator R_b\in \mathrm{End} \mathbb{C}\mathrm{B}(k) is defined by

R_ba = ab, \quad a \in \mathbb{C}\mathrm{B}(k).

We are using the letter “R” for multiplication operators because later on we will consider the group algebras of noncommutative groups, in which left and right multiplication do not necessarily coincide, and we have typically different left and right multiplication operators L_b and R_b. To put this the Math 202B way, the operator R_b is the image of b in the right regular representation of \mathbb{C}\mathrm{B}(k).

Note that multiplication operators corresponding to pure group elements (as opposed to linear combinations of group elements) are unitary operators on \mathbb{C}\mathrm{B}(k),

\langle \rho,R_\pi\sigma \rangle = \langle \rho,\sigma\pi \rangle = \langle \rho\pi^{-1},\sigma \rangle = \langle R_{\pi}^{-1} \rho,\sigma \rangle.

However, since we are working with the permutation group \mathrm{B}(k), in which all permutations are actually involutions, R_\pi is both unitary and selfadjoint, hence has eigenvalues \pm 1.

The importance of multiplication operators in our quest to understand the eigenvalues and eigenvectors of the Cayley graph of \mathrm{B}(k) as generated by the transpositions \tau_1,\dots,\tau_k is the following.

Proposition 1: The adjacency operator of the Cayley graph of \mathrm{B}_k is the multiplication operator R_t, where

t = \sum\limits_{i=1}^k \tau_i.

Proof: For any \rho \in \mathrm{B}(k), we have

R_t\rho = \sum\limits_{i=1}^k \rho\tau_i.

— Q.E.D.

In view of Proposition 1, the problem of understanding the eigenvalues and eigenvectors of the Cayley graph is that of understanding the eigenvalues and eigenvectors of

R_t = \sum\limits_{I=1}^k R_{\tau_i}.

To solve it, we will work out the spectral analysis of arbitrary multiplication operators on \mathrm{B}(k). The main tool in this endeavor is as second basis of the group algebra, which consists of special linear combinations \chi of group elements which have the property that

\langle \rho\sigma,\chi \rangle = \langle \rho,\chi \rangle \langle \sigma,\chi \rangle.

That is, \chi is the generating function of a group homomorphism

\mathrm{B}(k) \to \mathbb{C}^\times,

the multiplicative group of nonzero complex numbers. I like this way of stating things; in the analytic formalism, the words “character” and “homomorphism” mean exactly the same thing, a redundancy, whereas in the algebraic formalism “character” really means “generating function of a homomorphism.”

In Lecture 7, we explicitly worked out all homorphisms from \mathrm{B}(k) to \mathbb{C}^\times, showing that they have values in the subgroup generated by -1, and are moreover parameterized by the points of \mathrm{B}(k). We also showed that the characters are orthogonal,

\langle \chi_\pi,\chi_\sigma \rangle = 2^k[\pi=\sigma], \quad \pi,\sigma \in \mathrm{B}(k).

In particular, for any a \in \mathbb{C}\mathrm{B}(k), we have the character expansion

a = 2^{-k}\sum\limits_{\pi \in \mathrm{B}(k)} \langle \chi_\pi,a\rangle \chi_\pi.

The character expansion of a is also known as its Fourier expansion, with \langle \chi_\pi,a\rangle being the Fourier coefficients of a. The main reason that character expansions are “better” than expansions in the group basis is that the Fourier coefficients of a product ab are just the products of the corresponding Fourier coefficients of a and b.

The main reason that characters — which are generating functions of homomorphisms \mathrm{B}(k) \to \{-1,+1\} — are better than the original group elements is that their multiplication is extremely simple.

Theorem 1: For any \rho,\sigma \in \mathrm{B}(k), we have

\chi_\rho\chi_\sigma =0

if \rho \neq \sigma, and

\chi_\pi \chi_\pi = 2^k\chi_\pi.

Proof: Expanding in the permutation basis, we have

\chi_\rho\chi_\sigma = \sum\limits_{\pi_1 \in \mathrm{B}(k)} \left( \sum\limits_{\pi_2 \in \mathrm{B}(k)} \chi_\rho(\pi_1\pi_2^{-1}) \chi_\sigma(\pi_2) \right) \pi_1 = \sum\limits_{\pi_1 \in \mathrm{B}(k)} \left( \sum\limits_{\pi_2 \in \mathrm{B}(k)} \chi_\rho(\pi_1) \chi_\rho(\pi_2) \chi_\sigma(\pi_2) \right) \pi_1,

where we used that \chi_\rho is a homomorphism and \pi_2^{-1}=\pi_2. The result now follows from character orthogonality, which is Theorem 2 in Lecture 7.

— Q.E.D.

An immediate consequence is that the Fourier coefficients of a product are the products of the Fourier coefficients of the factors.

Corollary 1: For any a,b \in \mathbb{C}\mathrm{G}, we have

\langle \chi_\pi,ab\rangle = \langle \chi_\pi,a \rangle \langle \chi_\pi,b\rangle.

Proof: On one hand, the character expansion of the product ab is

ab = 2^{-k}\sum\limits_{\pi \in \mathrm{B}(k)} \langle \chi_\pi,ab\rangle \chi_\pi.

On the other hand, we compute the product a and b via their character expansions:

ab = \left( 2^{-k}\sum\limits_{\rho \in \mathrm{B}(k)} \langle \chi_\rho,a\rangle \chi_\rho\right)\left( 2^{-k}\sum\limits_{\sigma \in \mathrm{B}(k)} \langle \chi_\sigma,b\rangle \chi_\sigma\right) \\= 4^{-k} \sum\limits_{(\rho,\sigma) \in \mathrm{B}(k)^2}\langle \chi_\rho,a\rangle\langle \chi_\sigma,b\rangle \chi_\rho\chi_\sigma \\ = 2^{-k}\sum\limits_{\pi \in \mathrm{B}(k)} \langle\chi_\pi,a\rangle \langle \chi_\pi,b\rangle \chi_\pi,

where we applied Theorem 1 to get the final equality. Since the characters are linearly independent, the result follows.

—- Q.E.D.

In Lecture 9, we will apply the above to prove that M \in \mathrm{End} \mathbb{C}\mathrm{G} is a multiplication operator if and only if it acts diagonally in the character basis. We will also find the corresponding eigenvalues, and hence count walks on \mathrm{B}(k). In Lecture 10 we will then analyze convergence of random walk on \mathrm{B}(k) to equilibrium.