Math 202C: Lecture 5

Lectures 5 and 6 of Math 202C give an application of the representation theory of the symmetric group \mathrm{S}(d) as discussed in Math 202B together with the additional material on the YJM elements in the corresponding group algebra \mathcal{A}(d). This application is probabilistic in nature, and concerns the following question: how many times do you have to shuffle a deck of cards in order to randomize it?

In order to analyze this question rigorously, we need to model it mathematically. The first step is to say exactly what we mean by “shuffle.” We consider a deck of d cards labeled 1,\dots,d laid face up on a table, arranged in a single row. Initially, this arrangement is 1,\dots,d, from left to right. At each tick of the clock, we choose a random card, and swap its position with the rightmost card. This gives us a new arrangement of cards which, reading the labels from left to right, corresponds to a permutation in the symmetric group \mathrm{S}(d). This is called the “top-to-random” shuffle. Note that it is possible the randomly chosen card is in fact the rightmost (top) card, and in this case no shuffling occurs.

The top-to-random shuffle is a stochastic process evolving in discrete time. It may equivalently be formulated as a random walk on the the Cayley graph \Gamma(d)=\mathrm{Cay}(\mathrm{S}(d),R) of the symmetric group generated by the set

Y_d=\{(1\ d), (2\ d),\dots, (d-1\ d)\}.

The elements of R are transpositions; they are called “star transpositions” because, if we picture R as the edge set of a graph on \{1,\dots,d\}, we get the star graph with hub vertex d. The simple random walk on \Gamma(d) is the following stochastic process: a particle is initially located at the identity permutation \iota \in \mathrm{S}(d), and at each tick of the clock it jumps to a neighboring vertex, with equal probability of jumping in any direction. This can be encoded algebraically as follows. Let \mathcal{A}(d) be the group algebra of \mathrm{S}(d), equipped with the scalar product in which the permutation basis is orthonormal. Identify the generating set with the sum of its elements,

Y_d (1\ d) +\dots +(d-1\ d),

which you will recognize from our previous lectures as the top YJM element in \mathcal{A}(d). The probability that a particle on \Gamma(d) evolving by simple random walk is located at a given permutation \pi \in \mathrm{S}(d) at a given time r \in \mathbb{N}_0 is then equal to the normalized scalar product

P_r(\pi) = \frac{1}{(d-1)^r} \langle \pi, Y_d^r \iota \rangle.

This random walk is not equivalent to the top-to-random shuffle, but rather to a shuffling process in which one of the d-1 cards below the top card is exchanged with the top card. This is not a good way to shuffles cards, because randomness is never achieved; equivalently, the corresponding random walk is not ergodic. A random walk on a graph is said to be ergodic if, for all sufficiently large times, the particle has positive probability of being located at each vertex. This is not the case for simple random walk on \Gamma(d) — at even times, the probability the particle is located at an odd permutation is zero, and vice versa.

A shuffling process should have the property that as r \to \infty we get closer and closer to a uniform distribution, which algebraically is represented by the vector

\mathbf{u} = \frac{1}{d!} \sum\limits_{\pi \in \mathrm{S}(d)} \pi.

More formally, let us introduce the \ell_1-norm on \mathcal{A}(d) corresponding to the permutation basis, which assigns to each \mathbf{a}\in \mathcal{A}(d) the number

\|\mathbf{a}\| = \sum\limits_{\pi \in \mathrm{S}(d)} |\langle \pi,\mathbf{a}\rangle|.

In probability theory, the corresponding distance \|\mathbf{a}_1-\mathbf{a}_2\| is called total variation distance. The lazy simple random walk on \Gamma(d) is the modification of simple random walk in which, at each tick of the clock, the particle either jumps to an adjacent vertex or stays put, with each such event occurring with equal probability. This random walk is generated by the group algebra element \iota + Y_d, which means that the probability the particle is located at \pi \in \mathrm{S}(d) is

P^r(\pi) = \frac{1}{d^r} \langle \pi, (\iota+Y_d)^r\iota \rangle.

This random walk is ergodic, and it follows from the Perron-Frobenius theorem in linear algebra that

\lim\limits_{d\to \infty} P^r(\pi) = \frac{1}{d!}

for each \pi \in \mathrm{S}(d), so that perfect randomness is achieved in the limit of infinitely many shuffles.

Shuffling infinitely many times is not something that can be done in the real world. Therefore what we want is an effective version of the above, which tells us how close our deck is to randomized after r shuffles have been performed. Algebraically, we want bounds on the \ell_1-norm

\|\frac{1}{d^r}(\iota+Y_d)^r\iota - \mathbf{u}\|.

This weeks lectures, which take the form of a typeset document posted to Piazza in the resources section, explain how to do this using the representation theory of the symmetric group, in particular the spectral theory of the YJM elements which we have been discussing. A remarkable insight coming out of the analysis is that the top-to-random shuffle exhibits what is known as the cutoff phenomenon: for small values of r, the total variation distance to uniform remains close to its initial value, and then suddenly crosses a d-dependent cutoff time and becomes exponentially small, indicating that near-perfect randomness has been achieved in finite time.

Math 202C: Lecture 4

In Lecture 3, we introduced the Young-Jucys-Murphy elements Y_1,\dots,Y_d in the group algebra \in \mathcal{A}(d) of the symmetric group \mathrm{S}(d). The YJM elements form a commuting family of elements in the group algebra of the symmetric group \mathrm{S}(d), and have the amazing feature that symmetric polynomials in Y_1,\dots,Y_d always belong to the class algebra \mathcal{Z}(d) despite the non-centrality of the Y_j‘s themselves. This is quite remarkable, as there is no obvious reason why an arbitrary symmetric polynomial f(Y_1,\dots,Y_d) should evaluate to a linear combination of conjugacy classes.

Since f(Y_1,\dots,Y_d) is central, Schur’s Lemma implies that it acts in any given irreducible representation of \mathcal{A}(d) as a scalar operator. More precisely, if (\mathbf{V}^\lambda,R^\lambda) is the irreducible representation of \mathcal{A}(d) corresponding to the partition \lambda \vdash d (more precisely, a representative of the isomorphism class of all such representations), then the operator R^\lambda(f(Y_1,\dots,Y_d) \in \mathrm{End}\mathbf{V}^\lambda is of the form

R^\lambda(f(Y_1,\dots,Y_d)) = \omega_f(\lambda) I_{\mathbf{V}^\lambda},

where I_{\mathbf{V}^\lambda} \in \mathrm{End}\mathbf{V}^\lambda is the identity operator and \omega_f(\lambda) \in \mathbb{R} is a scalar. Note that this scalar must in fact be real, because each Y_i, being a sum of transpositions, is a selfadjoint element of \mathcal{A}(d). Indeed,

R^\lambda(Y_1),\dots,R^\lambda(Y_d) \in \mathrm{End}\mathbf{V}^\lambda

are commuting selfadjoint operators, and hence are simultaneously diagonalizable, i.e. admit a common eigenbasis. A natural question is then how to explicitly describe the eigenvalue \omega_f(\lambda). This question has an amazing answer, which we now describe.

According to Schur’s Lemma, the class algebra \mathcal{Z}(d) may be described as the subalgebra of \mathcal{A}(d) consisting of elements which act as scalar operators in all irreducible representations. Note that we are including here a converse to Schur’s Lemma, namely the statement that if R^\lambda(C) is a scalar operator in \mathrm{End}\mathbf{V}^\lambda for all \lambda \vdash d, then C \in \mathcal{Z}(d); this follows immediately from the fact that the map

\mathcal{F} \colon \mathcal{A}(d) \longrightarrow \bigoplus\limits_{\lambda \vdash d} \mathrm{End}\mathbf{V}^\lambda

defined by

\mathcal{F}(A) = \bigoplus\limits_{\lambda \vdash d} R^\lambda(A)

is an algebra isomorphism (often referred to as the noncommutative Fourier transform). We can generalize this as follows.

As you know from Math 202B, the branching of irreducible \mathcal{A}(d)-modules is multiplicity free. More precisely, we may consider the group algebra \mathcal{A}(d-1) of the symmetric group \mathcal{S}(d-1) as the subalgebra of \mathcal{A}(d) spanned by permutations in \mathcal{S}(d) satisfying \pi(d)=d. The isotopic decomposition of an irreducible \mathcal{A}(d)-module \mathbf{V}^\lambda as an \mathcal{A}(d-1)-module is

\mathbf{V}^\lambda = \bigoplus\limits_{\mu \nearrow \lambda} \mathbf{V}^\mu,

where the direct sum is over all irreducible \mathcal{A}(d-1)-modules \mathbf{V}^\mu such that the Young diagram \lambda\vdash d can be obtained from the Young diagram \mu\vdash d-1 by adding a single cell.

Example 1: For the “staircase” diagram \lambda =(3,2,1), the isotypic decomposition of the irreducible \mathcal{A}(6)-module as an \mathcal{A}(5)-module is

\mathbf{V}^{(3,2,1)} = \mathbf{V}^{(2,2,1)} \oplus \mathbf{V}^{(3,1,1)} \oplus \mathbf{V}^{(3,2)}.

Iterating this construction, we obtain a decomposition of \mathbf{V}^\lambda into a sum of 1-dimensional spaces \mathbf{V}^T indexed by “growth sequences”

T=\mu^1 \nearrow \mu^2 \nearrow \dots \nearrow \lambda,

where \mu^i is a diagram with i-cells, which encode the possible life histories of the multicellular organism \lambda \vdash d starting from the single-celled organism \mu^1=\Box. Obviously, each such growth history corresponds uniquely to a standard Young tableau of shape \lambda — the entry of a given cell encodes the time instant at which that cell was added. Thus, multiplicity-free branching gives us a basis \{\mathbf{e}_T \colon T \in \mathrm{SYT}(\lambda)\} of each irreducible representation \mathbf{V}^\lambda canonically determined up to scaling; this basis is known as the Young basis (or sometimes the Gelfand-Tsetlin basis). By Schur’s lemma, the subalgebra

\mathcal{Y}(d) = \mathrm{Span}(\mathcal{Z}(1),\mathcal{Z}(2),\dots,\mathcal{Z}(d)

coincides with the maximal commutative subalgebra of \mathcal{A}(d) of elements which act diagonally on the Young basis of every irreducible representation \mathbf{V}^\lambda; this algebra is known as the Young subalgebra (or Gelfand-Tsetlin subalgebra) of the group algebra \mathcal{A}(d), and its construction depends crucially on the fact that we have a natural inclusion \mathcal{A}(d-1) \hookrightarrow \mathcal{A}(d). Observe that the YJM elements Y_1,\dots,Y_d belong to the algebra \mathcal{Y}(d) (explaining why is a conceptually optimal way to answer one of the homework problems). The following Theorem is an enormously useful tool, both theoretically and computationally.

Theorem 1 (Diaconis-Greene,Murphy,Okounkov-Vershik): The commutative algebra \mathcal{Y}(d) is exactly the algebra of polynomials in the YJM elements Y_1,\dots,Y_d. Moreover, for each irreducible representation (\mathbf{V}^\lambda,R^\lambda) of \mathcal{A}(d), the action of Y_k on the corresponding Young basis \{\mathbf{e}_T \colon T \in \mathrm{SYT}(d)\} is given by

R^\lambda(Y_k)\mathbf{e}_T = c_T(k) \mathbf{e}_T,

where c_T(k) is the content (i.e. column index minus row index) of the cell in T containing the number k.

Example 2: Consider the group algebra of the symmetric group of degree d=6 acting in the irreducible representation \mathbf{V}^\lambda labeled by the staircase partition \lambda=(3,2,1), and let \mathbf{e}_T \in \mathbf{V}^\lambda be the Young basis vector corresponding to the standard tableau

T = \begin{matrix} 1 & 3 & 6 \\ 2 & 5 & {} \\ 4 & {} & {} \end{matrix}.

Theorem 1 says that \mathbf{e}_T is an eigenvector of each of the operators R^\lambda(Y_1),R^\lambda(Y_2),R^\lambda(Y_3),R^\lambda(Y_4),R^\lambda(Y_5),R^\lambda(Y_6), and that the corresponding eigenvalues are, respectively,

0,-1,1,0,-2,2.

Corollary 1: For any f \in \mathbb{C}[X_1,\dots,X_d]^{\mathrm{S}(d)} and any \lambda \vdash d, we have

\omega_f(\lambda) = f(c(\Box) \colon \Box \in \lambda),

where the right hand side denotes the evaluation of the symmetric polynomial f(X_1,\dots,X_d) on the numbers c(\Box_1),\dots,c(\Box_d), with \Box_1,\dots,\Box_d being an arbitrary enumeration of the cells of \lambda.

Example 2: Corollary 1 tells us that the eigenvalue of the symmetric polynomial

f(Y_1,Y_2,Y_3,Y_4,Y_5,Y_6) = Y_1^2+Y_2^2+Y_3^2+Y_4^2+Y_5^2+Y_6^2

acting in \mathbf{V}^{(3,2,1)} is

\omega_f(\lambda) = f(0,0,-1,1,-2,2) = 10.

We will black-box Theorem 1, and (possibly) return to its proof later, though I will post a resource where you can look it up if you wish. Our goal will be to mine some rather remarkable information from Theorem 1. There are two natural directions in which one might want to go.

First, there is the question of whether evaluation on the YJM elements in fact defines a surjective morphism

\mathbb{C}[X_1,\dots,X_d]^{\mathrm{S}(d)} \longrightarrow \mathcal{Z}(d).

If this were so, then for each \lambda \vdash d there would exist a symmetric polynomial f_\alpha(X_1,\dots,X_d) \in \mathbb{C}[X_1,\dots,X_d]^{\mathrm{S}(d)} such that

C_\alpha = f_\alpha(Y_1,\dots,Y_d,),

i.e. we could express each conjugacy class as a symmetric polynomial function of the YJM elements. This would in turn have the following character-theoretic consequence. One one hand, the eigenvalue \omega_\alpha(\lambda) of the conjugacy class C_\alpha acting in the irreducible representation \mathbf{V}^\lambda is given by

\omega_\alpha(\lambda) = |C_\alpha| \frac{\chi^\lambda_\alpha}{\dim \mathbf{V}^\lambda},

where as always the character \chi^\lambda_\alpha is by definition the trace of the operator R^\lambda(\pi) \in \mathrm{End}\mathbf{V}^\lambda, with \pi an arbitrary permutation in the conjugacy class C_\alpha. Indeed, on one hand we have

\mathrm{Tr} R^\lambda(C_\alpha) = \omega_\alpha(\lambda) \dim \mathbf{V}^\lambda,

since the trace of a scalar operator is simply its unique eigenvalue summed dimension-many times, and on the other hand

\mathrm{Tr} R^\lambda(C_\alpha) = \sum\limits_{\pi \in C_\alpha} \mathrm{Tr} R^\lambda(\pi) = |C_\alpha| \chi^\lambda_\alpha.

Corollary 1 would then give us the formula

\omega_\alpha(\lambda) = f_\alpha(c(\Box) \colon \Box \in \lambda),

which in turn gives us a formula for the irreducible character \chi^\lambda_\alpha as a function of the content alphabet of the Young diagram \lambda. In fact, this is all true, and is a consequence of combining Corollary 1 with the following classical theorem on the class algebra.

Theorem 2 (Farahat-Higman): Every conjugacy class C_\alpha \in \mathcal{Z}(d) can be expressed as a polynomial in the levels L_0,\dots,L_{d-1} of the Cayley graph of \mathrm{S}(d).

A good treatment of this approach to the irreducible characters of the symmetric groups has been given by our colleague Adriano Garsia, and I will post his lecture notes to our Piazza page. There is an inescapable downside to this seemingly miraculous result: the symmetric polynomials f_\alpha(X_1,\dots,X_d) are not given by any simple formula. This is in a sense inevitable, as the computation of the irreducible characters of the symmetric groups is a P^\sharp-complete problem, and consequently the quest for an explicit closed form is quixotic.

Instead, one can take the opposite approach, and start from the fact that there are many interesting (symmetric) polynomials f which we can write down explicitly, and Theorem 1 gives us a way to explicitly calculate the action of f(Y_1,\dots,Y_d) in irreducible representations. It is this direction that we shall follow in the coming lectures.

Math 202C: Lecture 3

In Lecture 2, we established the following unique factorization result for the symmetric group \mathrm{S}(d).

Theorem 1 (Jucys): Every permutation \pi \in \mathrm{S}(d) admits a unique factorization

\pi = (i_1\ j_1) \dots (i_r\ j_r)

into transpositions (i\ j), 1 \leq i<j\leq d, such that

j_1 < \dots < j_r.

Moreover, the number r of factors in this decomposition is r=|\pi|.

We now consider some algebraic ramifications of Jucys’ result. Let \mathcal{A}(d) denote the group algebra of the symmetric group. As you know from Math 202B, this is the d!-dimensional Hilbert space in which \mathrm{S}(d) sits as an orthonormal basis, and which supports the bilinear product inherited from the group law on \mathrm{S}(d). In fact, \mathbb{C}\mathrm{S}(d) is a *-algebra, with antilinear involution given by \pi^*=\pi^{-1}. All of this is true for arbitrary groups, and indeed you can view \mathcal{A} as a functor from the category of groups to the category of unital associative *-algebras which transports each group \mathrm{G} to its corresponding group algebra \mathcal{A}(\mathrm{G}).

Any subset of \mathrm{S}(d) may be identified with a vector in \mathbb{C}\mathrm{S}(d), simply by identifying it with the sum of its elements, and any family of disjoint subsets of \mathrm{S}(d) is linearly independent, since any two of its members are orthogonal in \mathbb{C}\mathrm{S}(d). In particular, for each \alpha \vdash d we may view the corresponding conjugacy class C_\alpha as an element of \mathbb{C}\mathrm{S}(d), the sum of all permutations of cycle type \alpha, and consider the linearly independent set in the group algebra,

\{ C_\alpha \colon \alpha \vdash d\}.

Let \mathcal{Z}(d) denote the subspace of \mathcal{A}(d) spanned by the conjugacy classes, the dimension of which is the number p(d) of partitions of d. Given C_\alpha,C_\beta \in \mathcal{Z}(d), we have

C_\alpha C_\beta = \sum\limits_{\pi \in \mathrm{S}(d)} c_{\alpha\beta}^\pi \pi,

where

c_{\alpha\beta}^\pi = \left| \{ (\rho,\sigma) \in C_\alpha \times C_\beta \colon \rho\sigma=\pi\}\right|.

Th number c_{\alpha\beta}^\pi in fact depends only on the conjugacy class of \pi, for if \rho\sigma=\pi then (\kappa\rho\kappa^{-1})(\kappa\sigma\kappa^{-1})=\kappa\pi\kappa^{-1} for any \kappa \in \mathrm{S}(d). Thus \mathcal{Z}(d) is a subalgebra of \mathbb{C}\mathrm{S}(d) called the class algebra, which is a particularly appropriate name for something encountered in an algebra class. We thus have

C_\alpha C_\beta = \sum\limits_{\gamma \vdash d} c_{\alpha\beta}^\gamma C_\gamma

where

c_{\alpha\beta}^\gamma = \left| \{ (\rho,\sigma) \in C_\alpha \times C_\beta \colon \rho\sigma \in C_\gamma\}\right|.

The numbers c_{\alpha\beta}^\gamma are called the connection coefficients (aka structure constants) of the class algebra \mathcal{Z}(d); they are the entries of the p(d) \times p(d) multiplication table for conjugacy classes. More generally, we can take arbitrary polynomials in conjugacy classes, and write them as linear combinations of conjugacy classes, and this often has enumerative meaning. For example, consider the class T=C_{(21^{d-2})} of transpositions. For any r \in \mathbb{N}, we have

T^r = \sum\limits_{j \in \mathrm{Fun}(r,d)}\ \sum\limits_{\substack{i \in \mathrm{Fun}(r,d) \\ i<j \text{ pointwise}}} (i(1)\ j(1)) \dots (i(r)\ j(r)),

and each term in the sum corresponds uniquely to an r-step walk on the Cayley graph \Gamma(d) of \mathrm{S}(d) beginning at \iota. Thus,

T^r = \sum\limits_{\pi \in \mathrm{S}(d)} W^r(\pi) \pi = \sum_{\alpha \vdash d} W^r(\alpha) C_\alpha,

which says that the combinatorial problem of counting r-step walks on the graph \Gamma(d) from the identity permutation to a permutation of cycle type \alpha is the same thing as the algebraic problem of resolving T^r \in \mathcal{Z}(d) into a linear combination of conjugacy classes.

For any \pi \in \mathrm{S}(d), we have

C_\alpha \pi = \sum\limits_\rho \rho\pi \quad\text{ and }\quad \pi C_\alpha= \sum\limits_\rho \pi\rho,

where both sums are over \rho \in C_\alpha, and the two sums are in fact the same since (\pi\rho\pi^{-1})\pi = \pi \rho. Thus, the class algebra is a subalgebra of the center of the group algebra. Conversely, if C \in \mathbb{C}\mathrm{S}(d) is a central element, then \pi C \pi^{-1}=C \pi \pi^{-1}=C, whence C is a linear combination of conjugacy classes, and we conclude that the class algebra \mathcal{Z}(d) coincides with the center of the group algebra \mathbb{C}\mathrm{S}(d). Again, all of the above is true for any group, and you can view \mathcal{Z} as a functor from groups to commutative, unital *-algebras which transports each group \mathrm{G} to its class algebra \mathcal{Z}(\mathrm{G}).

In addition to the conjugacy class themselves, the levels L_r of the Cayley graph of \mathrm{S}(d) belong to the class algebra, and are \mathbb{Z}_2-linear combinations of conjugacy classes,

L_r = \sum\limits_{\substack{\pi \in \mathrm{S}(d) \\ |\pi|=r}} \pi = \sum\limits_{\substack{\alpha \vdash d \\ \ell(\alpha)=d-r}} C_\alpha.

Lifted to the group algebra, Theorem 1 says that these central elements are given by the sums

L_r = \sum\limits_{\substack{j \in \mathrm{Fun}(r,d) \\ j \text{ strictly increasing}}}\ \sum\limits_{\substack{i \in \mathrm{Fun}(r,d) \\ i<j \text{ pointwise}}} (i(1)\ j(1)) \dots (i(r)\ j(r)),

where \mathrm{Fun}(r,d) is the set of functions from [r] to [d]. Equivalently, we have

L_r = \sum\limits_{\substack{ j \in \mathrm{Fun}(r,d) \\ j\text{ strictly increasing}}} Y_{j(1)} \dots Y_{j(r)},

where Y_1,Y_2,\dots,Y_d \in \mathbb{C}\mathrm{S}(d) are the sums

Y_j = \sum\limits_{\{i \in [d] \colon i<j\}} (i\ j), \quad 1 \leq j \leq d.

Note that Y_1 is an empty sum, hence equal to the zero element in \mathbb{C}\mathrm{S}(d). A nice way to picture these sums is to take the conjugacy class T \subset \mathrm{S}(d) of transpositions and view it as the d \times d strictly upper triangular matrix over \mathbb{C}\mathrm{S}(d) whose entry in row i and column j is the transposition (i\ j) whenever i<j, and 0 \in \mathbb{C}\mathrm{S}(d) when not. Then Y_j is simply the jth column sum of the matrix T. For example, in the case d=4, we have

\begin{bmatrix} 0 & (1\ 2) & (1\ 3) & (1\ 4) \\ 0 & 0 & (2\ 3) & (2\ 4) \\ 0 & 0 & 0 & (3\ 4) \\ 0 & 0 & 0 & 0\end{bmatrix},

and

Y_1 = 0 \\ Y_2 = (1\ 2) \\ Y_3 = (1\ 3) + (2\ 3) \\ Y_4=(1\ 4)+(2\ 4) +(3\ 4).

The transpositions sums Y_1,Y_2,\dots,Y_d are known as the Young-Jucys-Murphy elements of the group algebra \mathbb{C}\mathrm{S}(d). Note that the YJM elements are non-central, since they correspond to proper subsets of a conjugacy class (namely, the class T), and therefore cannot be expressed as linear combinations of conjugacy classes. Nevertheless, we have the following.

Proposition 2: The YJM elements commute with one another.

Proof: Problem set 2.

In view of the commutativity of the elements Y_1,\dots,Y_d, we may equivalently write Theorem 1 as

L_r = \sum\limits_{\substack{R \subseteq [d] \\ |R|=r}} \prod\limits_{j \in R} Y_j,

which says that L_r is the sum of all squarefree monomials of degree r in the commuting elements Y_1,\dots,Y_d. But this is a polynomial you know from Math 202B: it is the elementary symmetric polynomial e_r of degree r in d variables. In particular, Theorem 1 is equivalent to the following polynomial identity in the class algebra \mathcal{Z}(d).

Theorem 2: L_r = e_r(Y_1,\dots,Y_d).

As you know from Math 202B, the elementary symmetric polynomials are important because they express the coefficients of a polynomial as functions of its roots (the inverse problem, expressing the roots as functions of the coefficients, is harder). Thus Theorem 2 is equivalent to the following factorization in the class algebra.

Theorem 3: For any \hbar \in \mathbb{C}, we have

\sum\limits_{r=0}^d \hbar^r L_r = \prod\limits_{i=1}^d (\iota + \hbar Y_i).

In addition to being a direct incarnation of the relationship between the roots of a polynomial and its coefficients, there is another reason why the elementary symmetric polynomials are deserving of their name. Let X_1,\dots,X_d be commuting indeterminates, and consider the corresponding polynomial algebra \mathbb{C}[X_1,\dots,X_d]. As you know from Math 202B, polynomials f \in \mathbb{C}[X_1,\dots,X_d] satisfying

f(X_{\sigma(1)},\dots,X_{\sigma(d)}) = f(X_1,\dots,X_d) \quad\text{for all }\sigma \in \mathrm{S}(d)

are called symmetric polynomials, and their totality forms a subalgebra of \mathbb{C}[X_1,\dots,X_d] denoted \mathbb{C}[X_1,\dots,X_d]^{\mathrm{S}(d)}. This subalgebra is itself a polynomial algebra.

Theorem 3 (Newton): We have \mathbb{C}[X_1,\dots,X_d]^{\mathrm{S}(d)} = \mathbb{C}[e_1,\dots,e_d].

Theorem 3 is often referred to as the fundamental theorem on symmetric polynomials. A proof is given in Professor Sam’s 202B notes (Theorem 3.3.4). Combining Theorem 3 with Theorem 2, we learn the following: even though the JM elements Y_1,\dots,Y_d do not themselves belong to the class algebra \mathcal{Z}(d), any symmetric function of them does.

Theorem 4: For any f \in \mathbb{C}[X_1,\dots,X_d]^{\mathrm{S}(d)}, the group algebra element f(Y_1,\dots,Y_d) \in \mathbb{C}\mathrm{S}(d) is central.

Theorem 4 is quite remarkable, and has many interesting consequences. Here is one. Suppose we weaken our definition of strictly monotone walks on the edge-labeled Cayley graph \Gamma(d) of the symmetric group \mathrm{S}(d) to just plain monotone walks, meaning walks which have the feature that the labels of the edges they traverse form a weakly increasing sequence.

Theorem 5: The number \vec{W}^r(\pi) depends only on the conjugacy class of \pi.

Proof: The complete homogeneous symmetric polynomial of degree r in commuting variables X_1,\dots,X_d is the sum of all monomials of degree r in these variables (hence the name “complete”). Equivalently,

h_r(X_1,\dots,X_d) = \sum\limits_{\substack{j \in \mathrm {Fun}(r,d) \\ j\text{ weakly increasing}}} X_{j(1)} \dots X_{j(d)}.

Evaluating h_r on the YJM elements, we obtain

h_r(Y_1,\dots,Y_d) = \sum\limits_{\substack{j \in \mathrm{Fun}(r,d) \\ j \text{ weakly increasing}}}\ \sum\limits_{\substack{i \in \mathrm{Fun}(r,d) \\ i<j \text{ pointwise}}} (i(1)\ j(1)) \dots (i(r)\ j(r)),

which is precisely the sum of all weakly monotone walks on \Gamma(d) beginning at \iota. Thus,

h_r(Y_1,\dots,Y_d) = \sum\limits_{\pi \in \mathrm{S}(d)} \vec{W}^r(\pi) \pi.

But since h_r is a symmetric polynomial, Theorem 4 guarantees that h_r(Y_1,\dots,Y_d) \in \mathcal{Z}(d), so that we must have \vec{W}^r(\pi)=\vec{W}^r(\kappa\pi\kappa^{-1}) for all \pi,\kappa \in \mathrm{S}(d), as claimed.

— Q.E.D.

No direct combinatorial proof of Theorem 5 is known.

Exercise: In the symmetric group \mathrm{S}(4), write down all factorizations of the full cycle \gamma = (1\ 2\ 3\ 4) into three transpositions; identify the factorizations which are weakly monotone, and those which are strictly monotone. Repeat this procedure with the full cycle \gamma'=(1\ 3\ 2\ 4).

Math 202C: Lecture 2

Before leaving the world of linear algebra for multilinear algebra (i.e. tensors), let us reformulate the spectral theorem entirely “upstairs” in \mathrm{End} \mathbf{V} (as in Lecture 1, \mathbf{V} is complex vector space of dimension N < \infty equipped with a scalar product).

Theorem 1: If A \in \mathrm{End}\mathbf{V} is selfadjoint, then there is a positive integer n \leq N, real numbers \lambda_1>\dots>\lambda_n, and selfadjoint operators P_1,\dots,P_n in \mathrm{End}\mathbf{V} satisfying

P_i^2 = P_i, \quad 1 \leq i \leq n

and

P_iP_j = 0 \quad 1 \leq i \neq j \leq n

such that

A = \lambda_1 P_1 + \dots + \lambda_n P_n.

The operators P_1,\dots,P_n above are called orthogonal idempotents. The “idempotent” part is P_i^2=P_i, and this term is applied more generally in ring theory to refer to anything equal to its own square. The “orthogonal” part is P_iP_j = 0, the zero operator in \mathrm{End}\mathbf{V}, when i \neq j. This is a useful representation of A, because orthogonal idempotence immediately implies that

A^k = \lambda_1^k P_1 + \dots + \lambda_n^k P_n.

Given your Math 202A background, you have realized immediately that Theorem 1 is equivalent to the eigenvalue/eigenvector form of the Spectral Theorem discussed in Lecture 1, because the set \mathcal{P}(\mathbf{V}) \subset \mathrm{End}\mathbf{V} of selfadjoint idempotents acting in \mathbf{V} is in canonical bijection with the lattice \mathcal{L}(\mathbf{V}) of subspaces of \mathbf{V}, since these are exactly the operators which act by orthogonal projection onto a subspace: for any W \in \mathcal{L}(\mathbf{V}), the corresponding P \in \mathcal{P}(\mathbf{V}) maps \mathbf{v} \in \mathbf{V} to the nearest point of the subspace \mathbf{W}. The correspondence between selfadjoint idempotents — which are often simply called orthogonal projections — and subspaces even holds in the infinite-dimensional setting provided \mathbf{V} is complete with respect to the norm \|\mathbf{v}\|=\sqrt{\langle \mathbf{v},\mathbf{v}\rangle}, i.e. \mathbf{V} is a Hilbert space, provided we restrict to closed subspaces. Thus the Grassmannian \mathrm{Gr}_k(\mathbf{V}) is for all practical purposes equivalent to the set \mathrm{Pr}_k(\mathbf{V}) of selfadjoint idempotents/orthogonal projections of rank k. Theorem 1 above is obtained from the eigenvalue/eigenvector form of the Spectral Theorem by replacing the eigenspaces of A with the orthogonal projections onto those spaces, and conversely Theorem 1 gives the eigenvalue/eigenvector form of the Spectral Theorem with \lambda_1>\dots>\lambda_n being the distinct eigenvalues of A whose corresponding eigenspaces are the images of the orthogonal idempotents P_1,\dots,P_n.

Let us close out this brief recap of eigenvectors and eigenvalues (i.e. Math 202A material) by recalling the algebraic criterion for diagonalizability. An operator A \in \mathrm{End}\mathbf{V} is said to be normal if it commutes with its adjoint, AA^*=A^*A. Obviously, selfadjoint operators X are normal, XX^*=X^*X=X^2, and so are unitary operators U, since UU^*=U^*U=I.

Theorem 2: Given A \in \mathrm{End} \mathbf{V}, there exists an orthonormal basis \mathbf{e}_1,\dots,\mathbf{e}_N of \mathbf{V} such that A\mathbf{e}_i=\lambda_i\mathbf{e}_i for each 1 \leq i \leq N, where \lambda_1,\dots,\lambda_N \in \mathbb{C} are scalars, if and only if A is normal. Equivalently, A is a \mathbb{C}-linear combination of pairwise orthogonal selfadjoint idempotents if and only if AA^*=A^*A.

After revisiting this essential piece of 202A technology, my plan was to jump into multilinear algebra — tensor products, exterior products, symmetric products, and their applications to eigenvalue inequalities — and work up to the construction of Schur functors, which interpolate between these using the representation theory of the symmetric groups. This would be an application of the material you learned in 202B wherein the goal is to enhance the material you learned in Math 202A. However, I have since changed my mind. After reviewing Professor Sam’s 202B lecture notes (which I am posting to Piazza for your reference), and speaking briefly with Steven, I have realized that you already have some exposure to this material. Even though there is much left to be done in this direction, namely starting from Section 6 in Steven’s notes and working up to a proof of Theorem 6.4.1 via Schur-Weyl duality, I have ultimately decided to leave this for the second half of the course, and begin instead with some very natural and appealing applications of symmetric group representations which we can sink our teeth into immediately. Also, this will give you exposure to material which cannot be looked up in textbooks, or at least cannot be found presented in the way we’ll do it.

Definition 1: For each d \in \mathbb{N}, the symmetric group \mathrm{S}(d) of degree d consists of the set of bijective functions

\sigma \colon [d] \to [d],

i.e. automorphisms of [d]:=\{1,\dots,d\}, with the group law being composition of functions. Elements of \mathrm{S}(d) are called permutations.

As you know from Math 202B (and likely much before), any permutation \sigma can be factored into a product of disjoint cyclic permutations, and this factorization is unique up to the order of the factors. Our starting point will be a rather different unique factorization theorem for permutations. To explain, let us consider \mathrm{S}(d) from the perspective of geometric group theory. Let T \subset \mathrm{S}(d) denote the set of transpositions, i.e T is the conjugacy class of 2-cycles,

(i\ j), \quad 1 \leq i<j \leq d.

As you are also no doubt aware, T generates \mathrm{S}(d), which means that any permutation \sigma \in \mathrm{S}(d) can be written as a product of transpositions,

\sigma = \tau_1 \dots \tau_r, \quad r \in \mathbb{N}_0,\ \tau_i \in T.

This representation is very far from being unique, and indeed the question of counting the number of factorizations of a given permutation into a given number of transpositions is a classical problem in algebraic combinatorics known as the Hurwitz enumeration problem; despite its elementary formulation, this question has deep ties to enumerative algebraic geometry and mathematical physics, some of which we will hopefully be able to discuss. Nevertheless, for each \sigma there is a minimal value of r for which such a factorization exists, and this is called the word norm of \sigma and denoted |\sigma|_T, where the subscript reminds that this quantity depends on the choice of generating set T. We will omit this subscript in order to lighten the notation, and if we switch to a different generating set T' inducing a different word norm on \mathrm{S}(d), we will say so loud and clear. The following is elementary combinatorics, but you should take the time to prove it.

Proposition 1: The length r of any factorization \sigma=\tau_1\dots\tau_r of a permutation \sigma \in \mathrm{S}(d) into transpositions satisfies r=|\sigma|+2k for some k \in \mathbb{N}_0. That is, all factorizations of \sigma into transpositions have the same parity.

Somewhat less elementary but no less important is the fact that there is a tight connection between the word norm of a permutation \sigma and the number of factors c(\sigma) in its unique decomposition into disjoint cycles.

Proposition 2 (join-cut relation): We have |\sigma|=d-c(\sigma).

The key idea in the proof of Proposition 2 is expressed in its name: multiplying a permutation by a transposition will either join two of its cycles into one, or cut one of its cycles into two. Thus,

The word norm makes the symmetric group \mathrm{S}(d) into a metric space space.

Proposition 3: The function

\mathrm{d} \colon \mathrm{S}(d) \times \mathrm{S}(d) \longrightarrow \mathbb{N}_0

defined by \mathrm{d}(\rho,\sigma) = |\rho^{-1}\sigma| is a metric.

We may thus for example speak of the open ball of radius r centered at a given permutation \rho,

B_r(\rho) = \{\sigma \in \mathrm{S}(d) \colon \mathrm{d}(\rho,\sigma) < r\},

and its boundary, the sphere

\partial B_r(\rho) = \{\sigma \in \mathrm{S}(d) \colon \mathrm{d}(\rho,\sigma)= r\}.

The metric \mathrm{d} is nothing but the graph theory distance (or geodesic distance) on \Gamma(d) = \mathrm{Cay}(\mathrm{S}(d),T), the Cayley graph of the symmetric group as generated by the conjugacy class T \subset \mathrm{S}(d) of transpositions. The vertex set of this graph is \mathrm{S}(d) itself, and two permutations \rho,\sigma \in \mathrm{S}(d) are joined by an edge if and only if there is \tau \in T such that \sigma = \rho\tau. Note that this is an undirected graph because \tau^2=\iota is the identity permutation for every \tau \in T, so that \sigma=\rho\tau is equivalent to \rho=\sigma\tau. Note also that \Gamma(d) is a regular graph of degree |T|={d \choose 2}. Another feature of \Gamma(d) is that it is a graded graph: it decomposes into a disjoint union of independent sets or “levels,”

\mathrm{S}(d) = \bigsqcup\limits_{r=0}^{d-1} L_r,

with L_r being the sphere of radius r centered at the identity permutation \iota. By the join-cut relation, L_r may also be described as the set of permutations whose disjoint cycle decomposition contains d-r factors.

The Hurwitz enumeration problem has an equivalent formulation in terms of the Cayley graph: it asks for the number W^r(\pi) of r-step walks from \iota to \pi on \Gamma(d). This description also points to a refinement of the Hurwitz problem. Let us introduce the edge-labeling on \Gamma(d) in which each edge corresponding to the transposition (i\ j) is marked by j, the larger of the two numbers it interchanges. We will call this the Biane-Stanley edge labeling of the symmetric group, after Philippe Biane and Richard Stanley, in whose work it appears implicitly.

\mathrm{S}(4) with Biane-Stanley labeling.

Given an r-step walks \rho \to \sigma on \mathrm{S}(d), we define its signature to be the corresponding sequence j_1,\dots,j_r of edge labels. We say that a walk is strictly monotone if its signature is a strictly increasing sequence. Thus, an r-step strictly monotone walk from \iota to \sigma is the same thing as a strictly monotone factorization of \sigma into r transpositions, i.e. a factorization

\sigma = (i_1\ j_1) \dots (i_r\ j_r)

such that

j_1 < \dots < j_r.

Theorem (unique monotone factorization): Every permutation \pi \in \mathrm{S}(d) admits a unique strictly monotone factorization, and the length of this factorization is equal to |\pi|. That is,

\vec{\vec{W}}^r(\pi) = \begin{cases} 1, \text{ if }r=|\pi| \\ 0, \text{ otherwise}\end{cases}.

Proof: The proof is by induction on d.

In the base case we have \mathrm{S}(2) = \{\iota,\tau\} with \iota = (1)(2) and \tau=(1\ 2). The complete answer to the Hurwitz enumeration problem for d=2 is thus given by

W^r(\iota)=\begin{cases} 1,\text{ if } r \text{ even} \\ 0, \text{ if }r \text{ odd } \end{cases}

and

W^r(\tau)=\begin{cases} 1, \text{ if } r \text{ odd } \\0, \text{ if }r\text{ even}\end{cases}.

This corresponds to the fact that if \pi \in \mathrm{S}(2) and r \in \mathbb{N}_0 have the same parity, then we have the unique length r factorization

\pi=\underbrace{(1\ 2) \dots (1\ 2)}_{r \text{ factors}},

but no factorization exists if \pi and r have opposite parity. Imposing the strictly monotone constraint whittles this down to

\vec{\vec{W}}^r(\iota)=\begin{cases} 1, \text{ if }r=0 \\ 0, \text{ otherwise} \end{cases}

and

\vec{\vec{W}}^r(\tau) = \begin{cases} 1, \text{ if } r=1 \\ 0, \text{ otherwise} \end{cases}.

For the inductive step, there are two cases to consider. Let d>2 and let \pi \in \mathrm{S}(d) be any permutation. If d is a fixed point of \pi, i.e. \pi(d)=d, then \pi \in \mathrm{S}(d-1) has a unique strictly monotone factorization by the induction hypothesis. If on the other hand \pi \in \mathrm{S}(d) is one of the

d! - (d-1)! = (d-1)(d-1)!

permutations which contains d in its support (meaning \pi(d)\neq d), then \pi is of the form

\pi = \pi'(i\ d)

for a unique pair \pi' \in \mathrm{S}(d-1) and i \in \{1,\dots,d-1\}. By the induction hypothesis, \pi' admits a unique strictly monotone factorization of length equal to |\pi'|_{d-1}, and the result follows.

—- Q.E.D.

Graphically, the unique monotone factorization theorem gives a presentation of the symmetric group \mathrm{S}(d) as a starlike tree, which we call the Jucys tree after the Lithuanian physicist Algimantas Adolfas Jucys (not to be confuse with his father, the Lithuanian physicist Adolfas Jucys), in whose work it appears implicitly. We will embed this construction in the group algebra \mathbb{C}\mathrm{S}(d) in Lecture 3, and start using it in conjunction with linear representations of the group algebra.

Jucys tree for \mathrm{S}(4).

Math 202C: Lecture 1

Throughout this lecture, \mathbf{V} is a finite-dimensional complex vector space equipped with a scalar product \langle \cdot,\cdot \rangle, which is by convention a linear function of its second argument. The scalar product on \mathbf{V} determines various symmetry classes in \mathrm{End}\mathbf{V}, the algebra of linear operators A \colon \mathbf{V} \to \mathbf{V}, and the study of these symmetry classes is one of the main components of Math 202A. In particular, recall that an operator A \in \mathrm{End}\mathbf{V} is said to be selfajdoint (or Hermitian) if

\langle \mathbf{v},A\mathbf{w} \rangle = \langle A\mathbf{v},\mathbf{w}\rangle, \quad \mathbf{v},\mathbf{w} \in \mathcal{H}.

Hermitian operators are the subject of the Spectral Theorem, which may be formulated as follows.

Theorem 1: If A \in \mathrm{End} \mathbf{V} is selfadjoint, then there exists an orthonormal basis \mathbf{e}_1,\dots\mathbf{e}_N of \mathbf{V} such that A\mathbf{e}_i = \lambda_i\mathbf{e}_i, where \lambda_1 \geq \dots \geq \lambda_N are real numbers.

Theorem 1 is the eigenvalue/eigenvector form of the Spectral Theorem. We will give a proof of this result which iteratively constructs the eigenvalues/eigenvectors of A as the maxima/maximizers of the (necessarily real-valued) quadratic form Q_A(\mathbf{v}) = \langle \mathbf{v},A\mathbf{v}\rangle on a nested sequence of compact sets in \mathbf{V}. Instead of invoking the Fundamental Theorem of Algebra to claim the existence of a root of the characteristic polynomial, this argument calls the Extreme Value Theorem as a subroutine.

Lemma 1: If B \colon \mathbf{V} \to \mathbf{V} is a selfadjoint operator such that the affiliated quadratic form Q_B(\mathbf{v})=\langle \mathbf{v},B\mathbf{v}\rangle is nonnegative, then any zero of Q_B \colon \mathbf{V} \to \mathbb{R} is in the kernel of B.

Proof: In fact, the argument does not use the finite-dimensionality of \mathbf{V}. Let \mathbf{z} \in \mathbf{V} be such that Q_B(\mathbf{z})=0. Consider a real perturbation of this zero in an arbitrary direction, i.e. consider Q_B(\mathbf{z}+t\mathbf{h}), where t \in \mathbb{R} is arbitrary and \mathbf{h} \in \mathbf{V} is a fixed unit vector. We then have

Q_B(\mathbf{z}+t\mathbf{h}) = \langle \mathbf{z}+t\mathbf{h},B(\mathbf{z}+t\mathbf{h})\rangle=Q_B(\mathbf{z})+t\langle \mathbf{z},B\mathbf{h} \rangle+t\langle\mathbf{h},B\mathbf{z}\rangle+t^2Q_B(\mathbf{h}).

The first term is zero by hypothesis. The middle terms are the real number t times

\langle \mathbf{z},B\mathbf{h} \rangle+\langle\mathbf{h},B\mathbf{z}\rangle = \langle \mathbf{z},B\mathbf{h} \rangle+\langle B\mathbf{h},\mathbf{z}\rangle = \langle \mathbf{z},B\mathbf{h} \rangle+\overline{\langle \mathbf{z},B\mathbf{h} \rangle} = 2 \Re \langle \mathbf{z},B\mathbf{h} \rangle.

Thus by hypothesis the function

f(t)=2t \Re \langle \mathbf{z},B\mathbf{h} \rangle +t^2Q_B(\mathbf{h})

is a nonnegative function of the real variable t. This function has the asymptotics

f(t) \sim 2t \Re \langle \mathbf{z},B\mathbf{h} \rangle

as t \to 0, so it must be the case that \Re \langle \mathbf{z},B\mathbf{h} \rangle=\Re \langle B\mathbf{z},\mathbf{h} \rangle=0, otherwise f(t) would change sign at t=0, which it does not, being a nonnegative function. Repeating the above with a pure imaginary perturbation it\mathbf{h}, we get that \Im \langle \mathbf{z},B\mathbf{h} \rangle =\Im \langle B\mathbf{z},\mathbf{h} \rangle=0 as well. Thus B\mathbf{z} is orthogonal to every unit vector \mathbf{h}, whence it must be the zero vector, B\mathbf{z}=\mathbf{0}_\mathbf{V}.

— Q.E.D.

Proof of Theorem 1: Since \mathbf{V} is finite-dimensional, its unit sphere

S_1 = \{\mathbf{v} \in \mathbf{V} \colon \|\mathbf{v}\|=1\}

is compact, and the function Q_A \colon \mathbf{V} \to \mathbb{R} is continuous. Thus, by the Extreme Value Theorem, there exists a unit vector \mathbf{e}_1 \in \mathbf{V} at which Q_A attains its maximum value on S_1, i.e. we have

Q_A(\mathbf{v}) \leq \lambda_1

for all unit vectors \mathbf{v}. But this is equivalent to

\langle \mathbf{v},A\mathbf{v}\rangle \leq \lambda_1 \langle \mathbf{v},\mathbf{v}\rangle

for all unit vectors \mathbf{v}, and in this form it is clear that the inequality actually holds for arbitrary vectors \mathbf{v} \in \mathbf{V}, i.e. we have

\langle \mathbf{v},B_1\mathbf{v} \rangle \geq 0

for all \mathbf{v} \in \mathbf{V}, where

B_1=\lambda_1I -A.

Note that B_1 \in \mathrm{End}\mathbf{V} is selfadjoint (indeed, selfadjoint operators in \mathrm{End}\mathbf{V} form a real vector space), and the above says that the associated quadratic form Q_{B_1} is nonnegative. Since Q_{B_1}(\mathbf{e}_1)=0 by construction, we may invoke Lemma 1 to conclude that \mathbf{e}_1 is in the kernel of B_1=\lambda_1I-A, i.e. \mathbf{e}_1 is an eigenvector of A with eigenvalue \lambda_1.

Consider now the orthogonal complement

\mathbf{e}_1^\perp = \{\mathbf{v} \in \mathbf{V} \colon \langle \mathbf{e}_1,\mathbf{v} \rangle = 0\},

i.e. the (N-1)-dimensional hyperplane in \mathbf{V} perpendicular to the line through \mathbf{e}_1. Let S_2 be the unit sphere in \mathbf{e}_1^\perp, let \lambda_2\leq \lambda_1 be the maximum of the quadratic form Q_A on S_2 \subset S_1, and let \mathbf{e}_2 \in S_2 be a point at which the maximum is achieved. By exactly the same argument as above, we get that \mathbf{e}_2 is an eigenvector of A with eigenvalue \lambda_2. Indeed, we can iterate the argument N times to obtain an orthonormal set \{\mathbf{e}_1,\dots,\mathbf{e}_N\} of eigenvectors with eigenvalues \lambda_1 \geq \dots \geq \lambda_N.

— Q.E.D.

2. The subspace lattice

Let \mathcal{L}(\mathbf{V}) denote the set of all subspaces of \mathbf{V}, including the trivial subspace \{\mathbf{0}_\mathbf{V}\} and \mathbf{V} itself. This set carries a natural partial order: given \mathbf{W}_1,\mathbf{W}_2 \in \mathcal{L}(\mathbf{V}), we declare \mathbf{W}_1 \leq \mathbf{W}_2 if and only if \mathbf{W}_1 is a subspace of \mathbf{W}_2. In fact, \mathcal{L}(\mathbf{V}) is a graded poset in which the rank of a subspace \mathbf{W} \in \mathcal{L}(\mathbf{V}) is its dimension \dim \mathbf{W}. The graded poset \mathcal{L}(\mathbf{V}) is moreover a lattice: the largest subspace contained in each of two given subspaces \mathbf{W}_1 and \mathbf{W}_2 is their intersection, and the smallest subspaces containing both of them is their span. Finally, the lattice \mathcal{L}(\mathbf{V}) is self-dual: the map \mathbf{W} \to \mathbf{W}^\perp sending a subspace to its orthogonal complement is an order-reversing involution.

In formulating the above, we have just encountered our first example of a functor: we can view \mathcal{L} as a covariant functor from the category of vector spaces to the category of posets. Indeed, if \mathbf{V}_1,\mathbf{V}_2 are vector spaces and T \in \mathrm{Hom}(\mathbf{V}_1,\mathbf{V}_2) is a linear transformation, we get an order-preserving function \mathcal{L}(T) \in \mathrm{Hom}(\mathcal{L}(\mathbf{V}_1),\mathcal{L}(\mathbf{V}_2)) defined by

\mathcal{L}(T)\mathbf{W}=T(\mathbf{W}).

In this course we will be considering quite a few more functors, most of which will be endofunctors from the category of vector spaces to itself. The most basic of these are tensor powers, symmetric powers, and exterior powers, which are the basic operations of multilinear algebra and are closely related to the physical concept of Fock space, which is intended to capture the notion of symmetry classes of elementary particles in quantum mechanics — bosons and fermions and such.

3. Flags and Grassmannians

Very loosely speaking, vector spaces are continuous versions of sets. A finite set E=\{\mathbf{e}_1,\dots,\mathbf{e}_N\} might represent the possible states of some physical system, while the free vector space \mathbf{V}=\mathbb{C}[E] over this set allows for superpositions of states, i.e. linear combinations of them. Associated to E is its lattice of subsets \mathcal{L}(E), where meet and join are intersection and union. This is called a Boolean lattice, and it is graded by cardinality rather than dimension. Sometimes \mathcal{L}(E) is referred to as a hypercube, because its elements can be thought of as length N bitstrings, and these may in turn be thought of as the vertices of a polytope in \mathbf{R}^N which is a cube when N=3.

In a general poset, a chain is a totally ordered subset, while an antichain is a subset which does not contain any comparable pairs of elements. A chain or antichain is said to be saturated if adding any new element will cause it to no longer be a chain or antichain. It is easy to spot saturated chains in the Boolean lattice \mathcal{L}(E) — they are just sequences E_0,E_1,\dots,E_N of subsets of E such that |E_k|=k, and there are factorially many of these. It is equally easy to spot N+1 saturated antichains, namely the collections A_k \subset \mathcal{L}(E) consisting of all subsets of E of cardinality k. The cardinality of A_k itself is then {N \choose k}, and it is a famous result in extremal combinatorics that the size of the largest antichain in the Boolean lattice \mathcal{L}(E) is the largest number in the Nth row of Pascal’s triangle.

The lattice of subspaces of a finite-dimensional vector space is in many ways analogous to the lattice of subsets of a finite set. Indeed, if one could define a field with one element in a satisfactory way, then the two notions would coincide for vector spaces over \mathbb{F}_1. This is problematic, but one can analyze the subspace lattice of a vector space defined over \mathbb{F}_q without difficulty, and count chains, antichains, etc. For example, the antichain consisting of k-dimensional subspaces is the Gaussian binomial coefficient {N \choose k}_q, which degenerates to the ordinary binomial coefficient in the q \to 1 limit, almost as if \mathbf{F}_1 really did exist…

Anyway, we are working over \mathbb{C}, so everything is infinite. Chains in \mathcal{L}(\mathbf{V}) are typically referred to as flags, though this term is more often than not reserved for chains that contain the trivial subspace and the \mathbf{V} itself. Saturated chains are called complete flags: they are sequences

\mathbf{V}_0 \leq \mathbf{V}_1 \leq \dots \leq \mathbf{V}_N

such that \dim\mathbf{V}_k =k. The analogue of the antichain A_k in the Boolean lattice consisting of subsets of cardinality k is the collection of k-dimensional subspaces of \mathbf{V}; these antichains are called Grassmannians and denoted \mathrm{Gr}_k(\mathbf{V}). We are going to discuss Grassmannians of finite-dimensional complex vector spaces in some detail, and I hope that we will have time to discuss a certain special infinite-dimensional Grassmannian known as the Sato Grassmannian. For now, we concentrate on the following fundamental relationship between eigenvalues of selfadjoint operators and Grassmannians.

3. Eigenvalues as extremizers

Let A \in \mathrm{End}\mathbf{V} be a selfadjoint operator, and define a function

M_A \colon \mathcal{L}(\mathbf{V}) \to \mathbb{R}

by

M_A(\mathbf{W}) = \max\limits_{\substack{\mathbf{w} \in \mathbf{W} \\ \|\mathbf{w}\|=1}} \langle \mathbf{w},A\mathbf{w}\rangle.

Thus, M_A(\mathbf{W}) assigns to each subspace \mathbf{W} \leq \mathbf{V} the maximum of the quadratic form associated to A on the unit sphere in \mathbf{W}. In particular, we know from our proof of the Spectral Theorem above that the largest eigenvalue of A is given by

\lambda_1 = M_A(\mathbf{V}).

The following result, known as the Courant-Fisher minimax theorem, generalizes this by giving a variational characterization of each individual eigenvalue \lambda_k of A.

Theorem 2: If A \in \mathrm{End}\mathbf{V} is a selfadjoint operator with eigenvalues \lambda_1 \geq \dots \geq \lambda_N, then for each 1 \leq k \leq N we have

\lambda_k = \min \{M_A(W) \colon W \in \mathrm{Gr}_{N-k+1}(\mathbf{V})\}.

That is, the kth largest eigenvalue of A is the minimum value of M_A over the Grassmannian of (N-k+1)-dimensional subspaces of \mathbf{V}.

Proof: Let \{\mathbf{e}_1,\dots,\mathbf{e}_N\} be an orthonormal basis of \mathbf{V} such that A\mathbf{e}_i=\lambda_i\mathbf{e}_i, \quad 1 \leq i \leq N. For each 1 \leq k \leq N, consider the space

\mathbf{S}_k = \mathrm{Span} \{\mathbf{e}_1,\dots,\mathbf{e}_k\}

spanned by the eigenvectors corresponding to the k largest eigenvalues of A, and also set \mathbf{S}_0 = \mathbf{V}. In the proof of Theorem 1, we showed that

\lambda_k = M_A(\mathbf{S}_{k-1}^\perp),

so that indeed the kth largest eigenvalue \lambda_k is given by evaluating the function M_A at a point of the Grassmannian \mathrm{Gr}_{N-k+1}(\mathbf{V}), namely \mathbf{S}_{k-1}^\perp. It remains to show that the evaluation of M_A on every other subspace of dimension N-k+1 is at least \lambda_k.

To achieve this end, we establish the following squeeze inequality for the values of the quadratic form Q_A on the subspace spanned by the first k eigenvectors: for all \mathbf{v} \in \mathbf{S}_k we have

\lambda_k \langle \mathbf{v},\mathbf{v}\rangle \leq \langle \mathbf{v},A\mathbf{v}\rangle \leq \lambda_1\langle \mathbf{v},\mathbf{v} \rangle.

Indeed, let \mathbf{v}=x_1\mathbf{e}_1+\dots+x_k\mathbf{e}_k be any vector in \mathbf{S}_k. Using orthonormality of the eigenvectors we calculate

\langle \mathbf{v},A\mathbf{v} \rangle = \lambda_1x_1^2+\dots+\lambda_kx_k^2 \leq \lambda_1(x_1^2+\dots+x_k)^2 \leq \lambda_1\langle \mathbf{v},\mathbf{v}\rangle,

and the lower bound is established in exactly the same way.

Now let \mathbf{W} \in \mathrm{Gr}_{N-k+1}(\mathbf{V}) be arbitrary. Since

\dim \mathbf{S}_k + \dim \mathbf{W} = k + N-k+1 = N+1 > \dim \mathbf{V},

the intersection of the subspaces \mathbf{S}_k and \mathbf{W} is at least one-dimensional (see Problem set 1). Let \mathbf{e} be a unit vector in \mathbf{S}_k \cap \mathbf{W}. Since \mathbf{e} \in \mathbf{S}_k, we have Q_A(\mathbf{e}) \geq \lambda_k by the above inequality. But also \mathbf{e} \in \mathbf{W}, which shows that M_A(\mathbf{W}) \geq \lambda_k, as required.

— Q.E.D.

4. Adjoint orbits

Any operator U \in \mathrm{End}\mathbf{V} which preserves the scalar product,

\langle U\mathbf{v},U\mathbf{w} \rangle = \langle \mathbf{v},\mathbf{w} \rangle, \quad \mathbf{v},\mathbf{w} \in \mathbf{V},

is called a unitary operator. It is clear that the set \mathrm{U}(\mathbf{V}) of unitary operators on \mathbf{V} is a group, indeed a subgroup of the general linear group \mathrm{GL}(\mathbf{V}). Note that the former depends on the scalar product but the latter does not; one may equivalently say that invertible operators map bases to bases, while unitary operators map orthonormal bases to orthonormal bases.

Let \lambda_1 \geq \dots \geq \lambda_N be given real numbers, and let \mathcal{O}_\lambda \subset \mathrm{End}\mathbf{V} be the set of selfadjoint operators on \mathbf{V} with this given spectrum.

Theorem 3: Choose any orthonormal basis \mathbf{e}_1,\dots,\mathbf{e}_N \in \mathbf{V}, and let \Lambda \in \mathrm{End}\mathbf{V} be the selfadjoint operator defined by

\Lambda \mathbf{e}_i = \lambda_i \mathbf{e}_i, \quad 1 \leq i \leq N.

Then the isospectral set \mathcal{O}_\lambda \subset \mathrm{End}\mathbf{V} is given by

\mathcal{O}_\lambda = \{U\Lambda U^{-1} \colon U \in \mathrm{U}(\mathbf{V}).

Proof: First we check that \mathcal{O}_\lambda consists entirely of selfadjoint operators having spectrum \lambda. The selfadjointness of U\Lambda U^{-1} follows immediately from the selfadjointness of \Lambda together with the fact that U^*=U^{-1}. To see that U\Lambda U^{-1} has spectrum \lambda, let \mathbf{f}_1,\dots,\mathbf{f}_N be the orthonormal basis of \mathbf{V} defined by

\mathbf{f}_i = U\mathbf{e}_i, \quad 1 \leq i\leq N.

Thent \mathbf{f}_i is an eigenvector of U\Lambda U^{-1} with eigenvalue \lambda_i.

Now let A be a given selfadjoint operator with eigenvalues \lambda_1,\dots,\lambda_N. By the Spectral Theorem, there exists an orthonormal basis \mathbf{f}_1,\dots,\mathbf{f}_N such that A\mathbf{f}_i=\lambda_i\mathbf{f}_i. If U is the unitary operator sending \mathbf{e}_i to \mathbf{f}_i, then

AU^{-1}\mathbf{e}_i = \lambda_i \mathbf{e}_i, \quad 1 \leq  \leq N,

which is equivalent to UAU^{-1}=\Lambda.

— Q.E.D.

In due time, we shall see that both Grassmannians \mathrm{Gr}_k(\mathbf{V} and adjoint orbits \mathcal{O}_\lambda are complex projective varieties.

Math 202C: Lecture 0

Welcome to Math 202C in the Spring quarter of 2022. This course concludes the applied algebra graduate course sequence, the previous installments of which were taught by Professor Freddie Manners and Professor Steven Sam. The goal of the course will be to unite and extend the material you learned in 202A and 202B in a natural, pleasing, and useful way. We will focus on applications of the representation theory of the symmetric groups, which was the subject of 202B. The Jucys-Murphy elements will be constructed, and their action in irreducible representations will be applied to analyze random walks (card shuffling) and permutation factorizations (Hurwitz theory). We then review some 202A material, namely the basic constructions of multilinear algebra (tensor, exterior, and symmetric algebras), and their physical interpretation as state spaces for bosons and fermions (Fock spaces). We apply the representation theory of the symmetric groups to prove the existence of a q-Fock space interpolating between the symmetric and exterior algebras. The two parts of the course are then unified using the infinite wedge space formalism, whose connection with the theory of integrable systems via the Sato Grassmannian will be discussed if time permits.

Now to the mechanics of Math 202C. This post, Lecture 0, serves as the course syllabus. The course is being taught remotely, and the plan is that the two weekly lectures will be delivered in the form of blog posts like the one you are reading right now. Each of these posts will likely cover more than I would be able to cover in an in-person ninety minute lecture. These posts will appear every Tuesday and Thursday in a temporal neighborhood of our scheduled lecture slots, and the course will feel much like a reading course, albeit a fairly intense one leading up to a qualifying exam. I will also hold live office hours over Zoom, so that we can discuss the course content in real time, and so that I can embellish and add to it verbally. This will happen on Fridays, and we will work out a time over Piazza, which we will use as a question and answer forum (signup link here). We’ll be conducting all class-related discussion on Piazza. The quicker you begin asking questions on Piazza (rather than via emails), the quicker you’ll benefit from the collective knowledge of your classmates and instructors. I encourage you to ask questions when you’re struggling to understand a concept.

As for evaluation, there are no exams (neither midterm nor final), though of course you’ll be taking the Applied Algebra qualifying exam mid-quarter. We will however have weekly problem sets which will be due on Sundays and submitted via Gradescope. More information about this will be posted on Piazza. These problem sets will help to cement the concepts you’ll be learning in the course, and will also give you an idea of the sorts of problems that are likely to appear on the qualifying exam. I believe this is everything you need to know to get started — see you on Piazza starting today, and in real time on Friday.

1Spectral theorem, flags and Grassmannians, minimax
2Jucys’s theorem: unique factorization
3Symmetric polynomials and JM elements
4Murphy’s theorem: content eigenvalues
5Top-to-random shuffle I
6Top-to-random shuffle II
7Basic category theory
8Tensor product, tensor powers, tensor algebra
9Symmetric and antisymmetric tensors, bosons and fermions
10Polynomials and varieties
11Grassmann varieties
12Selfadjoint algebras I
13Selfadjoint algebras II
14Selfadjoint algebras III
15Unitary representations I
16Unitary representations II
17Unitary representations III
18Schur-Weyl duality I
19Schur-Weyl duality II
20Invariant theory
Schedule and topics subject to change.

Math 202C: Lecture 14

Let us begin with a very brief overview of symmetric function theory. Let X=\{x_1,x_2,x_3,\dots\} be a countable set of commuting indeterminates, referred to as an alphabet. Given r \in \mathbb{N}, let \mathrm{Fun}(r,\mathbb{N}) denote the set of functions

i \colon \{1,\dots,r\} \longrightarrow \mathbb{N}.

Consider the generating functions

e_r = \sum\limits_{\substack{i \in \mathrm{Fun}(r,\mathbb{N}) \\ i \text{ strictly increasing}}} \prod\limits_{q=1}^r x_{i(q)},

h_r = \sum\limits_{\substack{i \in \mathrm{Fun}(r,\mathbb{N}) \\ i \text{ weakly increasing}}} \prod\limits_{q=1}^r x_{I(q)},

p_r = \sum\limits_{\substack{i \in \mathrm{Fun}(r,\mathbb{N}) \\ i \text{ constant}}} \prod\limits_{q=1}^r x_{I(r)}.

These are respectively known as the degree r elementary, complete, and power sum symmetric functions over the alphabet X. Note that they are not actually functions, but rather elements of the algebra \mathbb{C}[[x_1,x_2,x_3,\dots]] of formal power series over X. The “fundamental theorem of symmetric function theory” asserts that these series all generate the same subalgebra of \mathbb{C}[[x_1,x_2,x_3,\dots]].

Theorem (Newton): We have

\mathbb{C}[e_1,e_2,e_3,\dots]=\mathbb{C}[h_1,h_2,h_3,\dots]=\mathbb{C}[p_1,p_2,p_3,\dots].

The subalgebra of \mathbb{C}[[x_1,x_2,x_3,\dots]] generated by any of the above three fundamental families is called the algebra of symmetric functions, and typically denoted \Lambda or \Lambda[X] if the alphabet needs to be specified. This algebra has a lot of internal structure which expresses itself in combinatorially interesting ways. To see some of this, you might try expressing the elementary symmetric functions in terms of the power sums, as Newton did.

Definition 1: A specialization of \Lambda is an algebra homomorphism from \Lambda to a commutative \mathbb{C}-algebra \mathbf{A}.

In practice, specializations typically occur as families of substitutions. Here is a very simple example. For any n \in \mathbb{N}, we have a specialization

\Xi_n \colon \Lambda \to \mathbb{C}

defined by

\Xi_n(f) = f(\underbrace{1,\dots,1}_n,0,0,0,\dots),

i.e. by substituting the numerical value 1 for each of the variables x_1,\dots,x_n (or any other specified set of n variables), and substituting the numerical value 0 for the rest. It is natural to identify the homomorphism \Xi_n with the numerical multiset

\{\{\underbrace{1,\dots,1}_n,0,0,0,\dots\}\}

and write f(\Xi_n) rather than \Xi_n(f) to reflect the substitution.

Problem 1: Explicitly compute e_r(\Xi_n),h_r(\Xi_n),p_r(\Xi_n).

Another interesting numerical specialization arises from taking

\Xi_n = \{\{1,\dots,n,0,0,0,\dots\}\}.

For this numerical substitution we have

p_r(\Xi_n) = 1^r + 2^r + \dots +n^r,

which is a sum people have been thinking about since antiquity. In particular, there are various formulas for this sum, many of which involve the Bernoulli numbers. In order to evaluate e_r(\Xi_n) and h_r(\Xi_n), one may proceed rather differently as follows. Let t be a new variable, and consider the generating functions

E(t) = 1 + \sum\limits_{r=1}^\infty e_r t^r \quad\text{ and }\quad H(t) = 1 + \sum\limits_{r=1}^\infty h_rt^r,

which are elements of \Lambda[[t]], the algebra of formal power series in the single variable t, with coefficients in \Lambda. These generating functions may be explicitly evaluated without too much effort.

Problem 2: Show that

E(t) = \prod\limits_{t=1}^\infty (1+tx_i) \quad\text{ and }\quad H(t) = \prod\limits_{t=1}^\infty \frac{1}{1-tx_i}.

From Problem 2, we immediately get the identities

1+\sum\limits_{d=1}^\infty e_r(\Xi_n)t^r = (1+t)(1+2t) \dots(1+nt)

and

1+\sum\limits_{d=1}^\infty h_r(\Xi_n)t^r = \frac{1}{(1+t)(1+2t) \dots(1+nt)},

from which one can conclude that e_r(\Xi_n) and h_r(\Xi_n) are in fact Stirling numbers.

We can repackage the the content of Lecture 13 as the study of a certain family of specializations

\Lambda \longrightarrow \mathcal{Z}(n),

where \mathcal{Z}(n) is the center of the group algebra \mathbb{C}\mathrm{S}(n). These specializations are defined using the substitution multiset

\Xi_n =\{\{J_1,\dots,J_n,0,0,0,\dots\}\},

where

J_t = \sum\limits_{1 \leq s<t \leq n} (s\ t), \quad 1 \leq t \leq n,

are the Jucys-Murphy elements in \mathbb{C}\mathrm{S}(n). The natural basis of \mathcal{Z}(n) is the conjugacy class basis,

\{C_\mu \colon \mu \vdash n\},

so understanding the JM specialization of \Lambda means being able to compute

[C_\mu]f(\Xi_n),

the coefficient of a given conjugacy class C_\mu \in \mathcal{Z}(n) in the specialization of a given symmetric function f \in \Lambda. What we saw in Lecture 13 is that for certain f, this problem has a combinatorial interpretation as counting certain walks in the Cayley graph of \mathrm{S}(n) as generated by the conjugacy class of transpositions.

Consider first the elementary symmetric functions e_r. What we saw in Lecture 12 is that

[C_\mu]e_r(\Xi_N) = E^r(\mu)

is the number of r-step strictly monotone walks from the identity permutation to any permutation \pi of cycle type \mu. Moreover, we were able to solve the class expansion problem for the elementary symmetric polynomials explicitly:

e_r(\Xi_n) = \sum\limits_{\substack{\mu \vdash n \\ \ell(\mu)=n-r}} C_\mu,

with the sum on the right being exactly the identity-centered sphere L_r of radius r in \mathrm{S}(n), or equivalently the rth level of the Cayley graph. Equivalently, for the generating function

E(t) = 1 + \sum\limits_{r=1}^\infty e_r(\Xi_n)t^r,

we have

E(t) = \sum\limits_{\pi \in \mathrm{S}(n)} t^{|\pi|} \pi,

which is the generating function for permutations by norm.

For the complete symmetric functions h_r, we found that

[C_\mu]h_r(\Xi_n) = H^r(\mu)

is the number of r-step weakly monotone walks from the identity permutation to any permutation \pi of cycle type \mu. We did not give an exact formula for this number, and indeed no such formula is known. However, we can make some progress using another remarkable aspect of the JM elements, namely their interaction with the representation theory of \mathrm{S}(n).

We know that, for any f \in \Lambda, the specialization f(\Xi_n) is a central element in the group algebra \mathbb{C}\mathrm{S}(n). Thus, by Schur’s lemma, f(\Xi_n) acts in any irreducible representation (\mathbf{V}^\lambda,R^\lambda) of \mathbb{C}\mathrm{S}(n) as a scalar operator, i.e. we have

R^\lambda(f(\Xi_n)) = \omega_f(\lambda) I_{\mathbf{V}^\lambda},

with \omega_f(\lambda) \in \mathbb{C} a scalar and I_{\mathbf{V}^\lambda} \in \mathrm{End} \mathbf{V}^\lambda the identity operator. It turns out that the eigenvalue \omega_f(\lambda) can be expressed in terms of a remarkable numerical specialization of \Lambda.

Definition 2: Given a Young diagram \lambda and a cell \Box \in \lambda, the content c(\Box) of \Box is its column index minus its row index. The content alphabet C(\lambda) of \lambda is the multiset of the contents of its cells,

C(\lambda) = \{\{ c(\Box) \colon \Box \in \lambda\}\}.

Theorem (Jucys): The eigenvalue of f(\Xi_n) acting in \mathbf{V}^\lambda is the specialization of f on the content alphabet of \lambda,

\omega_f(\lambda) = f(C(\lambda)).

In fact, Jucys proved a more general result which implies the above: he showed that for any of the elements J_k, the Young basis \mathbf{e}_T of \mathbf{V}^\lambda is an eigenbasis of the operator R^\lambda(J_k), and

R^\lambda(J_k)\mathbf{e}_T = c_T(k) \mathbf{e}_T,

where c_T(k) is the content of the cell containing k in the standard Young tableau T.

In particular, the Jucys’ result enables us to obtain a formula for the generating function enumerating monotone walks from the identity permutation to any permutation \pi \in \mathrm{S}(n) of cycle type \mu \vdash n in terms of the characters of \mathrm{S}(n).

Theorem (Matsumoto-Novak): We have

1+\sum\limits_{r=1}^\infty H^r(\mu) t^r = \sum\limits_{\lambda \vdash n} \frac{\chi^\lambda_\mu}{\prod_{\Box \in \lambda} h(\Box)(1+tc(\Box))},

where h(\Box) denotes the hook length of the cell \Box \in \lambda.

For certain choices of \mu, the sum on the right simplifies considerably and explicit formulas for H^r(\mu) follow. For example, in the case where \mu=(n), we may use the fact that the character of a full cycle vanishes in any representation \mathbf{V}^\lambda for which \lambda is not a hook diagram, and is either pm 1 when \lambda is a hook. This leads to an explicit formula for the number of monotone walks of arbitrary length from the identity to a full cycle in \mathrm{S}(n). Understanding all of this requires only Math 202B knowledge, and the details are in the paper referenced above.

Math 202C: Lecture 13

In Lecture 12, we analyzed the top-to-random shuffle of a deck of n cards, which is the same thing as lazy simple random walk on the Cayley graph of the symmetric group \mathrm{S}(n) generated by the star transpositions

(1\ n), \dots, (n-1\ n).

An important role in this analysis was played the sum of these transpositions,

J_n = (1\ n) + \dots +(n-1\ n),

which is an element of the group algebra \mathbb{C}\mathrm{S}(n). In fact, J_n is one of a special family of elements in \mathbb{C}\mathrm{S}(n) called the Jucys-Murphy elements of \mathbb{C}\mathrm{S}(n). These element have many remarkable properties, a few of which we will discuss in this lecture — in fact, these simple transposition sums are so rich in structure that they encapsulate the entire representation theory of the symmetric group, and can be taken as the starting point for a very intuitive and streamlined re-imagining of this subject.

The JM elements can be understood in terms of the Cayley \Gamma = \mathrm{Cay}(\mathrm{S}(n),\mathrm{T}) as arising from a special edge-labeling of \Gamma called the Biane-Stanley edge labeling. To construct this labeling, mark each edge of the group corresponding to the transposition (i\ j) with j, the larger of the two numbers interchanged. Thus, each edge of \Gamma is assigned a label from the set \{2,\dots,n\}. Emanating from each vertex \pi \in \mathrm{S}(n) we see one 2-edge, two 3-edges, three 4-edges, etc., as in the picture below.

For a full example, the figure below shows the Cayley graph of the symmetric group \mathrm{S}(4) with the Biane-Stanley edge labeling, with 2-edges drawn in cyan, 3-edges in magenta, and 4-edges in black.

This edge labeling of the Cayley graph of the symmetric group was introduced by Richard Stanley in the context of noncrossing partitions, and you can read about this here. We will use the introduction of more structure on \Gamma to break away from the Markovian paradigm of the evolution of a memoryless particle on \Gamma.

Definition 1: A walk on \Gamma is said to be monotone if the labels of the edges it traverses form a non-decreasing sequence, and strictly monotone if the labels of the edges it traverses form an increasing sequence.

You can think of monotone walks as virtual histories of a particle moving around on \Gamma which learns from experience: once the particle traverses a road of value t, it refuses to traverse roads of lesser value. Strictly monotone walks are histories of an even more fastidious particle that seeks out roads of strictly increasing value as it evolves. If the motion of the particle is random, then monotone walks on \Gamma are histories of a self-interacting random walk, meaning a random walk whose next step at each time instant depends on its history up to that time.

Given permutations \rho,\sigma \in \mathrm{S}(n) and a nonnegative integer r, let H^r(\rho,\sigma) denote the number of r-step walks on \gamma from \rho to \sigma, and let E^r(\rho,\sigma) denote the number of strictly monotone r-step walks from \rho to \sigma. Obviously, we always have

0 \leq E^r(\rho,\sigma) \leq H^r(\rho,\sigma) \leq W^r(\rho,\sigma),

where we recall that W^r(\rho,\sigma) is the number of unrestricted r-step walks from \rho to \sigma.

Theorem 1: For any permutations \rho,\sigma \in \mathrm{S}(n), we have

E^r(\rho,\sigma) = \begin{cases} 1, \text{ if } r=|\rho^{-1}\sigma| \\ 0, \text{ otherwise}. \end{cases}

That is, there exists a unique strictly monotone walk between any two permutations, and this walk is a geodesic.

Proof: First, observe that the problem is vertex transitive: we have E^r(\rho,\sigma) = E^r(\tau\rho,\tau\sigma) for any \tau \in \mathrm{S}(n). Indeed, if

\sigma = \rho(s_1\ t_1) \dots (s_r\ t_r), \quad t_1< \dots < t_r

is a strictly monotone walk from \rho to \sigma, then

\tau\sigma = \tau\rho (s_1\ t_1) \dots (s_r\ t_r), \quad t_1< \dots < t_r

is a strictly monotone walk from \tau\rho to \tau\sigma, and vice versa. It is thus sufficient to prove that

E^r(\iota,\pi) = \begin{cases} 1, \text{ if } r=|\pi| \\ 0, \text{ otherwise}, \end{cases}

where \iota \in \mathrm{S}(n) is the identity permutation. Here is a proof by picture: draw all strictly monotone walks on \Gamma departing from \iota. What results is a graphical presentation of \mathrm{S}(n) as a starlike tree:

The basic idea of an actual proof is as follows. First, verify that every cyclic permutation admits a unique strictly monotone factorization, the number of factors of which is one less than the length of the cycle, which is the smallest number of factors possible. Now for an arbitrary permutation, factor each of its cycles into a minimal strictly monotone product of transpositions, and observe that there is a unique way to rearrange the transposition factors of all these factorizations so that they are strictly monotone.

— Q.E.D.

A group-theoretic way to conceptualize Theorem 1 is as a unique factorization theorem for the symmetric group: every permutation \pi \in \mathrm{S}(n) has a unique factorization of the form

\pi = (s_1\ t_1)(s_2\ t_2) \dots (s_k\ t_k), \quad s_i<t_i,

such that t_1< t_2 < \dots < t_k, and moreover k=|\pi|. In terms of the group algebra \mathbb{C}\mathrm{S}(n), we may rephrase Theorem 1 as follows: we have

L_r = \sum\limits_{\substack{2 \leq t_1 < \dots < t_r \leq n\\ s_1<t_1, \dots, s_k<t_k}} (s_1\ t_1) \dots (s_r\ t_r), \quad 0 \leq r \leq n-1

where

L_r = \sum\limits_{\substack{ \pi \in \mathrm{S}(n) \\ |\pi|=r}} \pi

is the rth level of the Cayley graph, or in other words the identity-centered sphere of radius r in the symmetric group, viewed as an element of the group algebra \mathbb{C}\mathrm{S}(n). Equivalently, this can be written

L_r = \sum\limits_{2 \leq t_1 < \dots < t_r \leq n} \left( \sum\limits_{s_1=1}^{t_1} (s_1\ t_1) \right) \dots \left( \sum\limits_{s_r=1}^{t_r} (s_r\ t_r) \right).

Defintion 1: The Jucys-Murphy elements in \mathbb{C}\mathrm{S}(n) are the transposition sums

J_2= (1\ 2) \\ J_3 =(1\ 3) + (2\ 3)\\ \vdots \\ J_n=(1\ n) + (2\ n) + \dots + (n-1\ n).

For notational convenience, we set J_1:=\mathbf{0}, the zero element of the group algebra \mathbb{C}\mathrm{S}(n).

Problem 1: Prove that the JM elements commute with one another.

We can now connect the group algebra formulation of Theorem 1 to the theory of symmetric polynomials.

Let the symmetric group \mathrm{S}(n) act on the algebra of \mathbb{C}[x_1,\dots,x_n] of polynomials in n variables by permuting variables:

\sigma . f(x_1,\dots,x_n) = f(x_{\sigma(1)},\dots,x_{\sigma(n)}).

The algebra of symmetric polynomials is the corresponding space of invariants,

\mathbb{C}[x_1,\dots,x_n]^{\mathrm{S}(n)} = \{ f \in \mathbb{C}[x_1,\dots,x_n] \colon \sigma . f =f\}.$

A non-obvious fact is that this algebra of invariants is in fact a polynomial algebra.

Theorem (Newton): We have

\mathbb{C}[x_1,\dots,x_n]^{\mathrm{S}(n)} = \mathbb{C}[e_1(x_1,\dots,x_n),\dots,e_n(x_1,\dots,x_n)],

where

e_r(x_1,\dots,x_n) = \sum\limits_{1 \leq i_1 < \dots < i_r \leq n} x_{i_1} \dots x_{i_r}, \quad 1 \leq r \leq n.

Newton’s theorem, which is often called the fundamental theorem of symmetric function theory, says that every f \in  \mathbb{C}[x_1,\dots,x_n]^{\mathrm{S}(n)} is a polynomial in the elementary symmetric polynomials e_1,\dots,e_n.

Theorem 2: The substitution rule

f(x_1,\dots,x_n) \mapsto f(J_1,\dots,J_n)

defines a homomorphism

\mathbb{C}[x_1,\dots,x_n]^{\mathrm{S}(n)} \longrightarrow \mathcal{Z}(n),

from the algebra of symmetric polynomials in n variables to the center of \mathbb{C}\mathrm{S}(n).

Proof: Theorem 1 says that

L_r = e_r(J_1,\dots,J_n).

Thus, by Newton’s theorem, any symmetric polynomial f(J_1,\dots,J_n) evaluated on the JM elements will be a polynomial in L_0,\dots,L_{n-1}. It thus suffices to show that each level of the Cayley graph (identified with the formal sum of its constituents) is an element of \mathcal{Z}(n). But this is clear: the center \mathcal{Z}(n) is spanned by the conjugacy classes C_\mu of \mathrm{S}(n) (each of which is identified with the formal sum of its constituents), and

L_r = \sum\limits_{\substack{\mu \vdash n \\ \ell(\mu)=n-r}} C_\mu.

— Q.E.D.

Theorem 2 has rather interesting consequences for monotone walks on the Cayley graph. Note that, if we consider all walks on the Cayley graph, then the number W^r(\iota,\pi) of r-step walks from the identity to a given permutation \pi depends only on the conjugacy class of \pi. Indeed, if \pi' is a conjugate of \pi, \pi' = \sigma\pi\sigma^{-1}, then every factorization

\pi = (s_1\ t_1) \dots (s_r\ t_r)

of \pi corresponds to the factorization

\pi' = (s_1'\ t_1') \dots (s_r'\ t_r')

in which

(s_i'\ t_i') = \sigma (s_i\ t_i) \sigma^{-1},

and this correspondence is clearly bijective. This simple argument does not show that the number of monotone factorizations

\pi = (s_1\ t_1) \dots (s_r\ t_r), \quad t_1 \leq \dots \leq t_r

is equal to the number of monotone factorizations of \pi, because conjugating each of the factors will not necessarily preserve the monotonicity condition. However, the following more sophisticated argument shows that indeed H^r(\iota,\pi)=H^r(\iota,\pi').

For each r \in \mathbb{N}, the degree r complete symmetric polynomial is defined by

h_r(x_1,\dots,x_n) = \sum\limits_{1 \leq t_1 \leq \dots \leq t_r } x_{t_1} \dots x_{t_r}.

We have

h_r(J_1,\dots,J_n) = \sum\limits_{1 \leq t_1 \leq \dots \leq t_r } J_{t_1} \dots J_{t_r} \\= \sum\limits_{1 \leq t_1 \leq \dots \leq t_r } \left( \sum\limits_{s_1=1}^{t_1} (s_1\ t_1) \right) \dots \left( \sum\limits_{s_r=1}^{t_r} (s_r\ t_r) \right) \\=\sum\limits_{\substack{ 1 \leq t_1 \leq \dots \leq t_r \leq n \\ s_i < t_i }} (s_1\ t_1) \dots (s_r\ t_r),

which is the sum of all monotone products of r transpositions. Evaluating each term in the sum and collecting like coefficients, the coefficient of \pi is the number of monotone factorizations of \pi, and the coefficient of \pi' is the number of monotone factorizations of \pi'. But, since this sum is a linear combination of conjugacy classes by Theorem 2, and since \pi and \pi' belong to the same conjugacy class, these coefficients are equal.

In the next lecture, we will combine the spectral theory of the JM elements with their combinatorial properties to give eigenvalue formulas for the enumeration of monotone walks on the Cayley graph.

Math 202C: Lecture 12

Since the representation theory of the symmetric group is likely fresh in your mind thanks to yesterday’s exam, this is a good time to show you a very nice probabilistic application of this theory to card shuffling. This post will discuss the “top-to-random” shuffle, where the cards 1,\dots,n are arranged in a single row (or a single stack), and a randomly chosen card is repeatedly swapped with the rightmost (or top) card. The algebraic model for this procedure is the Cayley graph \Gamma(n) of the symmetric group as generated by the so-called “star transpositions”

\tau_1=(1\ n), \tau_2=(2\ n),\dots, \tau_{n-1} = (n-1\ n).

Problem 12.1: Prove that the above transpositions do in fact generate \mathrm{S}(n).

These are called star transpositions because, if one forms the graph on \{1,\dots,n\} such that \{i,j\} is an edge if and only if one of the above transpositions swaps i and j, then this graph is a star graph. The top-to-random shuffle is then the same thing as the lazy simple random walk on the Cayley graph \Gamma(n).

The paper attached below gives a detailed analysis of the top-to-random shuffle using the representation theory of \mathrm{S}(n), and gives a precise meaning to the statement that n \log n such shuffles are required in order to randomize the deck. The argument is pretty much all representation theory — only some very basic probabilistic reasoning is used towards the end.

Math 202C: Lecture 11

If I was to start this course over again, I would not have limited the opening act to a discussion of the relationship between linear algebra and graph theory, but with a more general discussion of the relations between multilinear algebra and graph theory. Since I sketched this briefly in our last in-person lecture, I will record what was said here in more precise detail.

Let \mathcal{H} be a Hilbert space with finite or countably infinite orthonormal basis \mathbf{e}_i. To make things simpler, let us consider the case where \dim \mathcal{H}=n < \infty, but everything we say below works (or can be made to work) for arbitrary separable Hilbert spaces. Let \mathbb{C}\Omega be a the one-dimensional Hilbert space spanned by a unit vector \Omega which does not belong to \mathcal{H}.

Let d \in \mathbb{N}, and recall that \mathcal{H}^{\otimes d}, the dth tensor power of \mathcal{H}, is the vector space of degree d tensors

\mathbf{v}_1 \otimes \dots \otimes \mathbf{v}_d, \quad \mathbf{v}_i \in \mathcal{H},

and linear combinations thereof. This is again a Hilbert space, with the scalar product defined by

\langle \mathbf{v}_1 \otimes \dots \otimes \mathbf{v}_d,\mathbf{w}_1,\mathbf{v}_1 \otimes \dots \otimes \mathbf{v}_d \rangle = \prod\limits_{i=1}^d \langle \mathbf{v}_i,\mathbf{w}_i \rangle.

In particular, an orthonormal basis of \mathcal{H}^{\otimes d} is given by the tensors

\mathbf{e}_i = \mathbf{e}_{i(1)} \otimes \dots \otimes \mathbf{e}_{i(d)}, \quad i \in \mathrm{Fun}(d,n),

where \mathrm{Fun}(d,n) is the set of all functions

\{1,\dots,d\} \longrightarrow \{1,\dots,n\}.

Thus every tensor t \in \mathcal{H}^{\otimes d} is a linear combination

\mathbf{t} = \sum\limits_{i \in \mathrm{Fun}(d,n)} a_i \mathbf{e}_i.

The classical way to think about this is that \mathcal{H}^{\otimes d} is the space of complex-valued functions on d-tuples of vertices of a graph \Gamma with vertex set \{1,\dots,n\}. Alternatively, you can think of \mathcal{H}^{\otimes d} as the d-particle space corresponding to the one-particle space \mathcal{H}. From this point of view, each function

i \colon \{1,\dots,d\} \longrightarrow \{1,\dots,n\}

corresponds to an assignment of d distinguishable particles labeled 1,\dots,d to the vertices of \Gamma. If instead we are dealing with d quantum particles, meaning particles whose location cannot be definitely specified, then we instead must deal with a unit tensor

\mathbf{u}= \sum\limits_{i \in \mathrm{Fun}(d,n)} p_i \mathbf{e}_i

in \mathcal{H}^{\otimes d}, the idea being that

|p_i|^2 = |\langle \mathbf{e}_i,\mathbf{u}\rangle|^2

represents the probability to find the distinguishable particles 1,\dots,d at vertices i(1),\dots,i(d), respectively. It is convenient to define \mathcal{H}^{\otimes 0} :=\mathbb{C}\Omega, where \Omega is a unit vector not in \mathcal{H} called the “vacuum vector,” which corresponds to no particles, i.e. a vacuum.

Now suppose that one would like to deal with a variable (but finite) number of particles: the tensor algebra F(\mathcal{H}) over \mathcal{H} is the algebraic structure that does this. It is defined as the direct sum of all tensor powers of \mathcal{H}:

F(\mathcal{H}) = \bigoplus\limits_{d=0}^\infty \mathcal{H}^{\otimes d}.

This is a graded algebra, i.e. we have a natural multiplication of tensors: if \mathbf{t}_1 \in \mathcal{H}^{\otimes d_1} and \mathbf{t}_2 \in \mathcal{H}^{\otimes d_2}, then \mathbf{t_1} \otimes \mathbf{t}_2 \in \mathcal{H}^{\otimes (d_1+d_2)}. Note that F(\mathcal{H}) is a noncommutative algebra, and it is also infinite-dimensional, even though \mathcal{H} is finite-dimensional.

Problem 11.1: Show that F(\mathcal{H}) is (non-canonically) isomorphic to the free algebra \mathbb{C}\langle x_1,\dots,x_n\rangle of rank n, which consists of polynomials in n noncommutative indeterminates.

Returning to the viewpoint that \mathcal{H}=\bigoplus_{i=1}^n\mathbf{e}_i is the one-particle space corresponding to a graph \Gamma with vertex set \{1,\dots,n\}, we encode the edge structure of \Gamma using the adjacency operator A \in \mathrm{End}\mathcal{H} defined by

A\mathbf{e}_i = \sum\limits_{j=1}^n m(i,j) \mathbf{e}_j,

where m(i,j) is the multiplicity of the edge \{i,j\} in \Gamma. As we have discussed, the matrix elements of A^r then count r-step walks in \Gamma, i.e.

\langle A^r\mathbf{e}_i,\mathbf{e}_j \rangle

is the number of r-step walks from vertex i to vertex j in \Gamma. Let us now observe that this naturally extends to any finite number of particles. Namely, for any d \in \mathbb{N}, we have the operator A^{\otimes d} \in \mathrm{End} \mathcal{H}^{\otimes d} with action defined by

A^{\otimes d} \mathbf{e}_i = A\mathbf{e}_{i(1)} \otimes \dots \otimes A\mathbf{e}_{i(d)}, \quad i \in \mathrm{Fun}(d,n).

Proposition 1: For any d \in \mathbb{N}, any i,j \in \mathrm{Fun}(d,n), and any r \in \mathbb{N} \cup \{0\}, the number of ways in which d distinguishable particles on \Gamma can move from positions i(1),\dots,i(d) to positions j(1),\dots,j(d) in r instants of time, where at each instant each particle jumps to an adjacent vertex, is given by the scalar product

\langle (A^r)^{\otimes d} \mathbf{e}_i,\mathbf{e}_j \rangle.

Proof: We have that

\langle (A^r)^{\otimes d} \mathbf{e}_i,\mathbf{e}_j \rangle = \langle A^r\mathbf{e}_{i(1)} \otimes \dots \otimes A^r\mathbf{e}_{i(d)},\mathbf{e}_{j(1)} \otimes \dots \otimes \mathbf{e}_{j(d)} \rangle = \prod\limits_{x=1}^d \langle A^r\mathbf{e}_{i(x)},\mathbf{e}_{j(x)} \rangle.

The xth factor \langle A^r\mathbf{e}_{i(x)},\mathbf{e}_{j(x)} \rangle in the product counts the number of ways in which particle x can travel from vertex i(x) to vertex j(x) in r jumps along the edges of \Gamma. Since the particles are non-interacting, the product of these numbers is the number of possible histories corresponding to the evolution of the particle system from initial state i to state j in time r.

— Q.E.D.

Now suppose that \mathbf{f}_1,\dots,\mathbf{f}_n is an orthonormal basis of \mathcal{H} consisting of eigenvectors of A:

A\mathbf{f}_1 = \lambda_1 \mathbf{f}_1,\dots,A \mathbf{f}_n = \lambda_n\mathbf{f}_n.

Then, the tensors

\mathbf{f}_i = \mathbf{f}_{i(1)} \otimes \dots \otimes \mathbf{f}_{i(d)}, \quad i \in \mathrm{Fun}(d,n)

are an orthonormal basis of \mathcal{H}^{\otimes d} consisting of eigenvectors of A^{\otimes d}:

A^{\otimes d}\mathbf{f}_i = A\mathbf{f}_{i(1)} \otimes \dots\otimes A\mathbf{f}_{i(d)} = \lambda_{i(1)} \dots \lambda_{i(d)} \mathbf{f}_i, \quad i \in \mathrm{Fun}(d,n).

If we agree to use the multi-index notation

\lambda_i := \lambda_{i(1)} \dots \lambda_{i(d)},

then this simply reads

A^{\otimes d} \mathbf{f}_i = \lambda_i \mathbf{f}_i, \quad i \in \mathrm{Fun}(d,n).

We then have the following.

Corollary 1: For any d \in \mathbb{N}, any i,j \in \mathrm{Fun}(d,n), and any r \in \mathbb{N} \cup \{0\}, the number of ways in which d distinguishable particles on \Gamma can move from positions i(1),\dots,i(d) to positions j(1),\dots,j(d) in r instants of time, where at each instant each particle jumps to an adjacent vertex, is given by

W^r(i,j) = \sum\limits_{k \in \mathrm{Fun}(d,n)} \overline{\langle \mathbf{f}_k,\mathbf{e}_i\rangle} \lambda_i^r \mathbf{f}_k,\mathbf{e}_j \rangle.

The point of the above discussion is that, associated to every graph \Gamma is not just a Hilbert space \mathcal{H} encoding the vertices and an operator A \in \mathrm{End}\mathcal{H} encoding the edges, but an infinite dimensional noncommutative algebra F(\mathcal{H}) encoding \mathbb{C}-valued functions on ordered tuples of vertices of arbitrary finite length, and an operator F(A) \in \mathrm{End} F(\mathcal{H}) acting on these functions in a manner determined by the adjacency structure of \Gamma. Moreover, the variable multi-particle construction is not really any different from the one-particle construction: all the formulas are the same up to replacing indices with multi-indices.

The reason that we are discussing the above is that it gives a nice conceptual understanding of why the spectral analysis of Cayley graphs is so much more accessible than the spectral analysis of arbitrary graphs. In the group case the vertex set of \Gamma is not an inert finite set, but rather a finite set \mathrm{G} which comes to us with a group law. Since this is a special case of the above, we can of course still form the corresponding one-particle space \mathcal{H} with orthonormal basis \mathbf{e}_\pi indexed by the group elements \pi \in \mathrm{G} (which may or may not have a natural linear order), but it is more appropriate to denote this space \mathcal{A} since it is already an a unital, associative, and finite-dimensional algebra via bilinear extension of

\mathbf{e}_\pi \mathbf{e}_\sigma := \mathbf{e}_{\pi\sigma}.

We can still of course form the tensor algebra F(\mathcal{A}), but there is a natural map from F(\mathcal{A}) to its degree one component \mathcal{A} given by evaluation:

\mathbf{e}_{i(1)} \otimes \mathbf{e}_{i(2)} \otimes \dots \otimes \mathbf{e}_{i(d)} \to \mathbf{e}_{i(1)i(2)\dots i(d)}, \quad i \in \mathrm{Fun}(\{1,\dots,d\},\mathrm{G}),\ d \in \mathbb{N}.

Moreover, it is not only \mathcal{A} which is naturally a finite-dimensional algebra, but all of its tensor powers \mathcal{A}^{\otimes d}, via

(\mathbf{e}_{i(1)} \otimes \dots \otimes \mathbf{e}_{i(d)})( (\mathbf{e}_{j(1)} \otimes \dots \otimes \mathbf{e}_{j(d)}) = e_{i(1)j(1)} \otimes \dots \otimes \mathbf{e}_{i(d)j(d)},

which in multi-index notation we can succinctly write as \mathbf{e}_i\mathbf{e}_j = \mathbf{e}_{ij}. Moreover, if \mathrm{G} is an abelian group, things get even simpler and all these finite-dimensional algebras are commutative, and character theory gives a universal method to construct the eigenvalues and eigenvectors of the adjacency operator A and all of its tensor powers.

One demerit of the above is that the tensor algebra F(\mathcal{H}) is “unphysical,” in the sense that if we were dealing with natural particles they would not be distinguishable. Moreover, we would want to classify them as either interacting or not, so decide between bosons and fermions populating the vertices of our graph \Gamma. This can be accomplished by letting the adjacency operator A act not on the tensor powers of the one-particle space \mathcal{H}, but rather on its symmetric and antisymmetric tensors powers. We will discuss this next time.