Math 202C: Lecture 14

Let us begin with a very brief overview of symmetric function theory. Let X=\{x_1,x_2,x_3,\dots\} be a countable set of commuting indeterminates, referred to as an alphabet. Given r \in \mathbb{N}, let \mathrm{Fun}(r,\mathbb{N}) denote the set of functions

i \colon \{1,\dots,r\} \longrightarrow \mathbb{N}.

Consider the generating functions

e_r = \sum\limits_{\substack{i \in \mathrm{Fun}(r,\mathbb{N}) \\ i \text{ strictly increasing}}} \prod\limits_{q=1}^r x_{i(q)},

h_r = \sum\limits_{\substack{i \in \mathrm{Fun}(r,\mathbb{N}) \\ i \text{ weakly increasing}}} \prod\limits_{q=1}^r x_{I(q)},

p_r = \sum\limits_{\substack{i \in \mathrm{Fun}(r,\mathbb{N}) \\ i \text{ constant}}} \prod\limits_{q=1}^r x_{I(r)}.

These are respectively known as the degree r elementary, complete, and power sum symmetric functions over the alphabet X. Note that they are not actually functions, but rather elements of the algebra \mathbb{C}[[x_1,x_2,x_3,\dots]] of formal power series over X. The “fundamental theorem of symmetric function theory” asserts that these series all generate the same subalgebra of \mathbb{C}[[x_1,x_2,x_3,\dots]].

Theorem (Newton): We have

\mathbb{C}[e_1,e_2,e_3,\dots]=\mathbb{C}[h_1,h_2,h_3,\dots]=\mathbb{C}[p_1,p_2,p_3,\dots].

The subalgebra of \mathbb{C}[[x_1,x_2,x_3,\dots]] generated by any of the above three fundamental families is called the algebra of symmetric functions, and typically denoted \Lambda or \Lambda[X] if the alphabet needs to be specified. This algebra has a lot of internal structure which expresses itself in combinatorially interesting ways. To see some of this, you might try expressing the elementary symmetric functions in terms of the power sums, as Newton did.

Definition 1: A specialization of \Lambda is an algebra homomorphism from \Lambda to a commutative \mathbb{C}-algebra \mathbf{A}.

In practice, specializations typically occur as families of substitutions. Here is a very simple example. For any n \in \mathbb{N}, we have a specialization

\Xi_n \colon \Lambda \to \mathbb{C}

defined by

\Xi_n(f) = f(\underbrace{1,\dots,1}_n,0,0,0,\dots),

i.e. by substituting the numerical value 1 for each of the variables x_1,\dots,x_n (or any other specified set of n variables), and substituting the numerical value 0 for the rest. It is natural to identify the homomorphism \Xi_n with the numerical multiset

\{\{\underbrace{1,\dots,1}_n,0,0,0,\dots\}\}

and write f(\Xi_n) rather than \Xi_n(f) to reflect the substitution.

Problem 1: Explicitly compute e_r(\Xi_n),h_r(\Xi_n),p_r(\Xi_n).

Another interesting numerical specialization arises from taking

\Xi_n = \{\{1,\dots,n,0,0,0,\dots\}\}.

For this numerical substitution we have

p_r(\Xi_n) = 1^r + 2^r + \dots +n^r,

which is a sum people have been thinking about since antiquity. In particular, there are various formulas for this sum, many of which involve the Bernoulli numbers. In order to evaluate e_r(\Xi_n) and h_r(\Xi_n), one may proceed rather differently as follows. Let t be a new variable, and consider the generating functions

E(t) = 1 + \sum\limits_{r=1}^\infty e_r t^r \quad\text{ and }\quad H(t) = 1 + \sum\limits_{r=1}^\infty h_rt^r,

which are elements of \Lambda[[t]], the algebra of formal power series in the single variable t, with coefficients in \Lambda. These generating functions may be explicitly evaluated without too much effort.

Problem 2: Show that

E(t) = \prod\limits_{t=1}^\infty (1+tx_i) \quad\text{ and }\quad H(t) = \prod\limits_{t=1}^\infty \frac{1}{1-tx_i}.

From Problem 2, we immediately get the identities

1+\sum\limits_{d=1}^\infty e_r(\Xi_n)t^r = (1+t)(1+2t) \dots(1+nt)

and

1+\sum\limits_{d=1}^\infty h_r(\Xi_n)t^r = \frac{1}{(1+t)(1+2t) \dots(1+nt)},

from which one can conclude that e_r(\Xi_n) and h_r(\Xi_n) are in fact Stirling numbers.

We can repackage the the content of Lecture 13 as the study of a certain family of specializations

\Lambda \longrightarrow \mathcal{Z}(n),

where \mathcal{Z}(n) is the center of the group algebra \mathbb{C}\mathrm{S}(n). These specializations are defined using the substitution multiset

\Xi_n =\{\{J_1,\dots,J_n,0,0,0,\dots\}\},

where

J_t = \sum\limits_{1 \leq s<t \leq n} (s\ t), \quad 1 \leq t \leq n,

are the Jucys-Murphy elements in \mathbb{C}\mathrm{S}(n). The natural basis of \mathcal{Z}(n) is the conjugacy class basis,

\{C_\mu \colon \mu \vdash n\},

so understanding the JM specialization of \Lambda means being able to compute

[C_\mu]f(\Xi_n),

the coefficient of a given conjugacy class C_\mu \in \mathcal{Z}(n) in the specialization of a given symmetric function f \in \Lambda. What we saw in Lecture 13 is that for certain f, this problem has a combinatorial interpretation as counting certain walks in the Cayley graph of \mathrm{S}(n) as generated by the conjugacy class of transpositions.

Consider first the elementary symmetric functions e_r. What we saw in Lecture 12 is that

[C_\mu]e_r(\Xi_N) = E^r(\mu)

is the number of r-step strictly monotone walks from the identity permutation to any permutation \pi of cycle type \mu. Moreover, we were able to solve the class expansion problem for the elementary symmetric polynomials explicitly:

e_r(\Xi_n) = \sum\limits_{\substack{\mu \vdash n \\ \ell(\mu)=n-r}} C_\mu,

with the sum on the right being exactly the identity-centered sphere L_r of radius r in \mathrm{S}(n), or equivalently the rth level of the Cayley graph. Equivalently, for the generating function

E(t) = 1 + \sum\limits_{r=1}^\infty e_r(\Xi_n)t^r,

we have

E(t) = \sum\limits_{\pi \in \mathrm{S}(n)} t^{|\pi|} \pi,

which is the generating function for permutations by norm.

For the complete symmetric functions h_r, we found that

[C_\mu]h_r(\Xi_n) = H^r(\mu)

is the number of r-step weakly monotone walks from the identity permutation to any permutation \pi of cycle type \mu. We did not give an exact formula for this number, and indeed no such formula is known. However, we can make some progress using another remarkable aspect of the JM elements, namely their interaction with the representation theory of \mathrm{S}(n).

We know that, for any f \in \Lambda, the specialization f(\Xi_n) is a central element in the group algebra \mathbb{C}\mathrm{S}(n). Thus, by Schur’s lemma, f(\Xi_n) acts in any irreducible representation (\mathbf{V}^\lambda,R^\lambda) of \mathbb{C}\mathrm{S}(n) as a scalar operator, i.e. we have

R^\lambda(f(\Xi_n)) = \omega_f(\lambda) I_{\mathbf{V}^\lambda},

with \omega_f(\lambda) \in \mathbb{C} a scalar and I_{\mathbf{V}^\lambda} \in \mathrm{End} \mathbf{V}^\lambda the identity operator. It turns out that the eigenvalue \omega_f(\lambda) can be expressed in terms of a remarkable numerical specialization of \Lambda.

Definition 2: Given a Young diagram \lambda and a cell \Box \in \lambda, the content c(\Box) of \Box is its column index minus its row index. The content alphabet C(\lambda) of \lambda is the multiset of the contents of its cells,

C(\lambda) = \{\{ c(\Box) \colon \Box \in \lambda\}\}.

Theorem (Jucys): The eigenvalue of f(\Xi_n) acting in \mathbf{V}^\lambda is the specialization of f on the content alphabet of \lambda,

\omega_f(\lambda) = f(C(\lambda)).

In fact, Jucys proved a more general result which implies the above: he showed that for any of the elements J_k, the Young basis \mathbf{e}_T of \mathbf{V}^\lambda is an eigenbasis of the operator R^\lambda(J_k), and

R^\lambda(J_k)\mathbf{e}_T = c_T(k) \mathbf{e}_T,

where c_T(k) is the content of the cell containing k in the standard Young tableau T.

In particular, the Jucys’ result enables us to obtain a formula for the generating function enumerating monotone walks from the identity permutation to any permutation \pi \in \mathrm{S}(n) of cycle type \mu \vdash n in terms of the characters of \mathrm{S}(n).

Theorem (Matsumoto-Novak): We have

1+\sum\limits_{r=1}^\infty H^r(\mu) t^r = \sum\limits_{\lambda \vdash n} \frac{\chi^\lambda_\mu}{\prod_{\Box \in \lambda} h(\Box)(1+tc(\Box))},

where h(\Box) denotes the hook length of the cell \Box \in \lambda.

For certain choices of \mu, the sum on the right simplifies considerably and explicit formulas for H^r(\mu) follow. For example, in the case where \mu=(n), we may use the fact that the character of a full cycle vanishes in any representation \mathbf{V}^\lambda for which \lambda is not a hook diagram, and is either pm 1 when \lambda is a hook. This leads to an explicit formula for the number of monotone walks of arbitrary length from the identity to a full cycle in \mathrm{S}(n). Understanding all of this requires only Math 202B knowledge, and the details are in the paper referenced above.

Math 202C: Lecture 13

In Lecture 12, we analyzed the top-to-random shuffle of a deck of n cards, which is the same thing as lazy simple random walk on the Cayley graph of the symmetric group \mathrm{S}(n) generated by the star transpositions

(1\ n), \dots, (n-1\ n).

An important role in this analysis was played the sum of these transpositions,

J_n = (1\ n) + \dots +(n-1\ n),

which is an element of the group algebra \mathbb{C}\mathrm{S}(n). In fact, J_n is one of a special family of elements in \mathbb{C}\mathrm{S}(n) called the Jucys-Murphy elements of \mathbb{C}\mathrm{S}(n). These element have many remarkable properties, a few of which we will discuss in this lecture — in fact, these simple transposition sums are so rich in structure that they encapsulate the entire representation theory of the symmetric group, and can be taken as the starting point for a very intuitive and streamlined re-imagining of this subject.

The JM elements can be understood in terms of the Cayley \Gamma = \mathrm{Cay}(\mathrm{S}(n),\mathrm{T}) as arising from a special edge-labeling of \Gamma called the Biane-Stanley edge labeling. To construct this labeling, mark each edge of the group corresponding to the transposition (i\ j) with j, the larger of the two numbers interchanged. Thus, each edge of \Gamma is assigned a label from the set \{2,\dots,n\}. Emanating from each vertex \pi \in \mathrm{S}(n) we see one 2-edge, two 3-edges, three 4-edges, etc., as in the picture below.

For a full example, the figure below shows the Cayley graph of the symmetric group \mathrm{S}(4) with the Biane-Stanley edge labeling, with 2-edges drawn in cyan, 3-edges in magenta, and 4-edges in black.

This edge labeling of the Cayley graph of the symmetric group was introduced by Richard Stanley in the context of noncrossing partitions, and you can read about this here. We will use the introduction of more structure on \Gamma to break away from the Markovian paradigm of the evolution of a memoryless particle on \Gamma.

Definition 1: A walk on \Gamma is said to be monotone if the labels of the edges it traverses form a non-decreasing sequence, and strictly monotone if the labels of the edges it traverses form an increasing sequence.

You can think of monotone walks as virtual histories of a particle moving around on \Gamma which learns from experience: once the particle traverses a road of value t, it refuses to traverse roads of lesser value. Strictly monotone walks are histories of an even more fastidious particle that seeks out roads of strictly increasing value as it evolves. If the motion of the particle is random, then monotone walks on \Gamma are histories of a self-interacting random walk, meaning a random walk whose next step at each time instant depends on its history up to that time.

Given permutations \rho,\sigma \in \mathrm{S}(n) and a nonnegative integer r, let H^r(\rho,\sigma) denote the number of r-step walks on \gamma from \rho to \sigma, and let E^r(\rho,\sigma) denote the number of strictly monotone r-step walks from \rho to \sigma. Obviously, we always have

0 \leq E^r(\rho,\sigma) \leq H^r(\rho,\sigma) \leq W^r(\rho,\sigma),

where we recall that W^r(\rho,\sigma) is the number of unrestricted r-step walks from \rho to \sigma.

Theorem 1: For any permutations \rho,\sigma \in \mathrm{S}(n), we have

E^r(\rho,\sigma) = \begin{cases} 1, \text{ if } r=|\rho^{-1}\sigma| \\ 0, \text{ otherwise}. \end{cases}

That is, there exists a unique strictly monotone walk between any two permutations, and this walk is a geodesic.

Proof: First, observe that the problem is vertex transitive: we have E^r(\rho,\sigma) = E^r(\tau\rho,\tau\sigma) for any \tau \in \mathrm{S}(n). Indeed, if

\sigma = \rho(s_1\ t_1) \dots (s_r\ t_r), \quad t_1< \dots < t_r

is a strictly monotone walk from \rho to \sigma, then

\tau\sigma = \tau\rho (s_1\ t_1) \dots (s_r\ t_r), \quad t_1< \dots < t_r

is a strictly monotone walk from \tau\rho to \tau\sigma, and vice versa. It is thus sufficient to prove that

E^r(\iota,\pi) = \begin{cases} 1, \text{ if } r=|\pi| \\ 0, \text{ otherwise}, \end{cases}

where \iota \in \mathrm{S}(n) is the identity permutation. Here is a proof by picture: draw all strictly monotone walks on \Gamma departing from \iota. What results is a graphical presentation of \mathrm{S}(n) as a starlike tree:

The basic idea of an actual proof is as follows. First, verify that every cyclic permutation admits a unique strictly monotone factorization, the number of factors of which is one less than the length of the cycle, which is the smallest number of factors possible. Now for an arbitrary permutation, factor each of its cycles into a minimal strictly monotone product of transpositions, and observe that there is a unique way to rearrange the transposition factors of all these factorizations so that they are strictly monotone.

— Q.E.D.

A group-theoretic way to conceptualize Theorem 1 is as a unique factorization theorem for the symmetric group: every permutation \pi \in \mathrm{S}(n) has a unique factorization of the form

\pi = (s_1\ t_1)(s_2\ t_2) \dots (s_k\ t_k), \quad s_i<t_i,

such that t_1< t_2 < \dots < t_k, and moreover k=|\pi|. In terms of the group algebra \mathbb{C}\mathrm{S}(n), we may rephrase Theorem 1 as follows: we have

L_r = \sum\limits_{\substack{2 \leq t_1 < \dots < t_r \leq n\\ s_1<t_1, \dots, s_k<t_k}} (s_1\ t_1) \dots (s_r\ t_r), \quad 0 \leq r \leq n-1

where

L_r = \sum\limits_{\substack{ \pi \in \mathrm{S}(n) \\ |\pi|=r}} \pi

is the rth level of the Cayley graph, or in other words the identity-centered sphere of radius r in the symmetric group, viewed as an element of the group algebra \mathbb{C}\mathrm{S}(n). Equivalently, this can be written

L_r = \sum\limits_{2 \leq t_1 < \dots < t_r \leq n} \left( \sum\limits_{s_1=1}^{t_1} (s_1\ t_1) \right) \dots \left( \sum\limits_{s_r=1}^{t_r} (s_r\ t_r) \right).

Defintion 1: The Jucys-Murphy elements in \mathbb{C}\mathrm{S}(n) are the transposition sums

J_2= (1\ 2) \\ J_3 =(1\ 3) + (2\ 3)\\ \vdots \\ J_n=(1\ n) + (2\ n) + \dots + (n-1\ n).

For notational convenience, we set J_1:=\mathbf{0}, the zero element of the group algebra \mathbb{C}\mathrm{S}(n).

Problem 1: Prove that the JM elements commute with one another.

We can now connect the group algebra formulation of Theorem 1 to the theory of symmetric polynomials.

Let the symmetric group \mathrm{S}(n) act on the algebra of \mathbb{C}[x_1,\dots,x_n] of polynomials in n variables by permuting variables:

\sigma . f(x_1,\dots,x_n) = f(x_{\sigma(1)},\dots,x_{\sigma(n)}).

The algebra of symmetric polynomials is the corresponding space of invariants,

\mathbb{C}[x_1,\dots,x_n]^{\mathrm{S}(n)} = \{ f \in \mathbb{C}[x_1,\dots,x_n] \colon \sigma . f =f\}.$

A non-obvious fact is that this algebra of invariants is in fact a polynomial algebra.

Theorem (Newton): We have

\mathbb{C}[x_1,\dots,x_n]^{\mathrm{S}(n)} = \mathbb{C}[e_1(x_1,\dots,x_n),\dots,e_n(x_1,\dots,x_n)],

where

e_r(x_1,\dots,x_n) = \sum\limits_{1 \leq i_1 < \dots < i_r \leq n} x_{i_1} \dots x_{i_r}, \quad 1 \leq r \leq n.

Newton’s theorem, which is often called the fundamental theorem of symmetric function theory, says that every f \in  \mathbb{C}[x_1,\dots,x_n]^{\mathrm{S}(n)} is a polynomial in the elementary symmetric polynomials e_1,\dots,e_n.

Theorem 2: The substitution rule

f(x_1,\dots,x_n) \mapsto f(J_1,\dots,J_n)

defines a homomorphism

\mathbb{C}[x_1,\dots,x_n]^{\mathrm{S}(n)} \longrightarrow \mathcal{Z}(n),

from the algebra of symmetric polynomials in n variables to the center of \mathbb{C}\mathrm{S}(n).

Proof: Theorem 1 says that

L_r = e_r(J_1,\dots,J_n).

Thus, by Newton’s theorem, any symmetric polynomial f(J_1,\dots,J_n) evaluated on the JM elements will be a polynomial in L_0,\dots,L_{n-1}. It thus suffices to show that each level of the Cayley graph (identified with the formal sum of its constituents) is an element of \mathcal{Z}(n). But this is clear: the center \mathcal{Z}(n) is spanned by the conjugacy classes C_\mu of \mathrm{S}(n) (each of which is identified with the formal sum of its constituents), and

L_r = \sum\limits_{\substack{\mu \vdash n \\ \ell(\mu)=n-r}} C_\mu.

— Q.E.D.

Theorem 2 has rather interesting consequences for monotone walks on the Cayley graph. Note that, if we consider all walks on the Cayley graph, then the number W^r(\iota,\pi) of r-step walks from the identity to a given permutation \pi depends only on the conjugacy class of \pi. Indeed, if \pi' is a conjugate of \pi, \pi' = \sigma\pi\sigma^{-1}, then every factorization

\pi = (s_1\ t_1) \dots (s_r\ t_r)

of \pi corresponds to the factorization

\pi' = (s_1'\ t_1') \dots (s_r'\ t_r')

in which

(s_i'\ t_i') = \sigma (s_i\ t_i) \sigma^{-1},

and this correspondence is clearly bijective. This simple argument does not show that the number of monotone factorizations

\pi = (s_1\ t_1) \dots (s_r\ t_r), \quad t_1 \leq \dots \leq t_r

is equal to the number of monotone factorizations of \pi, because conjugating each of the factors will not necessarily preserve the monotonicity condition. However, the following more sophisticated argument shows that indeed H^r(\iota,\pi)=H^r(\iota,\pi').

For each r \in \mathbb{N}, the degree r complete symmetric polynomial is defined by

h_r(x_1,\dots,x_n) = \sum\limits_{1 \leq t_1 \leq \dots \leq t_r } x_{t_1} \dots x_{t_r}.

We have

h_r(J_1,\dots,J_n) = \sum\limits_{1 \leq t_1 \leq \dots \leq t_r } J_{t_1} \dots J_{t_r} \\= \sum\limits_{1 \leq t_1 \leq \dots \leq t_r } \left( \sum\limits_{s_1=1}^{t_1} (s_1\ t_1) \right) \dots \left( \sum\limits_{s_r=1}^{t_r} (s_r\ t_r) \right) \\=\sum\limits_{\substack{ 1 \leq t_1 \leq \dots \leq t_r \leq n \\ s_i < t_i }} (s_1\ t_1) \dots (s_r\ t_r),

which is the sum of all monotone products of r transpositions. Evaluating each term in the sum and collecting like coefficients, the coefficient of \pi is the number of monotone factorizations of \pi, and the coefficient of \pi' is the number of monotone factorizations of \pi'. But, since this sum is a linear combination of conjugacy classes by Theorem 2, and since \pi and \pi' belong to the same conjugacy class, these coefficients are equal.

In the next lecture, we will combine the spectral theory of the JM elements with their combinatorial properties to give eigenvalue formulas for the enumeration of monotone walks on the Cayley graph.

Math 202C: Lecture 12

Since the representation theory of the symmetric group is likely fresh in your mind thanks to yesterday’s exam, this is a good time to show you a very nice probabilistic application of this theory to card shuffling. This post will discuss the “top-to-random” shuffle, where the cards 1,\dots,n are arranged in a single row (or a single stack), and a randomly chosen card is repeatedly swapped with the rightmost (or top) card. The algebraic model for this procedure is the Cayley graph \Gamma(n) of the symmetric group as generated by the so-called “star transpositions”

\tau_1=(1\ n), \tau_2=(2\ n),\dots, \tau_{n-1} = (n-1\ n).

Problem 12.1: Prove that the above transpositions do in fact generate \mathrm{S}(n).

These are called star transpositions because, if one forms the graph on \{1,\dots,n\} such that \{i,j\} is an edge if and only if one of the above transpositions swaps i and j, then this graph is a star graph. The top-to-random shuffle is then the same thing as the lazy simple random walk on the Cayley graph \Gamma(n).

The paper attached below gives a detailed analysis of the top-to-random shuffle using the representation theory of \mathrm{S}(n), and gives a precise meaning to the statement that n \log n such shuffles are required in order to randomize the deck. The argument is pretty much all representation theory — only some very basic probabilistic reasoning is used towards the end.

Math 202C: Lecture 11

If I was to start this course over again, I would not have limited the opening act to a discussion of the relationship between linear algebra and graph theory, but with a more general discussion of the relations between multilinear algebra and graph theory. Since I sketched this briefly in our last in-person lecture, I will record what was said here in more precise detail.

Let \mathcal{H} be a Hilbert space with finite or countably infinite orthonormal basis \mathbf{e}_i. To make things simpler, let us consider the case where \dim \mathcal{H}=n < \infty, but everything we say below works (or can be made to work) for arbitrary separable Hilbert spaces. Let \mathbb{C}\Omega be a the one-dimensional Hilbert space spanned by a unit vector \Omega which does not belong to \mathcal{H}.

Let d \in \mathbb{N}, and recall that \mathcal{H}^{\otimes d}, the dth tensor power of \mathcal{H}, is the vector space of degree d tensors

\mathbf{v}_1 \otimes \dots \otimes \mathbf{v}_d, \quad \mathbf{v}_i \in \mathcal{H},

and linear combinations thereof. This is again a Hilbert space, with the scalar product defined by

\langle \mathbf{v}_1 \otimes \dots \otimes \mathbf{v}_d,\mathbf{w}_1,\mathbf{v}_1 \otimes \dots \otimes \mathbf{v}_d \rangle = \prod\limits_{i=1}^d \langle \mathbf{v}_i,\mathbf{w}_i \rangle.

In particular, an orthonormal basis of \mathcal{H}^{\otimes d} is given by the tensors

\mathbf{e}_i = \mathbf{e}_{i(1)} \otimes \dots \otimes \mathbf{e}_{i(d)}, \quad i \in \mathrm{Fun}(d,n),

where \mathrm{Fun}(d,n) is the set of all functions

\{1,\dots,d\} \longrightarrow \{1,\dots,n\}.

Thus every tensor t \in \mathcal{H}^{\otimes d} is a linear combination

\mathbf{t} = \sum\limits_{i \in \mathrm{Fun}(d,n)} a_i \mathbf{e}_i.

The classical way to think about this is that \mathcal{H}^{\otimes d} is the space of complex-valued functions on d-tuples of vertices of a graph \Gamma with vertex set \{1,\dots,n\}. Alternatively, you can think of \mathcal{H}^{\otimes d} as the d-particle space corresponding to the one-particle space \mathcal{H}. From this point of view, each function

i \colon \{1,\dots,d\} \longrightarrow \{1,\dots,n\}

corresponds to an assignment of d distinguishable particles labeled 1,\dots,d to the vertices of \Gamma. If instead we are dealing with d quantum particles, meaning particles whose location cannot be definitely specified, then we instead must deal with a unit tensor

\mathbf{u}= \sum\limits_{i \in \mathrm{Fun}(d,n)} p_i \mathbf{e}_i

in \mathcal{H}^{\otimes d}, the idea being that

|p_i|^2 = |\langle \mathbf{e}_i,\mathbf{u}\rangle|^2

represents the probability to find the distinguishable particles 1,\dots,d at vertices i(1),\dots,i(d), respectively. It is convenient to define \mathcal{H}^{\otimes 0} :=\mathbb{C}\Omega, where \Omega is a unit vector not in \mathcal{H} called the “vacuum vector,” which corresponds to no particles, i.e. a vacuum.

Now suppose that one would like to deal with a variable (but finite) number of particles: the tensor algebra F(\mathcal{H}) over \mathcal{H} is the algebraic structure that does this. It is defined as the direct sum of all tensor powers of \mathcal{H}:

F(\mathcal{H}) = \bigoplus\limits_{d=0}^\infty \mathcal{H}^{\otimes d}.

This is a graded algebra, i.e. we have a natural multiplication of tensors: if \mathbf{t}_1 \in \mathcal{H}^{\otimes d_1} and \mathbf{t}_2 \in \mathcal{H}^{\otimes d_2}, then \mathbf{t_1} \otimes \mathbf{t}_2 \in \mathcal{H}^{\otimes (d_1+d_2)}. Note that F(\mathcal{H}) is a noncommutative algebra, and it is also infinite-dimensional, even though \mathcal{H} is finite-dimensional.

Problem 11.1: Show that F(\mathcal{H}) is (non-canonically) isomorphic to the free algebra \mathbb{C}\langle x_1,\dots,x_n\rangle of rank n, which consists of polynomials in n noncommutative indeterminates.

Returning to the viewpoint that \mathcal{H}=\bigoplus_{i=1}^n\mathbf{e}_i is the one-particle space corresponding to a graph \Gamma with vertex set \{1,\dots,n\}, we encode the edge structure of \Gamma using the adjacency operator A \in \mathrm{End}\mathcal{H} defined by

A\mathbf{e}_i = \sum\limits_{j=1}^n m(i,j) \mathbf{e}_j,

where m(i,j) is the multiplicity of the edge \{i,j\} in \Gamma. As we have discussed, the matrix elements of A^r then count r-step walks in \Gamma, i.e.

\langle A^r\mathbf{e}_i,\mathbf{e}_j \rangle

is the number of r-step walks from vertex i to vertex j in \Gamma. Let us now observe that this naturally extends to any finite number of particles. Namely, for any d \in \mathbb{N}, we have the operator A^{\otimes d} \in \mathrm{End} \mathcal{H}^{\otimes d} with action defined by

A^{\otimes d} \mathbf{e}_i = A\mathbf{e}_{i(1)} \otimes \dots \otimes A\mathbf{e}_{i(d)}, \quad i \in \mathrm{Fun}(d,n).

Proposition 1: For any d \in \mathbb{N}, any i,j \in \mathrm{Fun}(d,n), and any r \in \mathbb{N} \cup \{0\}, the number of ways in which d distinguishable particles on \Gamma can move from positions i(1),\dots,i(d) to positions j(1),\dots,j(d) in r instants of time, where at each instant each particle jumps to an adjacent vertex, is given by the scalar product

\langle (A^r)^{\otimes d} \mathbf{e}_i,\mathbf{e}_j \rangle.

Proof: We have that

\langle (A^r)^{\otimes d} \mathbf{e}_i,\mathbf{e}_j \rangle = \langle A^r\mathbf{e}_{i(1)} \otimes \dots \otimes A^r\mathbf{e}_{i(d)},\mathbf{e}_{j(1)} \otimes \dots \otimes \mathbf{e}_{j(d)} \rangle = \prod\limits_{x=1}^d \langle A^r\mathbf{e}_{i(x)},\mathbf{e}_{j(x)} \rangle.

The xth factor \langle A^r\mathbf{e}_{i(x)},\mathbf{e}_{j(x)} \rangle in the product counts the number of ways in which particle x can travel from vertex i(x) to vertex j(x) in r jumps along the edges of \Gamma. Since the particles are non-interacting, the product of these numbers is the number of possible histories corresponding to the evolution of the particle system from initial state i to state j in time r.

— Q.E.D.

Now suppose that \mathbf{f}_1,\dots,\mathbf{f}_n is an orthonormal basis of \mathcal{H} consisting of eigenvectors of A:

A\mathbf{f}_1 = \lambda_1 \mathbf{f}_1,\dots,A \mathbf{f}_n = \lambda_n\mathbf{f}_n.

Then, the tensors

\mathbf{f}_i = \mathbf{f}_{i(1)} \otimes \dots \otimes \mathbf{f}_{i(d)}, \quad i \in \mathrm{Fun}(d,n)

are an orthonormal basis of \mathcal{H}^{\otimes d} consisting of eigenvectors of A^{\otimes d}:

A^{\otimes d}\mathbf{f}_i = A\mathbf{f}_{i(1)} \otimes \dots\otimes A\mathbf{f}_{i(d)} = \lambda_{i(1)} \dots \lambda_{i(d)} \mathbf{f}_i, \quad i \in \mathrm{Fun}(d,n).

If we agree to use the multi-index notation

\lambda_i := \lambda_{i(1)} \dots \lambda_{i(d)},

then this simply reads

A^{\otimes d} \mathbf{f}_i = \lambda_i \mathbf{f}_i, \quad i \in \mathrm{Fun}(d,n).

We then have the following.

Corollary 1: For any d \in \mathbb{N}, any i,j \in \mathrm{Fun}(d,n), and any r \in \mathbb{N} \cup \{0\}, the number of ways in which d distinguishable particles on \Gamma can move from positions i(1),\dots,i(d) to positions j(1),\dots,j(d) in r instants of time, where at each instant each particle jumps to an adjacent vertex, is given by

W^r(i,j) = \sum\limits_{k \in \mathrm{Fun}(d,n)} \overline{\langle \mathbf{f}_k,\mathbf{e}_i\rangle} \lambda_i^r \mathbf{f}_k,\mathbf{e}_j \rangle.

The point of the above discussion is that, associated to every graph \Gamma is not just a Hilbert space \mathcal{H} encoding the vertices and an operator A \in \mathrm{End}\mathcal{H} encoding the edges, but an infinite dimensional noncommutative algebra F(\mathcal{H}) encoding \mathbb{C}-valued functions on ordered tuples of vertices of arbitrary finite length, and an operator F(A) \in \mathrm{End} F(\mathcal{H}) acting on these functions in a manner determined by the adjacency structure of \Gamma. Moreover, the variable multi-particle construction is not really any different from the one-particle construction: all the formulas are the same up to replacing indices with multi-indices.

The reason that we are discussing the above is that it gives a nice conceptual understanding of why the spectral analysis of Cayley graphs is so much more accessible than the spectral analysis of arbitrary graphs. In the group case the vertex set of \Gamma is not an inert finite set, but rather a finite set \mathrm{G} which comes to us with a group law. Since this is a special case of the above, we can of course still form the corresponding one-particle space \mathcal{H} with orthonormal basis \mathbf{e}_\pi indexed by the group elements \pi \in \mathrm{G} (which may or may not have a natural linear order), but it is more appropriate to denote this space \mathcal{A} since it is already an a unital, associative, and finite-dimensional algebra via bilinear extension of

\mathbf{e}_\pi \mathbf{e}_\sigma := \mathbf{e}_{\pi\sigma}.

We can still of course form the tensor algebra F(\mathcal{A}), but there is a natural map from F(\mathcal{A}) to its degree one component \mathcal{A} given by evaluation:

\mathbf{e}_{i(1)} \otimes \mathbf{e}_{i(2)} \otimes \dots \otimes \mathbf{e}_{i(d)} \to \mathbf{e}_{i(1)i(2)\dots i(d)}, \quad i \in \mathrm{Fun}(\{1,\dots,d\},\mathrm{G}),\ d \in \mathbb{N}.

Moreover, it is not only \mathcal{A} which is naturally a finite-dimensional algebra, but all of its tensor powers \mathcal{A}^{\otimes d}, via

(\mathbf{e}_{i(1)} \otimes \dots \otimes \mathbf{e}_{i(d)})( (\mathbf{e}_{j(1)} \otimes \dots \otimes \mathbf{e}_{j(d)}) = e_{i(1)j(1)} \otimes \dots \otimes \mathbf{e}_{i(d)j(d)},

which in multi-index notation we can succinctly write as \mathbf{e}_i\mathbf{e}_j = \mathbf{e}_{ij}. Moreover, if \mathrm{G} is an abelian group, things get even simpler and all these finite-dimensional algebras are commutative, and character theory gives a universal method to construct the eigenvalues and eigenvectors of the adjacency operator A and all of its tensor powers.

One demerit of the above is that the tensor algebra F(\mathcal{H}) is “unphysical,” in the sense that if we were dealing with natural particles they would not be distinguishable. Moreover, we would want to classify them as either interacting or not, so decide between bosons and fermions populating the vertices of our graph \Gamma. This can be accomplished by letting the adjacency operator A act not on the tensor powers of the one-particle space \mathcal{H}, but rather on its symmetric and antisymmetric tensors powers. We will discuss this next time.

Math 202C: Lecture 10

In this Lecture, we analyze simple random walk on the group \mathrm{B}(k) of permutations generated by the disjoint transpositions

\tau_1=(1\ 2), \dots, \tau_k=(2k-1\ 2k).

The corresponding Cayley graph \Gamma=\mathrm{Cay}(\mathrm{B}(k),\{\tau_1,\dots,\tau_k\}) has vertex set \mathrm{B}(k), with \{\rho,\sigma\} an edge if and only if \sigma=\rho\tau_i for some 1 \leq i \leq k.

Simple random walk on \Gamma has an interesting equivalent formulation known Ehrenfest diffusion model, which is used in thermodynamics. The Ehrenfest model consists of two urns, U_1 and U_2, and k balls b_1,\dots,b_k. At time zero, all balls are in urn U_2. At each instant of discrete time thereafter, a ball is chosen uniformly at random, and moved to the other urn.

Problem 10.1: Explain the equivalence between Ehrenfest diffusion and simple random walk on \mathrm{B}(k).

Simple random walk on \Gamma begins with a single particle positioned at the identity permutation \iota. At each tick of the clock, the particle makes a random jump to an adjacent permutation, with equal probability of jumping to any neighboring vertex on the k-regular graph \Gamma. However, because \Gamma is bipartite, the distribution of this random walk does not converge.

Problem 10.2: Prove that \Gamma is a bipartite graph.

To fix the bipartiteness issue, we replace \Gamma with a new graph \Gamma, which differs only from \Gamma by the addition of exactly one loop at each vertex, resulting in a (k+1)-regular graph with the same vertex set. This resolves the bipartiteness issue: by the Markov limit theorem, simple random walk on \overline{\Gamma} converges to uniform measure on \mathrm{B}(k). Note that one can equivalently think of simple random walk on \overline{\Gamma} as a “lazy” random walk on \Gamma, where at each time instant the particle either stays put or jumps to a neighboring vertex, with all events being equally likely.

We proceed to analysis of simple random walk on \overline{\Gamma}. Let \mathbb{C}\mathrm{B}(k) be the group algebra of \mathrm{B}(k), i.e. the Hilbert space with orthonormal basis

\{\pi \colon \pi \in \mathrm{B}(k)\}

and multiplication given by bilinear extension of the group law

\rho \sigma = \rho\sigma.

The probability operator P \in \mathrm{End} \mathbb{C}\mathrm{B}(k) for lazy simple random walk on \overline{\Gamma} is then implemented as multiplication by

\frac{1}{k+1}\iota + \frac{1}{k+1}\tau_1 + \dots + \frac{1}{k+1}\tau_k,

where \iota \in \mathrm{B}(k) is the identity permutation. In other words, the distribution of the particle at time r is given by the vector P^r_\iota. We would like to compare this to the vector

u=\sum\limits_{\pi \in \mathrm{B}(k)} \frac{1}{2^k} \pi,

which corresponds to uniform measure on \mathrm{B}(k). More precisely, we would like to estimate the quantity

\|P^r\iota - u\|_1 = \sum\limits_{\pi \in \mathrm{B}(k)} \left| \langle P^r \iota,\pi\rangle -\frac{1}{2^k}\right|,

which, up to a factor of 1/2, coincides with the total variation distance between the distribution of the particle at time r and the uniform distribution.

We will get an upper bound on the above total variation distance using the eigenvalues and eigenvectors of P. We have ample information concerning the eigenvalues and eigenvectors of the operator P from Lecture 9, where we analyzed the operator A of multiplication by

\tau_1 + \dots + \tau_k,

which is the adjacency operator of \Gamma. We know that the eigenvalues of A are the numbers

k-2j, \quad 0 \leq j \leq k,

and that E_j, the eigenspace of k-2j, has dimension {k \choose j} (an orthogonal basis of E_j is given by the characters \chi_\pi of \mathrm{B}(k) indexed by permutations with |\pi|=j.) Now, it is immediate to see that

P = \frac{1}{k+1}(I + A),

where I \in \mathrm{End}\mathbb{C}\mathrm{B}(k) is the identity operator, so that P has the same eigenspaces as A, and more precisely that P acts in E_j as multiplication by

\lambda_k=1-\frac{2j}{k+1}.

In particular, the largest eigenvalue \lambda_0 of P is 1 and it is a simple eigenvalue, which is consistent with Perron-Frobenius, and moreover \lambda_0 is the eigenvalue of largest absolute value, since the smallest eigenvalue is

\lambda_k= 1-\frac{2k}{k+1} > 1- \frac{2k}{k} = -1,

which is also consistent with Perron-Frobenius. We will now plug this information into the famous Upper Bound Lemma of Diaconis-Shahshahani to get an estimate on the total variation distance between the distribution of lazy simple random walk on \Gamma at time r, and the uniform distribution on the vertices of \Gamma. We skip the proof, and comment only that it follows from combining character expansion and the Cauchy-Schwarz inequality. A proof can be found in the “Harmonic Analysis on Finite Groups” reference text.

Theorem (Upper Bound Lemma): We have

\|P^r\iota - u \|_1^2 \leq \sum\limits_{j=1}^k \lambda_j^{2r} \dim E_j.

with \lambda_\pi the eigenvalue of the transition operator acting on the character \chi_\pi. The only eigenvalue equal to 1 is \lambda_\iota, and hence we have

Let us apply the Upper Bound Lemma. Right off the bat, it tells us that

\|P^r\iota - u\|_1^2 \leq \sum\limits_{j=1}^k {k \choose j} \left(1-\frac{2j}{k+1}\right)^{2r},

so we just have to estimate this sum of explicit quantities. Observing that

\left(1-\frac{2j}{k+1}\right)^{2r} = \left(1-\frac{2(k+1-j)}{k+1}\right)^{2r},

we have the estimate

\sum\limits_{j=1}^k {k \choose j} \left(1-\frac{2j}{k+1}\right)^{2r} \leq  2\sum\limits_{j=1}^{\lfloor \frac{k+1}{2} \rfloor} {k \choose j} \left(1-\frac{2j}{k+1}\right)^{2r},

and the factor of two which appears cancels with the missing factor of 1/2 which distinguishes the 1-norm from the actual total variation distance. Thus we get that \mathrm{d}_r, the total variation distance between the distribution of lazy simple random walk on \Gamma at time r and the uniform distribution, satisfies

\mathrm{d}_r \leq \sum\limits_{j=1}^{\lfloor \frac{k+1}{2} \rfloor} {k \choose j} \left(1-\frac{2j}{k+1}\right)^{2r}.

To estimate the right hand side, first observe that

{k \choose j} = \frac{k(k-1) \dots (k-j+1)}{j!} \leq \frac{k^j}{j!},

so that

\mathrm{d}_r \leq \sum\limits_{j=1}^{\lfloor \frac{k+1}{2} \rfloor} \frac{k^j}{j!}\left(1-\frac{2j}{k+1}\right)^{2r}.

Using that

e^{-x} = 1-x+\frac{x^2}{2} - \dots \geq 1-x

for x \geq 0, we have

\left(1-\frac{2j}{k+1}\right)^{2r} \leq e^{-\frac{4jr}{k+1}},

so that we have arrived at the upper bound

\mathrm{d}_r \leq \sum\limits_{j=1}^{\lfloor \frac{k+1}{2} \rfloor} \frac{k^j}{j!}e^{-\frac{4jr}{k+1}}.

Now observe that if r is large enough so that the exponentially small factor e^{-\frac{4jr}{k+1}} defeats the polynomially large factor k^j even for small values of j, then the distance \mathrm{d}_r will be very small. More precisely, if

r \geq \frac{1}{4}(k+1)(\log k + c)

for some constant c > 0, then

e^{-\frac{4jr}{k+1}} \leq k^{-j}e^{-cj},

and we have

\mathrm{d}_r \leq \sum\limits_{j=1}^{\lfloor \frac{k+1}{2} \rfloor} \frac{1}{j!} e^{-cj} \leq \sum\limits_{j=1}^{\infty} \frac{1}{j!} e^{-cj} = e^{e^{-c}}-1,

a number exponentially close to 0. Thus as soon as the number of steps r in lazy simple random walk on \Gamma exceeds the threshold

r=\frac{1}{4}(k+1)\log k,

the distribution of the walk becomes exponentially close to its limiting distribution. On the other hand, if instead

r \leq \frac{1}{4}(k+1)(\log k - c)

is below this threshold, then the distance between the total variation distance between the distribution of the walk and its limit distribution is still exponentially close to its initial value of 1. Thus the transition from not very random to exponentially close to total randomness happens suddenly, in a neighborhood of the “cutoff” \frac{1}{4}(k+1)\log k.

This so-called “cutoff phenomenon,” was discovered by Diaconis and Shahshahani for card shuffles, i.e. random walks on the symmetric group, which we will discuss next. There is no known characterization of the class of Markov chains for which this phenomenon occurs; observe that to establish it for \mathrm{B}(k) we needed total knowledge of the eigenvalues and eigenvectors of the associated probability operator P. For random walks on groups this information can often be obtained thanks to representation theory, which is a testament to the power of symmetry. For random walks on graphs which are not Cayley graphs of groups, complete knowledge of eigenvalues and eigenvectors is typically unavailable, and one cannot determine effective bounds on the speed of convergence of random walk to its stationary distribution.

Math 202C: Lecture 9

We begin with a brief recap of the main result from Lecture 8. Let

\chi_\sigma = \sum\limits_{\pi \in \mathrm{B}(k)} \chi_\sigma(\pi) \pi, \quad \sigma \in \mathrm{B}(k)

be the characters of the rank k-hypercube group

\mathrm{B}(k) = \langle (1\ 2), (3\ 4), \dots, (2k-1\ k) \rangle.

Definition 1: The orthogonal idempotents in the group algebra \mathbb{C}\mathrm{B}(k) are defined as the characters of \mathrm{B}(k) normalized by the order of \mathrm{B}(k):

\tilde{\chi}_\sigma = \frac{1}{2^k} \chi_\sigma, \quad \sigma \in \mathrm{B}(k).

The reason for the name is the following key property of these elements:

\tilde{\chi}_\rho \tilde{\chi}_\sigma = \tilde{\chi}_\rho [\rho = \sigma], \quad \rho,\sigma \in \mathrm{B}(k).

This is quite valuable, since expanding elements a \in \mathbb{C}\mathrm{B}(k) in the group algebra relative to the basis of orthogonal idempotents essentially makes multiplication in the group algebra trivial, which translates into the “Fourier coefficients of a product are products of Fourier coefficients” fact that we saw last Lecture. Note also that the behavior of orthogonal idempotents in the group algebra is formally identical to the behavior of orthogonal projections in Hilbert space — you perhaps know from Math 202B that this is not a coincidence, and if you don’t know this don’t worry, because we will discuss it.

The main result for today is the following.

Theorem 1: The class of multiplication operators in \mathrm{End}\mathbb{C}\mathrm{B}(k) coincides with the class of operators that are diagonal in the character basis of \mathbb{C}\mathrm{B}(k).

Proof: Let b \in \mathbb{C}\mathrm{B}(k) be an element of the group algebra, and let R_b \in \mathrm{End} \mathbb{C}\mathrm{B}(k) be the corresponding multiplication operator. Consider the character expansion of b,

b = 2^{-k} \sum\limits_{\pi \in \mathrm{B}(k)} \langle \chi_\pi,b\rangle \chi_\pi .

Now let \chi_\sigma \in \mathbb{C}\mathrm{B}(k) be another character. We have

R_b\chi_\sigma = 2^{-k}\sum\limits_{\pi \in \mathrm{B}(k)} \langle \chi_\pi,b\rangle \chi_\sigma\chi_\pi = 2^{-k}\langle \chi_\sigma,b\rangle 2^k \chi_\sigma = \langle \chi_\sigma,b\rangle\chi_\sigma.

So, \chi_\sigma is an eigenvector of R_b, with eigenvalue the Fourier coefficient \langle \chi_\sigma,b\rangle.

Conversely, suppose that R \in  \mathrm{End}\mathbb{C}\mathrm{B}(k) acts diagonally in the character basis:

R \chi_\pi = \lambda_\pi \chi_\pi, \quad \pi \in \mathrm{B}(k).

Then the action of R on an arbitrary a \in \mathbb{C}\mathrm{B}(k) is

Ma = M 2^{-k}\sum\limits_{\pi \in \mathrm{B}(k)} \langle \chi_\pi,a \rangle \chi_\pi = 2^{-k}\sum\limits_{\pi \in \mathrm{B}(k)} \langle \chi_\pi,a \rangle \lambda_\pi\chi_\pi = \left(  2^{-k}\sum\limits_{\pi \in \mathrm{B}(k)} \langle \chi_\pi,a \rangle \chi_\pi \right) \left(2^{-k}\sum\limits_{\sigma \in \mathrm{B}(k)} \lambda_\sigma \chi_\sigma \right),

so that M=R_b with

b= 2^{-k}\sum\limits_{\sigma \in \mathrm{B}(k)} \lambda_\sigma \chi_\sigma.

— Q.E.D.

With Theorem 1 in hand, we can immediately compute the the eigenvalues and eigenvectors of the Cayley graph of \mathrm{B}(k). Indeed, the adjacency operator of the Cayley graph of \mathrm{B}(k) is just the multiplication operators

R_{\sum_{i=1}^k \tau_i},

so by Theorem 1 we immediately have that

R_{\sum_{i=1}^k \tau_i} \chi_\sigma = \langle \chi_\sigma,\sum\limits_{i=1}^k \tau_i \rangle \chi_\sigma, \quad \sigma \in \mathrm{B}(k).

Now, we have that the resulting eigenvalue is

\langle \chi_\sigma,\sum\limits_{i=1}^k \tau_i \rangle = \sum\limits_{i=1}^k \langle \chi_\sigma,\tau_i \rangle,

and this is a sum we can explicitly evaluate, because we explicitly computed the characters of \mathrm{B}(k) back in Lecture 7. In particular, what we know is that

\langle \chi_\sigma,\tau_i \rangle =\chi_\sigma(\tau_i)

is equal to -1 if \tau_i is a factor of \sigma, and is equal to +1 if \tau_i is not a factor of \sigma. Since we compute this scalar product for each and every generator \tau_i of \mathrm{B}(k), the sum has |\sigma| terms equal to -1, and k - |\sigma| terms equal to +1, where |\sigma| is the word norm of \sigma, or equivalently its distance to the identity permutation in the Cayley graph. So we get

\sum\limits_{i=1}^k \langle \chi_\sigma,\tau_i \rangle = k-2|\sigma|

as our explicit eigenvalue of the character \chi_\sigma under the action of the adjacency operator. Moreover, the number of elements \sigma \in \mathrm{B}(k) such that |\sigma|=j for a given 0 \leq j \leq k is simply {k \choose j}, since this just amounts to choosing j of the generators \tau_1,\dots,\tau_k of \mathrm{B}(k). So we can summarize our results as follows.

Theorem 2: The eigenvalues of Cayley graph of \mathrm{B}(k) are the numbers

k-2j, \quad j=0,\dots,k,

and the dimension of the eigenspace corresponding to k-2j is {k \choose j}.

Since all eigenvalues of the Cayley graph of \mathrm{B}(k) are given explicitly by Theorem 2, we are in a position to explicitly enumerate walks on this graph, using the spectral approach to counting walks discussed previously.

Problem 9.1: Show that the number W^r(\rho,\sigma) of r-step walks on \mathrm{B}(k) from \rho to \sigma is

W^r(\rho,\sigma) = \frac{1}{2^k} \sum\limits_{i=0}^k \sum\limits_{j=0}^{|\rho\sigma|} (-1)^j {|\rho\sigma| \choose j} {k-|\rho\sigma| \choose i-j} (k-2i)^r.

This material, and problem 9.1, have applications in coding theory which you might find interesting to look into.

Math 202C: Lecture 8

We are at the beginning of the theory of harmonic analysis on finite abelian groups, starting with the concrete example of the the rank k hypercube group, which we have chosen to define as the permutation group \mathrm{B}(k) generated by the disjoint transpositions

\tau_1=(1\ 2), \tau_2=(3\ 4), \dots, \tau_k = (2k-1\ 2k).

Typically, analysts think of harmonic analysis on \mathrm{B}(k) (or any other group) as the study of the Hilbert space \mathbb{C}^{\mathrm{B}(k)} of functions

a \colon \mathrm{B}(k) \longrightarrow \mathbb{C}

equipped with the scalar product

\langle a,b \rangle = \sum\limits_{\pi \in \mathrm{B}(k)} \overline{a(\pi)}b(\pi),

so that one has a canonical orthonormal basis \{\delta_\pi\} of \mathbb{C}^{\mathrm{B}(k)} consisting of indicator functions. This space has the structure of an algebra when equipped with the convolution product of functions,

[a*b](\pi) = \sum\limits_{\sigma \in \mathrm{B}(k)} a(\pi\sigma^{-1}) b(\sigma),

which is exactly what one gets from bilinear extension of group law in \mathrm{B}(k) lifted to the group element basis:

\delta_\rho * \delta_\sigma = \delta_{\rho\sigma}.

In particular, the convolution product on \mathbb{C}^{\mathrm{B}(k)} is quite different from the pointwise product. This is the right point of view to take when one is dealing with non-discrete groups (e.g. Lie groups), but for finite groups there is another way of writing things out which is often simpler and cleaner, and favored by algebraists.

The algebraists’ preferred Hilbert space \mathbb{C}\mathrm{B}(k) of formal \mathbb{C}-linear combinations of permutations \pi \in \mathrm{B}(k),

a = \sum\limits_{\pi \in \mathrm{B}(k)} a(\pi)\pi.

Such linear combinations are of course equivalent to functions on the group, and you can think of the linear combination a as a generating function which are equivalent to complex valued functions on \mathrm{B}(k). From this perspective the canonical orthonormal basis of \mathbb{C}\mathrm{B}(k) consists of the permutations \pi \in \mathrm{B}(k) themselves (rather than their indicator functions), which induces the equivalent scalar product,

\langle a,b \rangle = \sum\limits_{\pi \in \mathrm{B}(k)} \overline{\langle \pi, a\rangle} \langle \pi,b \rangle.

The convolution product on \mathbb{C}\mathrm{B}(k) is likewise replaced with the equivalent multiplication (written simply as juxtaposition)

ab = \sum\limits_{(\rho,\sigma) \in \mathrm{B}(k)^2} \langle \rho,a\rangle \langle \sigma,b\rangle \rho\sigma =\sum\limits_{\pi \in \mathrm{B}(k)} \left(\sum\limits_{\sigma \in \mathrm{B}(k)} \langle \pi\sigma^{-1},a\rangle \langle \sigma,b \rangle \right)\pi.

In short, \mathbb{C}^{\mathrm{B}(k)} and \mathbb{C}\mathrm{B}(k) are isomorphic both as Hilbert spaces and as algebras, so the choice to use one rather than the other is a matter only of preference. In practice, it is quite useful to be comfortable with both the analytic and the algebraic way of doing things.

Yet another way of repackaging everything is the following probably too-general perspective. Let us zoom back out to the very beginning, where we started with an arbitrary finite graph \Gamma. The algebraic way to understand this combinatorial object is to replace the vertex set V of \Gamma with the algebra \mathbb{C}\langle x_v \colon v \in V \rangle of polynomials in noncommuting variables x_v indexed by the vertices of v. In other words, this is the free algebra of rank |V|. This algebra is graded by degree, and we have a decomposition

\mathbb{C}\langle x_v \colon v \in V \rangle = \bigoplus_{d=0}^\infty \mathbb{C}^{(d)}\langle x_v \colon v \in V \rangle,

with the dth summand being the space of homogeneous polynomials of degree d. Observe that this vectors space is isomorphic to the space of complex-valued functions on V^d, i.e. the space of functions on d-tuples of vertices, or equivalently formal linear combinations of d-tuples of vertices. In particular, the linear component \mathbb{C}^{(1)}\langle x_v \colon v \in V\rangle is isomorphic to either/both \mathbb{C}^V and \mathbb{C}V, and the adjacency operator acts on this linear component to capture the edge structure of \Gamma. Indeed, the free algebra is isomorphic to the tensor algebra of the vector space of functions on the vertex set V. In the case where the vertex set of V of \Gamma is actually a group \mathrm{G}, we can mod out by the group law and consider the quotient

\mathbb{C}\langle x_g \colon g \in \mathrm{G} \rangle/\langle x_gx_h=x_{gh}\rangle

of the free algebra by the ideal corresponding to the group multiplication. This quotient consists solely of polynomials of degree 1, so functions on the group, or equivalently linear combinations of group elements, but now with the structure of an algebra.

Coming back to reality, the fact that \mathbb{C}\mathrm{B}(k) is an algebra and not just a vector space is reflected in the presence of a certain class of operators on the group algebra which are not present in the endomorphism algebra of a vector space devoid of multiplication.

Definition 1: Given b \in \mathbb{C}\mathrm{B}(k), the corresponding multilplication operator R_b\in \mathrm{End} \mathbb{C}\mathrm{B}(k) is defined by

R_ba = ab, \quad a \in \mathbb{C}\mathrm{B}(k).

We are using the letter “R” for multiplication operators because later on we will consider the group algebras of noncommutative groups, in which left and right multiplication do not necessarily coincide, and we have typically different left and right multiplication operators L_b and R_b. To put this the Math 202B way, the operator R_b is the image of b in the right regular representation of \mathbb{C}\mathrm{B}(k).

Note that multiplication operators corresponding to pure group elements (as opposed to linear combinations of group elements) are unitary operators on \mathbb{C}\mathrm{B}(k),

\langle \rho,R_\pi\sigma \rangle = \langle \rho,\sigma\pi \rangle = \langle \rho\pi^{-1},\sigma \rangle = \langle R_{\pi}^{-1} \rho,\sigma \rangle.

However, since we are working with the permutation group \mathrm{B}(k), in which all permutations are actually involutions, R_\pi is both unitary and selfadjoint, hence has eigenvalues \pm 1.

The importance of multiplication operators in our quest to understand the eigenvalues and eigenvectors of the Cayley graph of \mathrm{B}(k) as generated by the transpositions \tau_1,\dots,\tau_k is the following.

Proposition 1: The adjacency operator of the Cayley graph of \mathrm{B}_k is the multiplication operator R_t, where

t = \sum\limits_{i=1}^k \tau_i.

Proof: For any \rho \in \mathrm{B}(k), we have

R_t\rho = \sum\limits_{i=1}^k \rho\tau_i.

— Q.E.D.

In view of Proposition 1, the problem of understanding the eigenvalues and eigenvectors of the Cayley graph is that of understanding the eigenvalues and eigenvectors of

R_t = \sum\limits_{I=1}^k R_{\tau_i}.

To solve it, we will work out the spectral analysis of arbitrary multiplication operators on \mathrm{B}(k). The main tool in this endeavor is as second basis of the group algebra, which consists of special linear combinations \chi of group elements which have the property that

\langle \rho\sigma,\chi \rangle = \langle \rho,\chi \rangle \langle \sigma,\chi \rangle.

That is, \chi is the generating function of a group homomorphism

\mathrm{B}(k) \to \mathbb{C}^\times,

the multiplicative group of nonzero complex numbers. I like this way of stating things; in the analytic formalism, the words “character” and “homomorphism” mean exactly the same thing, a redundancy, whereas in the algebraic formalism “character” really means “generating function of a homomorphism.”

In Lecture 7, we explicitly worked out all homorphisms from \mathrm{B}(k) to \mathbb{C}^\times, showing that they have values in the subgroup generated by -1, and are moreover parameterized by the points of \mathrm{B}(k). We also showed that the characters are orthogonal,

\langle \chi_\pi,\chi_\sigma \rangle = 2^k[\pi=\sigma], \quad \pi,\sigma \in \mathrm{B}(k).

In particular, for any a \in \mathbb{C}\mathrm{B}(k), we have the character expansion

a = 2^{-k}\sum\limits_{\pi \in \mathrm{B}(k)} \langle \chi_\pi,a\rangle \chi_\pi.

The character expansion of a is also known as its Fourier expansion, with \langle \chi_\pi,a\rangle being the Fourier coefficients of a. The main reason that character expansions are “better” than expansions in the group basis is that the Fourier coefficients of a product ab are just the products of the corresponding Fourier coefficients of a and b.

The main reason that characters — which are generating functions of homomorphisms \mathrm{B}(k) \to \{-1,+1\} — are better than the original group elements is that their multiplication is extremely simple.

Theorem 1: For any \rho,\sigma \in \mathrm{B}(k), we have

\chi_\rho\chi_\sigma =0

if \rho \neq \sigma, and

\chi_\pi \chi_\pi = 2^k\chi_\pi.

Proof: Expanding in the permutation basis, we have

\chi_\rho\chi_\sigma = \sum\limits_{\pi_1 \in \mathrm{B}(k)} \left( \sum\limits_{\pi_2 \in \mathrm{B}(k)} \chi_\rho(\pi_1\pi_2^{-1}) \chi_\sigma(\pi_2) \right) \pi_1 = \sum\limits_{\pi_1 \in \mathrm{B}(k)} \left( \sum\limits_{\pi_2 \in \mathrm{B}(k)} \chi_\rho(\pi_1) \chi_\rho(\pi_2) \chi_\sigma(\pi_2) \right) \pi_1,

where we used that \chi_\rho is a homomorphism and \pi_2^{-1}=\pi_2. The result now follows from character orthogonality, which is Theorem 2 in Lecture 7.

— Q.E.D.

An immediate consequence is that the Fourier coefficients of a product are the products of the Fourier coefficients of the factors.

Corollary 1: For any a,b \in \mathbb{C}\mathrm{G}, we have

\langle \chi_\pi,ab\rangle = \langle \chi_\pi,a \rangle \langle \chi_\pi,b\rangle.

Proof: On one hand, the character expansion of the product ab is

ab = 2^{-k}\sum\limits_{\pi \in \mathrm{B}(k)} \langle \chi_\pi,ab\rangle \chi_\pi.

On the other hand, we compute the product a and b via their character expansions:

ab = \left( 2^{-k}\sum\limits_{\rho \in \mathrm{B}(k)} \langle \chi_\rho,a\rangle \chi_\rho\right)\left( 2^{-k}\sum\limits_{\sigma \in \mathrm{B}(k)} \langle \chi_\sigma,b\rangle \chi_\sigma\right) \\= 4^{-k} \sum\limits_{(\rho,\sigma) \in \mathrm{B}(k)^2}\langle \chi_\rho,a\rangle\langle \chi_\sigma,b\rangle \chi_\rho\chi_\sigma \\ = 2^{-k}\sum\limits_{\pi \in \mathrm{B}(k)} \langle\chi_\pi,a\rangle \langle \chi_\pi,b\rangle \chi_\pi,

where we applied Theorem 1 to get the final equality. Since the characters are linearly independent, the result follows.

—- Q.E.D.

In Lecture 9, we will apply the above to prove that M \in \mathrm{End} \mathbb{C}\mathrm{G} is a multiplication operator if and only if it acts diagonally in the character basis. We will also find the corresponding eigenvalues, and hence count walks on \mathrm{B}(k). In Lecture 10 we will then analyze convergence of random walk on \mathrm{B}(k) to equilibrium.

Math 202C: Lecture 7

Let the symmetric group \mathrm{S}(n) on \{1,\dots,n\} be identified with its right Cayley graph as generated by the conjugacy class of transpositions: two permutations \rho,\sigma \in \mathrm{S}(n) are joined by a single edge if and only if \sigma = \rho\tau for some transposition \tau \in \mathrm{S}(n). We equip \mathrm{S}(n) with the corresponding word norm, so that |\pi| is the minimal length of a factorization of \pi into transpositions, or equivalently the length of a geodesic path from the identity permutation to \pi in the Cayley graph.

Our current two-part goal is to compute the number of walks of given length between two specified vertices in \mathrm{S}(n), and to estimate the total variation distance between the distribution of simple random walk on \mathrm{S}(n) and its stationary distribution, which is the uniform measure U(\pi)=\frac{1}{n!} on \mathrm{S}(n).

We will start by asking these same questions about certain simple subgroups of \mathrm{S}(n). Let k be a positive integer such that 2k \leq n, and let \mathrm{B}(k) be the subgroup of \mathrm{S}(n) generated by k disjoint transpositions, say

\tau_1=(1\ 2),\ \tau_2=(3\ 4),\ \dots, \tau_k = (2k-1\ 2k).

Thus \mathrm{B}(k) is an abelian group of order 2^k, whose elements are all permutations of the form

\pi =\tau_1^{e_1} \dots \tau_k^{e_k}, \quad e_i \in \{0,1\}.

For example,

\mathrm{B}(2) = \{(1)(2)(3)(4), (1\ 2), (3\ 4),(1\ 2)(3\ 4)\}.

We identify \mathrm{B}(k) with its Cayley graph, which is an induced subgraph of the Cayley graph of \mathrm{S}(n) often referred to as the k-dimensional hypercube graph.

Problem 7.1: Prove that \mathrm{B}(k) is isomorphic to the group \mathbb{Z}_2^k of bitstrings of length k, and that under this isomorphism the word norm |\pi| is given by the number of nonzero bits.

There are two ways to implement the linear algebraic approach to the study of (random) walks on the hypercube \mathrm{B}(k). The first employs the Hilbert space

\mathbb{C}\mathrm{B}(k) = \left\{\sum\limits_{\pi \in \mathrm{B}(k)} a_\pi \pi \colon a_\pi \in \mathbb{C}\right\}

of formal \mathbb{C}-linear combinations of permutations \pi \in \mathrm{B}(k), with the scalar product defined by declaring the permutation basis

\{\pi \colon \pi \in \mathrm{B}(k)\}

to be orthonormal. This is the space favored by algebraists. Analysts prefer to use the space

\mathbb{C}^{\mathrm{B}(k)} = \left\{ f \colon \mathrm{B}(k) \to \mathbb{C} \right\}

of \mathbb{C}-valued functions of permutations \pi \in \mathrm{B}(k)\}, with the scalar product defined by declaring the indicator function basis

\{ \delta_\pi(\sigma) := [\pi=\sigma] \colon \pi \in \mathrm{B}(k)\}

to be orthonormal (the square bracket above is the Iverson bracket). These two spaces are isomorphic, with a Hilbert space isomorphism

\mathbb{C}^{\mathrm{B}(k)} \longrightarrow \mathbb{C}\mathrm{B}(k)

given by

f \mapsto \sum\limits_{\pi \in \mathrm{B}(k)} f(\pi) \pi.

Because \mathrm{B}(k) is not just a set, but a group, the corresponding Hilbert spaces \mathbb{C}\mathrm{B}(k) and \mathbb{C}^{\mathrm{B}(k)} are algebras. The multiplication in \mathbb{C}\mathrm{B}(k) is defined by

\left( \sum\limits_{\rho \in \mathrm{B}(k)} a_\rho \rho \right)\left( \sum\limits_{\sigma \in \mathrm{B}(k)} b_\sigma \sigma \right) = \sum\limits_{(\rho,\sigma) \in \mathrm{B}(k)^2} a_\rho b_\sigma \rho\sigma = \sum\limits_{\pi \in \mathrm{B}(k)} \left( \sum\limits_{\sigma \in \mathrm{B}(k)} a_{\pi\sigma^{-1}} b_\sigma\right)\pi.

The multiplication in \mathbb{C}^{\mathrm{B}(k)} is defined by

\left( f*g \right)(\pi) = \sum\limits_{\sigma \in \mathrm{B}(k)} f(\pi\sigma^{-1}) g(\sigma).

This latter multiplication is typically referred to as convolution because of its formal similarity to convolution of functions on the real line,

(f*g)(x) = \int_{-\infty}^{\infty} f(x-t)g(t)\mathrm{d}t,

and indeed this similarity is part of a meaningful analogy which we are just starting to discuss. It is clear that the isomorphism \mathbb{C}^{\mathrm{B}(k)} \to \mathbb{C}\mathrm{B}(k) discussed above is an algebra isomorphism, i.e. it intertwines the two product operations.

In the algebraic formulation, the adjacency operator A \in \mathbb{C}\mathrm{B}(k) is defined by its action on the permutation basis,

A\pi = \sum\limits_{i=1}^k \pi\tau_i.

In the functional implementation, the adjacency operator A \in \mathbb{C}^{\mathrm{B}(k)} is defined by its action on the indicator function basis,

A\delta_{\pi} = \sum\limits_{i=1}^k \delta_{\pi\tau_i}.

Both implementations of the linearization of \mathrm{B}(k) — meaning ways of embedding this group into a vector space — are useful, and it is advisable to keep both in mind. In what follows, the functional implementation is somewhat more natural, but we will move back and forth between the two freely.

Intuition leads us to believe that the keys to the spectrum of A must somehow come from the group structure of \mathrm{B}(k) — not just from linear algebra, but from the interaction of linear algebra and group theory. Indeed, inside the linearization \mathbb{C}^{\mathrm{B}(k)} as a function space, there is a natural family of functions linked to the lingering group structure.

Definition 1: The characters of \mathrm{B}(k) are the elements \chi of \mathbb{C}^{\mathrm{B}(k)} which are group homomorphisms

\chi \colon \mathrm{B}(k) \longrightarrow \mathbb{C}^\times

from the hypercube to the multiplicative group of nonzero complex numbers. The set of characters of \mathrm{B}(k) is called the dual of \mathrm{B}(k), and denoted \widehat{\mathrm{B}(k)}.

Much more generally, for any finite abelian group \mathrm{G}, one defines its Pontryagin dual \widehat{\mathrm{G}} to be the set of all group homomorphisms

\chi \colon \mathrm{G} \longrightarrow \mathbb{C}^\times,

these being referred to as the characters of \mathrm{G}. The Pontryagin dual can be defined for much more general abelian groups, but in this course we will almost exclusively deal with finite groups.

Returning to the case \mathrm{G}=\mathrm{B}(k) at hand, let us note that Definition 1 is actually even simpler and more natural than it appears, since all elements of \mathrm{B}(k) are permutations of order 2, i.e. involutions. This means that we have the constraint

\chi(\pi^2) = \chi(\pi)^2= 1,

so that in fact the values of any character \chi of \mathrm{B}(k) must be a square root of 1, i.e. either 1 or -1. From this it follows that the number of characters of \mathrm{B}(k) must be 2^k, the order of \mathrm{B}(k), since each character \chi is determined by the values

\chi(\tau_i), \quad 1 \leq i \leq k,

and we have 2 choices, \pm 1, for each of these values. This suggests that we can in fact parameterize the characters of \mathrm{B}(k) by the elements of \mathrm{B}(k), which offers the beginnings of an explanation as to why \widehat{\mathrm{B}(k)} is called the dual of \mathrm{B}(k).

Theorem 1: The characters of \mathrm{B}(k) are precisely the functions defined by

\chi_{\tau_1^{x_1} \dots \tau_k^{x_k}}(\tau_1^{y_1} \dots \tau_k^{y_k}) = (-1)^{x_1y_1+\dots+x_ky_k}.

Proof: Clearly, the number of these putative characters is 2^k, which we know to be the total number of characters, so that it is sufficient to check that they are in fact group homomorphisms. But this is clear:

\chi_{\tau_1^{x_1} \dots \tau_k^{x_k}}(\tau_1^{y_1} \dots \tau_k^{y_k}) = (-1)^{x_1y_1+\dots+x_ky_k} = (-1)^{x_1y_1} \dots (-1)^{x_ky_k} = \chi_{\tau_1^{x_1} \dots \tau_k^{x_k}}(\tau_1^{y_1}) \dots \chi_{\tau_1^{x_1} \dots \tau_k^{x_k}}(\tau_k^{y_k}).

— Q.E.D.

Problem 7.2: Show that the multiplication defined by

\chi_{\rho}\chi_\sigma := \chi_{\rho\sigma}

gives \widehat{\mathrm{B}(k)} the structure of a group, and that this dual group is isomorphic to \mathrm{B}(k).

Theorem 2: The set \widehat{\mathrm{B}(k)} is an orthogonal basis of \mathbb{C}^{\mathrm{B}(k)}.

Proof: Since the cardinality of \widehat{\mathrm{B}(k)} equals the cardinality of \mathrm{B}(k), which is in turn the dimension of \mathbb{C}^{\mathrm{B}(k)}, it is sufficient to prove orthogonality. Let \rho,\sigma \in \mathrm{B}(k) be arbitrary. Then

\langle \chi_\rho,\chi_\sigma \rangle = \sum\limits_{\pi \in \mathrm{B}(k)} \chi_\rho(\pi)\chi_\sigma(\pi) = \sum\limits_{\pi \in \mathrm{B}(k)} \chi_{\rho\sigma}(\pi) = \langle \chi_{\rho\sigma},\chi_\iota\rangle,

where we have used Problem 7.1 and \iota \in \mathrm{B}(k) denotes the identity permutation. So it is sufficient to consider scalar products of the form \langle \chi_\rho,\chi_\iota\rangle. In the case \rho=\iota, we have

\langle \chi_\iota,\chi_\iota \rangle = \sum\limits_{\pi \in \mathrm{B}(k)} \chi_\iota(\pi)\chi_\iota(\pi) = \sum\limits_{\pi \in \mathrm{B}(k)} 1 = 2^k.

If \rho \neq \iota, then there exists \sigma \in \mathrm{B}(k) such that \chi_\rho(\sigma)=-1. We then have

- \langle \chi_\rho,\chi_\iota \rangle = \chi_\rho(\sigma) \sum\limits_{\pi \in \mathrm{B}(k)} \chi_\rho(\pi) = \sum\limits_{\pi \in \mathrm{B}(k)} \chi_\rho(\sigma\pi) = \langle \chi_\rho,\chi_\iota \rangle,

whence \langle \chi_\rho,\chi_\iota \rangle=0.

— Q.E.D.

Problem 7.3: Prove the dual orthogonality relation

\sum\limits_{\pi \in \mathrm{B}(k)} \chi_\pi(\rho)\chi_\pi(\sigma)= 2^k[\rho=\sigma].

In Lecture 8, we will link the characters of \mathrm{B}(k) to the adjacency operator of its Cayley graph.

Math 202C: Lecture 6

In Lecture 5, we worked out the Markov limit theorem which says that simple random walk on a connected, non-bipartite graph will converge to the stationary distribution on its vertices in the large time limit. This is an important general result, but it gives no understanding of how quickly this convergence occurs. Let us try to do this for \mathrm{K}(n), the complete graph on \{1,\dots,n\}.

Intuitively, because \mathrm{K}(n) is so highly connected, we expect convergence of simple random walk on this graph to the stationary measure to be very rapid. Indeed, at time r=1, a particle located at vertex 1 at time r=0 which performs simple random has equal probability of being at any vertex except 1, where it cannot be:

P^1(1,1) = 0,\ P^1(1,2)=\frac{1}{n-1},\dots,\ P^1(1,n) = \frac{1}{n-1}.

This seems close to the stationary distribution for simple random walk on \mathrm{K}(n), which according to the Markov limit theorem is uniform distribution

U(1) = \frac{1}{n},\ U(2) = \frac{1}{n},\dots,\ U(n)=\frac{1}{n}

on \{1,\dots,n\}, except for the aberration P^1(1,1)=0. However, this aberration continues indefinitely: there cannot be any finite time r such that

P^r(1,j)=U^r(j), \quad 1 \leq j \leq n,

because the particle has zero probability of being located at vertex j conditional on the event that it was located at vertex j at time r-1, and this event has positive probability. So, what really do we mean by “P^r gets close to U fast?”

Definition 1: If P,Q are probability measures on \{1,\dots,n\}, their total variation distance is

\mathrm{d}_{TV}(P,Q) = \frac{1}{2}\sum\limits_{i=1}^n |P(i)-Q(i)|.

This is quite an intuitive notion: the total variation distance simply sums up the discrepancy between P and Q over all points, giving a quantitative meaning to the total discrepancy between these measures. In particular, the maximum total variation occurs when P places all its mass at a single point, and Q places all its mass at a single different point; we then have

\mathrm{d}_{TV}(P,Q) =1.

We now have a clearly defined objective: estimate the total variation distance between the distribution of simple random walk on \mathrm{K}(n) at time r, and its stationary (uniform) distribution. Following the same pedagogical template as in our first few lectures, let us emphasize that this can be done in a completely elementary way, since we were able to obtain explicit formulas for counting walks on \mathrm{K}(n) just by being clever. Back in Lecture 1, we showed that the number of r step loops on \mathrm{K}(n) based at any vertex is

W^r(1,1) = \frac{1}{n}(n-1)^r +\frac{(-1)^r}{n}(n-1).

Now, since \mathrm{K}(n) is (n-1)-regular, the total number of r-step walks departing from 1 is (n-1)^r, and consequently the probability that simple random walk on \mathrm{K}(n) returns to its starting point at time r is

P^r(1,1) = \frac{1}{(n-1)^r}W^r(1,1) = \frac{1}{n} +\frac{(-1)^r}{n(n-1)^{r-1}},

from which we see directly that

\lim\limits_{r\to \infty} P^r(1,1) = \frac{1}{n},

exactly as the Markov limit theorem predicts. But actually, since we have simple explicit formulas, we can see more, namely we have the error estimate

\left| P^r(1,1)-\frac{1}{n}\right| = \frac{1}{n(n-1)^{r-1}},

which is actually an identity (the best kind of estimate). As for the number of r-step walks from 1 to 2, this is given by the complementary formula

W^r(1,2) = \frac{1}{n}(n-1)^r -\frac{(-1)^r}{n},

whence

P^r(1,2) = \frac{1}{(n-1)^r}W^r(1,2) = \frac{1}{n} -\frac{(-1)^r}{n(n-1)^r}.

So again we immediately obtain the limit

\lim\limits_{r \to \infty} P^r(1,2) = \frac{1}{n},

along with the discrepancy formula

\left| P^r(1,2)-\frac{1}{n}\right| = \frac{1}{n(n-1)^r}.

All told, our maneuvering above comes down to the exact formula

\mathrm{d}_{TV}(P^r,U) = \frac{1}{n(n-1)^{r-1}}

for the total variation distance between the distribution of simple random walk on \mathrm{K}(n) at time r and the uniform distribution, which is its r \to \infty limit. Thus we know not just that P^r converges to r, but exactly how fast, as an explicit function both of spatial parameter n and the time parameter r.

Now let us reproduce the above formulas algebraically. We form the Hilbert space \mathcal{H} with orthonormal basis \{\mathbf{e}_1,\dots,\mathbf{e}_n\} labeled by the vertices of \mathrm{K}(n). From this point of view, a probability measure P on \{1,\dots,n\} is replaced with the vector \mathbf{p} \in \mathcal{H} defined by

\mathbf{p} = \sum\limits_{I=1}^n P(i) \mathbf{e}_i.

Note that even though P is probability measure on \{1,\dots,n\}, \mathbf{p} is not a unit vector with respect to the Hilbert space norm (or \ell^2) norm,

\|\mathbf{v}\|_2= \sqrt{\langle \mathbf{v},\mathbf{v} \rangle}.

Rather, it is a unit vector with respect to the \ell^1-norm, which is defined by

\|\mathbf{v}\|_1 = \sum\limits_{i=1}^n |\langle \mathbf{e}_i,\mathbf{v}\rangle|,

and it is this norm that corresponds to the total variation distance: for probability measures P,Q on \{1,\dots,n\} with corresponding probability vectors \mathbf{p},\mathbf{q} \in \mathcal{H}, we have

\mathrm{d}_{TV}(P,Q) = \frac{1}{2}\|\mathbf{p}-\mathbf{q}\|_1.

So, we would like to compute \|\mathbf{p}_r-\mathbf{u}\|_1, where

\mathbf{p}_r = \sum\limits_{j=1}^n P^r(1,j) \mathbf{e}_j

is the probability vector corresponding to the distribution of simple random walk on \mathrm{K}(n) at time r, and

\mathbf{u} = \sum\limits_{j=1}^n \frac{1}{n} \mathbf{e}_j

is the probability vector corresponding to the uniform distribution.

The adjacency operator A \in \mathrm{End}\mathcal{H} is defined by its action in the vertex basis,

A\mathbf{e}_j = \sum\limits_{\substack{i=1 \\ i\neq j}}^n \mathbf{e}_i, \quad 1 \leq j \leq n,

and since \mathrm{K}(n) is (n-1)-regular the transition operator for simple random walk on \mathrm{K}(n) is just T=\frac{1}{n-1}A. By inspection, we have

A\mathbf{u} = (n-1)\mathbf{u}

so that \mathbf{u} is an eigenvector of A with eigenvalue (n-1). Let \mathcal{U} be the line in \mathcal{H} spanned by \mathbf{u}. Since A is selfadjoint, \mathcal{U}^\perp is an invariant subspace of A, and in fact, as you saw in last week’s problem set, A acts in \mathcal{U}^\perp as multiplication by -1. Thus, the adjacency operator decomposes as

A = (n-1)P + (-1)Q,

where P,Q \in \mathrm{End}\mathcal{H} are the orthogonal projections of \mathcal{H} onto the complementary subspaces \mathcal{U} and \mathcal{U}^\perp, respectively. We thus have

A^r=(n-1)^rP + (-1)^rQ,

and hence

T^r= P +\frac{(-1)^r}{(n-1)^r}Q.

We now have the formula

\|\mathbf{p}_r-\mathbf{u}\|_1 = \|T^r\mathbf{e}_1-\mathbf{u}\|_1,

so it remains only to compute T^r\mathbf{e}_1 in order to get the total variation distance we seek. Now this is just

T^r\mathbf{e}_1 = P\mathbf{e}_1 +\frac{(-1)^r}{(n-1)^r}Q\mathbf{e}_1,

where P\mathbf{e}_1 is the orthogonal projection of \mathbf{e}_1 onto \mathcal{U} and Q\mathbf{e}_1 is the orthogonal projection of \mathbf{e}_1 onto \mathcal{U}^\perp. By basic linear algebra,

P \mathbf{e}_1 = \langle \mathbf{e}_1,\mathbf{u} \rangle \mathbf{u} = \frac{1}{n}\mathbf{u},

and hence for the complementary projection we have

Q\mathbf{e}_1 = (I-P)\mathbf{e}_1 = \mathbf{e}_1 -\frac{1}{n}\mathbf{u}.

This allows to explicitly compute the 1-norm of T^r\mathbf{e}_1-\mathbf{u}, though I have to admit I stopped short of doing the required simplifications to check that we get the same formulas as by the barehands method, up to a factor of 2.

Problem 6.1: Analyze simple random walk on \overline{\mathrm{K}}(n), the graph obtained by adding a loop at each vertex of \mathrm{K}(n). This is exactly the same thing as “lazy” simple random walk on \mathrm{K}(n), where the particle can either jump to an adjacent vertex or stay put.

The main message of this lecture is the following. We would like to analyze random walk on the Cayley graph of the symmetric group \mathrm{S}(n) as generated by the conjugacy class of transpositions. Of course, being a Cayley graph, this is a regular graph and hence the Markov limit theorem tells us that the distribution of the walk tends to uniform in the long-time limit. But in real world terms, this is just the statement that “after infinitely many shuffles, a deck of cards is randomized,” and now it is clear that this is not a useful statement: what we really want to be able to calculate is the total variation distance between the distribution of the walk at time r and uniform. This will allow us to quantitatively state how many shuffles are required to get within a stipulated distance of the uniform measure. This endeavor will require everything you learned in Math 202B, and will (hopefully) help to elucidate that material. In the next lecture, we will start down this road by analyzing the same question for some fairly simple abelian subgroups of \mathrm{S}(n).

Math 202C: Lecture 5

We continue with our study of random walks on finite graphs via linear algebra. Let \mathrm{G} be a graph on \{1,\dots,n\}, possibly with loops and/or multiple edges. As in Lecture 4, let P^r(i,j) denote the probability that simple random walk on \mathrm{G} starting from vertex i is located at vertex j at time r. In this Lecture, we will use the linear algebra setup from Lecture 4 to analyze the r \to \infty asymptotic behavior of the probability P^r(i,j), which corresponds to the “long time behavior” of simple random walk on \mathrm{G}.

The linear algebraic setup is as follows. Let \mathcal{H} be a Hilbert space with orthonormal basis \{\mathbf{e}_1,\dots,\mathbf{e}_n\} indexed by the vertices of \mathrm{G}. In the language of quantum mechanics, by replacing the vertex set of \mathrm{G} with the Hilbert space \mathcal{H} we allow ourselves to consider not just vertices, but “superpositions” of vertices. Recall from Lecture 4 that the transition operator T \in \mathrm{End}\mathcal{H} for simple random walk on \mathrm{G} is defined by its action on the vertex basis as

T\mathbf{e}_i = \sum\limits_{j=1}^n \frac{m_{ij}}{d_i} \mathbf{e}_j,

where m_{ij} is the number of edges of \mathrm{G} incident to both i and j, and d_i=\sum_{j=1}^n m_{ij} is the total number of edges incident to i. In Lecture 4, we showed that

P^r(i,j) = \langle T^r\mathbf{e}_i,\mathbf{e}_j \rangle,

so that understanding the r \to \infty asymptotics of the probability P^r(i,j) is equivalent to understanding the r \to \infty asymptotics of the scalar product \langle T^r\mathbf{e}_i,\mathbf{e}_j \rangle. We would like to accomplish this via spectral analysis of T.

We adopt the same assumptions as in Lecture 4: \mathrm{G} is a connected graph on \{1,\dots,n\} and n \geq 2, so that in particular every vertex has strictly positive degree. Under this assumption, we showed previously that, although T need not be selfadjoint, or even normal, the “symmetrize” transition operator D^{-1}TD is selfadjoint, with D \in \mathrm{GL}(\mathcal{H}) being the operator that acts diagonally in the vertex basis according to

D\mathbf{e}_j = \sqrt{d_j}\mathbf{e}_j.

We may thus invoke the spectral theorem to assert that

D^{-1}TD = \lambda_1P_1 + \dots + \lambda_mP_m,

where

\lambda_1 >\dots > \lambda_m,

are the distinct eigenvalues of the selfadjoint operator D^{-1}TD, with P_i \in \mathrm{End}\mathcal{H} the orthogonal projection of \mathcal{H} onto the \lambda_i-eigenspace. Because the P_i‘s are orthogonal projections, we have

D^{-1}T^rD = \lambda_1^rP_1 + \dots + \lambda_m^rP_m,

so that for the rth power of the transition operator we have the decomposition

T^r = \lambda_1^r DP_1D^{-1} + \dots + \lambda_m^r DP_mD^{-1}.

So far everything we have said is valid for an arbitrary operator in \mathrm{End}\mathcal{H} whose orbit under the conjugation action of \mathrm{GL}(\mathcal{H}) contains a selfadjoint operator. To go further we naturally have to reference some of the particular features of the transition operator T.

Theorem 1: The largest eigenvalue of T is \lambda_1=1.

Proof: Let \|\cdot\|_1 be the 1-norm on \mathcal{H} corresponding to the basis \mathbf{e}_1,\dots,\mathbf{e}_n, i.e.

\|\mathbf{v}\|_1 = \sum\limits_{j=1}^n |\langle \mathbf{v},\mathbf{e}_j \rangle|.

We claim that T has norm at most 1 in the corresponding operator norm on \mathrm{End}\mathcal{H}, i.e.

\|T\mathbf{v}\|_1 \leq \|\mathbf{v}\|_1 \quad \forall \mathbf{v} \in \mathcal{H}.

Indeed, we have

\|T\mathbf{v}\|_1 = \sum\limits_{j=1}^n |\langle S\mathbf{v},\mathbf{e}_j\rangle| = \sum\limits_{j=1}^n \left| \left\langle \sum\limits_{i=1}^n \langle \mathbf{e}_i,\mathbf{v}\rangle T\mathbf{e}_i,\mathbf{e}_j \right\rangle \right| \leq \sum\limits_{i=1}^n|\langle \mathbf{v},\mathbf{e}_i\rangle|\sum\limits_{j=1}^n |\langle T\mathbf{e}_i,\mathbf{e}_j\rangle|,

and the terminal double sum is exactly \|\mathbf{v}\|_1, since

\sum\limits_{j=1}^n \langle T\mathbf{e}_i,\mathbf{e}_j \rangle = \sum\limits_{j=1}^n \frac{m_{ij}}{d_i} = \frac{d_i}{d_i}=1.

It follows that any eigenvalue of T has modulus at most 1.

Now we exhibit an eigenvector of T with eigenvalue 1. Consider the vector

\mathbf{s} = \sum\limits_{i=1}^n \frac{d_i}{D}\mathbf{e}_i,

where

D=\sum\limits_{i=1}^n d_i

is the sum of the degrees of all vertices of \mathrm{G}, which is equal to twice the number of edges in \mathrm{G} less the number of loops. We then have

T\mathbf{s} = \sum\limits_{i=1}^n \frac{d_i}{D}T\mathbf{e}_i = \sum\limits_{i=1}^n \frac{d_i}{D}\sum\limits_{j=1}^n \frac{m_{ij}}{d_i} \mathbf{e}_j = \sum\limits_{j=1}^n \frac{d_j}{D}\mathbf{e}_j = \mathbf{s}.

— Q.E.D.

The eigenvector \mathbf{s} above is called the stationary vector for simple random walk on \mathrm{G}, and it encodes the probability measure on the vertices of \mathrm{G} which assigns mass \frac{d_i}{D} to vertex i. Note that, if \mathrm{G} is a simple graph with constant vertex degree equal to d, which is always the case for Cayley graphs corresponding to a symmetric generating set, then the stationary vector

\mathbf{s} = \sum\limits_{i=1}^n \frac{d}{dn}\mathbf{e}_i = \sum\limits_{i=1}^n \frac{1}{n}\mathbf{e}_i

corresponds to uniform probability measure on \{1,\dots,n\}.

It is important to note that Theorem 1 does not preclude the possibility that the smallest eigenvalue of the transition operator T could also have maximum modulus — it could well be that \lambda_m=-1. However, this can only be the case if \mathrm{G} is bipartite (perhaps I should fill in a proof of this later, but if I don’t you can find one in any text on spectral graph theory). Therefore, we have the following.

Corollary 1: If \mathrm{G} is not bipartite, then

\lim\limits_{r \to \infty} \|T^r-DP_1D^{-1}\| = 0,

where \|\cdot\| is any norm on \mathrm{End}\mathcal{H}.

Proof: Let \|\cdot\| be an arbitrary norm on \mathrm{End}\mathcal{H} (indeed, since \mathrm{End}\mathcal{H}\} is finite-dimensional, all norms are equivalent). Then for any r we have that

\|T^r-\lambda_1DP_1D^{-1}\| \leq \sum\limits_{i=2}^m |\lambda_i|^r \|DP_iD^{-1}\|,

by the triangle inequality. But since \lambda_1=1 and |\lambda_i|<1 for i > 1, the statement follows from the squeeze theorem.

— Q.E.D.

At this point we can finally say something about the probability P^r(i,j) = \langle T^r\mathbf{e}_i,\mathbf{e}_j \rangle that simple random walk on \mathrm{G} starting at i arrives at j at time r: for \mathrm{G} a connected non-bipartite graph on n \geq 2 vertices, the limit

t(i,j) = \lim\limits_{r \to \infty} P^r(i,j)

exists. Note that the role of bipartiteness here is quite clear from a combinatorial perspective: if \mathrm{G} is bipartite, then with probability one simple random walk started at i is located in the color class of i at even times, and in the opposite color class at odd times, so that the above limit obviously does not exist.

Continuing on under the assumption that \mathrm{G} is not-bipartite, let us define a vector \mathbf{t}_i \in \mathcal{H} by

\mathbf{t}_i = \sum\limits_{j=1}^n t(i,j) \mathbf{e}_j,

which encodes the limit distribution of simple random walk on \mathrm{G} started at i.

Theorem 2: For any 1 \leq i \leq n, the limit vector \mathbf{t}_i is an eigenvector of T with eigenvalue 1.

Proof: We have

T\mathbf{t}_i = T\lim\limits_{r \to \infty} T^r \mathbf{e}_i = \lim\limits_{r \to \infty} T^{r+1} \mathbf{e}_i = \mathbf{t}_i.

— Q.E.D.

Theorem (Markov limit theorem): For \mathrm{G} a connected non-bipartite graph on \{1,\dots,n\}, n \geq 2, we have

\mathbf{t}_1=\mathbf{t}_2=\dots = \mathbf{t}_n = \mathbf{s}.

Proof: The statement follows if we can show that \lambda_1=1 is a simple eigenvalue of T, i.e. that the \lambda_1-eigenspace is a line. Indeed, all the vectors \mathbf{t}_i as well as \mathbf{s} have nonnegative coordinates summing to 1 relative to the vertex basis, so if any of these is a scalar multiple of another then the scalar in question must be 1.

The proof that the top eigenspace is indeed one-dimensional uses a bit of a deus ex machina, namely the Perron-Frobenius theorem. Look up this theorem, and complete the proof!

— Q.E.D.