Math 202C: Lecture 11

If I was to start this course over again, I would not have limited the opening act to a discussion of the relationship between linear algebra and graph theory, but with a more general discussion of the relations between multilinear algebra and graph theory. Since I sketched this briefly in our last in-person lecture, I will record what was said here in more precise detail.

Let \mathcal{H} be a Hilbert space with finite or countably infinite orthonormal basis \mathbf{e}_i. To make things simpler, let us consider the case where \dim \mathcal{H}=n < \infty, but everything we say below works (or can be made to work) for arbitrary separable Hilbert spaces. Let \mathbb{C}\Omega be a the one-dimensional Hilbert space spanned by a unit vector \Omega which does not belong to \mathcal{H}.

Let d \in \mathbb{N}, and recall that \mathcal{H}^{\otimes d}, the dth tensor power of \mathcal{H}, is the vector space of degree d tensors

\mathbf{v}_1 \otimes \dots \otimes \mathbf{v}_d, \quad \mathbf{v}_i \in \mathcal{H},

and linear combinations thereof. This is again a Hilbert space, with the scalar product defined by

\langle \mathbf{v}_1 \otimes \dots \otimes \mathbf{v}_d,\mathbf{w}_1,\mathbf{v}_1 \otimes \dots \otimes \mathbf{v}_d \rangle = \prod\limits_{i=1}^d \langle \mathbf{v}_i,\mathbf{w}_i \rangle.

In particular, an orthonormal basis of \mathcal{H}^{\otimes d} is given by the tensors

\mathbf{e}_i = \mathbf{e}_{i(1)} \otimes \dots \otimes \mathbf{e}_{i(d)}, \quad i \in \mathrm{Fun}(d,n),

where \mathrm{Fun}(d,n) is the set of all functions

\{1,\dots,d\} \longrightarrow \{1,\dots,n\}.

Thus every tensor t \in \mathcal{H}^{\otimes d} is a linear combination

\mathbf{t} = \sum\limits_{i \in \mathrm{Fun}(d,n)} a_i \mathbf{e}_i.

The classical way to think about this is that \mathcal{H}^{\otimes d} is the space of complex-valued functions on d-tuples of vertices of a graph \Gamma with vertex set \{1,\dots,n\}. Alternatively, you can think of \mathcal{H}^{\otimes d} as the d-particle space corresponding to the one-particle space \mathcal{H}. From this point of view, each function

i \colon \{1,\dots,d\} \longrightarrow \{1,\dots,n\}

corresponds to an assignment of d distinguishable particles labeled 1,\dots,d to the vertices of \Gamma. If instead we are dealing with d quantum particles, meaning particles whose location cannot be definitely specified, then we instead must deal with a unit tensor

\mathbf{u}= \sum\limits_{i \in \mathrm{Fun}(d,n)} p_i \mathbf{e}_i

in \mathcal{H}^{\otimes d}, the idea being that

|p_i|^2 = |\langle \mathbf{e}_i,\mathbf{u}\rangle|^2

represents the probability to find the distinguishable particles 1,\dots,d at vertices i(1),\dots,i(d), respectively. It is convenient to define \mathcal{H}^{\otimes 0} :=\mathbb{C}\Omega, where \Omega is a unit vector not in \mathcal{H} called the “vacuum vector,” which corresponds to no particles, i.e. a vacuum.

Now suppose that one would like to deal with a variable (but finite) number of particles: the tensor algebra F(\mathcal{H}) over \mathcal{H} is the algebraic structure that does this. It is defined as the direct sum of all tensor powers of \mathcal{H}:

F(\mathcal{H}) = \bigoplus\limits_{d=0}^\infty \mathcal{H}^{\otimes d}.

This is a graded algebra, i.e. we have a natural multiplication of tensors: if \mathbf{t}_1 \in \mathcal{H}^{\otimes d_1} and \mathbf{t}_2 \in \mathcal{H}^{\otimes d_2}, then \mathbf{t_1} \otimes \mathbf{t}_2 \in \mathcal{H}^{\otimes (d_1+d_2)}. Note that F(\mathcal{H}) is a noncommutative algebra, and it is also infinite-dimensional, even though \mathcal{H} is finite-dimensional.

Problem 11.1: Show that F(\mathcal{H}) is (non-canonically) isomorphic to the free algebra \mathbb{C}\langle x_1,\dots,x_n\rangle of rank n, which consists of polynomials in n noncommutative indeterminates.

Returning to the viewpoint that \mathcal{H}=\bigoplus_{i=1}^n\mathbf{e}_i is the one-particle space corresponding to a graph \Gamma with vertex set \{1,\dots,n\}, we encode the edge structure of \Gamma using the adjacency operator A \in \mathrm{End}\mathcal{H} defined by

A\mathbf{e}_i = \sum\limits_{j=1}^n m(i,j) \mathbf{e}_j,

where m(i,j) is the multiplicity of the edge \{i,j\} in \Gamma. As we have discussed, the matrix elements of A^r then count r-step walks in \Gamma, i.e.

\langle A^r\mathbf{e}_i,\mathbf{e}_j \rangle

is the number of r-step walks from vertex i to vertex j in \Gamma. Let us now observe that this naturally extends to any finite number of particles. Namely, for any d \in \mathbb{N}, we have the operator A^{\otimes d} \in \mathrm{End} \mathcal{H}^{\otimes d} with action defined by

A^{\otimes d} \mathbf{e}_i = A\mathbf{e}_{i(1)} \otimes \dots \otimes A\mathbf{e}_{i(d)}, \quad i \in \mathrm{Fun}(d,n).

Proposition 1: For any d \in \mathbb{N}, any i,j \in \mathrm{Fun}(d,n), and any r \in \mathbb{N} \cup \{0\}, the number of ways in which d distinguishable particles on \Gamma can move from positions i(1),\dots,i(d) to positions j(1),\dots,j(d) in r instants of time, where at each instant each particle jumps to an adjacent vertex, is given by the scalar product

\langle (A^r)^{\otimes d} \mathbf{e}_i,\mathbf{e}_j \rangle.

Proof: We have that

\langle (A^r)^{\otimes d} \mathbf{e}_i,\mathbf{e}_j \rangle = \langle A^r\mathbf{e}_{i(1)} \otimes \dots \otimes A^r\mathbf{e}_{i(d)},\mathbf{e}_{j(1)} \otimes \dots \otimes \mathbf{e}_{j(d)} \rangle = \prod\limits_{x=1}^d \langle A^r\mathbf{e}_{i(x)},\mathbf{e}_{j(x)} \rangle.

The xth factor \langle A^r\mathbf{e}_{i(x)},\mathbf{e}_{j(x)} \rangle in the product counts the number of ways in which particle x can travel from vertex i(x) to vertex j(x) in r jumps along the edges of \Gamma. Since the particles are non-interacting, the product of these numbers is the number of possible histories corresponding to the evolution of the particle system from initial state i to state j in time r.

— Q.E.D.

Now suppose that \mathbf{f}_1,\dots,\mathbf{f}_n is an orthonormal basis of \mathcal{H} consisting of eigenvectors of A:

A\mathbf{f}_1 = \lambda_1 \mathbf{f}_1,\dots,A \mathbf{f}_n = \lambda_n\mathbf{f}_n.

Then, the tensors

\mathbf{f}_i = \mathbf{f}_{i(1)} \otimes \dots \otimes \mathbf{f}_{i(d)}, \quad i \in \mathrm{Fun}(d,n)

are an orthonormal basis of \mathcal{H}^{\otimes d} consisting of eigenvectors of A^{\otimes d}:

A^{\otimes d}\mathbf{f}_i = A\mathbf{f}_{i(1)} \otimes \dots\otimes A\mathbf{f}_{i(d)} = \lambda_{i(1)} \dots \lambda_{i(d)} \mathbf{f}_i, \quad i \in \mathrm{Fun}(d,n).

If we agree to use the multi-index notation

\lambda_i := \lambda_{i(1)} \dots \lambda_{i(d)},

then this simply reads

A^{\otimes d} \mathbf{f}_i = \lambda_i \mathbf{f}_i, \quad i \in \mathrm{Fun}(d,n).

We then have the following.

Corollary 1: For any d \in \mathbb{N}, any i,j \in \mathrm{Fun}(d,n), and any r \in \mathbb{N} \cup \{0\}, the number of ways in which d distinguishable particles on \Gamma can move from positions i(1),\dots,i(d) to positions j(1),\dots,j(d) in r instants of time, where at each instant each particle jumps to an adjacent vertex, is given by

W^r(i,j) = \sum\limits_{k \in \mathrm{Fun}(d,n)} \overline{\langle \mathbf{f}_k,\mathbf{e}_i\rangle} \lambda_i^r \mathbf{f}_k,\mathbf{e}_j \rangle.

The point of the above discussion is that, associated to every graph \Gamma is not just a Hilbert space \mathcal{H} encoding the vertices and an operator A \in \mathrm{End}\mathcal{H} encoding the edges, but an infinite dimensional noncommutative algebra F(\mathcal{H}) encoding \mathbb{C}-valued functions on ordered tuples of vertices of arbitrary finite length, and an operator F(A) \in \mathrm{End} F(\mathcal{H}) acting on these functions in a manner determined by the adjacency structure of \Gamma. Moreover, the variable multi-particle construction is not really any different from the one-particle construction: all the formulas are the same up to replacing indices with multi-indices.

The reason that we are discussing the above is that it gives a nice conceptual understanding of why the spectral analysis of Cayley graphs is so much more accessible than the spectral analysis of arbitrary graphs. In the group case the vertex set of \Gamma is not an inert finite set, but rather a finite set \mathrm{G} which comes to us with a group law. Since this is a special case of the above, we can of course still form the corresponding one-particle space \mathcal{H} with orthonormal basis \mathbf{e}_\pi indexed by the group elements \pi \in \mathrm{G} (which may or may not have a natural linear order), but it is more appropriate to denote this space \mathcal{A} since it is already an a unital, associative, and finite-dimensional algebra via bilinear extension of

\mathbf{e}_\pi \mathbf{e}_\sigma := \mathbf{e}_{\pi\sigma}.

We can still of course form the tensor algebra F(\mathcal{A}), but there is a natural map from F(\mathcal{A}) to its degree one component \mathcal{A} given by evaluation:

\mathbf{e}_{i(1)} \otimes \mathbf{e}_{i(2)} \otimes \dots \otimes \mathbf{e}_{i(d)} \to \mathbf{e}_{i(1)i(2)\dots i(d)}, \quad i \in \mathrm{Fun}(\{1,\dots,d\},\mathrm{G}),\ d \in \mathbb{N}.

Moreover, it is not only \mathcal{A} which is naturally a finite-dimensional algebra, but all of its tensor powers \mathcal{A}^{\otimes d}, via

(\mathbf{e}_{i(1)} \otimes \dots \otimes \mathbf{e}_{i(d)})( (\mathbf{e}_{j(1)} \otimes \dots \otimes \mathbf{e}_{j(d)}) = e_{i(1)j(1)} \otimes \dots \otimes \mathbf{e}_{i(d)j(d)},

which in multi-index notation we can succinctly write as \mathbf{e}_i\mathbf{e}_j = \mathbf{e}_{ij}. Moreover, if \mathrm{G} is an abelian group, things get even simpler and all these finite-dimensional algebras are commutative, and character theory gives a universal method to construct the eigenvalues and eigenvectors of the adjacency operator A and all of its tensor powers.

One demerit of the above is that the tensor algebra F(\mathcal{H}) is “unphysical,” in the sense that if we were dealing with natural particles they would not be distinguishable. Moreover, we would want to classify them as either interacting or not, so decide between bosons and fermions populating the vertices of our graph \Gamma. This can be accomplished by letting the adjacency operator A act not on the tensor powers of the one-particle space \mathcal{H}, but rather on its symmetric and antisymmetric tensors powers. We will discuss this next time.

Math 202C: Lecture 10

In this Lecture, we analyze simple random walk on the group \mathrm{B}(k) of permutations generated by the disjoint transpositions

\tau_1=(1\ 2), \dots, \tau_k=(2k-1\ 2k).

The corresponding Cayley graph \Gamma=\mathrm{Cay}(\mathrm{B}(k),\{\tau_1,\dots,\tau_k\}) has vertex set \mathrm{B}(k), with \{\rho,\sigma\} an edge if and only if \sigma=\rho\tau_i for some 1 \leq i \leq k.

Simple random walk on \Gamma has an interesting equivalent formulation known Ehrenfest diffusion model, which is used in thermodynamics. The Ehrenfest model consists of two urns, U_1 and U_2, and k balls b_1,\dots,b_k. At time zero, all balls are in urn U_2. At each instant of discrete time thereafter, a ball is chosen uniformly at random, and moved to the other urn.

Problem 10.1: Explain the equivalence between Ehrenfest diffusion and simple random walk on \mathrm{B}(k).

Simple random walk on \Gamma begins with a single particle positioned at the identity permutation \iota. At each tick of the clock, the particle makes a random jump to an adjacent permutation, with equal probability of jumping to any neighboring vertex on the k-regular graph \Gamma. However, because \Gamma is bipartite, the distribution of this random walk does not converge.

Problem 10.2: Prove that \Gamma is a bipartite graph.

To fix the bipartiteness issue, we replace \Gamma with a new graph \Gamma, which differs only from \Gamma by the addition of exactly one loop at each vertex, resulting in a (k+1)-regular graph with the same vertex set. This resolves the bipartiteness issue: by the Markov limit theorem, simple random walk on \overline{\Gamma} converges to uniform measure on \mathrm{B}(k). Note that one can equivalently think of simple random walk on \overline{\Gamma} as a “lazy” random walk on \Gamma, where at each time instant the particle either stays put or jumps to a neighboring vertex, with all events being equally likely.

We proceed to analysis of simple random walk on \overline{\Gamma}. Let \mathbb{C}\mathrm{B}(k) be the group algebra of \mathrm{B}(k), i.e. the Hilbert space with orthonormal basis

\{\pi \colon \pi \in \mathrm{B}(k)\}

and multiplication given by bilinear extension of the group law

\rho \sigma = \rho\sigma.

The probability operator P \in \mathrm{End} \mathbb{C}\mathrm{B}(k) for lazy simple random walk on \overline{\Gamma} is then implemented as multiplication by

\frac{1}{k+1}\iota + \frac{1}{k+1}\tau_1 + \dots + \frac{1}{k+1}\tau_k,

where \iota \in \mathrm{B}(k) is the identity permutation. In other words, the distribution of the particle at time r is given by the vector P^r_\iota. We would like to compare this to the vector

u=\sum\limits_{\pi \in \mathrm{B}(k)} \frac{1}{2^k} \pi,

which corresponds to uniform measure on \mathrm{B}(k). More precisely, we would like to estimate the quantity

\|P^r\iota - u\|_1 = \sum\limits_{\pi \in \mathrm{B}(k)} \left| \langle P^r \iota,\pi\rangle -\frac{1}{2^k}\right|,

which, up to a factor of 1/2, coincides with the total variation distance between the distribution of the particle at time r and the uniform distribution.

We will get an upper bound on the above total variation distance using the eigenvalues and eigenvectors of P. We have ample information concerning the eigenvalues and eigenvectors of the operator P from Lecture 9, where we analyzed the operator A of multiplication by

\tau_1 + \dots + \tau_k,

which is the adjacency operator of \Gamma. We know that the eigenvalues of A are the numbers

k-2j, \quad 0 \leq j \leq k,

and that E_j, the eigenspace of k-2j, has dimension {k \choose j} (an orthogonal basis of E_j is given by the characters \chi_\pi of \mathrm{B}(k) indexed by permutations with |\pi|=j.) Now, it is immediate to see that

P = \frac{1}{k+1}(I + A),

where I \in \mathrm{End}\mathbb{C}\mathrm{B}(k) is the identity operator, so that P has the same eigenspaces as A, and more precisely that P acts in E_j as multiplication by

\lambda_k=1-\frac{2j}{k+1}.

In particular, the largest eigenvalue \lambda_0 of P is 1 and it is a simple eigenvalue, which is consistent with Perron-Frobenius, and moreover \lambda_0 is the eigenvalue of largest absolute value, since the smallest eigenvalue is

\lambda_k= 1-\frac{2k}{k+1} > 1- \frac{2k}{k} = -1,

which is also consistent with Perron-Frobenius. We will now plug this information into the famous Upper Bound Lemma of Diaconis-Shahshahani to get an estimate on the total variation distance between the distribution of lazy simple random walk on \Gamma at time r, and the uniform distribution on the vertices of \Gamma. We skip the proof, and comment only that it follows from combining character expansion and the Cauchy-Schwarz inequality. A proof can be found in the “Harmonic Analysis on Finite Groups” reference text.

Theorem (Upper Bound Lemma): We have

\|P^r\iota - u \|_1^2 \leq \sum\limits_{j=1}^k \lambda_j^{2r} \dim E_j.

with \lambda_\pi the eigenvalue of the transition operator acting on the character \chi_\pi. The only eigenvalue equal to 1 is \lambda_\iota, and hence we have

Let us apply the Upper Bound Lemma. Right off the bat, it tells us that

\|P^r\iota - u\|_1^2 \leq \sum\limits_{j=1}^k {k \choose j} \left(1-\frac{2j}{k+1}\right)^{2r},

so we just have to estimate this sum of explicit quantities. Observing that

\left(1-\frac{2j}{k+1}\right)^{2r} = \left(1-\frac{2(k+1-j)}{k+1}\right)^{2r},

we have the estimate

\sum\limits_{j=1}^k {k \choose j} \left(1-\frac{2j}{k+1}\right)^{2r} \leq  2\sum\limits_{j=1}^{\lfloor \frac{k+1}{2} \rfloor} {k \choose j} \left(1-\frac{2j}{k+1}\right)^{2r},

and the factor of two which appears cancels with the missing factor of 1/2 which distinguishes the 1-norm from the actual total variation distance. Thus we get that \mathrm{d}_r, the total variation distance between the distribution of lazy simple random walk on \Gamma at time r and the uniform distribution, satisfies

\mathrm{d}_r \leq \sum\limits_{j=1}^{\lfloor \frac{k+1}{2} \rfloor} {k \choose j} \left(1-\frac{2j}{k+1}\right)^{2r}.

To estimate the right hand side, first observe that

{k \choose j} = \frac{k(k-1) \dots (k-j+1)}{j!} \leq \frac{k^j}{j!},

so that

\mathrm{d}_r \leq \sum\limits_{j=1}^{\lfloor \frac{k+1}{2} \rfloor} \frac{k^j}{j!}\left(1-\frac{2j}{k+1}\right)^{2r}.

Using that

e^{-x} = 1-x+\frac{x^2}{2} - \dots \geq 1-x

for x \geq 0, we have

\left(1-\frac{2j}{k+1}\right)^{2r} \leq e^{-\frac{4jr}{k+1}},

so that we have arrived at the upper bound

\mathrm{d}_r \leq \sum\limits_{j=1}^{\lfloor \frac{k+1}{2} \rfloor} \frac{k^j}{j!}e^{-\frac{4jr}{k+1}}.

Now observe that if r is large enough so that the exponentially small factor e^{-\frac{4jr}{k+1}} defeats the polynomially large factor k^j even for small values of j, then the distance \mathrm{d}_r will be very small. More precisely, if

r \geq \frac{1}{4}(k+1)(\log k + c)

for some constant c > 0, then

e^{-\frac{4jr}{k+1}} \leq k^{-j}e^{-cj},

and we have

\mathrm{d}_r \leq \sum\limits_{j=1}^{\lfloor \frac{k+1}{2} \rfloor} \frac{1}{j!} e^{-cj} \leq \sum\limits_{j=1}^{\infty} \frac{1}{j!} e^{-cj} = e^{e^{-c}}-1,

a number exponentially close to 0. Thus as soon as the number of steps r in lazy simple random walk on \Gamma exceeds the threshold

r=\frac{1}{4}(k+1)\log k,

the distribution of the walk becomes exponentially close to its limiting distribution. On the other hand, if instead

r \leq \frac{1}{4}(k+1)(\log k - c)

is below this threshold, then the distance between the total variation distance between the distribution of the walk and its limit distribution is still exponentially close to its initial value of 1. Thus the transition from not very random to exponentially close to total randomness happens suddenly, in a neighborhood of the “cutoff” \frac{1}{4}(k+1)\log k.

This so-called “cutoff phenomenon,” was discovered by Diaconis and Shahshahani for card shuffles, i.e. random walks on the symmetric group, which we will discuss next. There is no known characterization of the class of Markov chains for which this phenomenon occurs; observe that to establish it for \mathrm{B}(k) we needed total knowledge of the eigenvalues and eigenvectors of the associated probability operator P. For random walks on groups this information can often be obtained thanks to representation theory, which is a testament to the power of symmetry. For random walks on graphs which are not Cayley graphs of groups, complete knowledge of eigenvalues and eigenvectors is typically unavailable, and one cannot determine effective bounds on the speed of convergence of random walk to its stationary distribution.

Math 202C: Lecture 9

We begin with a brief recap of the main result from Lecture 8. Let

\chi_\sigma = \sum\limits_{\pi \in \mathrm{B}(k)} \chi_\sigma(\pi) \pi, \quad \sigma \in \mathrm{B}(k)

be the characters of the rank k-hypercube group

\mathrm{B}(k) = \langle (1\ 2), (3\ 4), \dots, (2k-1\ k) \rangle.

Definition 1: The orthogonal idempotents in the group algebra \mathbb{C}\mathrm{B}(k) are defined as the characters of \mathrm{B}(k) normalized by the order of \mathrm{B}(k):

\tilde{\chi}_\sigma = \frac{1}{2^k} \chi_\sigma, \quad \sigma \in \mathrm{B}(k).

The reason for the name is the following key property of these elements:

\tilde{\chi}_\rho \tilde{\chi}_\sigma = \tilde{\chi}_\rho [\rho = \sigma], \quad \rho,\sigma \in \mathrm{B}(k).

This is quite valuable, since expanding elements a \in \mathbb{C}\mathrm{B}(k) in the group algebra relative to the basis of orthogonal idempotents essentially makes multiplication in the group algebra trivial, which translates into the “Fourier coefficients of a product are products of Fourier coefficients” fact that we saw last Lecture. Note also that the behavior of orthogonal idempotents in the group algebra is formally identical to the behavior of orthogonal projections in Hilbert space — you perhaps know from Math 202B that this is not a coincidence, and if you don’t know this don’t worry, because we will discuss it.

The main result for today is the following.

Theorem 1: The class of multiplication operators in \mathrm{End}\mathbb{C}\mathrm{B}(k) coincides with the class of operators that are diagonal in the character basis of \mathbb{C}\mathrm{B}(k).

Proof: Let b \in \mathbb{C}\mathrm{B}(k) be an element of the group algebra, and let R_b \in \mathrm{End} \mathbb{C}\mathrm{B}(k) be the corresponding multiplication operator. Consider the character expansion of b,

b = 2^{-k} \sum\limits_{\pi \in \mathrm{B}(k)} \langle \chi_\pi,b\rangle \chi_\pi .

Now let \chi_\sigma \in \mathbb{C}\mathrm{B}(k) be another character. We have

R_b\chi_\sigma = 2^{-k}\sum\limits_{\pi \in \mathrm{B}(k)} \langle \chi_\pi,b\rangle \chi_\sigma\chi_\pi = 2^{-k}\langle \chi_\sigma,b\rangle 2^k \chi_\sigma = \langle \chi_\sigma,b\rangle\chi_\sigma.

So, \chi_\sigma is an eigenvector of R_b, with eigenvalue the Fourier coefficient \langle \chi_\sigma,b\rangle.

Conversely, suppose that R \in  \mathrm{End}\mathbb{C}\mathrm{B}(k) acts diagonally in the character basis:

R \chi_\pi = \lambda_\pi \chi_\pi, \quad \pi \in \mathrm{B}(k).

Then the action of R on an arbitrary a \in \mathbb{C}\mathrm{B}(k) is

Ma = M 2^{-k}\sum\limits_{\pi \in \mathrm{B}(k)} \langle \chi_\pi,a \rangle \chi_\pi = 2^{-k}\sum\limits_{\pi \in \mathrm{B}(k)} \langle \chi_\pi,a \rangle \lambda_\pi\chi_\pi = \left(  2^{-k}\sum\limits_{\pi \in \mathrm{B}(k)} \langle \chi_\pi,a \rangle \chi_\pi \right) \left(2^{-k}\sum\limits_{\sigma \in \mathrm{B}(k)} \lambda_\sigma \chi_\sigma \right),

so that M=R_b with

b= 2^{-k}\sum\limits_{\sigma \in \mathrm{B}(k)} \lambda_\sigma \chi_\sigma.

— Q.E.D.

With Theorem 1 in hand, we can immediately compute the the eigenvalues and eigenvectors of the Cayley graph of \mathrm{B}(k). Indeed, the adjacency operator of the Cayley graph of \mathrm{B}(k) is just the multiplication operators

R_{\sum_{i=1}^k \tau_i},

so by Theorem 1 we immediately have that

R_{\sum_{i=1}^k \tau_i} \chi_\sigma = \langle \chi_\sigma,\sum\limits_{i=1}^k \tau_i \rangle \chi_\sigma, \quad \sigma \in \mathrm{B}(k).

Now, we have that the resulting eigenvalue is

\langle \chi_\sigma,\sum\limits_{i=1}^k \tau_i \rangle = \sum\limits_{i=1}^k \langle \chi_\sigma,\tau_i \rangle,

and this is a sum we can explicitly evaluate, because we explicitly computed the characters of \mathrm{B}(k) back in Lecture 7. In particular, what we know is that

\langle \chi_\sigma,\tau_i \rangle =\chi_\sigma(\tau_i)

is equal to -1 if \tau_i is a factor of \sigma, and is equal to +1 if \tau_i is not a factor of \sigma. Since we compute this scalar product for each and every generator \tau_i of \mathrm{B}(k), the sum has |\sigma| terms equal to -1, and k - |\sigma| terms equal to +1, where |\sigma| is the word norm of \sigma, or equivalently its distance to the identity permutation in the Cayley graph. So we get

\sum\limits_{i=1}^k \langle \chi_\sigma,\tau_i \rangle = k-2|\sigma|

as our explicit eigenvalue of the character \chi_\sigma under the action of the adjacency operator. Moreover, the number of elements \sigma \in \mathrm{B}(k) such that |\sigma|=j for a given 0 \leq j \leq k is simply {k \choose j}, since this just amounts to choosing j of the generators \tau_1,\dots,\tau_k of \mathrm{B}(k). So we can summarize our results as follows.

Theorem 2: The eigenvalues of Cayley graph of \mathrm{B}(k) are the numbers

k-2j, \quad j=0,\dots,k,

and the dimension of the eigenspace corresponding to k-2j is {k \choose j}.

Since all eigenvalues of the Cayley graph of \mathrm{B}(k) are given explicitly by Theorem 2, we are in a position to explicitly enumerate walks on this graph, using the spectral approach to counting walks discussed previously.

Problem 9.1: Show that the number W^r(\rho,\sigma) of r-step walks on \mathrm{B}(k) from \rho to \sigma is

W^r(\rho,\sigma) = \frac{1}{2^k} \sum\limits_{i=0}^k \sum\limits_{j=0}^{|\rho\sigma|} (-1)^j {|\rho\sigma| \choose j} {k-|\rho\sigma| \choose i-j} (k-2i)^r.

This material, and problem 9.1, have applications in coding theory which you might find interesting to look into.

Math 202C: Lecture 8

We are at the beginning of the theory of harmonic analysis on finite abelian groups, starting with the concrete example of the the rank k hypercube group, which we have chosen to define as the permutation group \mathrm{B}(k) generated by the disjoint transpositions

\tau_1=(1\ 2), \tau_2=(3\ 4), \dots, \tau_k = (2k-1\ 2k).

Typically, analysts think of harmonic analysis on \mathrm{B}(k) (or any other group) as the study of the Hilbert space \mathbb{C}^{\mathrm{B}(k)} of functions

a \colon \mathrm{B}(k) \longrightarrow \mathbb{C}

equipped with the scalar product

\langle a,b \rangle = \sum\limits_{\pi \in \mathrm{B}(k)} \overline{a(\pi)}b(\pi),

so that one has a canonical orthonormal basis \{\delta_\pi\} of \mathbb{C}^{\mathrm{B}(k)} consisting of indicator functions. This space has the structure of an algebra when equipped with the convolution product of functions,

[a*b](\pi) = \sum\limits_{\sigma \in \mathrm{B}(k)} a(\pi\sigma^{-1}) b(\sigma),

which is exactly what one gets from bilinear extension of group law in \mathrm{B}(k) lifted to the group element basis:

\delta_\rho * \delta_\sigma = \delta_{\rho\sigma}.

In particular, the convolution product on \mathbb{C}^{\mathrm{B}(k)} is quite different from the pointwise product. This is the right point of view to take when one is dealing with non-discrete groups (e.g. Lie groups), but for finite groups there is another way of writing things out which is often simpler and cleaner, and favored by algebraists.

The algebraists’ preferred Hilbert space \mathbb{C}\mathrm{B}(k) of formal \mathbb{C}-linear combinations of permutations \pi \in \mathrm{B}(k),

a = \sum\limits_{\pi \in \mathrm{B}(k)} a(\pi)\pi.

Such linear combinations are of course equivalent to functions on the group, and you can think of the linear combination a as a generating function which are equivalent to complex valued functions on \mathrm{B}(k). From this perspective the canonical orthonormal basis of \mathbb{C}\mathrm{B}(k) consists of the permutations \pi \in \mathrm{B}(k) themselves (rather than their indicator functions), which induces the equivalent scalar product,

\langle a,b \rangle = \sum\limits_{\pi \in \mathrm{B}(k)} \overline{\langle \pi, a\rangle} \langle \pi,b \rangle.

The convolution product on \mathbb{C}\mathrm{B}(k) is likewise replaced with the equivalent multiplication (written simply as juxtaposition)

ab = \sum\limits_{(\rho,\sigma) \in \mathrm{B}(k)^2} \langle \rho,a\rangle \langle \sigma,b\rangle \rho\sigma =\sum\limits_{\pi \in \mathrm{B}(k)} \left(\sum\limits_{\sigma \in \mathrm{B}(k)} \langle \pi\sigma^{-1},a\rangle \langle \sigma,b \rangle \right)\pi.

In short, \mathbb{C}^{\mathrm{B}(k)} and \mathbb{C}\mathrm{B}(k) are isomorphic both as Hilbert spaces and as algebras, so the choice to use one rather than the other is a matter only of preference. In practice, it is quite useful to be comfortable with both the analytic and the algebraic way of doing things.

Yet another way of repackaging everything is the following probably too-general perspective. Let us zoom back out to the very beginning, where we started with an arbitrary finite graph \Gamma. The algebraic way to understand this combinatorial object is to replace the vertex set V of \Gamma with the algebra \mathbb{C}\langle x_v \colon v \in V \rangle of polynomials in noncommuting variables x_v indexed by the vertices of v. In other words, this is the free algebra of rank |V|. This algebra is graded by degree, and we have a decomposition

\mathbb{C}\langle x_v \colon v \in V \rangle = \bigoplus_{d=0}^\infty \mathbb{C}^{(d)}\langle x_v \colon v \in V \rangle,

with the dth summand being the space of homogeneous polynomials of degree d. Observe that this vectors space is isomorphic to the space of complex-valued functions on V^d, i.e. the space of functions on d-tuples of vertices, or equivalently formal linear combinations of d-tuples of vertices. In particular, the linear component \mathbb{C}^{(1)}\langle x_v \colon v \in V\rangle is isomorphic to either/both \mathbb{C}^V and \mathbb{C}V, and the adjacency operator acts on this linear component to capture the edge structure of \Gamma. Indeed, the free algebra is isomorphic to the tensor algebra of the vector space of functions on the vertex set V. In the case where the vertex set of V of \Gamma is actually a group \mathrm{G}, we can mod out by the group law and consider the quotient

\mathbb{C}\langle x_g \colon g \in \mathrm{G} \rangle/\langle x_gx_h=x_{gh}\rangle

of the free algebra by the ideal corresponding to the group multiplication. This quotient consists solely of polynomials of degree 1, so functions on the group, or equivalently linear combinations of group elements, but now with the structure of an algebra.

Coming back to reality, the fact that \mathbb{C}\mathrm{B}(k) is an algebra and not just a vector space is reflected in the presence of a certain class of operators on the group algebra which are not present in the endomorphism algebra of a vector space devoid of multiplication.

Definition 1: Given b \in \mathbb{C}\mathrm{B}(k), the corresponding multilplication operator R_b\in \mathrm{End} \mathbb{C}\mathrm{B}(k) is defined by

R_ba = ab, \quad a \in \mathbb{C}\mathrm{B}(k).

We are using the letter “R” for multiplication operators because later on we will consider the group algebras of noncommutative groups, in which left and right multiplication do not necessarily coincide, and we have typically different left and right multiplication operators L_b and R_b. To put this the Math 202B way, the operator R_b is the image of b in the right regular representation of \mathbb{C}\mathrm{B}(k).

Note that multiplication operators corresponding to pure group elements (as opposed to linear combinations of group elements) are unitary operators on \mathbb{C}\mathrm{B}(k),

\langle \rho,R_\pi\sigma \rangle = \langle \rho,\sigma\pi \rangle = \langle \rho\pi^{-1},\sigma \rangle = \langle R_{\pi}^{-1} \rho,\sigma \rangle.

However, since we are working with the permutation group \mathrm{B}(k), in which all permutations are actually involutions, R_\pi is both unitary and selfadjoint, hence has eigenvalues \pm 1.

The importance of multiplication operators in our quest to understand the eigenvalues and eigenvectors of the Cayley graph of \mathrm{B}(k) as generated by the transpositions \tau_1,\dots,\tau_k is the following.

Proposition 1: The adjacency operator of the Cayley graph of \mathrm{B}_k is the multiplication operator R_t, where

t = \sum\limits_{i=1}^k \tau_i.

Proof: For any \rho \in \mathrm{B}(k), we have

R_t\rho = \sum\limits_{i=1}^k \rho\tau_i.

— Q.E.D.

In view of Proposition 1, the problem of understanding the eigenvalues and eigenvectors of the Cayley graph is that of understanding the eigenvalues and eigenvectors of

R_t = \sum\limits_{I=1}^k R_{\tau_i}.

To solve it, we will work out the spectral analysis of arbitrary multiplication operators on \mathrm{B}(k). The main tool in this endeavor is as second basis of the group algebra, which consists of special linear combinations \chi of group elements which have the property that

\langle \rho\sigma,\chi \rangle = \langle \rho,\chi \rangle \langle \sigma,\chi \rangle.

That is, \chi is the generating function of a group homomorphism

\mathrm{B}(k) \to \mathbb{C}^\times,

the multiplicative group of nonzero complex numbers. I like this way of stating things; in the analytic formalism, the words “character” and “homomorphism” mean exactly the same thing, a redundancy, whereas in the algebraic formalism “character” really means “generating function of a homomorphism.”

In Lecture 7, we explicitly worked out all homorphisms from \mathrm{B}(k) to \mathbb{C}^\times, showing that they have values in the subgroup generated by -1, and are moreover parameterized by the points of \mathrm{B}(k). We also showed that the characters are orthogonal,

\langle \chi_\pi,\chi_\sigma \rangle = 2^k[\pi=\sigma], \quad \pi,\sigma \in \mathrm{B}(k).

In particular, for any a \in \mathbb{C}\mathrm{B}(k), we have the character expansion

a = 2^{-k}\sum\limits_{\pi \in \mathrm{B}(k)} \langle \chi_\pi,a\rangle \chi_\pi.

The character expansion of a is also known as its Fourier expansion, with \langle \chi_\pi,a\rangle being the Fourier coefficients of a. The main reason that character expansions are “better” than expansions in the group basis is that the Fourier coefficients of a product ab are just the products of the corresponding Fourier coefficients of a and b.

The main reason that characters — which are generating functions of homomorphisms \mathrm{B}(k) \to \{-1,+1\} — are better than the original group elements is that their multiplication is extremely simple.

Theorem 1: For any \rho,\sigma \in \mathrm{B}(k), we have

\chi_\rho\chi_\sigma =0

if \rho \neq \sigma, and

\chi_\pi \chi_\pi = 2^k\chi_\pi.

Proof: Expanding in the permutation basis, we have

\chi_\rho\chi_\sigma = \sum\limits_{\pi_1 \in \mathrm{B}(k)} \left( \sum\limits_{\pi_2 \in \mathrm{B}(k)} \chi_\rho(\pi_1\pi_2^{-1}) \chi_\sigma(\pi_2) \right) \pi_1 = \sum\limits_{\pi_1 \in \mathrm{B}(k)} \left( \sum\limits_{\pi_2 \in \mathrm{B}(k)} \chi_\rho(\pi_1) \chi_\rho(\pi_2) \chi_\sigma(\pi_2) \right) \pi_1,

where we used that \chi_\rho is a homomorphism and \pi_2^{-1}=\pi_2. The result now follows from character orthogonality, which is Theorem 2 in Lecture 7.

— Q.E.D.

An immediate consequence is that the Fourier coefficients of a product are the products of the Fourier coefficients of the factors.

Corollary 1: For any a,b \in \mathbb{C}\mathrm{G}, we have

\langle \chi_\pi,ab\rangle = \langle \chi_\pi,a \rangle \langle \chi_\pi,b\rangle.

Proof: On one hand, the character expansion of the product ab is

ab = 2^{-k}\sum\limits_{\pi \in \mathrm{B}(k)} \langle \chi_\pi,ab\rangle \chi_\pi.

On the other hand, we compute the product a and b via their character expansions:

ab = \left( 2^{-k}\sum\limits_{\rho \in \mathrm{B}(k)} \langle \chi_\rho,a\rangle \chi_\rho\right)\left( 2^{-k}\sum\limits_{\sigma \in \mathrm{B}(k)} \langle \chi_\sigma,b\rangle \chi_\sigma\right) \\= 4^{-k} \sum\limits_{(\rho,\sigma) \in \mathrm{B}(k)^2}\langle \chi_\rho,a\rangle\langle \chi_\sigma,b\rangle \chi_\rho\chi_\sigma \\ = 2^{-k}\sum\limits_{\pi \in \mathrm{B}(k)} \langle\chi_\pi,a\rangle \langle \chi_\pi,b\rangle \chi_\pi,

where we applied Theorem 1 to get the final equality. Since the characters are linearly independent, the result follows.

—- Q.E.D.

In Lecture 9, we will apply the above to prove that M \in \mathrm{End} \mathbb{C}\mathrm{G} is a multiplication operator if and only if it acts diagonally in the character basis. We will also find the corresponding eigenvalues, and hence count walks on \mathrm{B}(k). In Lecture 10 we will then analyze convergence of random walk on \mathrm{B}(k) to equilibrium.

Math 202C: Lecture 7

Let the symmetric group \mathrm{S}(n) on \{1,\dots,n\} be identified with its right Cayley graph as generated by the conjugacy class of transpositions: two permutations \rho,\sigma \in \mathrm{S}(n) are joined by a single edge if and only if \sigma = \rho\tau for some transposition \tau \in \mathrm{S}(n). We equip \mathrm{S}(n) with the corresponding word norm, so that |\pi| is the minimal length of a factorization of \pi into transpositions, or equivalently the length of a geodesic path from the identity permutation to \pi in the Cayley graph.

Our current two-part goal is to compute the number of walks of given length between two specified vertices in \mathrm{S}(n), and to estimate the total variation distance between the distribution of simple random walk on \mathrm{S}(n) and its stationary distribution, which is the uniform measure U(\pi)=\frac{1}{n!} on \mathrm{S}(n).

We will start by asking these same questions about certain simple subgroups of \mathrm{S}(n). Let k be a positive integer such that 2k \leq n, and let \mathrm{B}(k) be the subgroup of \mathrm{S}(n) generated by k disjoint transpositions, say

\tau_1=(1\ 2),\ \tau_2=(3\ 4),\ \dots, \tau_k = (2k-1\ 2k).

Thus \mathrm{B}(k) is an abelian group of order 2^k, whose elements are all permutations of the form

\pi =\tau_1^{e_1} \dots \tau_k^{e_k}, \quad e_i \in \{0,1\}.

For example,

\mathrm{B}(2) = \{(1)(2)(3)(4), (1\ 2), (3\ 4),(1\ 2)(3\ 4)\}.

We identify \mathrm{B}(k) with its Cayley graph, which is an induced subgraph of the Cayley graph of \mathrm{S}(n) often referred to as the k-dimensional hypercube graph.

Problem 7.1: Prove that \mathrm{B}(k) is isomorphic to the group \mathbb{Z}_2^k of bitstrings of length k, and that under this isomorphism the word norm |\pi| is given by the number of nonzero bits.

There are two ways to implement the linear algebraic approach to the study of (random) walks on the hypercube \mathrm{B}(k). The first employs the Hilbert space

\mathbb{C}\mathrm{B}(k) = \left\{\sum\limits_{\pi \in \mathrm{B}(k)} a_\pi \pi \colon a_\pi \in \mathbb{C}\right\}

of formal \mathbb{C}-linear combinations of permutations \pi \in \mathrm{B}(k), with the scalar product defined by declaring the permutation basis

\{\pi \colon \pi \in \mathrm{B}(k)\}

to be orthonormal. This is the space favored by algebraists. Analysts prefer to use the space

\mathbb{C}^{\mathrm{B}(k)} = \left\{ f \colon \mathrm{B}(k) \to \mathbb{C} \right\}

of \mathbb{C}-valued functions of permutations \pi \in \mathrm{B}(k)\}, with the scalar product defined by declaring the indicator function basis

\{ \delta_\pi(\sigma) := [\pi=\sigma] \colon \pi \in \mathrm{B}(k)\}

to be orthonormal (the square bracket above is the Iverson bracket). These two spaces are isomorphic, with a Hilbert space isomorphism

\mathbb{C}^{\mathrm{B}(k)} \longrightarrow \mathbb{C}\mathrm{B}(k)

given by

f \mapsto \sum\limits_{\pi \in \mathrm{B}(k)} f(\pi) \pi.

Because \mathrm{B}(k) is not just a set, but a group, the corresponding Hilbert spaces \mathbb{C}\mathrm{B}(k) and \mathbb{C}^{\mathrm{B}(k)} are algebras. The multiplication in \mathbb{C}\mathrm{B}(k) is defined by

\left( \sum\limits_{\rho \in \mathrm{B}(k)} a_\rho \rho \right)\left( \sum\limits_{\sigma \in \mathrm{B}(k)} b_\sigma \sigma \right) = \sum\limits_{(\rho,\sigma) \in \mathrm{B}(k)^2} a_\rho b_\sigma \rho\sigma = \sum\limits_{\pi \in \mathrm{B}(k)} \left( \sum\limits_{\sigma \in \mathrm{B}(k)} a_{\pi\sigma^{-1}} b_\sigma\right)\pi.

The multiplication in \mathbb{C}^{\mathrm{B}(k)} is defined by

\left( f*g \right)(\pi) = \sum\limits_{\sigma \in \mathrm{B}(k)} f(\pi\sigma^{-1}) g(\sigma).

This latter multiplication is typically referred to as convolution because of its formal similarity to convolution of functions on the real line,

(f*g)(x) = \int_{-\infty}^{\infty} f(x-t)g(t)\mathrm{d}t,

and indeed this similarity is part of a meaningful analogy which we are just starting to discuss. It is clear that the isomorphism \mathbb{C}^{\mathrm{B}(k)} \to \mathbb{C}\mathrm{B}(k) discussed above is an algebra isomorphism, i.e. it intertwines the two product operations.

In the algebraic formulation, the adjacency operator A \in \mathbb{C}\mathrm{B}(k) is defined by its action on the permutation basis,

A\pi = \sum\limits_{i=1}^k \pi\tau_i.

In the functional implementation, the adjacency operator A \in \mathbb{C}^{\mathrm{B}(k)} is defined by its action on the indicator function basis,

A\delta_{\pi} = \sum\limits_{i=1}^k \delta_{\pi\tau_i}.

Both implementations of the linearization of \mathrm{B}(k) — meaning ways of embedding this group into a vector space — are useful, and it is advisable to keep both in mind. In what follows, the functional implementation is somewhat more natural, but we will move back and forth between the two freely.

Intuition leads us to believe that the keys to the spectrum of A must somehow come from the group structure of \mathrm{B}(k) — not just from linear algebra, but from the interaction of linear algebra and group theory. Indeed, inside the linearization \mathbb{C}^{\mathrm{B}(k)} as a function space, there is a natural family of functions linked to the lingering group structure.

Definition 1: The characters of \mathrm{B}(k) are the elements \chi of \mathbb{C}^{\mathrm{B}(k)} which are group homomorphisms

\chi \colon \mathrm{B}(k) \longrightarrow \mathbb{C}^\times

from the hypercube to the multiplicative group of nonzero complex numbers. The set of characters of \mathrm{B}(k) is called the dual of \mathrm{B}(k), and denoted \widehat{\mathrm{B}(k)}.

Much more generally, for any finite abelian group \mathrm{G}, one defines its Pontryagin dual \widehat{\mathrm{G}} to be the set of all group homomorphisms

\chi \colon \mathrm{G} \longrightarrow \mathbb{C}^\times,

these being referred to as the characters of \mathrm{G}. The Pontryagin dual can be defined for much more general abelian groups, but in this course we will almost exclusively deal with finite groups.

Returning to the case \mathrm{G}=\mathrm{B}(k) at hand, let us note that Definition 1 is actually even simpler and more natural than it appears, since all elements of \mathrm{B}(k) are permutations of order 2, i.e. involutions. This means that we have the constraint

\chi(\pi^2) = \chi(\pi)^2= 1,

so that in fact the values of any character \chi of \mathrm{B}(k) must be a square root of 1, i.e. either 1 or -1. From this it follows that the number of characters of \mathrm{B}(k) must be 2^k, the order of \mathrm{B}(k), since each character \chi is determined by the values

\chi(\tau_i), \quad 1 \leq i \leq k,

and we have 2 choices, \pm 1, for each of these values. This suggests that we can in fact parameterize the characters of \mathrm{B}(k) by the elements of \mathrm{B}(k), which offers the beginnings of an explanation as to why \widehat{\mathrm{B}(k)} is called the dual of \mathrm{B}(k).

Theorem 1: The characters of \mathrm{B}(k) are precisely the functions defined by

\chi_{\tau_1^{x_1} \dots \tau_k^{x_k}}(\tau_1^{y_1} \dots \tau_k^{y_k}) = (-1)^{x_1y_1+\dots+x_ky_k}.

Proof: Clearly, the number of these putative characters is 2^k, which we know to be the total number of characters, so that it is sufficient to check that they are in fact group homomorphisms. But this is clear:

\chi_{\tau_1^{x_1} \dots \tau_k^{x_k}}(\tau_1^{y_1} \dots \tau_k^{y_k}) = (-1)^{x_1y_1+\dots+x_ky_k} = (-1)^{x_1y_1} \dots (-1)^{x_ky_k} = \chi_{\tau_1^{x_1} \dots \tau_k^{x_k}}(\tau_1^{y_1}) \dots \chi_{\tau_1^{x_1} \dots \tau_k^{x_k}}(\tau_k^{y_k}).

— Q.E.D.

Problem 7.2: Show that the multiplication defined by

\chi_{\rho}\chi_\sigma := \chi_{\rho\sigma}

gives \widehat{\mathrm{B}(k)} the structure of a group, and that this dual group is isomorphic to \mathrm{B}(k).

Theorem 2: The set \widehat{\mathrm{B}(k)} is an orthogonal basis of \mathbb{C}^{\mathrm{B}(k)}.

Proof: Since the cardinality of \widehat{\mathrm{B}(k)} equals the cardinality of \mathrm{B}(k), which is in turn the dimension of \mathbb{C}^{\mathrm{B}(k)}, it is sufficient to prove orthogonality. Let \rho,\sigma \in \mathrm{B}(k) be arbitrary. Then

\langle \chi_\rho,\chi_\sigma \rangle = \sum\limits_{\pi \in \mathrm{B}(k)} \chi_\rho(\pi)\chi_\sigma(\pi) = \sum\limits_{\pi \in \mathrm{B}(k)} \chi_{\rho\sigma}(\pi) = \langle \chi_{\rho\sigma},\chi_\iota\rangle,

where we have used Problem 7.1 and \iota \in \mathrm{B}(k) denotes the identity permutation. So it is sufficient to consider scalar products of the form \langle \chi_\rho,\chi_\iota\rangle. In the case \rho=\iota, we have

\langle \chi_\iota,\chi_\iota \rangle = \sum\limits_{\pi \in \mathrm{B}(k)} \chi_\iota(\pi)\chi_\iota(\pi) = \sum\limits_{\pi \in \mathrm{B}(k)} 1 = 2^k.

If \rho \neq \iota, then there exists \sigma \in \mathrm{B}(k) such that \chi_\rho(\sigma)=-1. We then have

- \langle \chi_\rho,\chi_\iota \rangle = \chi_\rho(\sigma) \sum\limits_{\pi \in \mathrm{B}(k)} \chi_\rho(\pi) = \sum\limits_{\pi \in \mathrm{B}(k)} \chi_\rho(\sigma\pi) = \langle \chi_\rho,\chi_\iota \rangle,

whence \langle \chi_\rho,\chi_\iota \rangle=0.

— Q.E.D.

Problem 7.3: Prove the dual orthogonality relation

\sum\limits_{\pi \in \mathrm{B}(k)} \chi_\pi(\rho)\chi_\pi(\sigma)= 2^k[\rho=\sigma].

In Lecture 8, we will link the characters of \mathrm{B}(k) to the adjacency operator of its Cayley graph.

Math 202C: Lecture 6

In Lecture 5, we worked out the Markov limit theorem which says that simple random walk on a connected, non-bipartite graph will converge to the stationary distribution on its vertices in the large time limit. This is an important general result, but it gives no understanding of how quickly this convergence occurs. Let us try to do this for \mathrm{K}(n), the complete graph on \{1,\dots,n\}.

Intuitively, because \mathrm{K}(n) is so highly connected, we expect convergence of simple random walk on this graph to the stationary measure to be very rapid. Indeed, at time r=1, a particle located at vertex 1 at time r=0 which performs simple random has equal probability of being at any vertex except 1, where it cannot be:

P^1(1,1) = 0,\ P^1(1,2)=\frac{1}{n-1},\dots,\ P^1(1,n) = \frac{1}{n-1}.

This seems close to the stationary distribution for simple random walk on \mathrm{K}(n), which according to the Markov limit theorem is uniform distribution

U(1) = \frac{1}{n},\ U(2) = \frac{1}{n},\dots,\ U(n)=\frac{1}{n}

on \{1,\dots,n\}, except for the aberration P^1(1,1)=0. However, this aberration continues indefinitely: there cannot be any finite time r such that

P^r(1,j)=U^r(j), \quad 1 \leq j \leq n,

because the particle has zero probability of being located at vertex j conditional on the event that it was located at vertex j at time r-1, and this event has positive probability. So, what really do we mean by “P^r gets close to U fast?”

Definition 1: If P,Q are probability measures on \{1,\dots,n\}, their total variation distance is

\mathrm{d}_{TV}(P,Q) = \frac{1}{2}\sum\limits_{i=1}^n |P(i)-Q(i)|.

This is quite an intuitive notion: the total variation distance simply sums up the discrepancy between P and Q over all points, giving a quantitative meaning to the total discrepancy between these measures. In particular, the maximum total variation occurs when P places all its mass at a single point, and Q places all its mass at a single different point; we then have

\mathrm{d}_{TV}(P,Q) =1.

We now have a clearly defined objective: estimate the total variation distance between the distribution of simple random walk on \mathrm{K}(n) at time r, and its stationary (uniform) distribution. Following the same pedagogical template as in our first few lectures, let us emphasize that this can be done in a completely elementary way, since we were able to obtain explicit formulas for counting walks on \mathrm{K}(n) just by being clever. Back in Lecture 1, we showed that the number of r step loops on \mathrm{K}(n) based at any vertex is

W^r(1,1) = \frac{1}{n}(n-1)^r +\frac{(-1)^r}{n}(n-1).

Now, since \mathrm{K}(n) is (n-1)-regular, the total number of r-step walks departing from 1 is (n-1)^r, and consequently the probability that simple random walk on \mathrm{K}(n) returns to its starting point at time r is

P^r(1,1) = \frac{1}{(n-1)^r}W^r(1,1) = \frac{1}{n} +\frac{(-1)^r}{n(n-1)^{r-1}},

from which we see directly that

\lim\limits_{r\to \infty} P^r(1,1) = \frac{1}{n},

exactly as the Markov limit theorem predicts. But actually, since we have simple explicit formulas, we can see more, namely we have the error estimate

\left| P^r(1,1)-\frac{1}{n}\right| = \frac{1}{n(n-1)^{r-1}},

which is actually an identity (the best kind of estimate). As for the number of r-step walks from 1 to 2, this is given by the complementary formula

W^r(1,2) = \frac{1}{n}(n-1)^r -\frac{(-1)^r}{n},

whence

P^r(1,2) = \frac{1}{(n-1)^r}W^r(1,2) = \frac{1}{n} -\frac{(-1)^r}{n(n-1)^r}.

So again we immediately obtain the limit

\lim\limits_{r \to \infty} P^r(1,2) = \frac{1}{n},

along with the discrepancy formula

\left| P^r(1,2)-\frac{1}{n}\right| = \frac{1}{n(n-1)^r}.

All told, our maneuvering above comes down to the exact formula

\mathrm{d}_{TV}(P^r,U) = \frac{1}{n(n-1)^{r-1}}

for the total variation distance between the distribution of simple random walk on \mathrm{K}(n) at time r and the uniform distribution, which is its r \to \infty limit. Thus we know not just that P^r converges to r, but exactly how fast, as an explicit function both of spatial parameter n and the time parameter r.

Now let us reproduce the above formulas algebraically. We form the Hilbert space \mathcal{H} with orthonormal basis \{\mathbf{e}_1,\dots,\mathbf{e}_n\} labeled by the vertices of \mathrm{K}(n). From this point of view, a probability measure P on \{1,\dots,n\} is replaced with the vector \mathbf{p} \in \mathcal{H} defined by

\mathbf{p} = \sum\limits_{I=1}^n P(i) \mathbf{e}_i.

Note that even though P is probability measure on \{1,\dots,n\}, \mathbf{p} is not a unit vector with respect to the Hilbert space norm (or \ell^2) norm,

\|\mathbf{v}\|_2= \sqrt{\langle \mathbf{v},\mathbf{v} \rangle}.

Rather, it is a unit vector with respect to the \ell^1-norm, which is defined by

\|\mathbf{v}\|_1 = \sum\limits_{i=1}^n |\langle \mathbf{e}_i,\mathbf{v}\rangle|,

and it is this norm that corresponds to the total variation distance: for probability measures P,Q on \{1,\dots,n\} with corresponding probability vectors \mathbf{p},\mathbf{q} \in \mathcal{H}, we have

\mathrm{d}_{TV}(P,Q) = \frac{1}{2}\|\mathbf{p}-\mathbf{q}\|_1.

So, we would like to compute \|\mathbf{p}_r-\mathbf{u}\|_1, where

\mathbf{p}_r = \sum\limits_{j=1}^n P^r(1,j) \mathbf{e}_j

is the probability vector corresponding to the distribution of simple random walk on \mathrm{K}(n) at time r, and

\mathbf{u} = \sum\limits_{j=1}^n \frac{1}{n} \mathbf{e}_j

is the probability vector corresponding to the uniform distribution.

The adjacency operator A \in \mathrm{End}\mathcal{H} is defined by its action in the vertex basis,

A\mathbf{e}_j = \sum\limits_{\substack{i=1 \\ i\neq j}}^n \mathbf{e}_i, \quad 1 \leq j \leq n,

and since \mathrm{K}(n) is (n-1)-regular the transition operator for simple random walk on \mathrm{K}(n) is just T=\frac{1}{n-1}A. By inspection, we have

A\mathbf{u} = (n-1)\mathbf{u}

so that \mathbf{u} is an eigenvector of A with eigenvalue (n-1). Let \mathcal{U} be the line in \mathcal{H} spanned by \mathbf{u}. Since A is selfadjoint, \mathcal{U}^\perp is an invariant subspace of A, and in fact, as you saw in last week’s problem set, A acts in \mathcal{U}^\perp as multiplication by -1. Thus, the adjacency operator decomposes as

A = (n-1)P + (-1)Q,

where P,Q \in \mathrm{End}\mathcal{H} are the orthogonal projections of \mathcal{H} onto the complementary subspaces \mathcal{U} and \mathcal{U}^\perp, respectively. We thus have

A^r=(n-1)^rP + (-1)^rQ,

and hence

T^r= P +\frac{(-1)^r}{(n-1)^r}Q.

We now have the formula

\|\mathbf{p}_r-\mathbf{u}\|_1 = \|T^r\mathbf{e}_1-\mathbf{u}\|_1,

so it remains only to compute T^r\mathbf{e}_1 in order to get the total variation distance we seek. Now this is just

T^r\mathbf{e}_1 = P\mathbf{e}_1 +\frac{(-1)^r}{(n-1)^r}Q\mathbf{e}_1,

where P\mathbf{e}_1 is the orthogonal projection of \mathbf{e}_1 onto \mathcal{U} and Q\mathbf{e}_1 is the orthogonal projection of \mathbf{e}_1 onto \mathcal{U}^\perp. By basic linear algebra,

P \mathbf{e}_1 = \langle \mathbf{e}_1,\mathbf{u} \rangle \mathbf{u} = \frac{1}{n}\mathbf{u},

and hence for the complementary projection we have

Q\mathbf{e}_1 = (I-P)\mathbf{e}_1 = \mathbf{e}_1 -\frac{1}{n}\mathbf{u}.

This allows to explicitly compute the 1-norm of T^r\mathbf{e}_1-\mathbf{u}, though I have to admit I stopped short of doing the required simplifications to check that we get the same formulas as by the barehands method, up to a factor of 2.

Problem 6.1: Analyze simple random walk on \overline{\mathrm{K}}(n), the graph obtained by adding a loop at each vertex of \mathrm{K}(n). This is exactly the same thing as “lazy” simple random walk on \mathrm{K}(n), where the particle can either jump to an adjacent vertex or stay put.

The main message of this lecture is the following. We would like to analyze random walk on the Cayley graph of the symmetric group \mathrm{S}(n) as generated by the conjugacy class of transpositions. Of course, being a Cayley graph, this is a regular graph and hence the Markov limit theorem tells us that the distribution of the walk tends to uniform in the long-time limit. But in real world terms, this is just the statement that “after infinitely many shuffles, a deck of cards is randomized,” and now it is clear that this is not a useful statement: what we really want to be able to calculate is the total variation distance between the distribution of the walk at time r and uniform. This will allow us to quantitatively state how many shuffles are required to get within a stipulated distance of the uniform measure. This endeavor will require everything you learned in Math 202B, and will (hopefully) help to elucidate that material. In the next lecture, we will start down this road by analyzing the same question for some fairly simple abelian subgroups of \mathrm{S}(n).

Math 202C: Lecture 5

We continue with our study of random walks on finite graphs via linear algebra. Let \mathrm{G} be a graph on \{1,\dots,n\}, possibly with loops and/or multiple edges. As in Lecture 4, let P^r(i,j) denote the probability that simple random walk on \mathrm{G} starting from vertex i is located at vertex j at time r. In this Lecture, we will use the linear algebra setup from Lecture 4 to analyze the r \to \infty asymptotic behavior of the probability P^r(i,j), which corresponds to the “long time behavior” of simple random walk on \mathrm{G}.

The linear algebraic setup is as follows. Let \mathcal{H} be a Hilbert space with orthonormal basis \{\mathbf{e}_1,\dots,\mathbf{e}_n\} indexed by the vertices of \mathrm{G}. In the language of quantum mechanics, by replacing the vertex set of \mathrm{G} with the Hilbert space \mathcal{H} we allow ourselves to consider not just vertices, but “superpositions” of vertices. Recall from Lecture 4 that the transition operator T \in \mathrm{End}\mathcal{H} for simple random walk on \mathrm{G} is defined by its action on the vertex basis as

T\mathbf{e}_i = \sum\limits_{j=1}^n \frac{m_{ij}}{d_i} \mathbf{e}_j,

where m_{ij} is the number of edges of \mathrm{G} incident to both i and j, and d_i=\sum_{j=1}^n m_{ij} is the total number of edges incident to i. In Lecture 4, we showed that

P^r(i,j) = \langle T^r\mathbf{e}_i,\mathbf{e}_j \rangle,

so that understanding the r \to \infty asymptotics of the probability P^r(i,j) is equivalent to understanding the r \to \infty asymptotics of the scalar product \langle T^r\mathbf{e}_i,\mathbf{e}_j \rangle. We would like to accomplish this via spectral analysis of T.

We adopt the same assumptions as in Lecture 4: \mathrm{G} is a connected graph on \{1,\dots,n\} and n \geq 2, so that in particular every vertex has strictly positive degree. Under this assumption, we showed previously that, although T need not be selfadjoint, or even normal, the “symmetrize” transition operator D^{-1}TD is selfadjoint, with D \in \mathrm{GL}(\mathcal{H}) being the operator that acts diagonally in the vertex basis according to

D\mathbf{e}_j = \sqrt{d_j}\mathbf{e}_j.

We may thus invoke the spectral theorem to assert that

D^{-1}TD = \lambda_1P_1 + \dots + \lambda_mP_m,

where

\lambda_1 >\dots > \lambda_m,

are the distinct eigenvalues of the selfadjoint operator D^{-1}TD, with P_i \in \mathrm{End}\mathcal{H} the orthogonal projection of \mathcal{H} onto the \lambda_i-eigenspace. Because the P_i‘s are orthogonal projections, we have

D^{-1}T^rD = \lambda_1^rP_1 + \dots + \lambda_m^rP_m,

so that for the rth power of the transition operator we have the decomposition

T^r = \lambda_1^r DP_1D^{-1} + \dots + \lambda_m^r DP_mD^{-1}.

So far everything we have said is valid for an arbitrary operator in \mathrm{End}\mathcal{H} whose orbit under the conjugation action of \mathrm{GL}(\mathcal{H}) contains a selfadjoint operator. To go further we naturally have to reference some of the particular features of the transition operator T.

Theorem 1: The largest eigenvalue of T is \lambda_1=1.

Proof: Let \|\cdot\|_1 be the 1-norm on \mathcal{H} corresponding to the basis \mathbf{e}_1,\dots,\mathbf{e}_n, i.e.

\|\mathbf{v}\|_1 = \sum\limits_{j=1}^n |\langle \mathbf{v},\mathbf{e}_j \rangle|.

We claim that T has norm at most 1 in the corresponding operator norm on \mathrm{End}\mathcal{H}, i.e.

\|T\mathbf{v}\|_1 \leq \|\mathbf{v}\|_1 \quad \forall \mathbf{v} \in \mathcal{H}.

Indeed, we have

\|T\mathbf{v}\|_1 = \sum\limits_{j=1}^n |\langle S\mathbf{v},\mathbf{e}_j\rangle| = \sum\limits_{j=1}^n \left| \left\langle \sum\limits_{i=1}^n \langle \mathbf{e}_i,\mathbf{v}\rangle T\mathbf{e}_i,\mathbf{e}_j \right\rangle \right| \leq \sum\limits_{i=1}^n|\langle \mathbf{v},\mathbf{e}_i\rangle|\sum\limits_{j=1}^n |\langle T\mathbf{e}_i,\mathbf{e}_j\rangle|,

and the terminal double sum is exactly \|\mathbf{v}\|_1, since

\sum\limits_{j=1}^n \langle T\mathbf{e}_i,\mathbf{e}_j \rangle = \sum\limits_{j=1}^n \frac{m_{ij}}{d_i} = \frac{d_i}{d_i}=1.

It follows that any eigenvalue of T has modulus at most 1.

Now we exhibit an eigenvector of T with eigenvalue 1. Consider the vector

\mathbf{s} = \sum\limits_{i=1}^n \frac{d_i}{D}\mathbf{e}_i,

where

D=\sum\limits_{i=1}^n d_i

is the sum of the degrees of all vertices of \mathrm{G}, which is equal to twice the number of edges in \mathrm{G} less the number of loops. We then have

T\mathbf{s} = \sum\limits_{i=1}^n \frac{d_i}{D}T\mathbf{e}_i = \sum\limits_{i=1}^n \frac{d_i}{D}\sum\limits_{j=1}^n \frac{m_{ij}}{d_i} \mathbf{e}_j = \sum\limits_{j=1}^n \frac{d_j}{D}\mathbf{e}_j = \mathbf{s}.

— Q.E.D.

The eigenvector \mathbf{s} above is called the stationary vector for simple random walk on \mathrm{G}, and it encodes the probability measure on the vertices of \mathrm{G} which assigns mass \frac{d_i}{D} to vertex i. Note that, if \mathrm{G} is a simple graph with constant vertex degree equal to d, which is always the case for Cayley graphs corresponding to a symmetric generating set, then the stationary vector

\mathbf{s} = \sum\limits_{i=1}^n \frac{d}{dn}\mathbf{e}_i = \sum\limits_{i=1}^n \frac{1}{n}\mathbf{e}_i

corresponds to uniform probability measure on \{1,\dots,n\}.

It is important to note that Theorem 1 does not preclude the possibility that the smallest eigenvalue of the transition operator T could also have maximum modulus — it could well be that \lambda_m=-1. However, this can only be the case if \mathrm{G} is bipartite (perhaps I should fill in a proof of this later, but if I don’t you can find one in any text on spectral graph theory). Therefore, we have the following.

Corollary 1: If \mathrm{G} is not bipartite, then

\lim\limits_{r \to \infty} \|T^r-DP_1D^{-1}\| = 0,

where \|\cdot\| is any norm on \mathrm{End}\mathcal{H}.

Proof: Let \|\cdot\| be an arbitrary norm on \mathrm{End}\mathcal{H} (indeed, since \mathrm{End}\mathcal{H}\} is finite-dimensional, all norms are equivalent). Then for any r we have that

\|T^r-\lambda_1DP_1D^{-1}\| \leq \sum\limits_{i=2}^m |\lambda_i|^r \|DP_iD^{-1}\|,

by the triangle inequality. But since \lambda_1=1 and |\lambda_i|<1 for i > 1, the statement follows from the squeeze theorem.

— Q.E.D.

At this point we can finally say something about the probability P^r(i,j) = \langle T^r\mathbf{e}_i,\mathbf{e}_j \rangle that simple random walk on \mathrm{G} starting at i arrives at j at time r: for \mathrm{G} a connected non-bipartite graph on n \geq 2 vertices, the limit

t(i,j) = \lim\limits_{r \to \infty} P^r(i,j)

exists. Note that the role of bipartiteness here is quite clear from a combinatorial perspective: if \mathrm{G} is bipartite, then with probability one simple random walk started at i is located in the color class of i at even times, and in the opposite color class at odd times, so that the above limit obviously does not exist.

Continuing on under the assumption that \mathrm{G} is not-bipartite, let us define a vector \mathbf{t}_i \in \mathcal{H} by

\mathbf{t}_i = \sum\limits_{j=1}^n t(i,j) \mathbf{e}_j,

which encodes the limit distribution of simple random walk on \mathrm{G} started at i.

Theorem 2: For any 1 \leq i \leq n, the limit vector \mathbf{t}_i is an eigenvector of T with eigenvalue 1.

Proof: We have

T\mathbf{t}_i = T\lim\limits_{r \to \infty} T^r \mathbf{e}_i = \lim\limits_{r \to \infty} T^{r+1} \mathbf{e}_i = \mathbf{t}_i.

— Q.E.D.

Theorem (Markov limit theorem): For \mathrm{G} a connected non-bipartite graph on \{1,\dots,n\}, n \geq 2, we have

\mathbf{t}_1=\mathbf{t}_2=\dots = \mathbf{t}_n = \mathbf{s}.

Proof: The statement follows if we can show that \lambda_1=1 is a simple eigenvalue of T, i.e. that the \lambda_1-eigenspace is a line. Indeed, all the vectors \mathbf{t}_i as well as \mathbf{s} have nonnegative coordinates summing to 1 relative to the vertex basis, so if any of these is a scalar multiple of another then the scalar in question must be 1.

The proof that the top eigenspace is indeed one-dimensional uses a bit of a deus ex machina, namely the Perron-Frobenius theorem. Look up this theorem, and complete the proof!

— Q.E.D.

Math 202C: Lecture 4

In this Lecture and the next, we will discuss random walks on graphs. This is a randomized version of the discussion we had last week: we ask not for the number W^r(i,j) of r-step walks between two specified vertices i,j of a given graph \mathrm{G} — possibly with loops and/or multiple edges — on \{1,\dots,n\}, but rather for the probability P^r(i,j) that a particle initially situated at i which, at each instant of discrete time, jumps randomly to a neighboring vertex, with equal probability of jumping along any edge, will arrive at j at time r. We set P^0(i,j) := \delta_{ij}. Thus for each fixed vertex i and each fixed r \in \mathbb{Z}_{\geq 0}, we have a probability mass function P^r(i,\cdot) on the vertex set \{1,\dots,n\}, meaning that that

\sum\limits_{j=1}^n P^r(i,j)= 1.

This probability mass function in turn defines a probability measure on subsets S \subseteq \{1,\dots,n\} according to

P^r(i,S) = \sum\limits_{j \in S} P^r(i,j).

Thus P^r(i,S) is the probability that the particle is located at some vertex in the set S at time r.

The number P^r(i,j) is a probabilistic version of the combinatorial number W^r(i,j), and the relationship between these two quantities is not totally straightforward.

Problem 4.1: Prove that W^r(i,j) = W^r(j,i), and show by example that sometimes P^r(i,j) \neq P^r(j,i).

As the above example shows, random walk on \mathrm{G} departing from a specified vertex i doesn’t have as much symmetry as its non-random version; this is because it explores all walks beginning at i, whereas W^r(i,j) counts walks with specified boundary conditions. Indeed, the precise relationship between the numbers P^r(i,j) and W^r(i,j) is

P^r(i,j) = \begin{cases} 0, \text{ if } r < \mathrm{d}(i,j) \\ \frac{W^r(i,j)}{\sum_{j=1}^n W^r(i,j)}, \text{ if } r \geq \mathrm{d}(i,j) \end{cases},

where \mathrm{d}(i,j) is the length of a geodesic path from i to j in \mathrm{G}. One could also study random walk on \mathrm{G} started at i and “conditioned” to arrive at j after r steps, which means studying a uniformly random sample from the set of walks counted by W^r(i,j), but this is quite different from simple random walk on \mathrm{G}. Let us note that, in the important special case where \mathrm{G} is a kregular graph (see below for the definition of vertex degree), one has the straightforward relation

P^r(i,j) = \frac{1}{k^r}W^r(i,j),

so that P^r(i,j) = P^r(j,i) for regular graphs.

Problem 4.2: Compute P^r(i,j) for all values of i,j,r in the case that \mathrm{G}=\mathrm{K}(n) is the complete graph on \{1,\dots,n\}.

We will approach the study of random walks on graphs from a linear algebraic perspective, as we did with the enumeration of non-random walks. Let \mathrm{G} be a graph on \{1,\dots,n\}, where n \geq 2. We will assume that \mathrm{G} is connected, meaning that there exists at least one walk between every pair of vertices in \mathrm{G}. Employing the same construction as in the previous lectures, let \mathcal{H}=\mathcal{H}(\mathrm{G}) be a Hilbert space with orthonormal basis \{\mathbf{e}_1,\dots,\mathbf{e}_n\} labeled by the vertices of \mathrm{G}. We define the transition operator T \in \mathrm{End}\mathcal{H} by its action on the vertex basis, which is as follows:

T\mathbf{e}_i = \sum\limits_{j=1}^n \frac{m_{ij}}{d_i}\mathbf{e}_j,

where m_{ij} is the number of edges joining i to j in \mathrm{G}, and d_i is the degree of j, i.e. the total number of distinct edges incident to the vertex i (a loop contributes 1 to this count). The meaning of the ratio \frac{m_{ij}}{d_i} is that it is a conditional probability: it is the probability that the particle is located at vertex j at a specified time instant, given that it was located at vertex i at immediately prior to this.

The connection between random walks on \mathrm{G} and the transition operator T is conceptually quite similar to the relationship between ordinary (i.e. non-random) walks on \mathrm{G} and the adjacency operator A, as given in Theorem 1 of Lecture 2, but there are some important differences. First, as is clear from the above discussion, the transition operator T is not necessarily selfadjoint, so one cannot invoke the spectral theorem to immediately conclude that it is diagonalizable (though this is the case, as proved below). Another difference is that, in the random walk setting, we would like to allow for the possibility that the point of departure of the walk (i.e. the location of the particle at time zero) is itself random. To encode this algebraically, we consider a vector \mathbf{p} \in \mathcal{H} such that

p_i:=\langle \mathbf{e}_i,\mathbf{p} \rangle \geq 0

for all \in \{1,\dots,n\} and

\sum\limits_{i=1}^n p_i =1.

The vector \mathbf{p} represents the random location of the particle at time zero; note that taking \mathbf{p} to be one of the vectors in the vertex basis \{\mathbf{e}_1,\dots,\mathbf{e}_n\} corresponds to the situation where the initial position of the particle is deterministic. For each r \in \mathbb{Z}_{\geq 0} and i \in \{1,\dots,n\}, let P^r(\mathbf{p},j) denote the probability that a particle randomly evolving on \mathrm{G} (in the manner described above) is located at vertex j at time r jumps, given that its position at time zero was the vertex i with probability \langle \mathbf{e}_i,\mathbf{p} \rangle. We set P^0(\mathbf{p},\mathbf{e}_j) := p_j. In the case that \mathbf{p}=\mathbf{e}_i is a vector in the vertex basis, we default to our old notation and write P^r(i,j) = P^r(\mathbf{p},j).

Theorem 4.1: For each r \in \mathbb{Z}_{\geq 0}, we have

P^r(\mathbf{p},j) = \langle T^r\mathbf{p}, \mathbf{e}_j \rangle.

Proof: The proof is by induction on r. The case r=0 is immediate. For r \in \mathbb{N}, by the induction hypothesis we have

T^{r-1}\mathbf{p} = \sum\limits_{i=1}^n P^{r-1}(\mathbf{p},i)\mathbf{e}_i.

Thus

T\mathbf{p} =\sum\limits_{i=1}^n P^{r-1}(\mathbf{p},i)T\mathbf{e}_i = \sum\limits_{i=1}^n P^{r-1}(\mathbf{p},i) \sum\limits_{j=1}^n \frac{m_{ij}}{d_i} \mathbf{e}_j = \sum\limits_{j=1}^n \left( \sum\limits_{i=1}^n P^{r-1}(\mathbf{p},i) \frac{m_{ij}}{d_i} \right)\mathbf{e}_j,

so that

\langle T^r\mathbf{p},\mathbf{e}_j \rangle = \sum\limits_{i=1}^n P^{r-1}(\mathbf{p},i) \frac{m_{ij}}{d_i}.

The ith term in this sum is the conditional probability that a particle performing simple random walk with initial distribution \mathbf{p} in \mathrm{G} is located at j at time r, given that it is located at i at time r-1; summing over i gives the unconditional probability of this event.

— Q.E.D.

Theorem 4.1 means that we can use the transition operator to analyze simple random walk on \mathrm{G} in much the same way as we used the adjacency operator — provided that we can diagonalize it. We conclude this lecture by showing that this is at least theoretically possible.

Theorem 4.2: There exists a basis of \mathcal{H} consisting of eigenvectors of the transition operator T, and moreover every eigenvalue of T is real.

Proof: Let D \in \mathrm{End}\mathcal{H} be the operator defined by

D\mathbf{e}_j = \sqrt{d}_j\mathbf{e}_j.

Since by hypothesis \mathrm{G} is connected, D \in \mathrm{GL}(\mathcal{H}), i.e. D is a vector space automorphism (it is not necessarily a Hilbert space automorphism because it is not necessarily an isometry (you might want to think about when it is)). The inverse of D is simply

D^{-1}\mathbf{e}_j = \frac{1}{\sqrt{d_j}} \mathbf{e}_j.

Consider now the operator \tilde{T} = D^{-1}TD. This is a “symmetrized” version of the transition operator T, in that it is selfadjoint. Indeed, we have

\langle \mathbf{e}_i, \tilde{T}\mathbf{e}_j \rangle = \frac{m_{ij}}{\sqrt{d_id_j}},

which is a symmetric function of i and j. By the spectral theorem, \mathcal{H} admits an orthonormal basis \{\tilde{f}_1,\dots,\tilde{f}_n\} such that

\tilde{T}\tilde{\mathbf{f}}_i = \lambda_i \tilde{\mathbf{f}}_i, \quad 1 \leq i \leq n,

with each \lambda_i \in \mathbb{R}. But since \tilde{T}=D^{-1}TD, this means that

T(D\tilde{\mathbf{f}}_i) = \lambda_i (D\tilde{\mathbf{f}}_i), 1 \leq i \leq n.

Since D \in \mathrm{GL}(\mathcal{H}), the vectors

\mathbf{f}_i = D\tilde{\mathbf{f}}_i, 1 \leq i \leq n

are a basis of \mathcal{H}, and as we have just seen each is an eigenvector of T, with the corresponding eigenvalue real.

— Q.E.D.

Math 202C: Lecture 3

In Lecture 2, we showed that the problem of computing the number W^r(i,j) of r-step walks on finite undirected graph \mathrm{G} with vertex set \{1,\dots,n\} is equivalent to computing the matrix elements \langle \mathbf{e}_i,A^r \mathbf{e}_j \rangle of the rth power of the adjacency operator A \in \mathrm{End}\ \mathbb{C}\mathrm{G} relative to the vertex basis \{\mathbf{e}_1,\dots,\mathbf{e}_n\} of the Hilbert space \mathbb{C}\mathrm{G}. This translates the combinatorial problem of counting walks into the algebraic problem of computing the matrix elements of an operator in a specified basis. We now give this problem the Math 202A treatment.

Consider linear operators A_1,\dots,A_r acting in a finite-dimensional Hilbert space \mathcal{H}. Recall the following theorem from Math 202A.

Theorem 1: The operators A_1,\dots,A_r admit a common orthonormal basis if and only if \{A_1,A_1^*,\dots,A_r,A_r^*\} is a commutative family. In particular, a single operator admits an orthonormal eigenbasis if and only if it commutes with its adjoint, i.e. is a normal operator.

Now let \{\mathbf{e}_1,\dots,\mathbf{e}_n\} be a given orthonormal basis of \mathcal{H}, let A \in \mathrm{End}\mathcal{H} be a normal operator, let \{\mathbf{f}_1,\dots,\mathbf{f}_n\} be an orthonormal basis of \mathcal{H} such that

A \mathbf{f}_j = \lambda_j \mathbf{f}_j \quad 1 \leq j \leq n,

and let U \in \mathrm{Aut}\mathcal{H} be the automorphism of \mathcal{H} defined by

U\mathbf{f}_j = \mathbf{e}_j, \quad 1 \leq j \leq n,

i.e. U is the unitary operator which transforms the eigenvectors of A into our preferred basis of \mathcal{H}.

Problem 3.1: Prove that the operator U above is indeed unitary, which by definition means that it is invertible and preserves the scalar product.

Problem 3.2: Prove that U \in \mathrm{End} \mathcal{H} is unitary if and only if it is invertible and satisfies \langle \mathbf{v},U\mathbf{w}\rangle = \langle U^{-1}\mathbf{v},\mathbf{w} \rangle for all \mathbf{v},\mathbf{w} \in \mathcal{H}.

We now compute the matrix elements of a power A^r in the initial basis as follows:

\langle \mathbf{e}_i,A^r\mathbf{e}_j \rangle = \langle U\mathbf{f}_i,A^rU\mathbf{f}_j \rangle \\=\left\langle \sum\limits_{k=1}^n \langle \mathbf{f}_k,U\mathbf{f}_i\rangle \mathbf{f}_k,A^r\sum\limits_{l=1}^n \langle \mathbf{f}_l,U\mathbf{f}_j\rangle \mathbf{f}_l \right\rangle \\=\sum\limits_{k=1}^n\sum\limits_{l=1}^n \overline{\langle \mathbf{f}_k,U\mathbf{f}_i\rangle}\langle \mathbf{f}_l,U\mathbf{f}_j\rangle \langle \mathbf{f}_k,A^r\mathbf{f}_l\rangle \\ =\sum\limits_{k=1}^n\sum\limits_{l=1}^n \overline{\langle \mathbf{f}_k,U\mathbf{f}_i\rangle} \lambda_k^r \langle \mathbf{f}_l,U\mathbf{f}_j\rangle \langle \mathbf{f}_k,\mathbf{f}_l\rangle \\ =\sum\limits_{k=1}^n \overline{\langle \mathbf{f}_k,U\mathbf{f}_i\rangle} \lambda_k^r \langle \mathbf{f}_k,U\mathbf{f}_j\rangle \\ = \sum\limits_{k=1}^n \overline{\langle \mathbf{f}_k,\mathbf{e}_i\rangle} \lambda_k^r \langle \mathbf{f}_k,\mathbf{e}_j\rangle \\ =  \sum\limits_{k=1}^n \langle \mathbf{e}_i,\mathbf{f}_k\rangle \lambda_k^r \overline{\langle \mathbf{e}_j,\mathbf{f}_k\rangle}.

In particular, noting that the adjacency operator of a given undirected graph \mathrm{G} is selfadjoint and hence normal, we combine the above with Theorem 1 from Lecture 2 to obtain the following result.

Theorem 2: The number of r-step walks from vertex i to vertex j in \mathrm{G} is

W^r(i,j) = \sum\limits_{k=1}^n \langle \mathbf{e}_i,\mathbf{f}_k\rangle \lambda_k^r \overline{\langle \mathbf{e}_j,\mathbf{f}_k\rangle},

where \mathbf{f}_1,\dots,\mathbf{f}_n is an orthonormal basis of the Hilbert space \mathbb{C}\mathrm{G} consisting of eigenvectors of the adjacency operator, A\mathbf{f}_j=\lambda_j\mathbf{f}_j.

Remark: The adjacency operator A of a graph \mathrm{G} is not only selfadjoint, but symmetric. Consequently, the complex conjugate can be omitted from the formula.

Theorem 2 is a conceptually clear result, but actually applying it to a particular graph could be difficult, since one must find all spectral information, i.e. both the eigenvectors and the eigenvalues of the adjacency operator. For some “easy” graphs, this procedure is straightforward.

Problem 3.3: Find the eigenvalues and eigenvectors of the complete graph \mathrm{K}(n) on \{1,\dots,n\}, and use Theorem 2 to recover the formulas W^r(1,1) and W^r(1,2) previously derived using combinatorics and commutative algebra.

In order to make progress on the question of card shuffling as discussed in Lecture 1, we need to do two things. First, we need a version of Theorem 2 for random walks, and second we need to implement this for \mathrm{S}(n) rather than \mathrm{K}(n). Concerning the first point, this is trivial for regular graphs, being an instance the following principle:

Probability is just combinatorics divided by n.

Gian-Carlo Rota

For non-regular graphs Rota’s principle fails there are some additional complications, which we will deal with next week. A more serious obstacle is that the Cayley graph of \mathrm{S}(n) as generated by the conjugacy class \mathrm{T} of transpositions is enormous and complicated: compared to \mathrm{K}(n), the number of vertices goes up from quadratic to factorial, and the edge structure is far more involved as well. It may thus seem impossible to implement Theorem 2 for \mathrm{S}(n). The reason it’s not impossible is that, unlike \{1,\dots,n\}, the vertex set \mathrm{S}(n) is not just a set but a group.

Indeed, the perspective that we are developing is that the representation theory of finite groups is the subject that one must invent in order to successfully use spectral graph theory to analyze Cayley graphs. It is not entirely historical fiction to say that this was one of the main motivations for the development of the theory: in the late 1800s, Adolf Hurwitz wanted to count walks on the Cayley graph of \mathrm{S}(n) as generated by the conjugacy class of transpositions in order to solve a problem in enumerative geometry, and he was able to make progress in doing this using the Fourier transform on \mathrm{S}(n). So, after a necessarily brief discussion about the linear-algebraic approach to random walks on finite graphs, we will begin developing Fourier analysis on groups, which is just representation theory viewed a little differently. We’ll start with some easy abelian groups, like the discrete circle (aka the subgroup \mathrm{C}(n) of \mathrm{S}(n) generated by the full forward cycle \gamma = (1\ 2\ \dots\ n), and go from there.

Math 202C: Lecture 2

It is no secret that one can encode a finite undirected graph \mathrm{G} as a finite-dimensional Hilbert space together with a self-adjoint operator on that space. There are two ways to implement this: functional and formal. Suppose the vertex set of \mathrm{G} is \{1,\dots,n\}. The functional implementation is to take the Hilbert space to be

\mathbb{C}^{\mathrm{G}}=\{f \colon \{1,\dots,n\} \to \mathbb{C}\},

the space of complex-valued functions on the vertex set of the graph equipped with the scalar product

\langle f,g \rangle = \sum\limits_{i=1}^n \overline{f(i)} g(i),

and the self-adjoint operator to be the adjacency operator defined by

A\mathbf{1}_j = \sum\limits_{\{i,j\} \in E(\mathrm{G})} \mathbf{1}_i,

where \mathbf{1}_i \in \mathbb{C}^\mathrm{G} is the indicator function

\mathbf{1}_i(x) = \begin{cases} 1, \text{ if } x= i\\ 0, \text{ if } x \neq i \end{cases}.

The formal implementation is to take the Hilbert space to be

\mathbb{C}\mathrm{G} = \mathrm{Span} \{\mathbf{e}_1,\dots,\mathbf{e}_n\}

where \{\mathbf{e}_1,\dots,\mathbf{e}_n\} is an orthonormal set of abstract vectors indexed by the vertices of \mathrm{G}, and define the adjacency operator by

A \mathbf{e}_j = \sum\limits_{\{i,j\} \in E(\mathrm{G})} \mathbf{e}_i.

Both implementations are useful, and it’s wise to stay flexible and not be too wedded to either one. For the time being, we’ll go with the formal one.

The purpose of the above trade-off is to replace combinatorial questions about graphs with algebraic questions about linear operators: since linear algebra is such a powerful tool, this is often very helpful, and is an instructive example of how helpful it can be to “linearize” a problem. In particular, let us consider the problem of counting walks in graphs.

Definition 1: Given a graph \mathrm{G} on \{1,\dots,n\}, a pair of vertices i,j\in \{1,\dots,n\}, and a nonnegative integer r, an r-step walk from i to j in \mathrm{G} is a function \omega \colon \{0,\dots,r\} \to \{1,\dots,n\} such that \omega(0) = i, \omega(r)=j, and \omega(x),\omega(x+1) are adjacent vertices for all x in the domain of \omega. We write \Omega^r(i,j) for the set of all such functions, and set W^r(i,j) = |\Omega^r(i,j)|.

Let us take the case of \mathrm{G}=K_n, the complete graph on \{1,\dots,n\}, as a case study.

First, it is possible to count walks in the complete graph by direct combinatorial reasoning, without using any fancy linear algebra: thinking about the problem carefully reduces it to solving a system of two linear equations in two unknowns. Pick an arbitrary vertex of K_n, say the vertex 1. Observe that

\sum\limits_{j=1}^n W^r(1,j) = (n-1)^r,

since the sum on the left hand side counts all r-step walks beginning at 1, while the right hand side reflects the fact that such a walk amounts to choosing between n-1 options r times. Thinking about the left hand side a bit more, we realize that its terms are really only of two types: there’s the term W^r(1,1) which counts loops of length r based at 1, and then there are terms of the form W^r(1,j) with j \neq 1 corresponding to r-step walks which aren’t loops. Moreover, for any two vertices j_1,j_2 both of which are distinct from 1, we have that W^r(u,j_1) = W^r(u,j_2) because K_n is “everywhere the same.” More precisely, verify the following.

Problem 2.1: The map

\Omega^r(1,j_1) \longrightarrow \Omega^r(1,j_2)

defined by \omega \mapsto \tau \circ \omega, where \tau \in \mathrm{S}(n) is the transposition \tau = (j_1\ j_2) is a bijection.

In view Problem 2.1, the above summation identity simplifies to

W^r(1,1) + (n-1)W^r(1,2) = (n-1)^r.

We thus have a linear equation with two unknowns: if we can find one more equation we’re done. Looking at small values of r, one observes that the relation

W^r(1,2) = W^r(1,1)+(-1)^{r-1}

appears to hold, and this can be verified by induction. Indeed, for the inductive step, we write

W^{r+1}(1,2) = \sum\limits_{j \neq 2} W^r(1,j),

since the rth step of an r+1-step walk from 1 to 2 can land anywhere but the target vertex on its penultimate step, and also

W^{r+1}(1,1) = \sum\limits_{j \neq 1} W^r(1,j)

for exactly the same reason. Subtracting, we get

W^{r+1}(1,2) - W^{r+1}(1,1) = W^r(1,1)-W^r(1,2) = (-1)^r

by the induction hypothesis. Thus we obtain the formula

W^r(1,1) = \frac{(n-1)^r+(-1)^{r}(n-1)}{n},

for loops and I leave it to you to complete the mission and obtain the formula for W^r(1,2).

To solve the above problem through the systematic use of linear algebra rather than combinatorial reasoning, we can make use of the following general result.

Theorem 1: Let \mathrm{G} be a finite undirected graph on \{1,\dots,n\}, and let A \in \mathrm{End}\ \mathbb{C}\mathrm{G} be its adjacency operator. Then for any i,j \in \{1,\dots,n\} and r \in \mathbb{Z}_{\geq 0}, we have

W^r(i,j) = \langle \mathbf{e}_i,A^r\mathbf{e}_j \rangle.

Proof: For r=0, this is obvious: since A^0=I is the identity operator, we have \langle\mathbf{e}_i,A^0\mathbf{e}_j \rangle = \delta_{ij}, and clearly W^0(i,j)=\delta_{ij}. For r \in \mathbb{N}, the result follows from (a special case of) the formula for the matrix elements of a product A_1\dots A_r of linear operators relative to a preferred basis:

\langle \mathbf{e}_i, A_1 \dots A_r \mathbf{e}_j \rangle = \sum\limits_{\substack{\omega \colon \{0,\dots,r\} \to \{1,\dots,n\} \\ \omega(0)=i, \omega(r)=j}} \langle \mathbf{e}_{\omega(0)}, A_1  \mathbf{e}_{\omega(1)} \rangle \langle \mathbf{e}_{\omega(1)}, A_2 \mathbf{e}_{\omega(2)} \rangle \dots \langle \mathbf{e}_{\omega(r-1)},A_r \mathbf{e}_{\omega(r)} \rangle.

In the particular case at hand, we have A_1=\dots=A_r=A, the adjacency operator, so that for any r \in \{0,\dots,n\} the scalar product \langle \mathbf{e}_{\omega(x-1)}, A \mathbf{e}_\omega(x) \rangle equal to zero unless \omega(x-1) and \omega(x) are adjacent, in which case it is equal to 1. Thus we have

\langle i | A^r | j \rangle = \sum\limits_{\omega \in \Omega^r(i,j)} 1 = W^r(i,j).

— Q.E.D.

I’m sure you think you know what’s coming next: an invocation of the Spectral Theorem. However, let’s leave that for next lecture, and end this one by observing that sometimes Theorem 1 is useful simply because of the efficacy of algebraic operations with linear operators. In particular, in the case of K_n, we have A=J-I, where I,J \in \mathrm{End}\ \mathbb{C}\mathrm{K}_n are operators

I\mathbf{e}_j = \mathbf{e}_j \quad\text{ and }\quad J\mathbf{e}_i = \sum_{j=1}^n \mathbf{e}_j.

Clearly, these operators commute, so we can compute A^r by means of the binomial theorem:

A^r=(J-I)^r = \sum\limits_{s=0}^r {r \choose s} J^s (-I)^{r-s} =  \sum\limits_{s=1}^r {r \choose s} (-1)^{r-s} J^s + (-I)^r.

Problem 2.2: Prove that J^s = n^{s-1}J.

Using Problem 2.2, we can continue our above computation to

A^r=(J-I)^r=  \frac{1}{n}\sum\limits_{s=1}^r{r \choose s}n^s (-1)^{r-s} J + (-1)^rI,

and using the binomial theorem again (in the other direction), this is

A^r=\frac{1}{n}\left((n-1)^r -(-1)^r\right) J + (-1)^rI.

In particular, we can compute the missing quantity W^r(1,2) from the combinatorial argument above as the scalar product

\langle \mathbf{e}_1,A^r \mathbf{e}_2 \rangle = \frac{1}{n}\left((n-1)^r -(-1)^r\right).

Of course, all of this follows from the decomposition A=J-I of the adjacency operator, which is a special feature of \mathrm{K}_n. For general graphs, we really do have to resort to spectral theory, and we will start down this road next lecture.