Math 202C: Lecture 6

In Lecture 5, we worked out the Markov limit theorem which says that simple random walk on a connected, non-bipartite graph will converge to the stationary distribution on its vertices in the large time limit. This is an important general result, but it gives no understanding of how quickly this convergence occurs. Let us try to do this for \mathrm{K}(n), the complete graph on \{1,\dots,n\}.

Intuitively, because \mathrm{K}(n) is so highly connected, we expect convergence of simple random walk on this graph to the stationary measure to be very rapid. Indeed, at time r=1, a particle located at vertex 1 at time r=0 which performs simple random has equal probability of being at any vertex except 1, where it cannot be:

P^1(1,1) = 0,\ P^1(1,2)=\frac{1}{n-1},\dots,\ P^1(1,n) = \frac{1}{n-1}.

This seems close to the stationary distribution for simple random walk on \mathrm{K}(n), which according to the Markov limit theorem is uniform distribution

U(1) = \frac{1}{n},\ U(2) = \frac{1}{n},\dots,\ U(n)=\frac{1}{n}

on \{1,\dots,n\}, except for the aberration P^1(1,1)=0. However, this aberration continues indefinitely: there cannot be any finite time r such that

P^r(1,j)=U^r(j), \quad 1 \leq j \leq n,

because the particle has zero probability of being located at vertex j conditional on the event that it was located at vertex j at time r-1, and this event has positive probability. So, what really do we mean by “P^r gets close to U fast?”

Definition 1: If P,Q are probability measures on \{1,\dots,n\}, their total variation distance is

\mathrm{d}_{TV}(P,Q) = \frac{1}{2}\sum\limits_{i=1}^n |P(i)-Q(i)|.

This is quite an intuitive notion: the total variation distance simply sums up the discrepancy between P and Q over all points, giving a quantitative meaning to the total discrepancy between these measures. In particular, the maximum total variation occurs when P places all its mass at a single point, and Q places all its mass at a single different point; we then have

\mathrm{d}_{TV}(P,Q) =1.

We now have a clearly defined objective: estimate the total variation distance between the distribution of simple random walk on \mathrm{K}(n) at time r, and its stationary (uniform) distribution. Following the same pedagogical template as in our first few lectures, let us emphasize that this can be done in a completely elementary way, since we were able to obtain explicit formulas for counting walks on \mathrm{K}(n) just by being clever. Back in Lecture 1, we showed that the number of r step loops on \mathrm{K}(n) based at any vertex is

W^r(1,1) = \frac{1}{n}(n-1)^r +\frac{(-1)^r}{n}(n-1).

Now, since \mathrm{K}(n) is (n-1)-regular, the total number of r-step walks departing from 1 is (n-1)^r, and consequently the probability that simple random walk on \mathrm{K}(n) returns to its starting point at time r is

P^r(1,1) = \frac{1}{(n-1)^r}W^r(1,1) = \frac{1}{n} +\frac{(-1)^r}{n(n-1)^{r-1}},

from which we see directly that

\lim\limits_{r\to \infty} P^r(1,1) = \frac{1}{n},

exactly as the Markov limit theorem predicts. But actually, since we have simple explicit formulas, we can see more, namely we have the error estimate

\left| P^r(1,1)-\frac{1}{n}\right| = \frac{1}{n(n-1)^{r-1}},

which is actually an identity (the best kind of estimate). As for the number of r-step walks from 1 to 2, this is given by the complementary formula

W^r(1,2) = \frac{1}{n}(n-1)^r -\frac{(-1)^r}{n},


P^r(1,2) = \frac{1}{(n-1)^r}W^r(1,2) = \frac{1}{n} -\frac{(-1)^r}{n(n-1)^r}.

So again we immediately obtain the limit

\lim\limits_{r \to \infty} P^r(1,2) = \frac{1}{n},

along with the discrepancy formula

\left| P^r(1,2)-\frac{1}{n}\right| = \frac{1}{n(n-1)^r}.

All told, our maneuvering above comes down to the exact formula

\mathrm{d}_{TV}(P^r,U) = \frac{1}{n(n-1)^{r-1}}

for the total variation distance between the distribution of simple random walk on \mathrm{K}(n) at time r and the uniform distribution, which is its r \to \infty limit. Thus we know not just that P^r converges to r, but exactly how fast, as an explicit function both of spatial parameter n and the time parameter r.

Now let us reproduce the above formulas algebraically. We form the Hilbert space \mathcal{H} with orthonormal basis \{\mathbf{e}_1,\dots,\mathbf{e}_n\} labeled by the vertices of \mathrm{K}(n). From this point of view, a probability measure P on \{1,\dots,n\} is replaced with the vector \mathbf{p} \in \mathcal{H} defined by

\mathbf{p} = \sum\limits_{I=1}^n P(i) \mathbf{e}_i.

Note that even though P is probability measure on \{1,\dots,n\}, \mathbf{p} is not a unit vector with respect to the Hilbert space norm (or \ell^2) norm,

\|\mathbf{v}\|_2= \sqrt{\langle \mathbf{v},\mathbf{v} \rangle}.

Rather, it is a unit vector with respect to the \ell^1-norm, which is defined by

\|\mathbf{v}\|_1 = \sum\limits_{i=1}^n |\langle \mathbf{e}_i,\mathbf{v}\rangle|,

and it is this norm that corresponds to the total variation distance: for probability measures P,Q on \{1,\dots,n\} with corresponding probability vectors \mathbf{p},\mathbf{q} \in \mathcal{H}, we have

\mathrm{d}_{TV}(P,Q) = \frac{1}{2}\|\mathbf{p}-\mathbf{q}\|_1.

So, we would like to compute \|\mathbf{p}_r-\mathbf{u}\|_1, where

\mathbf{p}_r = \sum\limits_{j=1}^n P^r(1,j) \mathbf{e}_j

is the probability vector corresponding to the distribution of simple random walk on \mathrm{K}(n) at time r, and

\mathbf{u} = \sum\limits_{j=1}^n \frac{1}{n} \mathbf{e}_j

is the probability vector corresponding to the uniform distribution.

The adjacency operator A \in \mathrm{End}\mathcal{H} is defined by its action in the vertex basis,

A\mathbf{e}_j = \sum\limits_{\substack{i=1 \\ i\neq j}}^n \mathbf{e}_i, \quad 1 \leq j \leq n,

and since \mathrm{K}(n) is (n-1)-regular the transition operator for simple random walk on \mathrm{K}(n) is just T=\frac{1}{n-1}A. By inspection, we have

A\mathbf{u} = (n-1)\mathbf{u}

so that \mathbf{u} is an eigenvector of A with eigenvalue (n-1). Let \mathcal{U} be the line in \mathcal{H} spanned by \mathbf{u}. Since A is selfadjoint, \mathcal{U}^\perp is an invariant subspace of A, and in fact, as you saw in last week’s problem set, A acts in \mathcal{U}^\perp as multiplication by -1. Thus, the adjacency operator decomposes as

A = (n-1)P + (-1)Q,

where P,Q \in \mathrm{End}\mathcal{H} are the orthogonal projections of \mathcal{H} onto the complementary subspaces \mathcal{U} and \mathcal{U}^\perp, respectively. We thus have

A^r=(n-1)^rP + (-1)^rQ,

and hence

T^r= P +\frac{(-1)^r}{(n-1)^r}Q.

We now have the formula

\|\mathbf{p}_r-\mathbf{u}\|_1 = \|T^r\mathbf{e}_1-\mathbf{u}\|_1,

so it remains only to compute T^r\mathbf{e}_1 in order to get the total variation distance we seek. Now this is just

T^r\mathbf{e}_1 = P\mathbf{e}_1 +\frac{(-1)^r}{(n-1)^r}Q\mathbf{e}_1,

where P\mathbf{e}_1 is the orthogonal projection of \mathbf{e}_1 onto \mathcal{U} and Q\mathbf{e}_1 is the orthogonal projection of \mathbf{e}_1 onto \mathcal{U}^\perp. By basic linear algebra,

P \mathbf{e}_1 = \langle \mathbf{e}_1,\mathbf{u} \rangle \mathbf{u} = \frac{1}{n}\mathbf{u},

and hence for the complementary projection we have

Q\mathbf{e}_1 = (I-P)\mathbf{e}_1 = \mathbf{e}_1 -\frac{1}{n}\mathbf{u}.

This allows to explicitly compute the 1-norm of T^r\mathbf{e}_1-\mathbf{u}, though I have to admit I stopped short of doing the required simplifications to check that we get the same formulas as by the barehands method, up to a factor of 2.

Problem 6.1: Analyze simple random walk on \overline{\mathrm{K}}(n), the graph obtained by adding a loop at each vertex of \mathrm{K}(n). This is exactly the same thing as “lazy” simple random walk on \mathrm{K}(n), where the particle can either jump to an adjacent vertex or stay put.

The main message of this lecture is the following. We would like to analyze random walk on the Cayley graph of the symmetric group \mathrm{S}(n) as generated by the conjugacy class of transpositions. Of course, being a Cayley graph, this is a regular graph and hence the Markov limit theorem tells us that the distribution of the walk tends to uniform in the long-time limit. But in real world terms, this is just the statement that “after infinitely many shuffles, a deck of cards is randomized,” and now it is clear that this is not a useful statement: what we really want to be able to calculate is the total variation distance between the distribution of the walk at time r and uniform. This will allow us to quantitatively state how many shuffles are required to get within a stipulated distance of the uniform measure. This endeavor will require everything you learned in Math 202B, and will (hopefully) help to elucidate that material. In the next lecture, we will start down this road by analyzing the same question for some fairly simple abelian subgroups of \mathrm{S}(n).

1 Comment

Leave a Reply