Math 202C: Lecture 5

We continue with our study of random walks on finite graphs via linear algebra. Let \mathrm{G} be a graph on \{1,\dots,n\}, possibly with loops and/or multiple edges. As in Lecture 4, let P^r(i,j) denote the probability that simple random walk on \mathrm{G} starting from vertex i is located at vertex j at time r. In this Lecture, we will use the linear algebra setup from Lecture 4 to analyze the r \to \infty asymptotic behavior of the probability P^r(i,j), which corresponds to the “long time behavior” of simple random walk on \mathrm{G}.

The linear algebraic setup is as follows. Let \mathcal{H} be a Hilbert space with orthonormal basis \{\mathbf{e}_1,\dots,\mathbf{e}_n\} indexed by the vertices of \mathrm{G}. In the language of quantum mechanics, by replacing the vertex set of \mathrm{G} with the Hilbert space \mathcal{H} we allow ourselves to consider not just vertices, but “superpositions” of vertices. Recall from Lecture 4 that the transition operator T \in \mathrm{End}\mathcal{H} for simple random walk on \mathrm{G} is defined by its action on the vertex basis as

T\mathbf{e}_i = \sum\limits_{j=1}^n \frac{m_{ij}}{d_i} \mathbf{e}_j,

where m_{ij} is the number of edges of \mathrm{G} incident to both i and j, and d_i=\sum_{j=1}^n m_{ij} is the total number of edges incident to i. In Lecture 4, we showed that

P^r(i,j) = \langle T^r\mathbf{e}_i,\mathbf{e}_j \rangle,

so that understanding the r \to \infty asymptotics of the probability P^r(i,j) is equivalent to understanding the r \to \infty asymptotics of the scalar product \langle T^r\mathbf{e}_i,\mathbf{e}_j \rangle. We would like to accomplish this via spectral analysis of T.

We adopt the same assumptions as in Lecture 4: \mathrm{G} is a connected graph on \{1,\dots,n\} and n \geq 2, so that in particular every vertex has strictly positive degree. Under this assumption, we showed previously that, although T need not be selfadjoint, or even normal, the “symmetrize” transition operator D^{-1}TD is selfadjoint, with D \in \mathrm{GL}(\mathcal{H}) being the operator that acts diagonally in the vertex basis according to

D\mathbf{e}_j = \sqrt{d_j}\mathbf{e}_j.

We may thus invoke the spectral theorem to assert that

D^{-1}TD = \lambda_1P_1 + \dots + \lambda_mP_m,

where

\lambda_1 >\dots > \lambda_m,

are the distinct eigenvalues of the selfadjoint operator D^{-1}TD, with P_i \in \mathrm{End}\mathcal{H} the orthogonal projection of \mathcal{H} onto the \lambda_i-eigenspace. Because the P_i‘s are orthogonal projections, we have

D^{-1}T^rD = \lambda_1^rP_1 + \dots + \lambda_m^rP_m,

so that for the rth power of the transition operator we have the decomposition

T^r = \lambda_1^r DP_1D^{-1} + \dots + \lambda_m^r DP_mD^{-1}.

So far everything we have said is valid for an arbitrary operator in \mathrm{End}\mathcal{H} whose orbit under the conjugation action of \mathrm{GL}(\mathcal{H}) contains a selfadjoint operator. To go further we naturally have to reference some of the particular features of the transition operator T.

Theorem 1: The largest eigenvalue of T is \lambda_1=1.

Proof: Let \|\cdot\|_1 be the 1-norm on \mathcal{H} corresponding to the basis \mathbf{e}_1,\dots,\mathbf{e}_n, i.e.

\|\mathbf{v}\|_1 = \sum\limits_{j=1}^n |\langle \mathbf{v},\mathbf{e}_j \rangle|.

We claim that S has norm at most 1 in the corresponding operator norm on \mathrm{End}\mathcal{H}, i.e.

\|S\mathbf{v}\|_1 \leq \|\mathbf{v}\|_1 \quad \forall \mathbf{v} \in \mathcal{H}.

Indeed, we have

\|S\mathbf{v}\|_1 = \sum\limits_{j=1}^n |\langle S\mathbf{v},\mathbf{e}_j\rangle| = \sum\limits_{j=1}^n \left| \left\langle \sum\limits_{i=1}^n \langle \mathbf{e}_i,\mathbf{v}\rangle S\mathbf{e}_i,\mathbf{e}_j \right\rangle \right| \leq \sum\limits_{i=1}^n|\langle \mathbf{v},\mathbf{e}_i\rangle|\sum\limits_{j=1}^n |\langle S\mathbf{e}_i,\mathbf{e}_j\rangle|,

and the terminal double sum is exactly \|\mathbf{v}\|_1, since

\sum\limits_{j=1}^n \langle T\mathbf{e}_i,\mathbf{e}_j \rangle = \sum\limits_{j=1}^n \frac{m_{ij}}{d_i} = \frac{d_i}{d_i}=1.

It follows that any eigenvalue of S has modulus at most 1.

Now we exhibit an eigenvector of T with eigenvalue 1. Consider the vector

\mathbf{s} = \sum\limits_{i=1}^n \frac{d_i}{D}\mathbf{e}_i,

where

D=\sum\limits_{i=1}^n d_i

is the sum of the degrees of all vertices of \mathrm{G}, which is equal to twice the number of edges in \mathrm{G} less the number of loops. We then have

T\mathbf{s} = \sum\limits_{i=1}^n \frac{d_i}{D}T\mathbf{e}_i = \sum\limits_{i=1}^n \frac{d_i}{D}\sum\limits_{j=1}^n \frac{m_{ij}}{d_i} \mathbf{e}_j = \sum\limits_{j=1}^n \frac{d_j}{D}\mathbf{e}_j = \mathbf{s}.

— Q.E.D.

The eigenvector \mathbf{s} above is called the stationary vector for simple random walk on \mathrm{G}, and it encodes the probability measure on the vertices of \mathrm{G} which assigns mass \frac{d_i}{D} to vertex i. Note that, if \mathrm{G} is a simple graph with constant vertex degree equal to d, which is always the case for Cayley graphs corresponding to a symmetric generating set, then the stationary vector

\mathbf{s} = \sum\limits_{i=1}^n \frac{d}{dn}\mathbf{e}_i = \sum\limits_{i=1}^n \frac{1}{n}\mathbf{e}_i

corresponds to uniform probability measure on \{1,\dots,n\}.

It is important to note that Theorem 1 does not preclude the possibility that the smallest eigenvalue of the transition operator T could also have maximum modulus — it could well be that \lambda_m=-1. However, this can only be the case if \mathrm{G} is bipartite (perhaps I should fill in a proof of this later, but if I don’t you can find one in any text on spectral graph theory). Therefore, we have the following.

Corollary 1: If \mathrm{G} is not bipartite, then

\lim\limits_{r \to \infty} \|T^r-DP_1D^{-1}\| = 0,

where \|\cdot\| is any norm on \mathrm{End}\mathcal{H}.

Proof: Let \|\cdot\| be an arbitrary norm on \mathrm{End}\mathcal{H} (indeed, since \mathrm{End}\mathcal{H}\} is finite-dimensional, all norms are equivalent). Then for any r we have that

\|T^r-\lambda_1DP_1D^{-1}\| \leq \sum\limits_{I=2}^m |\lambda_i| \|DP_iD^{-1}\|,

by the triangle inequality. But since \lambda_1=1 and |\lambda_i|<1 for i > 1, the statement follows from the squeeze theorem.

— Q.E.D.

At this point we can finally say something about the probability P^r(i,j) = \langle T^r\mathbf{e}_i,\mathbf{e}_j \rangle that simple random walk on \mathrm{G} starting at i arrives at j at time r: for \mathrm{G} a connected non-bipartite graph on n \geq 2 vertices, the limit

t(i,j) = \lim\limits_{r \to \infty} P^r(i,j)

exists. Note that the role of bipartiteness here is quite clear from a combinatorial perspective: if \mathrm{G} is bipartite, then with probability one simple random walk started at i is located in the color class of i at even times, and in the opposite color class at odd times, so that the above limit obviously does not exist.

Continuing on under the assumption that \mathrm{G} is not-bipartite, let us define a vector \mathbf{t}_i \in \mathcal{H} by

\mathbf{t}_i = \sum\limits_{j=1}^n t(i,j) \mathbf{e}_j,

which encodes the limit distribution of simple random walk on \mathrm{G} started at i.

Theorem 2: For any 1 \leq 1 \leq n, the limit vector \mathbf{t}_i is an eigenvector of T with eigenvalue 1.

Proof: We have

T\mathbf{t}_i = T\lim\limits_{r \to \infty} T^r \mathbf{e}_i = \lim\limits_{r \to \infty} T^{r+1} \mathbf{e}_i = \mathbf{t}_i.

— Q.E.D.

Theorem (Markov limit theorem): For \mathrm{G} a connected non-bipartite graph on \{1,\dots,n\}, n \geq 2, we have

\mathbf{t}_1=\mathbf{t}_2=\dots = \mathbf{t}_n = \mathbf{s}.

Proof: The statement follows if we can show that \lambda_1=1 is a simple eigenvalue of T, i.e. that the \lambda_1-eigenspace is a line. Indeed, all the vectors \mathbf{t}_i as well as \mathbf{s} have nonnegative coordinates summing to 1 relative to the vertex basis, so if any of these is a scalar multiple of another then the scalar in question must be 1.

The proof that the top eigenspace is indeed one-dimensional uses a bit of a deus ex machina, namely the Perron-Frobenius theorem. Look up this theorem, and complete the proof!

— Q.E.D.

Math 202C: Lecture 1

Since this is a course in applied algebra, let us begin with a real-world problem. Consider a deck of playing cards numbered 1,2,\dots,n arranged in a single row on a table. At each instant of discrete time, choose two cards independently at random and swap them. How long until you can be sure that the order of the cards is uniformly random?

As with every applied problem, in order to get started we have to think about how best to formulate the above as a precise mathematical question. This is what it means to “model” a real world problem.

First, each of the possible configurations of cards can be uniquely encoded as a permutation of the numbers 1,\dots,n — just write down the labels of the cards one at a time, from left to right, and you obtain a permutation written of 1,\dots,n written in one-line notation. This is a good first step: we have identified the state space of the system we wish to understand, and it is the set \mathrm{S}(n) of permutations of 1,\dots,n.

Next, we need to make precise the way in which our system evolves in time, i.e. transitions from one state to another at each time instant. If the configuration of cards at time t is encoded by the permutation \pi \in \mathrm{S}(n), what are the possible states that our system can be in at time t+1? An obvious but in fact crucially important fact is that the answer to this question is not just combinatorial, but also algebraic: our configuration space \mathrm{S}(n) is not just a set, it is a group, the symmetric group of rank n. This means that the evolution equation we seek can be represented as multiplication in this group: given that the configuration of cards at time t is the permutation \pi the possible states at time t+1 are precisely the permutations

\sigma = \pi(x\ y), \quad 1 \leq x < y \leq n,

where (x\ y) denotes the transposition in \mathrm{S}(n) which interchanges the numbers x,y \in \{1,\dots,n\}. This is a good time to fix our notational conventions. Permutations in \mathrm{S}(n) are bijective functions \{1,\dots,n\} \to \{1,\dots,n\}, and multiplication of permutations means composition of functions. The concatenation of two permutations \pi_1,\pi_2 means the following:

\pi_1\pi_2 := \pi_2 \circ \pi_1.

Although this looks messy, the reason for taking this convention rather than the seemingly more natural one \pi_2\pi_1=\pi_2 \circ \pi_1 is that we are more interested in doing computations “upstairs” where permutations are elements in a group, as opposed to “downstairs” where they are functions acting on points, and it is nice to multiply strings of elements from left to right.

It is useful to take the above one step further, and think of the symmetric group \mathrm{S}(n) as not just an algebraic object, but a geometric one, in the manner of geometric group theory. To do this, first solve the following problem.

Problem 1: The set \mathrm{T} \subset \mathrm{S}(n) of all transpositions \tau=(x\ y),\ 1 \leq x < y \leq n generates S(n). That is, for any permutation \pi \in \mathrm{S}(n), there exists a nonnegative integer k and transpositions \tau_1,\dots,\tau_k \in \mathrm{T} such that

\pi = \tau_1 \tau_2 \dots \tau_k.

With the solution to Problem 1 under our belts, we can define the word norm on \mathrm{S}(n) corresponding to the generating set \mathrm{T} by

\|\pi\|_T = \min \{k \in \mathbb{Z}_{\geq 0} \colon \pi=\tau_1\dots\tau_k,\ \tau_i\in \mathrm{T}\}.

Problem 2: Prove that \|\pi\sigma\|_{\mathrm{T}} \leq \|\pi\|_{\mathrm{T}} + \|\sigma\|_{\mathrm{T}}.

Once we have a norm on \mathrm{S}(n), meaning a way to measure the “size” of a permutation, it is very natural to define a corresponding notion of distance between two permutations by

\mathrm{d}_\mathrm{T}(\pi_1,\pi_2) = \|\pi_1^{-1}\pi_2\|_\mathrm{T}.

Problem 3: Check that the pair (\mathrm{S}(n),\mathrm{d}_\mathrm{T}) satisfies the metric space axioms.

To summarize, the state space \mathrm{S}(n) of the system we wish to understand is not just a set, but a group, and further not just a group, but a metric space. It is nice to have a pictorial representation of the metric space (\mathrm{S}(n),\mathrm{d}_T), and this is afforded by the corresponding Cayley graph \Gamma(\mathrm{S}(n),\mathrm{T}). The vertex set of \Gamma(\mathrm{S}(n),\mathrm{T}) is \mathrm{S}(n), and two vertices \pi,\sigma are adjacent if and only if there exists \tau \in \mathrm{T} such that \sigma = \pi\tau. Below is a picture of the Cayley graph of \mathrm{S}(4) as generated by \mathrm{T}=\{(1\ 2), (1\ 3), (2\ 3), (1\ 4),(2\ 4),(3\ 4)\}.

Cayley graph of S(4) generated by transpositions.

Let us note a few basic features of this graph. First, it is {n \choose 2}-regular, because |\mathrm{T}|={n \choose 2}. Second, it is a graded graph, meaning that its vertex set decomposes as a disjoint union of independent sets,

\mathrm{S}(n) = \bigsqcup_{k=0}^{n-1} L_k,

where L_k is the set of permutations whose disjoint cycle decomposition consists of n-k cycles. Third, from the geometric perspective, the kth level L_k of the Cayley graph is the sphere of radius k in \mathrm{S}(n) centered at the identity permutation,

L_k=\{ \pi \in \mathrm{s}(n) \colon \|\pi\|_T=k\}.

Problem 4: Prove that \max \{\mathrm{d}_\mathrm{T}(\pi,\sigma) \colon \pi,\sigma \in \mathrm{S}(n)\}=n-1.

At this point we have quite a clear and detailed understanding of the system we want to analyze: the state space of the system is the Cayley graph of \mathrm{S}(n) as generated by the set \mathrm{T} of transpositions, and the evolution occurring is discrete time random walk on this graph. Now we have to make precise what exactly we mean by “random walk” in this sentence.

The most natural thing to consider is simple random walk: at time zero, we have a particle situated at the identity permutation on \Gamma(\mathrm{S}(n),\mathrm{T}), and at each instant of discrete time the particle jumps to a neighboring vertex, with equal probability of selecting any neighbor. We would like to understand how many jumps must be made before the probability to observe the particle at any vertex of the graph is the same as at any other vertex. This is what it means for a deck of cards to be shuffled: every configuration is equally likely, so that a person choosing a card from the deck cannot do anything more than guess what this card is going to be. Now, there is an immediate problem: this will never happen. The reason is annoying but unavoidable: every permutation is either even or odd, meaning that for any \pi \in \mathrm{S}(n), all walks from the identity to \pi have either an even or an odd number of steps. So, at even times, the particle must be situated in the alternating group \mathrm{A}(n), and at odd times in its complement. But actually, this foible reminds us that we are trying to solve a real world problem in which two cards are selected independently and then swapped — it is possible that the two cards selected are actually the same, and no swap occurs.

Problem 5: Prove that the same card, no-swap event occurs with probability \frac{2}{n+1}.

Thus, what we really want to understand is the lazy random walk on \mathrm{S}(n), where at each instant the particle either stays put with the above probability, or jumps to an adjacent vertex, with equal probability of jumping in any direction. Analyzing this model, Persi Diaconis and Mehrdad Shahshahani were able to prove the following landmark result in applied algebra.

Theorem (Diaconis-Shahshahani): A deck of cards is fully shuffled after \frac{1}{2}n \log n transpositions.

Although the Diaconis-Shahshahani result is ostensibly a theorem in probability theory, one may legitimately refer to it as a theorem of applied algebra: its proof uses everything you learned about the representation theory of the symmetric groups you learned in Math 202B. Moreover, as far as the author knows, there is no way to obtain this result without using representation theory, i.e. by purely probabilistic reasoning.

In this course, we will use simple stochastic processes like card shuffling as motivating questions whose solution leads to the development of harmonic analysis on finite groups. This is a fascinating application of algebra, and taking this route leads not just through the character theory of finite groups, but all the way to Gelfand pairs, spherical functions, and other related algebraic constructions. Even though you already know the representation theory of \mathrm{S}(n), so that in principle we could start with the Diaconis-Shahshahani theorem, we will instead begin with the analysis of random walks on much simpler groups, such as the discrete circle and hypercube. This motivates the development of character theory for finite abelian groups, where representations aren’t particularly meaningful, and sets the stage nicely for the harmonic analysis interpretation of the representation theory of general finite groups.

Math 202C: Lecture 0

As a student enrolled in Math 202C, you have necessarily completed Math 202B under the tutelage of Professor Brendon Rhoades, and as such are an expert on the representation theory of the symmetric group \mathrm{S}(d). You may however not realize that this theory is directly applicable to natural and compelling real world problems, some of which are sufficiently rich and interesting that they can be adopted as a posteriori motivations for the development of the subject.

In this course, we are going to take another look at the representation theory of the symmetric group through the lens of the following real world question: how many times do you have to shuffle a deck of cards in order to completely randomize it? We will try to answer this question completely and thoroughly, down to the last detail, and in doing so expand on your existing knowledge of the representation theory of the symmetric group, as well as place this theory into the context of the more general subject known as harmonic analysis on finite groups. This is a beautiful and very classical branch of algebra which enriches the representation theory of finite groups you learned in Math 202B with new perspectives gleaned from the theory of the Fourier transform in classical analysis. Representation theory and harmonic analysis can be completely united, and the resulting subject is the representation theory of topological groups, which we may briefly discuss towards the end of the course. However, we will primarily remain in the finite setting where everything is totally algebraic and easy to understand without any sophisticated machinery.

The mechanics of the course are as follows. I will provide you with a complete text on harmonic analysis on finite groups, in the form of blog posts which will appear here every Monday and Wednesday, in the approximate neighborhood of our regularly scheduled lecture time. Thus there will be two posts per week; reading through and understanding each of these will likely require somewhat more effort than absorbing a standard in-person fifty minute lecture. These two lecture posts form the asynchronous component of the course. That leaves our Friday lecture slot, which will be a two hour live stream session: the first hour will be a flipped classroom style experience, rather than a planned live lecture, in which we can collaboratively review, discuss, and expand upon the material from the lecture posts. The second hour will be my office hours, so open to any and all questions. In practice, it is likely that there won’t be much of a distinction between the first and second hour, but batching the lecture slot and office hours gives us a good chunk of time to work through things. This all happens on the Math 262C Discord server, which will be our main means of communication throughout the quarter: please sign up immediately using this link, https://discord.gg/SRKNEvNP. When signing up, please take your real name as your username.

The grading scheme for the course is as follows. The lecture posts will contain embedded problems for solution. You will be expected to solve these problems, typeset your solutions in LaTeX, and submit them for grading each week, prior to Sunday at midnight. The exact method for submission and return of your graded solutions will be decided this week, and announced on the Discord server. Furthermore, you will write a “guest lecture post” to be published on this blog on a topic related to the course material; I will prepare a list of possible topics, or you may choose your own topic and clear it with me. Your grade for the course will be an average of your graded problem solutions and your guest lecture post, and you may choose the weighting of this average however you wish, anywhere from 100% problem solutions to 100% guest lecture post. Thus, at the extremes, if you prefer to solve problems and have your solutions graded, you can skip the lecture post, and conversely if you are getting tired of solving problems and would rather throw yourself into a more long term project, you can put all your effort into writing a lecture post and forego submission of typed solutions to the problems.

At this point, I expect you have questions. Please sign up for the Discord server, and we can start talking and working out various logistics. The mathematical content of the course begins on Wednesday, with Lecture 1.

Math 262A: Lecture 18

The main point of this topics course (if there was one) is that there is a surprising and useful relationship between the asymptotics of integrals (a la Laplace) and the combinatorics of topological maps (Feynman diagrams). There are a few levels of sophistication to this relationship, which may be thought of as corresponding to zero-dimensional scalar, vector, and matrix-valued quantum field theories. We will try to summarize this relationship now from a high-level perspective, meaning that the details might be not quite right, but the basic idea is more or less sound.

Formally, for the scalar theory the connection is something like

\int\limits_\mathbb{R} e^{-N(\frac{x^2}{2} + \sum_{d=3}^\infty p_d\frac{x^d}{d})}\mathrm{d}x \sim \sum\limits_\Gamma \frac{1}{|\mathrm{Aut} \Gamma|} N^{v(\Gamma)-e(\gamma)} p_\Gamma, \quad N \to \infty

where the sum is over maps in which every vertex has degree at least 3, the numbers v(\Gamma) and e(\Gamma) are the number of edges and vertices of \Gamma, respectively, and the “amplitude” p_\Gamma is the product of the coefficients p_d from the perturbation of the Gaussian density on the integral side according to the vertex degree sequence of \Gamma,

p_\Gamma = \prod\limits_{V \in \Gamma} p_{\mathrm{deg}V}.

Now, we have that

e(\Gamma)-v(\Gamma)=r(\Gamma)-1,

where r(\Gamma) is the circuit rank of \Gamma and c(\Gamma). This means that we can organize the asymptotic expansion of the integral as a generating function for maps sorted by their circuit rank:

\int\limits_\mathbb{R} e^{-N(\frac{x^2}{2} + \sum\limits_{d=3}^\infty p_d\frac{x^d}{d})} \mathrm{d}x \sim \sum\limits_{k=0}^\infty \frac{1}{N^k}\sum\limits_{\substack{\Gamma \\ r(\Gamma)=k+1}} \frac{1}{|\mathrm{Aut}\Gamma|} p_\Gamma.

This is nice, and it allows us to compute graphically the coefficients in the asymptotic expansions of various integrals, for example the integral which figures in Stirling’s formula. Physicists call it the “loop expansion.” Importantly, it is actually true in the analytic (as opposed to just formal) sense, meaning that if you truncate the (divergent) expansion at order N^k then the error is o(N^k), because of the Laplace method, which shows that the integral in question really is concentrated in a neighborhood of the minimizer of the integrand. If you take logs, meaning that you are only interested in the order of magnitude of the integral you started with, then the loop expansion only involves connected graphs, which reduces the complexity quite a bit. An unfortunate aspect of the loop expansion is that it misses out on the fact that the relevant diagrams are not just combinatorial objects — graphs — but actually graphs embedded in surfaces. This expansion does not see the surface, meaning that it does not encode its topology.

We saw in Lecture 17 that it is indeed possible to get topological information by changing the space over which we integrate from \mathbb{R} to its noncommutative counterpart \mathrm{H}(N), the Euclidean space of N \times N Hermitian matrices. Formally, the expansion from the scalar case is modified to

\int\limits_{\mathrm{H}(N)} e^{-\mathrm{Tr} N(\frac{x^2}{2} + \sum_{d=3}^\infty p_d\frac{x^d}{d})}\mathrm{d}x \sim \sum\limits_\Gamma \frac{1}{|\mathrm{Aut} \Gamma|} N^{v(\Gamma)-e(\gamma)+f(\Gamma)} p_\Gamma, \quad N \to \infty,

where f(\Gamma) is the number of faces of the topological map \Gamma. Using the Euler formula,

v(\Gamma)-e(\Gamma)+f(\Gamma)=2-2g(\Gamma),

we can reorganize the formal N \to \infty expansion of the above matrix integral into not a loop expansion, but a genus expansion,

\int\limits_{\mathrm{H}(N)} e^{-N(\frac{x^2}{2} + \sum_{d=3}^\infty p_d\frac{x^d}{d})}\mathrm{d}x \sim \sum\limits_{k=0}^\infty N^{2-2k}\sum\limits_{\substack{\Gamma \\ g(\Gamma)=k}} \frac{1}{|\mathrm{Aut} \Gamma|}  p_\Gamma, \quad N \to \infty,

where now the inner sum is over maps with vertices of degree at least three of a fixed topological genus. This is even nicer than the loop expansion, because it sees the topology of maps (genus) as opposed to just their combinatorics (circuit rank), but it is much harder to prove that it is correct analytically, meaning that stopping the (divergent) expansion after k terms leaves an o(N^{-2k}) error. The basic problem is that the Laplace principle doesn’t work: the principle relies on the fact that the contribution to an integral over \mathbb{R} with integrand of the the form e^{-NS(x)} is dominated by a small neighborhood of the maximum of S(x), but if we swap out \mathbb{R} for \mathrm{H}(N), then a box in this N^2-dimensional Euclidean space has volume O(e^{N^2}), so that contributions to the integral from Lebesgue measure are on the same scale as contributions from the integrand itself.

This is not at all an easy problem to deal with, but I can at least give you an idea of how analysts have gotten past this obstruction. The starting point is a classical result due to Hermann Weyl the subset of \mathrm{H}(N) consisting of Hermitian matrices with a given list of eigenvalues,

\lambda_1 \geq \lambda_2 \geq \dots \geq \lambda_N,

has Euclidean volume

C_N \prod\limits_{1 \leq i < j \leq N} (\lambda_i-\lambda_j)^2,

where C_N is a constant depending only on N. Now, each such isospectral set is, by the Spectral Theorem, an orbit of the unitary group \mathrm{U}(N) acting on \mathrm{H}(N) by conjugation, i.e. a set of the form

\mathcal{O}_\lambda = \{U\mathrm{diag}(\lambda_1,\dots,\lambda_N)U^{-1} \colon U \in \mathrm{U}(N)\}.

Incidentally, the orbits \mathcal{O}_\lambda are symplectic manifolds, but we don’t need to pay attention to this here. The point that is relevant for us is the Weyl integration formula, which is just the change of variables formula resulting from Weyl’s volume computation: if f(X) is a function on \mathrm{H}(N) which is invariant under the conjugation action of \mathrm{U}(N), then

\int\limits_{\mathrm{H}(N)} f(X) \mathrm{d}X = c_N\int\limits_{\mathbb{W}^N} f(\lambda_1,\dots,\lambda_N) \prod_{i<j} (\lambda_i-\lambda_j)^2 \mathrm{d}\lambda,

where the integration is with respect to Lebesgue measure on the Weyl chamber \mathbb{W}^N \subset \mathbb{R}^N, i.e. the convex set

\mathbb{W}^N = \{(\lambda_1,\dots,\lambda_N) \in \mathbb{R}^N \colon \lambda_1 > \dots > \lambda_N\},

and f(\lambda_1,\dots,\lambda_N) is, by abuse of notation, the value of the original function f on any Hermitian matrix with eigenvalues \lambda_1,\dots,\lambda_N. In particular, a matrix integral of the Laplace form

\int\limits_{\mathrm{H}(N)} e^{-N\mathrm{Tr}S(X)} \mathrm{d}X,

can be rewritten via the Weyl integration formula as

c_N\int\limits_{\mathbb{W}^N} e^{-N\sum_{i=1}^N S(\lambda_i)} \prod_{i<j} (\lambda_i-\lambda_j)^2\mathrm{d}\lambda,

and the remaining integral over eigenvalues can yet again be rewritten in the more compelling form

\int\limits_{\mathbb{W}^N} e^{-N^2(\frac{1}{N} \sum_{i=1}^N S(\lambda_i)-\frac{2}{N} \sum_{i<j} \log(\lambda_i-\lambda_j))} \mathrm{d}\lambda.

There are a few things to say here. First and foremost, the point of all of this is that we have reduced from integrating over the huge space \mathrm{H}(N) of dimension N^2 to integrating over the N-dimensional set \mathbb{W}^N, which is a massive reduction in degrees of freedom, and in particular contributions from the Lebesgue measure are now order e^N. Second, the integrand is of order e^{-N^2}, so it seems like the Laplace principle may yet be correct in this context: the main contributions to the integrand come from a small neighborhood of the minimizer of the function

\mathcal{S}(\lambda_1,\dots,\lambda_N) = \frac{1}{N} \sum_{i=1}^N S(\lambda_i)-\frac{2}{N} \sum_{i<j} \log(\lambda_i-\lambda_j).

Third, we have actually met this action before, back when we looked at the asymptotic enumeration of permutations with restricted decreasing subsequence length: it represents the potential energy of a system of N identical point charges on a wire with logarithmic repulsion (which is indeed the electrostatic repulsion in two dimensions), but confined by the potential well S(\lambda), which was S(\lambda)=\frac{1}{2}\lambda^2 last time we met it, but now can be anything. As N\to \infty, the distribution of these charges will crystallize around the configuration which minimizes the action \mathcal{S}, which in particular means that the empirical distribution \mu_N of these N point charges — i.e. the probability measure on \mathbb{R} which places mass 1/N at each particle — will converge as N \to \infty to a continuous probability measure \mu_\mathcal{S} on \mathbb{R} called the “equilibrium measure” corresponding to S. More precisely, in the language of calculus of variations, we have

\lim_{N\to \infty} \frac{1}{N^2} \log \int\limits_{\mathrm{H}(N)} e^{-NS(X)} \mathrm{d}X = - \inf\limits_{\mu \in \mathcal{P}(\mathbb{R})} \mathcal{S}(\mu),

where

\mathcal{S}(\mu) = \int\limits_\mathbb{R} S(x)\mu(\mathrm{d}x) - \int\limits_\mathbb{R}\int\limits_\mathbb{R} \log |x-y|\mu(\mathrm{d}x)\mu(\mathrm{d}y).

It is a theorem that for nice enough S, the equilibrium measure always exists and is unique. This is the Laplace principle for matrix integrals. Although it is technically very involved, a fine analysis of the convergence to equilibrium measure shows that under reasonable hypotheses the formal genus expansion above is indeed correct, analytically, as an asymptotic expansion. This means that the Feynman diagrams for matrix integrals really are maps on surfaces. It is interesting that this connection is in fact more typically used to say something about maps, not about integrals: there are independent methods for computing the coefficients in the asymptotic expansion of Hermitian matrix integrals based on orthogonal polynomials, and once these coefficients are found in some independent fashion one has also determined the generating function for maps of a given genus. For example, taking the potential S(x) = \frac{x^2}{2} + p_4\frac{x^4}{4} and calculating the corresponding equilibrium measure, one can obtain an explicit formula for the number of ways to glue a sphere from a given number of squares. A good entry point into all of this is here.

Math 262A: Lecture 15

The N\to \infty Feynman diagram expansion

\int\limits_\mathbb{R} e^{-N S(x)}\frac{\mathrm{d}x}{\sqrt{2\pi N^{-1}}}\sim \sum\limits_\Gamma \frac{\mathrm{Amp}_S\ \Gamma}{|\mathrm{Aut}\ \Gamma|} N^{V(\Gamma)-E(\Gamma)}

is both fascinating and frustrating: while it is fascinating that there exists a close connection between the asymptotics of integrals and the combinatorics of maps, it is frustrating that this connection is purely combinatorial, and does not “see” the topology of the surface on which the diagram \Gamma is drawn. Wouldn’t it be nice if the exponent of N in the expansion had an extra term, and thus became Euler’s formula

V(\Gamma)-E(\Gamma)+F(\Gamma) = 2-2g(\Gamma),

relating the alternating sum of vertices, edges, and faces of a graph to the genus of the surface on which it is drawn? Could there be some way to tinker with the integral \int e^{-NS} to bring this about?

Note first that it is impossible to get the missing F(\Gamma) terms by somehow choosing the right action S in the above integral, because the diagrammatic expansion is essentially universal: the only part of it which depends on S is the “amplitude” \mathrm{Amp}_S\ \Gamma, which is a function of \Gamma determined by the Taylor expansion of S. In order to make the jump from the combinatorics of graphs to the topology of maps, we have to alter not the integrand, but the domain of integration itself.

Perhaps the first such alteration one would try is to introduce extra dimensions: instead of integrating over \mathbb{R}, we would integrate over \mathbb{R}^d for d > 1. It turns out that there is a Laplace method and a corresponding Feynman diagram expansion for the multidimensional integral

\int\limits_{\mathbb{R}^d} e^{-N S(x)}\mathrm{d}x,

which you can read about for example in these notes of Pavel Etingof. I was hoping to cover the vector Laplace principle in this course, but unfortunately was not able to do so. However, the answer is more or less that both the principle and the corresponding diagrammatic expansion are essentially the same, and in particular moving from scalars to vectors does not allow one to make contact with the topology of maps by counting faces.

It turns out that what must be done is not to replace numbers with vectors, but rather with their quantum qounterparts (sorry, couldn’t resist): operators. Consider a Hilbert space \mathrm{V}, i.e. a complex vector space equipped with a scalar product \langle \cdot,\cdot \rangle, which is required to be complete with respect to the norm induced by the scalar product. We can consider the operator algebra \mathrm{End} \mathrm{V} as a noncommutative generalization of \mathbb{C}: it actually is \mathbb{C} if \mathrm{dim} \mathrm{V}=1, and in any dimension we can add and multiply operators in a manner analogous to the addition and multiplication of numbers. What subsets of \mathrm{End} \mathrm{V} would be the counterparts of the real line \mathbb{R} \subset \mathbb{C} and the unit circle \mathbb{T} \subset \mathbb{C}? Answer: the Hermitian operators

\mathrm{H}(\mathrm{V}) = \{ X \in \mathrm{End}\mathrm{V} \colon X^*=X\}

are like real numbers, in that they are self-conjugate and consequently have real spectrum, while the unitary operators

\mathrm{U}(\mathrm{V}) = \{U \in \mathrm{End}\mathrm{V} \colon U^*=U^{-1}\}

are like complex numbers of modulus one, in that conjugation is the same as inversion and consequently the spectrum lies in \mathbb{T}. Thus, a noncommutative analogue of the Laplace method would mean a statement on the N \to \infty asymptotics of the integral

\int\limits_{\mathrm{H}(\mathrm{V})} e^{-NS(X)} \mathrm{d}X,

where S is a scalar-valued function on \mathrm{H}(\mathrm{V}). Similarly, a noncommutative analogue of the imaginary version of the Laplace method, which is called the method of stationary phase and deals with integrals over the circle \mathbb{T} rather than the line \mathbb{R}, would be something about the N \to \infty asymptotics of

\int\limits_{\mathrm{U}(\mathrm{V})} e^{-NS(U)} \mathrm{d}U,

where S is a scalar-valued function on unitary operators. Here there is an immediate issue in that these are ill-defined functional integrals as soon as \mathrm{V} is infinite-dimensional, because of the lack of Lebesgue (or Haar) measure in infinite dimensions. So in order to rigorously study the above integrals, we have to restrict to the case that \mathbf{V} is of finite dimension M<\infty, in which case we can identify \mathrm{H}(\mathrm{V}) with the set \mathrm{H}(M) of M \times M Hermitian matrices, and \mathrm{U}(\mathrm{V}) with the set of M \times M unitary matrices. The former is a real vector space of dimension M^2, which is isomorphic to the Euclidean space \mathbb{R}^{M^2} when given the scalar product \langle X,Y \rangle = \mathrm{Tr} XY, where \mathrm{Tr} is the usual matrix trace, and the latter is a compact real Lie group of dimension M^2. So, we are led to study the N\to \infty asymptotics of the matrix integrals

\int\limits_{\mathrm{H}(M)} e^{-NS(X)} \mathrm{d}X

and

\int\limits_{\mathrm{U}(M)} e^{-NS(U)} \mathrm{d}U,

which are perfectly well-defined objects. Our goal in the remainder of this course is to develop versions of the Laplace principle and the method of stationary phase for these objects, and see that their diagrammatic expansions actually do connect up with the topology of maps.

The first step in this direction is to identify what the counterpart of the the scalar integral

\int\limits_\mathbb{R} x^d e^{-N\frac{x^2}{2}} \mathrm{d}x = \sqrt{\frac{2\pi}{N}} |\mathrm{Inv}(d)| N^{-\frac{d}{2}},

which computes the moments of the Gaussian measure on \mathbb{R}, becomes when we replace the real line with Hermitian matrices. The answer to this was found by physicists long ago, and was rediscovered by mathematicians somewhat later in this fundamental paper of Harer and Zagier.

Theorem 1: For any \mathrm{d} \in \mathbb{N}, we have

\int\limits_{\mathrm{H}(N)} \mathrm{Tr} X^d e^{-\frac{N}{2}\mathrm{Tr}X^2}\mathrm{d}X = \mathcal{Z}_N \sum\limits_{g=0}^\infty \frac{\varepsilon_g(d)}{N^{2g}},

where \mathcal{Z}_N is an explicit number depending only on N, and \varepsilon_g(d) is the number of ways in which the edges of a d-gon can be identified in pairs so as to produce a compact orientable surface of genus g. On the right hand side, if we set N=1, we obtain

\sum\limits_{g=0}^\infty \varepsilon_g(d) = |\mathrm{Inv}(d)|,

since if we forget the stratification by genus encoded by N^{-2g}, the (finite) sum is simply counting ways to glue the edges of a d-gon in pairs. Note however that setting N=1 on the left hand side of Theorem 1 does *not* produce

\int\limits_\mathbb{R} e^{-N\frac{x^2}{2}} \mathrm{d}x,

but rather

\int\limits_\mathbb{R} e^{-\frac{x^2}{2}} \mathrm{d}x.

In particular, the fact that we are integrating over \mathrm{H}(N) rather than \mathrm{H}(M) in Theorem 1 is not a typo: to get the connection with surface topology we need M=N. This simple fact causes massive problems on the analytic side, effectively invalidating the fundamental notion underlying Laplace’s method, which is that the main contribution to an integral containing a large parameter comes from a small neighborhood of a single value of the integrand. We will discuss both the combinatorics and the analysis of this “quantum Laplace method” next week.

Math 262A: Lecture 13

Continuing with Lecture 12, we are analyzing the Taylor series

T_I(\hbar) = \sum\limits_{k=0}^\infty a_k\hbar^k

of the smooth function

I(\hbar) = \frac{1}{\sqrt{2\pi\hbar}}\int\limits_{\mathbb{R}} e^{-\frac{1}{\hbar}S(x)} \mathrm{d}x,

where S is a smooth function on the line such that the above integral converges, and whose Taylor series is of the form

T_S(x) = \frac{x^2}{2} + \sum\limits_{d=3}^\infty p_d \frac{x^d}{d}.

I should say here that I’m trying out a new notation: if F is a function of the real variable y which is smooth in a neighborhood of y=0, then I write

T_F(y) = \sum\limits_{k=0}^\infty F^{(k)}(0)\frac{y^k}{k!},

this being a formal power series in a single indeterminate which by abuse of notation I also call y. So it does not make sense to write

F(y) = T_F(y),

since the LHS is a number and the RHS is a formal power series. However, Taylor’s theorem with remainder tells you that if you take a finite number of terms of the formal power series T_F and trade the formal variable y for a real number y, you get a good approximation to the real number F(y).

Back to the problem at hand, what we are trying to do is to express the formal power series T_I in terms of the formal power series T_S. What we currently know is that

T_I(\hbar) = 1 + \sum\limits_{d=1}^\infty \frac{1}{d!}\sum\limits_{\vdash d} t_d|C_\alpha|p_\alpha\hbar^{\frac{d}{2}-\ell(\alpha)},

where the internal (finite) sum is over Young diagrams with d, with t_d the number of fixed point free involutions in \mathrm{S}(d) and C_\alpha the conjugacy class in \mathrm{S}(d) of permutations of cycle type \alpha, and

p_\alpha=\prod\limits_{i=1}^{\ell(\alpha)} p_{\alpha_i}

subject to p_1=p_2=0. So, in completely algebraic language, one could say that we are studying the “Feynman transform”

\mathcal{F} \colon x^3\mathbb{R}[p_1,p_2,p_3,\dots][[x]] \to \mathbb{R}[p_1,p_2,p_3,\dots][[\hbar]]

defined by

\mathcal{F}\left( \sum\limits_{d=3}^\infty p_d \frac{x^d}{d}\right):=\sum\limits_{d=1}^\infty \frac{1}{d!} \sum\limits_{\alpha \vdash d} t_d|C_\alpha|p_\alpha \hbar^{\frac{d}{2}-\ell(\alpha)}.

The domain of this map is the principal ideal generated by x^3 in the algebra of formal power series in x with coefficients being polynomials in the p_1,p_2,p_3,\dots, and the codomain is formal power series in \hbar with coefficients in the same polynomial ring. The problem then is to simplify the output of the Feynman map as a power series in \hbar, i.e. to determine the coefficient of \hbar^k for each k \in \mathbb{N}.

What we want to do in this lecture is give a diagrammatic solution to the above algebraic problem. Note that the solution to this problem has analytic value via Laplace’s method, but the solution itself does not involve any analysis and is purely combinatorial.

The first step is to think of the number |C_\alpha|t_d as the cardinality of C_\alpha \times \mathrm{Inv}(d), where again C_\alpha is the conjugacy class of permutations in the symmetric group \mathrm{S}(d), and \mathrm{Inv}(d) is the set of fixed point free involutions in \mathrm{S}(d). Note that \mathrm{Inv}(d) is nonempty if and only if d is even, in which case

\mathrm{Inv}(d) = C_{(2^{\frac{d}{2}})},

where (2^{\frac{d}{2}}) is the Young diagram with \frac{d}{2} rows of length 2. Every pair (\sigma,\tau) \in C_\alpha \times \mathrm{Inv}(d) is a combinatorial map. Associated to this map are two topological maps, which are dual to one another. To construct these topological maps, we use the following construction. First, for each cycle of \sigma we draw a polygon whose number of sides is equal to the length of \sigma, and whose edges are labeled counterclockwise in the cyclic order prescribed by \sigma. At the same time, we place into this polygon the corresponding dual object, namely a vertex in the interior of the polygon together with line segments, called “half edges” or “darts,” which emanate from the vertex, one for each edge of the polygon and labeled in the same way. Note that we need only consider the case where d is even, since t_d=0 otherwise, and moreover where all rows of \alpha have length at least 3, since p_1=p_2=0, meaning that all polygons in this construction have at least three sides, or dually that all stars have degree at least 3. Now we glue the sides of the polygons together according to the pairing permutation \tau, and at the same time pair half edges of stars in the same manner, i.e. using \tau. This produces a pair of dual topological maps \Gamma,\Gamma' on a compact oriented surface S. Our convention is that we view \Gamma as the topological map obtained by glueing stars, and \Gamma' as the topological map obtained by glueing polygons. The figure below illustrates this construction for the combinatorial map

\left( (1\ 2\ 3)(4\ 5\ 6), (1\ 2)(3\ 4)(5\ 6) \right).

Glueing a pair of triangles/3-stars to get a pair of maps.

Now let us consider the Feynman transform

\mathcal{F}\left( \sum\limits_{d=3}^\infty p_d \frac{x^d}{d}\right):=\sum\limits_{d=1}^\infty \frac{1}{d!} \sum\limits_{\alpha \vdash d} t_d|C_\alpha|p_\alpha \hbar^{\frac{d}{2}-\ell(\alpha)}

from the point of view of topological rather than combinatorial maps. First, the powers of \hbar occurring in the transformed power series have a clear interpretation: we have

\hbar^{\frac{d}{2}-\ell(\alpha)} = \hbar^{e(\Gamma)-v(\Gamma)},

where v(\Gamma),e(\Gamma) are the number of vertices and edges in \Gamma. Observe that actually the difference e(\Gamma)-v(\Gamma) depends only on the underlying graph, not actually on its embedding, so that this is a purely combinatorial parameter. Indeed, it is closely related to the so-called circuit rank of a graph, which is the number of independent cycles in the graph, or equivalently the minimal number of edges which must be deleted in order to obtain an acyclic graph, i.e. a forest. The precise relationship between these parameters is

e(\Gamma)-v(\Gamma)=r(\Gamma)-c(\Gamma)

with r(\Gamma) the circuit rank and c(\Gamma) the number of connected components. This is pretty obvious if you think about it: consider the extremal case where we have one connected component consisting of a tree. In this case the difference r(\Gamma)-c(\Gamma) is obviously equal to zero, and if we start adding edges the circuit rank goes up by one every time we add an edge, and moreover the construction is additive in the number of components (graph theory people – did I get that right?).

So, at the coarsest level, the output of the Feynman transform is apparently a generating function for (possibly disconnected) graphs sorted by circuit rank, where we are restricting to graphs in which every vertex has degree at least 3. Now this is not quite the case, since many combinatorial maps will give rise to the same topological map. Indeed, if we take the combinatorial map (\sigma,\tau) and conjugate it by an arbitrary permutation \pi \in \mathrm{S}(d), then we get another combinatorial map (\pi\sigma\pi^{-1},\pi\tau\pi^{-1}) which obviously yields the same topological map when the construction described above is performed, since we have simply relabeled everything by conjugating. Thus we expect that the correspondence between combinatorial and topological maps should be d!-to-one, which would be excellent news, since then the factor

\frac{1}{d!} |C_\alpha|t_d

would be equal to one. However, this is not precisely true, since there may be permutations \pi whose conjugation action on the combinatorial map (\sigma,\tau) has no effect, leaving this combinatorial map unaltered; this happens precisely when \pi commutes with both \pi and \sigma, or in other words belongs to the centralizer of the subgroup \langle \sigma,\tau \rangle of the symmetric group \mathrm{S}(d). Which they generate. It is thus reasonable to define the automorphism group of a combinatorial map to be this centralizer. We then see that the Feynman transform is indeed a generating function for graphs arranged by circuit rank, but in which each graph is counted with a weight equal to the reciprocal of the order of its automorphism group:

\mathcal{F}\left( \sum\limits_{d=3}^\infty p_d \frac{x^d}{d}\right):=\sum\limits_\Gamma \frac{\hbar^{r(\Gamma)-1}}{|\mathrm{Aut} \Gamma|}p_\Gamma,

the sum being over all topological maps of minimum vertex degree 3, with

p_\Gamma = \prod_{v \in \Gamma} p_{\deg(v)},

where the product is over all vertices of \Gamma. This product is called the “Feynman amplitude” of \Gamma, and it is the only part of the above construction which actually depends on the generating function S we started with — everything else is universal, being exactly the same irrespective of what S is. And that’s the general idea of the Feynman diagram expansion in zero-dimensional scalar-valued quantum field theory. It’s pretty useful, because for example we can say that the leading contribution in the asymptotic expansion of the integral we’re trying to approximate in the \hbar \to 0 limit comes from graphs which have minimum vertex degree 3, and two independent cycles, and there are only three of those:

Leading Feynman diagrams.

We’ll clean this up a bit next week, and then go on to the matrix-valued case, where the expansion depends not just on the underlying graph of a topological map, but also on the topology of the surface into which it is embedded.

Math 262A: Lecture 12

In Lecture 11, we were getting to the exciting part of this meandering topics course, where two seemingly disparate pieces of mathematics (Laplace’s Principle, the Exponential Formula) collide and produce something new (Feynman diagrams). Let’s review the two ingredients.

Take an interval [a,b] \subseteq \mathbb{R} which may be finite or infinite, and let S \colon [a,b] \to \mathbb{R} be a smooth function which attains a global minimum at a unique interior point c \in (a,b), and which is such that the integral

\int_a^b e^{-S(x)} \mathrm{d}x

is finite.

Theorem (Laplace Principle): The quantity

I(\hbar) = \sqrt{\frac{S''(c)}{2\pi\hbar}} e^{\frac{1}{\hbar}S(c)} \int_a^b e^{-\frac{1}{\hbar}S(x)} \mathrm{d}x

extends to a smooth function of \hbar \in [0,\infty), with I(0)=1.

The utility of this result is the following. Let

\sum\limits_{j=0}^\infty a_j\hbar^j

be the Maclaurin series of the smooth function I(\hbar). Taylor’s theorem with remainder tells us that for any nonnegative integer k, we have

\left|  I(\hbar) - \sum\limits_{j=0}^ka_j\hbar^j \right| \leq M_k \hbar^{k+1},

where M_k \geq 0 depends on k but not on \hbar. In other words, we have the \hbar \to 0 asymptotic expansion

I(\hbar) = \sum\limits_{j=0}^k a_j \hbar^j + o(\hbar^k),

where the error term is uniform in \hbar. Often times this is used to get the N \to \infty asymptotic expansion of an integral of the form

\int_a^b e^{-NS(x)} \mathrm{d}x,

in which case setting \hbar = 1/N the Laplace principle gives us the N \to \infty asymptotics

\sqrt{\frac{NS''(c)}{2\pi}} e^{NS(c)} \int_a^b e^{-NS(x)} \mathrm{d}x = 1+  \frac{a_1}{N} + \dots + \frac{a_k}{N^k} + o\left(\frac{1}{N^k} \right).

The prototypical example is Stirling’s approximation to N! via the Euler integral.

Now we recall the Exponential Formula, which tells us how to compute the exponential of a formal power series

S(x) = \sum\limits_{d=1}^\infty s_d \frac{x^d}{d!},

or equivalently how to compute the Maclaurin series of e^S when S is a smooth function with derivatives s_d=S^{(d)}(0) such that S(0)=0. Writing

e^{S(x)} = 1 + \sum\limits_{d=1}^\infty h_d \frac{x^d}{d!},

the Exponential Formula says that

h_d = \sum\limits_{\pi \in \mathrm{Par}(d)} \prod\limits_{B \in \pi} s_{|B|},

the sum being over all partitions \pi of \{1,\dots,d\}, and the product being over the blocks B of each partition \pi. An alternative form of the Exponential Formula, which is more useful for our current purpose, arises when we instead write

S(x) = \sum\limits_{d=1}^\infty p_d \frac{x^d}{d},

so s_d=(d-1)!p_d. In terms of the scaled derivatives p_d, the formula for h_d becomes

h_d = \sum\limits_{\pi \in \mathrm{Par}(d)} \prod\limits_{B \in \pi} (d-1)!p_{|B|}.

Since a finite set of cardinality c can be cyclically ordered in (c-1)! ways, we can rewrite the above as

h_d = \sum\limits_{\sigma \in \mathrm{S}(d)} \prod\limits_{C \in \sigma} p_{|C|},

where the sum is over all permutations \sigma of \{1,\dots,d\} and the product is over the cycles C of each permutation \sigma. Observe that this product does not depend on the internal structure of the cycles C of \pi, but only their sizes, so that any two permutations of the same cycle type (i.e. any two permutations which are conjugates of one another) give the same contribution to the sum. That is to say, we may rewrite the above as

h_d = \sum\limits_{\alpha \vdash d} |C_\alpha| p_\alpha,

where the sum is over all Young diagrams \alpha, and for each diagram C_\alpha \subseteq \mathrm{S}(d) is the conjugacy class of permutations of cycle type \alpha and

p_\alpha = \prod\limits_{I=1}^{\ell(\alpha)} p_{\alpha_i}.

We now combine the Laplace Principle with the (permutation form of) the Exponential Formula to calculate the asymptotic expansion coefficients a_j. To simplify the computation, we will make several assumptions on the Laplace side, none of which are essential — these can all be eliminated without too much difficulty and the end result is more or less the same. First, we assume the integration is over the whole real line. Second, we assume that the critical point c where the unique minimum of S occurs is c=0, and that S(c)=0. Finally, let us assume that the positive number S''(0) is equal to one.

With these assumptions, we have

I(\hbar)=\frac{1}{\sqrt{2\pi\hbar}}\int_\mathbb{R} e^{-\frac{1}{\hbar}S(x)}\mathrm{d}x,

and the Maclaurin series of S has the form

\frac{x^2}{2} + \sum\limits_{d=3}^\infty p_d\frac{x^d}{d!}.

with p_d:=\frac{S^{(d)}(0)}{(d-1)!}. We factor the integrand as

e^{-\frac{1}{\hbar}S(x)} = e^{-\frac{1}{\hbar}\frac{x^2}{2}} e^{-\frac{1}{\hbar}(S(x)-\frac{x^2}{2})},

and by the Exponential Formula the Maclaurin series the exponential of the non-quadratic part of the action S is

1+\sum\limits_{d=1}^\infty \left(\sum\limits_{\alpha \vdash d} \frac{|C_\alpha|}{d!} (-1)^{\ell(\alpha)}p_\alpha \hbar^{-\ell(\alpha)} \right)x^d

with p_1=p_2=0. We get the Maclaurin series of I by integrating this term-by-term against the Gaussian density:

1+\sum\limits_{d=1}^\infty \left(\sum\limits_{\alpha \vdash d} \frac{|C_\alpha|}{d!} (-1)^{\ell(\alpha)}p_\alpha \hbar^{-\ell(\alpha)} \right)\frac{1}{\sqrt{2\pi\hbar}}\int_\mathbb{R}x^d\mathrm{d}x = 1+\sum\limits_{d=1}^\infty \sum\limits_{\alpha \vdash d} \frac{t_d|C_\alpha|}{d!} (-1)^{\ell(\alpha)}p_\alpha \hbar^{\frac{d}{2}-\ell(\alpha)}

where we used the exact evaluation

\frac{1}{\sqrt{2\pi\hbar}}\int_\mathbb{R} x^d \mathrm{d}x = t_d\hbar^{\frac{d}{2}},

with t_d the number of fixed point free involutions in the symmetric group \mathrm{S}(d).

In order to determine the expansion coefficients a_j, we now want to arrange the sum

1+\sum\limits_{d=1}^\infty \sum\limits_{\alpha \vdash d} \frac{t_d|C_\alpha|}{d!} (-1)^{\ell(\alpha)}p_\alpha \hbar^{\frac{d}{2}-\ell(\alpha)}

as a power series in \hbar. At this point, it is not even clear that the series involves only nonnegative integer powers of \hbar. First of all, the presence of \frac{d}{2} in the exponent of \hbar makes it look like we might have half-integer powers, but this is not the case since t_d=0 whenever d is an odd number. Next, the actual exponent of \hbar is \frac{d}{2}-\ell(\alpha), which seems like it could be negative, but in fact since p_1=p_2=0 the quantity p_\alpha vanishes if \alpha \vdash d has more than \frac{d}{3} rows. Combining these conditions, we see that the first non-zero exponent of \hbar occurs at d=6, and the only \alpha \vdash 6 which contributes to the sum is \alpha=(3,3), the diagram with two rows each of length 3, so that in this case we have

\hbar^{\frac{d}{2}-\ell(\alpha)}=\hbar^{3-2}=\hbar^1.

Another thing which is not quite obvious is that this is the only situation which produces a power of 1 in the exponent of \hbar, or even that there are finitely many such situations. Let us see why this is the case. In order for this exponent to arise, we must have that \alpha \vdash d is a Young diagram with an even number of cells such that \ell(\alpha)=\frac{d}{2}-1. Since all rows of \alpha have length at least three, we get the inequality

\frac{d}{2}-1 \leq \frac{d}{3} \implies d \leq 6.

So, the only term in the sum which is linear in \hbar is

\frac{1}{6!}t_6|C_{(3,3)}|p_3^2,

and we conclude that

a_1=\frac{1}{6!}t_6|C_{(3,3)}|p_3^2.

We can explicitly evaluate the “universal” part of this formula, i.e. the part that has no dependence one S. As previously discussed, the number of fixed point free involutions in S(6) is

t_d=(5-1)!! = 5 \cdot 3 \cdot 1 = 15.

Next, the number of permutations in S(6) whose disjoint cycle decomposition is of the form (*\ *\ *)(*\ *\ *) is

\frac{{6 \choose 3} \cdot 2 \cdot 2}{2} = 20 * 2=40,

because there are {6 \choose 3} ways to choose the elements in cycles and 2 ways to cyclically arrange each, and this overcounts by a factor of 2 since the cycles are unordered. We thus conclude that

a_1 = \frac{40*15}{720}p_3^2 = \frac{5}{6}p_3^2.

(Note: I think I made a mistake here, because the subleading correction in Stirling’s formula is supposed to be \frac{1}{12}, which is ten times smaller than \frac{5}{6}).

Clearly, to continue this calculation to find all the coefficients a_j, we need a better way to organize this information. This can be done pictorially, and is the simplest example of the use of Feynman diagrams in perturbation theory. The number t_d|C_\alpha| is equal to the number of pairs (\gamma,\varphi) \in \mathrm{S}(d) \times \mathrm{S}(d) such that \gamma is a fixed point free involution and \varphi \in C_\alpha is a permutation of cycle type \alpha. As we discussed in Lecture 11, such pairs of permutations are referred to as combinatorial maps, or rotation systems, because they can be converted into topological maps, which are graphs drawn on surfaces. So, graphs on surface are the Feynman diagrams of zero dimensional quantum field theory. We will pick this up in Lecture 13, where we will give the full diagrammatic interpretation of the expansion coefficients a_j. After that, the plan is to tackle the same problem when we replace integration over \mathbb{R} with integration over the real vector space \mathrm{H}(N) of N \times N Hermitian matrices; in this situation we can actually distinguish the genus of the maps which arise. After that we will generalize further to integrating over the real vector space of selfadjoint elements in a finite-dimensional von Neumann algebra.

Math 262A: Lecture 6

In this Lecture we continue the the N \to \infty asymptotic analysis of N!_d with d \in \mathbb{N} arbitrary but fixed. Recall that N!_d was defined in Lecture 5 as the number of permutations in \mathrm{S}(N) which have no increasing subsequence of length d+1.

Our starting point is the “gluing” trick we used to recover Knuth’s formula

N!_3 = \frac{1}{N+1}{2N \choose N}.

Based on this, we can hope that the asymptotics of N!_d might coincide with the asymptotics of

\dim R(d, \frac{2N}{d}),

where R(a,b) is the a \times b rectangular Young diagram. This would be excellent, since the asymptotics of the dimension of a very long rectangle of fixed height are easy to deduce from Frobenius’s formula for \dim \lambda together with Stirling’s approximation of N!.

Conjecture 1: For any fixed d \in \mathbb{N}, we have

(dN)!_d =\dim R(d,2N)+o\left(\dim R(d,2N)\right)

as N \to \infty.

To begin, we write

(dN)!_d = \sum\limits_{\lambda \in \mathrm{Y}_d(dN)} (\dim \lambda)^2,

which is just RSK. We want to compare this sum to \dim R(d,2N) using our “gluing” intuition. To do this, let us define the notion of the complement \mu^* of a diagram \mu \subseteq R(d,2N). This is exactly what you think it is: if \mu=(\mu_1,\dots,\mu_d), then \mu^*=(2N-\mu_d,\dots,2N-\mu_1) — draw yourself a picture. The basis of our gluing trick was that in the case d=2, every diagram is self complementary: for \mu \in \mathrm{Y}_2(2N), we have

\mu=(\mu_1,\mu_2) = (\mu_1, 2N-\mu_1)

and

\mu^*=(2N-(2N-\mu_1),2N-\mu_1) = (\mu_1,2N-\mu_1).

This is not necessarily true for d > 2, but for any d we certainly have that the set of standard Young tableaux (SYT) of shape R(d,2N) is in bijection with the set of pairs (P,Q) of SYT in which the shape of P is a diagram \mu \in \mathrm{Y}_d(dN) and the shape of Q is the complementary diagram \mu^*. The bijection is obtained simply by splitting any SYT of shape R(d,2N) in half according to the decomposition

\{1,\dots,2dN\} = \{1,\dots,dN\} \sqcup \{dN+1,\dots,2dN\}

of its entries, rotating the half carrying the entries \{dN+1,\dots,2dN\}, and subtracting dN from each entry. In formulas, this is

\dim R(d,2N) = \sum\limits_{\mu \in \mathrm{Y}_d(dN)} (\dim \mu)(\dim \mu^*).

We can now compare (dN)!_d and \dim R(d,2N) by comparing the above sums. The RSK sum formula for (dN)!_d can be decomposed according to whether \lambda \in \mathrm{Y}_d(dN) is contained in R(d,2N) or not,

(dN)!_d = \sum\limits_{\substack{\lambda \in \mathrm{Y}_d(dN) \\ \lambda \subseteq R(d,2N)}} (\dim \lambda)^2 + \sum\limits_{\substack{\nu \in \mathrm{Y}_d(dN) \\ \nu_1 > 2N}} (\dim \lambda)^2,

and the first group of terms can be further decomposed into two sum according to whether \lambda \subseteq R(d,2N) is self-complementary or not,

(dN)!_d = \sum\limits_{\substack{\mu \in \mathrm{Y}_d(dN) \\ \mu \subseteq R(d,2N) \\ \mu = \mu^*}} (\dim \mu)^2 +\sum\limits_{\substack{\mu \in \mathrm{Y}_d(dN) \\ \mu \subseteq R(d,2N) \\ \mu \neq \mu^*}} (\dim \mu)^2 + \sum\limits_{\substack{\nu \in \mathrm{Y}_d(dN) \\ \nu_1 > 2N}} (\dim \nu)^2.

Similarly, we have

\dim R(d,2N) = \sum\limits_{\substack{\mu \in \mathrm{Y}_d(dN) \\ \mu \subseteq R(d,2N) \\ \mu = \mu^*}} (\dim \mu)^2 +\sum\limits_{\substack{\mu \in \mathrm{Y}_d(dN) \\ \mu \subseteq R(d,2N) \\ \mu \neq \mu^*}} (\dim \mu)(\dim \mu^*).

Now use the second formula to substitute for the first group of terms in the first formula, obtaining

(dN)!_d = \dim R(d,2N)+\sum\limits_{\substack{\mu \in \mathrm{Y}_d(dN) \\ \mu \subseteq R(d,2N) \\ \mu \neq \mu^*}} (\dim \mu)^2 - \sum\limits_{\substack{\mu \in \mathrm{Y}_d(dN) \\ \mu \subseteq R(d,2N) \\ \mu \neq \mu^*}} (\dim \mu)(\dim \mu^*)+ \sum\limits_{\substack{\nu \in \mathrm{Y}_d(dN) \\ \nu_1 > 2N}} (\dim \nu)^2.

Observing that

\sum\limits_{\substack{\mu \in \mathrm{Y}_d(dN) \\ \mu \subseteq R(d,2N) \\ \mu \neq \mu^*}} (\dim \mu)^2 = \sum\limits_{\substack{\mu \in \mathrm{Y}_d(dN) \\ \mu \subseteq R(d,2N) \\ \mu \neq \mu^*}} (\dim \mu^*)^2,

we complete the square to obtain the following.

Proposition 1: We have (dN)!_d = \dim R(d,2N) + E_d(N), where the error term is

E_d(N) = \frac{1}{2} \sum\limits_{\substack{\mu \in \mathrm{Y}_d(dN) \\ \mu \subseteq R(d,2N) \\ \mu \neq \mu^*}} (\dim \mu-\dim \mu^*)^2 + \sum\limits_{\substack{\nu \in \mathrm{Y}_d(dN) \\ \nu_1 > 2N}} (\dim \nu)^2.

The error term E_d(N) reflects the two sources of error in the approximation

(dN)!_d \approx \dim R(d,2N),

namely that the right hand side fails to account for diagrams which aren’t self-complementary relative to R(d,2N), or which don’t fit inside R(d,2N). In particular, we have that E_2(N)=0, corresponding to Knuth’s theorem, but E_d(N)>0 for d>2. Conjecture 1 says that E_d(N) is negligible for large N.

If E_d(N) is in fact negligible, it must be the case that the main contributions to the sum expressing (dN)!_d come from diagrams \lambda \in \mathrm{Y}_d(dN) which are contained in R(d,2N), and are moreover self-complementary relative to this rectangle. The simplest diagram with these features is of course half of R(d,2N), i.e. the rectangle R(d,N). So we are hopeful that diagrams in a neighborhood of this rectangle are essentially contributions from a neighborhood of the maximum of \dim \lambda, just like in Laplace’s method.

To investigate the above, we want to consider the large N behavior of the functional

f_N(y_1,\dots,y_N;\varepsilon) =\dim (N+y_1N^\varepsilon, \dots, N+y_dN^\varepsilon),

where \varepsilon \in (0,1) and (y_1,\dots,y_d) \in \mathbb{R}^d is a vector such that

y_1 > \dots > y_d \quad\text{ and }\quad y_1+\dots+y_d=0.

The set \Omega_{d-1} is a (d-1)-dimensional convex subset of \mathbb{R}^d, which is in fact a hyperplane slice of the type A Weyl chamber in \mathbb{R}^d (you can safely ignore these words if you don’t know what they mean).

Note that we are evaluating \dim at possibly non-integral numbers, so we are in effect taking Frobenius’s formula as a way to extend the domain of \dim, just like Euler’s integral extends the domain of N! to continuous arguments (and actually we’re using Euler’s integral inside the Frobenius formula because it contains factorials). The functional f_N is capturing the behavior of the dimension function in a neighborhood of the suspected maximizer, which to compare with Laplace’s method is like looking at the Taylor expansion of the e^{-NS(x)} in a neighborhood of its maximum.

Let’s see how f_N(\cdot;\varepsilon) behaves as N \to \infty. We have

f_N(y_1,\dots,y_N) = \frac{(dN)!}{\prod_{i=1}^d(N+y_iN^\varepsilon + d-i)!}\prod\limits_{1 \leq i<j \leq d} \left((y_i-y_j)N^{\varepsilon} + j-i \right),

where x! = \int_0^\infty t^x e^{-t} \mathrm{d}t as we have discussed. We now have to think about the N \to \infty behavior of the various pieces of this formula. The easiest piece is (dN)!, for which we have

(dN)! \sim \sqrt{2\pi} \frac{(dN)^{dN+\frac{1}{2}}}{e^{dN}}

by Stirling’s approximation. Now consider the product in the denominator,

\prod\limits_{i=1}^d(N+y_iN^\varepsilon + d-i)!.

For each factor in this product, we have that

(N+y_iN^\varepsilon + d-i)! \sim N^{d-i+1} (N+y_iN^\varepsilon-1)!

by the usual recurrence x!=x(x-1)! for factorials, which follows from Euler’s integral. Thus

\prod\limits_{i=1}^d(N+y_iN^\varepsilon + d-i)! \sim N^{1+2+\dots+d}\prod\limits_{i=1}^d(N+y_iN^\varepsilon-1)! \\= N^{\frac{d(d+1)}{2}}\prod\limits_{i=1}^d(N+y_iN^\varepsilon-1)!

To estimate the remaining simplified product, let’s take a logarithm; this will convert a product of big numbers into a sum of smaller numbers, which ought to be easier to understand. We obtain

\log \prod\limits_{i=1}^d(N+y_iN^\varepsilon-1)! = \sum\limits_{i=1}^d \log (N+y_iN^\varepsilon-1)!

Now using Stirling’s approximation in the form

\log (N-1)! \sim \frac{1}{2}\log 2\pi + (N-\frac{1}{2})\log N-N,

the asymptotic form of the above sum is

\sum\limits_{i=1}^d \log (N+y_iN^\varepsilon-1)!\sim \frac{d}{2} \log 2\pi -dN + \sum\limits_{i=1}^d \left(N+y_iN^\varepsilon - \frac{1}{2} \right)\log \left( N+y_iN^\varepsilon\right).

Nothing for it but to keep hammering away. We have

\sum\limits_{i=1}^d \left(N+y_iN^\varepsilon - \frac{1}{2} \right)\log \left( N+y_iN^\varepsilon\right) = dN\log N - \frac{d}{2} \log N + \sum\limits_{i=1}^d \left(N+y_iN^\varepsilon - \frac{1}{2} \right)\log \left( 1+y_iN^{\varepsilon-1}\right),

and the purpose of this step was to allow us the fact that \varepsilon <1 together with the Maclaurin series of \log(1+x) to write

\log \left( 1+y_iN^{\varepsilon-1}\right) = y_iN^{\varepsilon-1} - \frac{1}{2} y_i^2N^{2\varepsilon-2} + O(N^{3\varepsilon-3}),

which finally gets us to

\sum\limits_{i=1}^d \left(N+y_iN^\varepsilon - \frac{1}{2} \right)\log \left( 1+y_iN^{\varepsilon-1}\right) \sim \frac{N^{2\varepsilon-1}}{2} \sum\limits_{i=1}^d y_i^2,

where, crucially, we used the fact that y_1+\dots+y_d=0. Notice that \varepsilon = \frac{1}{2} is emerging as the right scaling here — do you see why? Assembling all these calculations into one master estimate, we obtain

\frac{(dN)!}{\prod_{i=1}^d(N+y_iN^\varepsilon + d-i)!} \sim \frac{e^{dN}}{(2\pi)^{\frac{d}{2}}N^{dN+\frac{d^2}{2}}} e^{-\frac{N^{2\varepsilon-1}}{2} \sum_{i=1}^d y_i^2}.

There is one more piece of Frobenius’s formula which we still have to estimate, but this one is easy:

\prod\limits_{1 \leq i<j \leq d} \left((y_i-y_j)N^{\varepsilon} + j-i \right) \sim N^{\varepsilon \frac{d(d-1)}{2}} \prod\limits_{1 \leq i<j \leq d} (y_i-y_j).

At the end of the day (which it is), if we take \varepsilon = \frac{1}{2} then we have the following.

Theorem 1: For any fixed d \in \mathbb{N} and any (y_1,\dots,y_d) \in \Omega_{d-1}, we have

\dim\left(N+y_1\sqrt{N},\dots,N+y_d\sqrt{N} \right) \sim \frac{(dN)!e^{dN}}{(2\pi)^{\frac{d}{2}} N^{dN+\frac{d^2}{2}}}e^{-S(y_1,\dots,y_d)},

where

S(y_1,\dots,y_d) = \frac{1}{2} \sum\limits_{i=1}^d y_i^2 - \sum\limits_{1 \leq i<j \leq d} (y_i-y_j).

The action S which has emerged from this herculean computation is an extremely interesting and important special function. We will discuss this further next lecture, and complete our analogue of Stirling’s approximation for N!_d.