Math 262A: Lecture 9

This week, we will discuss a remarkable law of nature discovered by George Polya. Consider a particle situated at a given site of the lattice \mathbb{Z}^d. At each tick of the clock, the particle makes a random jump to a neighboring lattice site, with equal probability of jumping in any direction. In other words, the particle is executing simple random walk on \mathbb{Z}^d. A random walk is said to be recurrent if it returns to its initial position with probability one; otherwise, it is said to be transient.

Theorem (Polya): Simple random walk on \mathbb{Z}^d is recurrent for d=1,2 and transient for d \geq 3.

Many proofs of this theorem are known, and the extent to which any two such proofs are different is a perennial subject of debate. In this lecture and the next, we will present the proof given here, which combines a number of the techniques we have so far encountered in this course (exponential generating functions, Laplace method), and introduces new ones that we will need going forward (ordinary generating functions, Borel transform). We will use all of these new methods when we begin our discussion of Feynman diagrams next week.

In case you are frustrated with the admittedly slow pace of the course so far and want to know what is planned for its remainder, here is a rough outline. First I want to discuss Feynman diagrams for zero-dimensional scalar-valued QFTs, which are combinatorial maps. Then I would like to discuss Feynman diagrams for zero-dimensional matrix-valued QFTs, which are again maps on surfaces but with more of their topological structure visible. After this I want to look at a framework which subsumes both of these, namely Feynman diagrams for zero-dimensional QFTs with values in a finite-dimensional von Neumann algebra. Finally, I would like to discuss the combinatorics of zero-dimensional QFTs with values in a matrix group, a situation which generalizes the method of stationary phase and for which the appropriate Feynman diagrams seem to be branched covers of compact Riemann surfaces.

Back to Polya’s theorem. For n \in \mathbb{N}, let E_n denote the event that simple random walk on \mathbb{Z}^d returns to its initial position for the first time after n steps. It is convenient to set E_0=\emptyset, corresponding to the fact that the particle is tautologically situated at its initial position at time zero and this should not count as a return. Write p_n for the probability of E_n. The events E_n are mutually exclusive for different values of n, and

E=\bigsqcup\limits_{n=0}^\infty E_n

is the event that the particle returns to initial position after finitely many steps. We would like to calculate the probability

p= \sum\limits_{n=0}^\infty p_n

that E occurs.

Let us clarify that we are viewing \mathbb{Z}^d as a graph, whose vertices are the points of \mathbb{Z}^d with two vertices being adjacent if and only if they are unit Euclidean distance apart. A loop on \mathbb{Z}^d is then just a closed walk in the usual sense of graph theory. We consider the set of loops based at an arbitrary but fixed vertex of \mathbb{Z}^d, and denote by \ell_n the number of length n loops based at this point. It is convenient to consider the base point itself as a loop of length zero, the trivial loop, so that \ell_0=1.

A loop is said to be indecomposable if it is not the concatenation of two loops. Let r_n denote the number of length n indecomposable loops based at the given point, with the convention that the trivial loop is not indecomposable so that r_0=0 (this is like excluding 1 from the primes).

Proposition 1: For any n \geq 1, we have

\ell_n = \sum\limits_{k=0}^n r_k \ell_{n-k}.

Proof: Every loop of length n \geq 1 consists of an indecomposable loop of some length k \geq 1, followed by a (possibly trivial) loop of length n-k.


If we divide the formula in Proposition 1 by (2d)^n, which is the total number of n-step walks on \mathbb{Z}^d beginning at the given base point, we get the recursion

q_n = \sum\limits_{k=0}^n p_k q_{n-k},

where p_n is the probability of a first return at time n as above, and q_n is the probability that the particle is situated at its initial position at time n.

Exercise 1: Which is bigger, p_n or q_n? (Hint: this is a trick question whose purpose is to make sure you are clear on the notation).

We now consider the ordinary generating functions of the sequences (p_n)_{n=0}^\infty and (q_n)_{n=0}^\infty, which by definition are the formal power series

P(z) = \sum\limits_{n=0}^\infty p_nz^n \quad\text{ and }\quad Q(z) = \sum\limits_{n=0}^\infty q_nz^n.

The product of these generating functions is

P(z)Q(z) =  \sum\limits_{n=0}^\infty \left(\sum\limits_{k=0}^n p_kq_{n-k}\right)z^n = \sum\limits_{n=1}^\infty \left(\sum\limits_{k=0}^n p_kq_{n-k}\right)z^n = Q(z)-1,

where we used the fact that p_0=0 together with Proposition 1. Now, each of the series P(z) and Q(z) has radius of convergence at least 1, because each has coefficients of absolute value at most 1, and therefore we may consider the equality


as an identity in the algebra of analytic functions on the open unit disc in \mathbb{C}. Since the coefficients of Q(z) are nonnegative numbers, the function Q(z) is non-vanishing on the interval [0,1), and we thus have

P(z) = 1- \frac{1}{Q(z)}, \quad z \in [0,1).

Now, since

P(1) = \sum_{n=0}^\infty p_n = p,

we can apply Abel’s power series theorem to obtain the following formula for the probability p that we are trying to calculate:

p = \lim\limits_{\substack{z \to 1 \\ z \in [0,1)}} P(z) = 1 - \frac{1}{\lim\limits_{\substack{z \to 1 \\ z \in [0,1)}}Q(z)}.

So, everything hinges on the limiting behavior of Q(z) as z approaches 1 through positive reals: we have either

\lim\limits_{\substack{z \to 1 \\ z \in [0,1)}}Q(z) = \infty,

in which case p=1 (recurrence), or

\lim\limits_{\substack{z \to 1 \\ z \in [0,1)}}Q(z) < \infty,

in which case p<1 (transience).

In order to determine which of the two limiting behaviors above Q(z) actually exhibits, we need some kind of tractable expression for this function. Combinatorially, this amounts to asking what can be said about the loop generating function

L(z) = \sum\limits_{n=0}^\infty \ell_n z^n

since we have Q(z) = L(\frac{z}{2d}) (Gian-Carlo Rota is said to have proclaimed that “probability is just combinatorics divided by n,” and that is exactly what is going on here). While there is not much that we can say about the ordinary generating function L(z) for lattice loops, it turns out that the corresponding exponential generating function,

E(z) = \sum\limits_{n=0}^\infty \ell_n \frac{z^n}{n!},

is analyzable. This is because the basic features of exponential generating functions, as discussed back in Lecture 3.

Let’s recall how this works. In this paragraph it is important to keep the dimension d of \mathbb{Z}^d in mind, so we write E_d(z) for the exponential generating function of the sequence (\ell_n^{(d)})_{n=0}^\infty of length n loops on \mathbb{Z}^d. To be concrete, let us consider what is going on for d=2. A walk on \mathbb{Z}^d is a sequence of steps, each of which is either horizontal or vertical, and so it is in effect a shuffle of two walks on \mathbb{Z}^1, one corresponding to horizontal steps and the other to vertical steps. Thus, a loop on \mathbb{Z}^2 is a shuffle of two loops on \mathbb{Z}^1. More quantitatively, we have

\ell_n^{(2)} = \sum\limits_{k=0}^n {n \choose k}\ell_k^{(1)}\ell_{n-k}^{(1)},

since to build a length n loop on \mathbb{Z}^2 we must choose a number k corresponding to the number of horizontal steps in the look, then choose a cardinality k subset of \{1,\dots,n\} corresponding to the times at which horizontal steps occur, and finally choose one of the \ell_k^{(1)} one-dimensional k-step loops to build on the chosen subset. The algebraic translation of this combinatorial reasoning is the generating function identity

E_2(z) = E_1(z)E_1(z).

Exactly the same reasoning applies in any dimension d: we have

E_d(z) = E_1(z)^d.

The above makes it seem at least plausible that we will be able to achieve some fairly explicit understanding of the exponential loop generating function E_d(z), since we have reduced to the case d=1 which ought to be straightforward. Indeed, we clearly have

E_1(z) = \sum\limits_{n=0}^\infty \ell_n^{(1)} \frac{z^n}{n!} = \sum\limits_{k=0}^\infty \ell_{2k}^{(1)} \frac{z^{2k}}{(2k)!},

since any loop on \mathbb{Z}^1 must consist of an even number of steps, half of which are positive and half of which are negative. Moreover, the number of loops is just the number of ways to choose the times at which the positive steps occur, so that

E_1(z) = \sum\limits_{k=0}^\infty {2k \choose k} \frac{z^{2k}}{(2k)!} = \sum\limits_{k=0}^\infty \frac{z^{2k}}{k!k!},

which is a nice, explicit power series. The question now is: can we sum this series? What this really means is: can we find the function whose Maclaurin series is E_1(z), and if so do we know anything useful about this function? Luckily, the answer turns out to be an emphatic yes: E_1(z) is a Bessel function.

The Bessel functions form a large class of special functions which crop up everywhere in mathematics, physics, and engineering, usually as solutions to physically meaningful ordinary differential equations. The particular class of Bessel functions relevant for us are called the modified Bessel functions of the first kind, and typically denoted I_\alpha(z), with \alpha being a parameter. The function I_\alpha(x) is one of two linearly independent solutions to the second order ODE

z^2 F''(z) + zF"(z) - (z^2+\alpha^2)F(z) = 0,

which is known as the modified Bessel equation (the other solution is of course the modified Bessel function of the second kind). Nineteenth century mathematicians found two ways to describe I_\alpha(z): as a power series,

I_\alpha(2z) = \sum\limits_{k=0}^\infty \frac{z^{2k+\alpha}}{k!(k+\alpha)!},

and as an integral,

I_\alpha(2z) = \frac{z^\alpha}{\sqrt{\pi}(\alpha-\frac{1}{2})!}\int_0^\pi e^{2z\cos\theta} (\sin \theta)^{2\alpha}\mathrm{d}\theta.

From the power series representation, we see that we have

E_1(z) = I_0(2z),

so that we now know

E_d(z) = I_0(2z)^d.

The above connection with Bessel functions leaves us in good shape to analyze the exponential loop generating function. However, this isn’t what we actually want — remember that we want to analyze the ordinary loop generating function in order to solve our problem. So we need a way to convert exponential generating functions into ordinary generating functions. This can be done using an integral transform called the Borel transform, which is in fact just an application of Euler’s integral representation of the factorial,

n! = \int_0^\infty t^n e^{-t} \mathrm{d}t,

which we discussed way back in Lecture 1. Indeed, if

E(z) = \sum\limits_{n=0}^\infty a_n \frac{z^n}{n!}

is the exponential generating function of the sequence (a_n)_{n=0}^\infty, then its Borel transform

\mathcal{B}E(z) = \int_0^\infty E(tz) e^{-t} \mathrm{d}t = \sum\limits_{n=0}^\infty a_n \frac{z^n}{n!} \int_0^\infty t^ne^{-t} \mathrm{d}t = \sum_{n=0}^\infty a_nz^n

is the ordinary generating function of this same sequence.

Now it is clear what must be done: we have to Borel transform the exponential loop generating function, which is a Bessel function, in order to get to the ordinary loop generating function. That is, we have

\mathcal{B}E_d(z) = \int\limits_0^\infty I_0(2tz)^d e^{-t} \mathrm{d}t,

so that we finally have the representation

Q(z) = \int\limits_0^\infty I_0(\frac{2tz}{d})^d e^{-t} \mathrm{d}t

for the function whose limit behavior we want to analyze.

Recall that what we want to do is analyze the z\to 1 behavior of

Q(z) = \int\limits_0^\infty I_0(\frac{2tz}{d})^d e^{-t} \mathrm{d}t,

and determine whether the limit is finite or infinite, so whether or not the above integral is convergent or divergent. Since all we care about is convergence or divergence, it suffices to consider the tail of the integral,

\int\limits_N^\infty I_0(\frac{2tz}{d})^d e^{-t} \mathrm{d}t,

with N large. Now because the integrand here is a Bessel function, it is itself an integral:

I_0(\frac{2tz}{d})^d = \int_0^\pi e^{tf(\theta)} \mathrm{d}\theta,

where f(\theta) = \frac{z}{d}\cos \theta and t \geq N is large. We are thus perfectly positioned for an application of the Laplace approximation (end point case),

\int_0^\pi e^{tf(\theta)} \mathrm{d}\theta \sim e^{tf(0)} \sqrt{\frac{\pi}{2t|f''(0)|}},\ t \to \infty.

Wading back through the formulas, what we have found is that

I_0(\frac{2tz}{d})^d e^{-t} \sim \text{const.} e^t(z-1)(tz)^{-\frac{d}{2}},\ t \to \infty.

Now, by the monotone convergence theorem we have

\lim\limits_{\substack{z \to 1 \\ z \in [0,1)}} \int_N^\infty e^t(z-1)(tz)^{-\frac{d}{2}} \mathrm{d}t = \int_N^\infty \lim\limits_{\substack{z \to 1 \\ z \in [0,1)}} e^t(z-1)(tz)^{-\frac{d}{2}} \mathrm{d}t = \int_N^\infty t^{-\frac{d}{2}}\mathrm{d}t,

an integral which diverges for d=1,2 and converges for d \geq 3. Polya’s theorem is proved, and if you’re willing to retrace our steps a bit you will find that you have a reasonable idea of what the mechanism is that is responsible for recurrence of simple random walk in dimensions d=1,2, and transience in dimensions d\geq 3. Indeed, if you want to get your hands dirty you can even try to estimate p by really staring at the above argument in order to get a feeling for how likely it is that simple random walk will escape to infinity.

I cannot resist mentioning that Bessel functions are intimately related to the enumeration of permutations with bounded decreasing subsequence length, which was our topic last week. The relation is via the following beautiful identity due to Ira Gessel.

Theorem (Gessel): For any d \in \mathbb{N}, we have

\sum\limits_{n=0}^\infty (n!)_d \frac{z^{2n}}{n!n!} = \det [I_{i-j}(2z)]_{i,j=1}^d,

where (n!)_d is the number of permutations in \mathrm{S}(n) with no decreasing subsequence of length d+1.

If you would like to see a proof of this theorem together with various pointers to the literature, I can refer you to this paper. In fact, since there will be no lecture post on Thursday, you can substitute this reading assignment for the missing lecture.

Math 262A: Lecture 8

Recall from Lecture 7 that the sequence of functions (f_N)_{N=1}^\infty defined on the region

\Omega_{d-1} = \{(y_1,\dots,y_d) \in \mathbb{R}^{d} \colon y_1 > \dots > y_d,\ y_1+\dots+y_d=0\}


f_N(y_1,\dots,y_d) = C_N(d) \dim (N+y_1\sqrt{N},\dots,N+y_d\sqrt{N}),


C_N(d) = (2\pi)^{\frac{d}{2}} \frac{N^{dN+\frac{d^2}{2}}}{(dN)!e^{dN}},

converges pointwise on \Omega_{d-1}: we have

\lim\limits_{N \to \infty} f_N(y_1,\dots,y_N) = e^{-S(y_1,\dots,y_d)},


S(y_1,\dots,y_d) = \frac{1}{2}\sum\limits_{i=1}^d y_i^2-\sum\limits_{1 \leq i< j \leq d} \log(y_i-y_j)

is the potential energy of a system of identical two-dimensional point charges on \mathbb{R} at locations y_1> \dots > y_d.

For arbitrary \beta > 0, define

N!_d^{(\beta)} = \sum\limits_{\lambda \in \mathbf{Y}_d(dN)} (\dim \lambda)^\beta,

the sum being over the set \mathbf{Y}_d(dN) of Young diagrams with dN boxes and at most d rows. According RSK, at \beta=2 this is the number of permutations in \mathrm{S}(N) with no decreasing subsequence of length N+1, and at \beta=1 it is the number of involutions satisfying the same.

We now use our limit theorem for f_N to calculate the asymptotics of N!_d^{(\beta)}. The first step is to rewrite the sum defining N!_d^{(\beta)} so that it is centered on the rectangular diagram R(d,N) \in \mathbf{Y}_d(dN), which as we have discussed in the previous lectures is the canonical self-complementary diagram relative to the twice-longer rectangle R(d,2N) which appear in Conjecture 1 of Lecture 6, which is the statement we are trying to prove. We have

C_N(d)N!_d^{(\beta)} = \sum\limits_{\substack{\frac{(d-1)N}{\sqrt{N}} \geq y_1 \geq \dots \geq y_{d-1} \geq \frac{-N}{\sqrt{N}} \\ y_i \in \frac{1}{\sqrt{N}}\mathbb{Z}}} (f_N(y_1,\dots,y_d))^\beta,

where y_d:=-(y_1+\dots+y_{d-1}). Now, the lattice \left(\frac{1}{\sqrt{N}}\mathbb{Z} \right)^{d-1} partitions \mathbb{R}^{d-1} into cells

\left[\frac{k_1}{\sqrt{N}},\frac{k_1+1}{\sqrt{N}} \right) \times \dots \left[\frac{k_N}{\sqrt{N}},\frac{k_N+1}{\sqrt{N}} \right), \quad k_i \in \mathbb{Z},

of volume \left(\frac{1}{\sqrt{N}} \right)^{d-1}, so that

\left(\frac{1}{\sqrt{N}} \right)^{d-1}C_N(d)N!_d^{(\beta)} = \left(\frac{1}{\sqrt{N}} \right)^{d-1}\sum\limits_{\substack{\frac{(d-1)N}{\sqrt{N}} \geq y_1 \geq \dots \geq y_{d-1} \geq \frac{-N}{\sqrt{N}} \\ y_i \in \frac{1}{\sqrt{N}}\mathbb{Z}}} (f_N(y_1,\dots,y_d))^\beta

is a Riemann sum for the integral

\int\limits_{\Omega_{d-1}} f_N(y_1,\dots,y_N)^\beta \mathrm{d}y_1 \dots \mathrm{d}y_{d-1}.

We thus have that

\lim\limits_{N \to \infty} \left(\frac{1}{\sqrt{N}} \right)^{d-1}C_N(d)N!_d^{(\beta)}\left(\frac{1}{\sqrt{N}} \right)^{d-1} = \int\limits_{\Omega_{d-1}} \lim\limits_{N \to \infty} f_N(y_1,\dots,y_N)^\beta \mathrm{d}y_1 \dots \mathrm{d}y_{d-1}\\ = \int\limits_{\Omega_{d-1}} \lim\limits_{N \to \infty} e^{-\beta S(y_1,\dots,y_d)}\mathrm{d}y_1 \dots \mathrm{d}y_{d-1},

by the dominated convergence theorem. Thus, we have found the N \to infty asymptotics of N!_d^{(\beta)} if we can evaluate the integral on the right, which up to the linear constraint y_1+\dots+y_d=0 is the partition function of the system of the two-dimensional Coulomb gas we have been discussing. This integral is not particularly easy to evaluate, and to compute it one needs to know some version of the Selberg integral formula.

In fact, we don’t have to evaluate the integral to prove Conjecture 1 if we take advantage of symmetry. Define the sum

S_N(d,\alpha,\beta) = \sum\limits_{\substack{\mu \in \mathbf{Y}_d(dN) \\ \mu \subset R(d,2N)}} (\dim \mu)^\alpha (\dim \mu^*)^{\beta-\alpha},

where 0 \leq \alpha \leq \beta and \mu^* is the complement of \mu \subset R(d,2N). Repeating the argument above essentially verbatim (change variables from \lambda_i \in \mathbb{Z}_{\geq 0} to y_i \in N^{-1/2}\mathbb{Z} in the sum and scale by the mesh volume), we find that

\lim\limits_{N \to \infty} \left(\frac{1}{\sqrt{N}}\right)^{d-1} C_N(d) S_N(d,\alpha,\beta) = \int\limits_{\Omega_{d-1}} e^{-\alpha S(y_1,\dots,y_d)} e^{-(\beta-\alpha)S(-y_d,\dots,-y_1)} \mathrm{d}y_1 \dots \mathrm{d}y_{d-1}.

This integral is even worse than the last one, but we claim that in fact the two are equal:

\int\limits_{\Omega_{d-1}}  e^{-\beta S(y_1,\dots,y_d)}\mathrm{d}y_1 \dots \mathrm{d}y_{d-1} = \int\limits_{\Omega_{d-1}} e^{-\alpha S(y_1,\dots,y_d)} e^{-(\beta-\alpha)S(-y_d,\dots,-y_1)} \mathrm{d}y_1 \dots \mathrm{d}y_{d-1}.

This follows immediately if we can show that the action S has the symmetry

S(y_1,\dots,y_d) = S(-y_d,\dots,-y_1).

This in turn is physically obvious: reflecting each point charge relative the the attractive harmonic potential preserves the distance of each particle to the potential, as well as the distance between every pair of particles, and hence does not change the potential energy of the system (Exercise: prove this algebraically). We have thus proved the following generalization of Conjecture 1 from Lecture 6.

Theorem 1: For any d \in \mathbb{N} and 0 \leq \alpha \leq \beta, we have that

N!_d^{(\beta)} \sim S_N(d,\alpha,\beta)

as N \to \infty. In particular, taking \alpha=1 and \beta =2 we get

N!_d \sim \dim R(d,2N)

as N \to \infty.

The moral of this story is that using physical intuition to guide mathematical reasoning can be quite useful. This is just a small example. For something more challenging, you might try the following exercise suggested by Polya and Szego. Let me know if you solve it.

Exercise 1: Prove the Riemann hypothesis by showing that the imaginary coordinates of the non-trivial zeros of the zeta function are the energy levels of a quantum mechanical system, i.e. the eigenvalues of the selfadjoint operator which is the Hamiltonian of the system.

The fact that we can obtain the N \to \infty asymptotics of N!_d is really a consequence of the fact that we can describe this combinatorial function as a sum over Young diagrams. One could try various other versions of this. For example, let \sigma \in \mathrm{S}(d+1) be any fixed permutation, and define N!_\sigma to the the number of permutations \pi in the symmetric group \mathrm{S}(N) which avoid \sigma, meaning that no subsequence of \pi, when written in one-line notation, is order isomorphic to \sigma. Our exactly solvable function N!_d corresponds to choosing \sigma to be the reverse identity permutation, i.e. the Coxeter element in the Weyl group \mathrm{S}(d+1)=A_d. Not much is known about the N \to \infty asymptotics of the generalized factorial N!_\sigma for other choices of \sigma. The growth of this function is the subject of the Stanley-Wilf ex-conjecture, which asserts that N!_\omega \leq C_\sigma^N where C_\sigma > 0 does not depend on N. Stirling’s formula tells us that this is false for N!, which exhibits super-exponential growth, and the theorem of Regev we have proved over the course of the last few lectures tells us that this is true for N!_d.

The Stanley-Wilf conjecture was proved by Adam Marcus and Gabor Tardos in 2002. Their proof gives a value for C_\sigma which is exponential in d. On the other hand, we know from the asymptotics of N!_d, which we can explicitly compute, that in this case C_\sigma is polynomial in d (Exercise: check this by computing the asymptotics of \dim R(d,2N) using the usual Stirling formula (Hint: this was already suggested as an Exercise in Lecture 5)). For a while, it was believed that C_\sigma should be of polynomial growth for all choices of \sigma. This was disproved by Jacob Fox, who showed that in a certain precise sense C_\sigma is almost always of exponential growth. So N!_d really is special among pattern avoiding generalizations of the factorial function.

Math 262A: Lecture 7

We continue with the N \to \infty asymptotic analysis of N!_d, where d \in \mathbb{N} is arbitrary but fixed and N!_d is the number of permutations in the symmetric group \mathrm{S}(N) with no decreasing subsequence of length d+1.

At the end of Lecture 6, we computed the N \to \infty asymptotics of a diagram \in \mathbf{Y}_d(dN) fluctuating around the canonical self-complentary diagram R(d,N) (relative to the twice-larger rectangle R(d,2N), whose dimension we expect to asymptotically coincide with (dN)!_d) on a square root scale. Recall that

\Omega_{d-1} = \{(y_1,\dots,y_d) \in \mathbb{R}^d \colon y_1> \dots>y_d,\ y_1+\dots+y_d=0\}.

Theorem 1: For any (y_1,\dots,y_d) \in \Omega_{d-1}, we have

\lim\limits_{N \to \infty} C_d(N) \dim(N+y_1\sqrt{N},\dots,N+y_d\sqrt{N}) = e^{-S(y_1,\dots,y_d)},

where C_d(N) is

C_d(N) = (2\pi)^{\frac{d}{2}} \frac{N^{dN+\frac{d^2}{2}}}{(dN)!e^{dN}},

and S is

S(y_1,\dots,y_d) = \frac{1}{2} \sum\limits_{i=1}^d y_i^2 - \sum\limits_{1 \leq i < j \leq d} \log(y_i-y_j).

At the end of Lecture 5, we indicated that the action S is an important special function which merits further discussion; we will have that discussion now.

What Theorem 1 reveals is the remarkable fact, as N \to \infty, the dimension of a Young diagram with \lambda \in \mathbb{Y}_d(dN) fluctuates on a square root scale, and the fluctuations themselves behave like a 2D Coulomb gas of d identical point charges on a wire at inverse temperature \beta=1 and confined by a harmonic potential. Let us unpack this statement.

First, the Weyl chamber

\mathbb{W}^d = \{(y_1,\dots,y_d) \in \mathbb{R}^d \colon y_1> \dots>y_d\}

is the configuration space of d hard particles on a wire, labeled from right to left. If S(y_1,\dots,y_d) is the potential energy of this system of particles when it is in a given configuration, then the formalism of statistical mechanics says that the density which determines the probability to observe the system in a given configuration is

\frac{1}{Z_d(\beta)} e^{-\beta S(y_1,\dots,y_d)},

where \beta is the inverse temperature and

Z_d(\beta) = \int_{\mathbb{W}^d} e^{-\beta S(y_1,\dots,y_d)} \mathrm{d}y_1 \dots \mathrm{d}y)_d

is the partition function which ensures that this is the density of a probability measure. The partition function corresponding to this particular S is a famous integral called Mehta’s integral, and it can be computed exactly using the Selberg integral formula.

What kind of a system has energy functional S as in Theorem 1? According to Coulomb’s law, the repulsive force F between two identical point charges in \mathbb{R}^n is inversely proportional to the distance between them raised to the power (n-1). So in our usual n=3 world, F has an inverse square law, but in an n=2 world the repulsion is proportional to the reciprocal of the distance. Since energy is the integral of force, this accounts for the logarithmic terms in S — they represent the electrostatic repulsion between d identical two dimensional point charges confined to a wire. Left to their own devices, these charges would drift infinitely far away from each other on the unbounded wire \mathbb{R}. The quadratic part of S has the meaning of a harmonic potential which attracts each point charge even as they repel one another, leading to the existence of a non-trivial ground state, the minimizer of S.

What configuration minimizes S? This classical problem in electrostatics was solved by Stieltjes, and the answer is: the roots of the dth Hermite polynomial, aka the matching polynomial of the complete graph on d vertices. Thus fluctuations of Young diagrams have something to do with the roots of Hermite polynomials. On the other hand, the Hermite polynomials themselves are orthogonal polynomials with respect to the Gaussian density, so fluctuations of Young diagrams should somehow be linked to Gaussian objects of some kind. In fact, the right objects are Gaussian random matrices.

Consider the real vector space of d \times d symmetric matrices. This is a Euclidean space when equipped with the standard scalar product \langle X,Y \rangle = \mathrm{Tr} XY, i.e. the scalar product with respect to which the elementary matrices E_{ij} form an orthonormal basis. In particular, one can define the standard Gaussian measure on this Euclidean space by stipulating its density, e^{-\frac{1}{2} \mathrm{Tr} X^2}. A random matrix with this distribution will have Gaussian random variables as its entries, and these variables are independent up to the symmetry constraint, as you can see by canonically identifying such a random matrix with a d^2 dimensional random vector and taking its Laplace transform. However, the eigenvalues of x_1 > \dots > x_d of a d \times d Gaussian random symmetric matrix are neither Gaussian nor independent: the distribution of the random vector (x_1,\dots,x_d) in \mathbb{W}^d has density proportional to


where S is just as in Theorem 1. This is essentially due to a classical result of Hermann Weyl which says that the Euclidean volume of the set of d \times d symmetric matrices with given eigenvalues x_1 \geq \dots \geq d is proportional to the Vandermonde determinant in these eigenvalues,

\det [x_j^{d-i}]_{1 \leq i,j \leq d} = \prod\limits_{1 \leq i<j \leq d} (x_i-x_j).

In particular, the volume of the set of symmetric matrices with two or more equal eigenvalues is zero, as it should be, since one can see from the spectral theorem that this is not a full-dimensional subset of the space of symmetric matrices.

Theorem 1 is thus saying that Young diagram fluctuations behave like the eigenvalues y_1>\dots > y_d of the traceless random matrix Y_d = X_d - \frac{1}{d}(\mathrm{Tr}X_d)I. This surprising coincidence of Young diagram formulas and random matrix formulas is just the tip of a huge iceberg linking the two subjects, the culmination of which is the Baik-Deift-Johansson Theorem linking the fluctuations of the length of the longest increasing subsequence in a large random permutation with the eigenvalues of random matrices.

Concerning random matrices, the Coulomb gas with Boltzmann weight e^{-\beta S(x_1,\dots,x_d)} can be interpreted as the spectrum of a d \times d random matrix for all values of the inverse temperature \beta. For \beta=2,4, the relationship is as above: one takes Gaussian complex selfadjoint matrices with entries to get the \beta=2 case, and Gaussian quaternion selfadjoint matrices to get the \beta =4 case. For the other values of \beta, the random matrix realization of the 2D Coulomb gas confined to a wire is more subtle, and was found by our colleague Ioana Dumitriu together Alan Edelman.

The above was pretty sketchy but also hopefully somewhat interesting. This being a topics course, you may want to take the time to run down some of the details (you can always ask me for further explanation, or pointers to the literature).

In our next lecture, we will complete the derivation of the first-order asymptotics of N!_d. For now, you might want to ask yourself: how is this related to the symmetry

S(y_1,\dots,y_d) = S(-y_d,\dots,-y_1)

of the energy of the Coulomb gas, and why is this symmetry physically obvious? In the next lecture, I will also tell you a bit about the Stanley-Wilf Conjecture (now a theorem of Marcus and Tardos), which is related to the growth of an even more general factorial function N!_\sigma which counts the number of permutations in \mathrm{S}(N) which avoid a fixed permutation \sigma \in \mathrm{S}(d).

Math 262A: Lecture 4

Let’s begin by going over the computation from the end of Lecture 3, where we were trying to compute the coefficients m_d(N) in the expansion

e^{\sum_{d=1}^\infty c_d(N) \frac{z^d}{d!}} = 1 + \sum\limits_{d=1}^\infty m_d(N) \frac{z^d}{d!}


c_d(N) = \delta_{d \geq 3} N(-1)^{d-1} (d-1)!.

From the Exponential Formula, we get

m_d(N) = \sum\limits_{\pi \in \mathrm{P}(d)} \prod\limits_{B \in \pi} c_{|B|}(N),

so for each partition \pi of \{1,\dots,d\} the corresponding term in the sum m_d(N) is

\prod\limits_{B \in \pi} \delta_{|B| \geq 3} N(-1)^{|B|-1} (|B|-1)! = N^{b(\pi)} (-1)^{d-b(\pi)} \prod\limits_{B \in \pi} \delta_{|B| \geq 3} (|B|-1)!,

where b(\pi) is the number of blocks in \pi. The product

\prod\limits_{B \in \pi} \delta_{|B| \geq 3} (|B|-1)!

is counting the number of ways to build a cyclic permutation of length at least 3 on each block of \pi, so we conclude that

m_d(N) = \sum\limits_{k=1}^d (-1)^{d-k} N^k t_k(d)

with t_k(d) the number of permutations in the symmetric group \mathrm{S}(d) made up of exactly k cycles, each of length at least 3. We can write this a bit more precisely as

m_d(N) = \sum\limits_{k=1}^{\lfloor \frac{d}{3} \rfloor} (-1)^{d-k} N^k t_k(d).

It remains to solve Problem 2 from Lecture 3, which was to evaluate the integral

\int\limits_{-\varepsilon}^{+\varepsilon} x^d e^{-\frac{N}{2}x^2} \mathrm{d}x.

Unlike Problem 1, Problem 2 cannot be solved exactly. In the case d=0, this was discussed in Lecture 2: we can approximate

\int\limits_{-\varepsilon}^{+\varepsilon} e^{-\frac{N}{2}x^2} \mathrm{d}x = \int\limits_{-\infty}^{+\infty} e^{-\frac{N}{2}x^2} \mathrm{d}x + E_0(\varepsilon,N)

with E_0(\varepsilon,N) an exponentially small errror, and then evaluate the Gaussian integral

\int\limits_{-\infty}^{+\infty} e^{-\frac{N}{2}x^2} \mathrm{d}x = \sqrt{\frac{2\pi}{N}}.

For any fixed d \in \mathbb{N} we have an analogous approximation

\int\limits_{-\varepsilon}^{+\varepsilon} x^d e^{-\frac{N}{2}x^2} \mathrm{d}x = \int\limits_{-\infty}^{+\infty}x^d e^{-\frac{N}{2}x^2} \mathrm{d}x +E_d(\varepsilon,N),

with E_d(\varepsilon,N) an error term that remains to be quantified precisely but is clearly rapidly decaying given the exponentially small tails of the Gaussian density. We now evaluate the total integral

\int\limits_{-\infty}^{+\infty}x^d e^{-\frac{N}{2}x^2} \mathrm{d}x.

In fact, we will evaluate all such integrals simultaneously by looking instead at the integral

L(z) = \int\limits_{-\infty}^{+\infty} e^{-\frac{N}{2}x^2+xz} \mathrm{d}x,

which is the Laplace transform of the Gaussian density. On one hand, observe that the integral converges to define an entire function of z \in \mathbb{C} whose derivatives at z=0 can be computed by differentiating under the integral sign, and are exactly the integrals we’re trying to evaluate:

\bigg[\frac{\mathrm{d}}{\mathrm{d}z}\bigg]^d L(z) \big{|}_{z=0}=\int\limits_{-\infty}^{+\infty}x^d e^{-\frac{N}{2}x^2} \mathrm{d}x.

(Differentiation under the integral sign is our first contact with the ghost of Richard Feynman, a congenial specter that will continue to haunt us). On the other hand, we can compute L(z) exactly by completing the square: since

-\frac{N}{2}x^2+xz = -\frac{N}{2}\left( x^2 -\frac{2}{N}xz + \frac{1}{N^2}z^2\right) + \frac{1}{2N}z^2 = -\frac{N}{2}\left( x-\frac{1}{N}z\right)^2 + \frac{1}{2N}z^2,

we obtain

L(z) = e^{\frac{1}{2N}z^2} \int\limits_{-\infty}^{+\infty} e^{-\frac{N}{2}\left(x-\frac{1}{N}z\right)^2} \mathrm{d}x = e^{\frac{1}{2N}z^2} \int\limits_{-\infty}^{+\infty} e^{-\frac{N}{2}x^2} \mathrm{d}x,

by translation invariance of Lebesgue measure, and hence

L(z) = \sqrt{\frac{2\pi}{N}} e^{\frac{1}{2N}z^2}.

This is an important feature of the Gaussian density: up to minor details, it is a fixed point of the Laplace transform. In particular, this makes the Maclaurin series of L(z) is easy to compute directly:

L(z) = \sqrt{\frac{2\pi}{N}}\sum\limits_{k=0}^\infty \frac{(2k)!}{N^k 2^k k!} \frac{z^{2k}}{(2k)!}.

Taking a closer look at the nonzero coefficients of this series, we observe that

\frac{(2k)!}{2^k k!} = \frac{(2k)(2k-1)(2(k-1))(2k-3) \dots 1}{2^k k!} = (2k-1)(2k-3) \dots 1,

the product of all odd numbers below 2k. Thus in terms of the double factorial n!!, which is the product of all numbers below n of the same parity (and should not be confused with the much larger number (n!)!),$ we have

L(z) = \sqrt{\frac{2\pi}{N}}\sum\limits_{k=0}^\infty N^{-k} (2k-1)!!\frac{z^{2k}}{(2k)!}.

We thus obtain the integral formula

\int\limits_{-\infty}^{+\infty}x^d e^{-\frac{N}{2}x^2} \mathrm{d}x = \sqrt{\frac{2\pi}{N}}\begin{cases} 0, \text{ if } d \text{ odd } \\ N^{-\frac{d}{2}} (d-1)!!, \text{ if } d \text{ even.} \end{cases}

Exercise 1: Show that for d even, (d-1)!! is the number of permutations in \mathrm{S}(d) all of whose cycles have length exactly 2, i.e. the number of fixed point free involutions.

We have finally reached the point where we can legitimately write down Stirling’s formula. Here’s a step-by-step reconstruction of the argument we’ve been building.

Step 1: Obtain the integral representation

N! = \frac{N^{N+1}}{e^N}Z_N(S),

where S(w) = w- \log(1+w) and

Z_N(S) =\int\limits_{-1}^{+\infty} e^{-N(w-\log(1+w)} \mathrm{d}w.

Step 2: Demonstrate that the integral Z_N(S) localizes in an \varepsilon-neighborhood of the minimizer of S:

Z_N(S) = Z_{N,\varepsilon}(S),


Z_{N,\varepsilon}(S) = \int_{-\varepsilon}^{+\varepsilon}e^{-NS(w)}\mathrm{d}w + O(e^{-c(\varepsilon)}N).

Step 3: Write the integrand of Z_{N,\varepsilon}(S) in the form

e^{-NS(w)} = e^{-\frac{N}{2}w^2} e^{\sum_{d=3}^\infty c_d(N) \frac{w^d}{d!}}.

Step 4: Expand the non-Gaussian part of the integrand,

e^{\sum_{d=3}^\infty c_d(N) \frac{w^d}{d!}}  = 1 + \sum\limits_{d=1}^\infty m_d(N) \frac{w^d}{d!},

with the coefficients m_d(N) being the “disconnected” version of the coefficients c_d(N).

Step 5: Integrate term-by-term:

Z_{N,\varepsilon}(S) = \int_{-\varepsilon}^{+\varepsilon}e^{-\frac{N}{2}w^2}\mathrm{d}w + \sum\limits_{d=1}^\infty m_d(N) \int_{-\varepsilon}^{+\varepsilon}w^de^{-\frac{N}{2}w^2}\mathrm{d}w.

Step 6: Replace each truncated Gaussian integral with a total Gaussian integral at the cost of an exponentially small error E_{d,\varepsilon}(N), and evaluate:

m_d(N) \int_{-\varepsilon}^{+\varepsilon}w^de^{-\frac{N}{2}w^2}\mathrm{d}w = [d \text{ even}] m_d(N) N^{-\frac{d}{2}} \left(\frac{d}{2}-1\right)!! + E_{d,\varepsilon}(N).

The bracket in Step 6 is an Iverson bracket.

Problem 1: Carefully executing the above steps, show that we have obtained Stirling’s approximation: as N \to \infty, we have

N! = \sqrt{2\pi}\frac{N^{N+\frac{1}{2}}}{e^N}\left(1 + O(\frac{1}{N}) \right).

Problem 2: Develop a good bookkeeping system allowing the computation of subleading terms in this approximation.

Problem 3: Go through Steps 1 to 6, consider what features of S were used, and examine to what extent these features were needed.

These are the problems which we will focus on in the next lectures. In particular, Problem 2 leads to the concept of Feynman diagrams. You can start thinking about these problems now, and see what you can see.

Math 262A: Lecture 2

We continue with the estimation of N! for large N via Euler’s integral,

N! = \int_0^\infty t^N e^{-t} \mathrm{d}t.

As at the end of Lecture 1, we make the substitution t=Nx, thereby obtaining

N! = \int_0^\infty (Nx)^N e^{-Nx} N\mathrm{d}x = N^{N+1} \int_0^\infty e^{-N(x-\log x)} \mathrm{d}x.

Now, one way to characterize an algebraic combinatorialist is to say that such a person loathes \log x, this being some horrible transcendental thing, but loves \log(1+x), this being an exponential generating function for cyclic permutations:

\log(1+x) = \sum\limits_{d=1}^\infty (-1)^{d-1} (d-1)! \frac{x^d}{d!} = \sum\limits_{d=1}^\infty (-1)^{d-1} \frac{x^d}{d}.

Accordingly, we make the further change of variables x=1+w, which gets us to the formula

N!= \frac{N^{N+1}}{e^N} \int_{-1}^\infty e^{-N(w-\log (1+w))} \mathrm{d}w.

We now focus on the problem of determining the N \to \infty asymptotics of

I_N = \int_{-1}^\infty e^{-NS(w)} \mathrm{d}w,

where the “action” S(w) = w-\log(1+w) is a smooth function on (-1,\infty), the Maclaurin series of which is

S(w) = \frac{w^2}{2} - \frac{w^3}{3} + \frac{w^4}{4} - \dots.

Exercise 1: What is the radius of convergence of this series?

Having solved Exercise 1 (which is singularly easy), you know that the series expansion of S(w) is only valid in a neighborhood of w=0, so ostensibly is not of any help in the estimation of I_N, since the integrand involves S(w) with arbitrarily large inputs w. On the other hand, we’re trying to work out an approximation, not perform an exact evaluation, so perhaps we can get away with something local. Indeed, the shape of S indicates that this might well be the case. Since

S'(w) = 1- \frac{1}{1+w},

our action S has a unique stationary point at w=0, and the value S(0)=0 is the global minimum of S on its domain of definition, (-1,\infty).

Plot of S(w) near w=0.

This means that e^{-NS(0)}=1 is the global maximum of e^{-NS(w)}, the integrand of I_N: we have

0 < e^{-NS(w)} < 1 \quad \forall w \in (-1,\infty)-\{0\},

In particular the integrand of I_N is exponentially smaller than 1 for all w \neq 0. This means that the integrand of I_N is sharply peaked at the unique minimizer w=0 of the action S(w), and the larger N is, the more exaggerated this effect becomes, as in the plots below.

Plot of e^{-10S(w)} near w=0.
Plot of e^{-1000S(w)} near w=0.

Accordingly, we expect that as N \to \infty the integral I_N “localizes” at the stationary point of the action meaning that

I_N \approx e^{-NS(0)}

might not be such a bad approximation. Let us investigate this more closely: we fix \varepsilon >0, and seek to estimate the integral

\int_{-\varepsilon}^{+\varepsilon} e^{-NS(w)} \mathrm{d}w.

Being combinatorialists, we will count the complement, meaning that we will try to estimate the above integral by instead estimating

\int_{-1}^{-\varepsilon} e^{-NS(w)} \mathrm{d}w \quad\text{ and }\quad \int_{+\varepsilon}^{+\infty} e^{-NS(w)} \mathrm{d}w.

To estimate the left integral, observe that left of zero we have

S(w) > \frac{w^2}{2},

and hence

\int_{-1}^{-\varepsilon} e^{-NS(w)} \mathrm{d}w < \int_{-1}^{-\varepsilon} e^{-\frac{N}{2}w^2} \mathrm{d}w < \int_{-\infty}^{-\varepsilon} e^{-\frac{N}{2}w^2} \mathrm{d}w,

where the rightmost integral is convergent (say, by comparison with \int_{-\infty}^{-\varepsilon} e^{-\frac{N}{2}w} \mathrm{d}w). To control the rightmost integral, we can observe that

e^{-\frac{N}{2}w^2} = e^{-\frac{N}{4}w^2}e^{-\frac{N}{4}w^2} \leq e^{-\frac{N}{4}\varepsilon^2}e^{-\frac{N}{4}w^2}

for w \in (-\infty,-\varepsilon), so that we have the bound

\int_{-1}^{-\varepsilon} e^{-NS(w)} \mathrm{d}w < C_ Ne^{-\frac{N}{4}\varepsilon^2}


C_N = \int_{-\infty}^{-\varepsilon} e^{-\frac{N}{4}w^2} \mathrm{d}w < \infty


\lim\limits_{N \to \infty} C_N=0

by the Bounded Convergence Theorem. In particular, we can conclude that

\int_{-1}^{-\varepsilon} e^{-NS(w)} \mathrm{d}w = O(e^{-\varepsilon N}).

Now we estimate the \int_{+\varepsilon}^{+\infty}-integral using a similar strategy Since S(w) is strictly increasing for w>0, we have

e^{-NS(w)} = e^{-\frac{N}{2}S(w)}e^{-\frac{N}{2}S(w)} \leq e^{-\frac{N}{2}S(\varepsilon)}e^{-\frac{N}{2}S(w)}

for w \in (\epsilon,\infty). Consequently, we have the bound

\int_{+\varepsilon}^{+\infty} e^{-NS(w)} \mathrm{d}w < C_N(\varepsilon)e^{-\frac{N}{2}S(\varepsilon)},


C_N(\varepsilon) = \int_{+\varepsilon}^{+\infty} e^{-\frac{N}{2}S(w)} \mathrm{d}w <\infty

is a convergent integral, and moreover

\lim\limits_{N \to \infty} C_N(\varepsilon) = 0

by the Bounded Convergence Theorem. So we similarly have that

\int_{+\varepsilon}^\infty e^{-NS(w)} \mathrm{d}w = O(e^{-\frac{S(\varepsilon}{2} N}).

We thus conclude that

I_N = \int_{-\varepsilon}^{+\varepsilon} e^{-NS(w)} \mathrm{d}w + \text{exponentially decaying error in }N.

Just to remind ourselves where we are vis-a-vis our principal goal of approximation N!, we have arrived at the estimate

N! = \frac{N^{N+1}}{e^N}\left( \int_{-\varepsilon}^{+\varepsilon} e^{-NS(w)} \mathrm{d}w + \text{exponentially decaying error in }N\right).

It remains to estimate the \int_{-\varepsilon}^{+\varepsilon} integral. This is ostensibly more difficult than what we did above, since it apparently requires some actual insight into the behavior of S(w). On the other hand, we are now reduced to considering the behavior of the action on a small interval where we can represent it as a series: we have that

\int_{-\varepsilon}^{+\varepsilon} e^{-NS(w)} \mathrm{d}w = \int_{-\varepsilon}^{+\varepsilon} e^{-N\sum_{d=2}^\infty (-1)^d \frac{w^d}{d}} \mathrm{d}w

for all \varepsilon sufficiently small. Now, since

S(w) = \frac{w^2}{2} + O(w^3) \quad\text{ as }w \rightarrow 0,

we expect a further approximation of the form

\int_{-\varepsilon}^{+\varepsilon} e^{-NS(w)} \mathrm{d}w \approx \int_{-\varepsilon}^{+\varepsilon} e^{-\frac{N}{2}w^2} \mathrm{d}w.

Of course, such an approximation is not exactly true, and we have to understand how accurate it actually is in order to proceed rigorously. For now, however, let us see what would happen if we had the exact identity

\int_{-\varepsilon}^{+\varepsilon} e^{-NS(w)} \mathrm{d}w ``="\int_{-\varepsilon}^{+\varepsilon} e^{-\frac{N}{2}w^2} \mathrm{d}w,

where the quotes are meant to indicate that we are considering a hypothetical truth which we know to be false, but might still be conceptually useful.

Consider the integral

\int_{-\varepsilon}^{+\varepsilon} e^{-\frac{N}{2}w^2} \mathrm{d}w,

which we would like to approximate. An ideal outcome would be that we could exactly evaluate it. Unfortunately, even though the integrand here doesn’t look so bad, trying to compute the indefinite integral

\int e^{-\frac{w^2}{2}} \mathrm{d}w

is a fool’s errand, being in some sense analogous to trying to solve the quintic. On the other hand, by the same sort of rough estimates we have been using so far in this lecture, we have

\int_{-\varepsilon}^{+\varepsilon} e^{-\frac{N}{2}w^2} \mathrm{d}w =\int_{-\infty}^{+\infty} e^{-\frac{N}{2}w^2} \mathrm{d}w + \text{exponentially decaying error in }N.

The integral

\int_\mathbb{R} e^{-x^2} \mathrm{d}x,

which is known as the Gaussian integral, can in fact be exactly evaluated — not using the Fundamental Theorem of Calculus, but by exploiting symmetry. The answer is

\int_\mathbb{R} e^{-x^2} \mathrm{d}x = \sqrt{\pi},

a remarkable formula which is the basis of various amusing proclamations, including Lord Kelvin‘s assertion that “a mathematician is one to whom this is as obvious as that twice two makes four is to you.” I leave you to either try the derivation of this formula on your own, or look it up somewhere.

Changing variables in the Gaussian integral, we get the evaluation we actually need:

\int_{-\infty}^{+\infty} e^{-\frac{N}{2}w^2} \mathrm{d}w = \sqrt{\frac{2\pi}{N}}.

Following through on our thought experiment, this would imply the approximation

N! = \sqrt{2\pi}\frac{N^{N+\frac{1}{2}}}{e^N}\left(1 + \text{exponentially decaying error in }N\right).

The meat of this formula is correct, but the error term is the result of our wishful thinking, and it is false — we will correct it in Lecture 3.

Math 262A: Lecture 1

The goal of this topics course is to introduce you to asymptotic algebraic combinatorics. Each of these words is familiar individually, but when combined in this order the resulting phrase is ambiguous; indeed, looking at the list of lectures from a recent conference dedicated to asymptotic algebraic combinatorics, it is clear that it means different things to different people.

In this course, we will try to get a feeling for what asymptotic algebraic combinatorics is about by going on a quest: we are going to start at Stirling’s approximation, which might be considered the historically first result in asymptotic algebraic combinatorics, and from there forge a path to what physicists simply call “Large N,” which has something to do with quantum field theory. There will be various (hopefully edifying) digressions along the way, sometimes leading to unsolved problems — for example, I hope to discuss the pattern-restricted factorial, for which an analogue of Stirling’s formula is only sometimes known, and the Bhargava factorial, for which no analogue of Stirling’s formula is known at all. Let’s start the quest.

We want to estimate N!. Let’s start by writing down the stupidest thing we can think of — that way, things can only get better.

Proposition 1: We have

1 \leq N! \leq N^N.

Proof: By definition,

N! = \prod\limits_{x=1}^N x.

On the interval [1,N], the linear function l(x) = x has a minimum value of 1 at x=1, and a maximum value of N at x=N.

— Q.E.D.

To improve on the shameful bounds in Proposition 1, we can use a multiplicative version of Gauss’s famous “replica trick” for summing consecutive numbers.

Proposition 2: We have

N^{N/2} \leq N! \leq \left(\frac{N+1}{2} \right)^N.

Proof: Take two copies of N!, multiply them together

N! \cdot N! = \prod\limits_{x=1}^Nx \cdot \prod\limits_{x=1}^N x,

write the second product backwards,

N! \cdot N! = \prod\limits_{x=1}^Nx \cdot \prod\limits_{x=1}^N (N-x+1),

and interlace the factors of the two products to get

(N!)^2 = \prod\limits_{x=1}^N x(N-x+1).

The minimum of the quadratic function

q(x) = x(N-x+1) = -(x-\frac{1}{2}(N+1))^2 +\frac{1}{4}(N+1)^2

on the interval [1,N] occurs at x=1, with

q(1) = N,

and the maximum occurs at x=\frac{1}{2}(N+1), with


— Q.E.D.

Problem 1: Suppose we use k replicas of N!, meaning that we write (N!)^k = \prod\limits_{x=1}^N \prod\limits_{j=1}^k\pi_j(x), where \pi_1,\dots,\pi_k \in \mathrm{S}(N) may be any permutations. Anything to be gained from this?

The fact that we now have a non-trivial lower bound on N! has some nice consequences. In particular, let us consider the prime factorization of N!,

N! = \prod\limits_p p^{e_p(N!)},

where the product is over distinct primes p and e_p(N!) is the multiplicity of p in N!, which is given by Legendre’s formula:

e_p(N!) = \sum\limits_{k=0}^\infty \lfloor \frac{N}{p^k} \rfloor.

Since floors are lower than chairs and tables, we get the estimate

e_p(N!) < \sum\limits_{k=0}^\infty \frac{N}{p^k}=\frac{N}{p-1},

so that the “isotypic component” of N! corresponding to the prime p is

p^{e_p(N!)} < p^{\frac{N}{p-1}}.

Now, since p \leq 2^{p-1}, we have

p^{e_p(N!)} < 2^N,

which gives us the estimate

N! < 2^{N\pi(N)}

with \pi(N) the number of primes less than or equal to N. In order for this to comport with the lower bound in Proposition 2, we get

\pi(N) > \frac{1}{2} \log_2 N,

which isn’t great from the perspective of actually counting primes, but it does tell you that there are infinitely many.

Another approach to estimating factorials is to use calculus: instead of squaring N!, we take its logarithm and interpret the resulting sum as a Riemann sum.

Proposition 3: We have

e\left(\frac{N}{e}\right)^N \leq N! \leq eN\left(\frac{N}{e}\right)^N.

Proof: Write

\sum\limits_{x=1}^{N-1} \log(x) \leq \int\limits_1^N \log(x) \mathrm{d}x \leq \sum\limits_{x=1}^{N} \log(x),

where the lower and upper bounds are, respectively, lower and upper Riemann sums for the integral between them. This integral can be evaluated:

\int\limits_1^N \log(x) \mathrm{d}x = N \log N - N +1.

The upper bound on the integral thus yields

e^{N \log N - N +1} \leq N!,

which is the claimed lower bound on N!. The lower bound on the integral yields

(N-1)! \leq e^{N \log N - N +1},

and multiplying through by N we get the claimed upper bound on N!.

— Q.E.D.

Problem 2: Which is better — Proposition 2 or Proposition 3?

Instead of comparing N! to an integral, we can represent it as an integral.

Proposition 4: We have

N! = \int_0^\infty t^N e^{-t} \mathrm{d}t.

Proof: This can be proved by induction on N. For N=0, we have

\int_0^\infty e^{-t} \mathrm{d}t = -e^{-t}|_{t=0}^{t=\infty}=1 = 0!.

Now, for any N \in \mathbb{N}, integrating by parts we have

\int t^N e^{-t} \mathrm{d}t = -t^Ne^{-t} - N\int t^{N-1} e^{-t}\mathrm{d}t.

Since f(t)=z^Ne^{-t} satisfies f(0)=0 and \lim_{t \to \infty} f(t) =0, this gives us

\int_0^\infty t^N e^{-t} \mathrm{d}x = N\int_H t^{N-1} e^{-t} \mathrm{d}t = N(N-1)! = N!.

— Q.E.D.

The integral representation of N! given by Proposition 4 was found by Euler, who wanted to define z! for z a complex variable. Indeed, the integral

z! := \int_0^\infty t^z e^{-t} \mathrm{d}t

converges to define a holomorphic function on the half plane H=\{ z\in \mathbb{C} \colon \Re z >0\}, so that we have an analytic factorial function on this domain. In some sense, this is the only way to define z! (precise statements are the Bohr-Mollerup Theorem and Wielandt’s Theorem).

This is not the direction we’re headed. For us, the main significance Euler’s integral representation is that it can be re-written as follows. Making the change of variables x=\frac{1}{N}t, we get

N! = N^{N+1} \mathcal{Z}_N(S),


Z_N(S) = \int_\mathbb{R} e^{-NS(x)} \mathrm{d}x,

where S(x) = x - \log x for x>0 and S(x)=\infty for x \leq 0. What we want to do is to consider the generalization of the factorial function obtained by considering all integrals of the form Z_N(S) with

S \colon \mathbb{R} \to \mathbb{R}.

a given function called the “action.” This term comes from physics, and is related to the fact that Z_N(S) is the partition function a zero-dimensional quantum field theory with action S. We will see that, for S drawn from a broad class of functions, the N \to \infty behavior of Z_N(S) can be understood in a uniform way using a robust technique known as the Laplace method. Eventually, we will see that all such integrals admit asymptotic expansions whose coefficients can be understood in terms of certain graphs known as Feynman diagrams. Next time, we’ll work out Stirling’s formula from the integral representation of N!, and start developing Laplace’s method in general.