Math 289A: Formalism of Schur-Horn Arrays

The right way to set up the classical limit theorems of probability theory is to start with a triangular array of real random variables

\begin{matrix} X_{11} & {} & {} \\ X_{21} & X_{22} & {} \\ X_{31} & X_{32} & X_{33} \\ \vdots & \vdots & \vdots\end{matrix}.

Your first exposure to the law of large numbers and central limit theorem likely involved a different setup, namely an infinite sequence

X_1,X_2,X_3,\dots

of iid random variables, whose partial sums

S_N=X_1 + \dots + X_N

are the main object of interest. There are some non-negligible advantages to the triangular array setup, for example the fact that the rows do not have to be defined on a common underlying probability space. The N \to \infty behavior of the row sums

S_N = X_{N1}+ \dots + X_{NN}

is the topic of slightly more advanced and flexible versions of the central limit theorem. This result and other classical limit theorems apply to the situation when the random variables in each row of the triangular array are independent or nearly independent.

We are interested in triangular arrays of random variables as above which arise from an underlying deterministic data set

\begin{matrix} B_{11} & {} & {} \\ B_{21} & B_{22} & {} \\ B_{31} & B_{32} & B_{33} \\ \vdots & \vdots & \vdots\end{matrix}.

The triangular array of random variables associated to this data has rows defined by

\begin{bmatrix} X_{N1} \\ \vdots \\ X_{NN} \end{bmatrix} = \begin{bmatrix} {} & \vdots & {}\\ \dots & |U_{ij}|^2 & \dots \\ {} & \vdots & {}\end{bmatrix}\begin{bmatrix} B_{N1} \\ \vdots \\ B_{NN} \end{bmatrix}

where U_N=[U_{ij}] is a uniformly random unitary matrix of rank N. Thus, the random vector (X_{N1},\dots,X_{NN}) is a random linear transformation of the deterministic data (B_{N1},\dots,B_{NN}). Because the entries of U_N are highly correlated (rows are orthonormal, columns are orthonormal) the random variables X_{N1},\dots,X_{NN} are far from independent. They are, however, exchangeable.

There are two ways to think about a Schur-Horn array. Geometrically, (X_{N1},\dots,X_{NN}) is a random point inside the permutahedron in \mathbb{R}^N generated by (B_{N1},\dots,B_{NN}). Algebraically, (X_{N1},\dots,X_{NN}) is the diagonal of the random matrix U_NB_NU_N^*, where B_N=\mathrm{diag}(B_{N1},\dots,B_{NN}).

We are interested in proving limit theorems for Schur-Horn arrays. Such theorems will describe N \to \infty behavior of the rows of the X-array in terms of prescribed N \to \infty behavior of the rows of the B-array. In particular, we are interested in the situation where the empirical distribution of the vector

\left(\frac{B_{N1}}{N^t}, \dots \frac{B_{NN}}{N^t}\right)

converges in moments as N \to \infty, where t \geq 0 is a scaling exponent. The empirical distribution is the probability measure on \mathbb{R} which places mass 1/N at each coordinate of the vector (B_{N1},\dots,B_{NN}). If we think of these data points as beads on a wire, then the empirical measure of S \subseteq \mathbb{R} is the fraction of beads which lie in S. Convergence in moments of the empirical measure means that the the limit

\gamma_d = \lim\limits_{N \to \infty} \frac{p_d(\left(\frac{B_{N1}}{N^t}, \dots \frac{B_{NN}}{N^t}\right)}{N}

exists for each fixed d \in \mathbb{N}, where p_d(z_1,\dots,z_N)=z_1^d + \dots + z_N^d is the power sum polynomial of degree d in N variables.

Let \langle \cdot \rangle denote expected value, and let

K_N(a_1,\dots,a_N) = \left\langle e^{i(a_1X_{N1}+\dots+a_NX_{NN})}\right\rangle

be the characteristic function of (X_{N1},\dots,X_{NN}). Because the distribution of (X_{N1},\dots,X_{NN}) in \mathbb{R}^N is compactly supported, K_N is an analytic function on \mathbb{R}^N. The Maclaurin expansion of the characteristic function is

K_N(a_1,\dots,a_N) = 1+\sum\limits_{i=1}^\infty \frac{i^d}{d!} K_N^d(a_1,\dots,a_N),

where

K_N^d(a_1,\dots,a_N) = \sum\limits_\alpha a_1^{\alpha_1} \dots a_N^{\alpha_N} \langle X_{N1}^{\alpha_1}\dots X_{NN}^{\alpha_N}\rangle {d \choose \alpha_1,\dots,\alpha_N}

and the sum is over all vectors \alpha=(\alpha_1,\dots,\alpha_N) of nonnegative integers whose coordinates sum to N. Thus, K_N^d(a_1,\dots,a_N) is a homogeneous degree d polynomial in N variables whose coefficients are, up to a transparent factor, the degree d joint moments of the random variables X_{N1},\dots,X_{NN}. Since K_N(0,\dots,0)=1, the logarithm of L_N(a_1,\dots,a_N)=\log K_N(a_1,\dots,a_N) of the characteristic function is defined and analytic on an open neighborhood of the origin in \mathbb{R}^N. We write the Maclaurin expansion of L_N as

L_N(a_1,\dots,a_N) = \sum\limits_{d=1}^\infty \frac{i^d}{d!} L_N^d(a_1,\dots,a_N),

where L_N^d(a_1,\dots,a_N) is a homogeneous degree d polynomial in N variables. The polynomials L_N^1,L_N^2,L_N^3,\dots are called the cumulant polynomials of the random variables X_{N1},\dots,X_{NN}. They are determined by the moment polynomials K_N^1,K_N^2,K_N^3,\dots according to a certain recursion explained on pages 2 and 3 of these notes. We write the cumulant polynomials in the form

K_N^d(a_1,\dots,a_N) = \sum\limits_\alpha a_1^{\alpha_1} \dots a_N^{\alpha_N} \langle X_{N1}^{\alpha_1}\dots X_{NN}^{\alpha_N}\rangle_c {d \choose \alpha_1,\dots,\alpha_N},

where the quantities \langle X_{N1}^{\alpha_1}\dots X_{NN}^{\alpha_N}\rangle_c are the degree d joint cumulants of the random variables X_{N1},\dots,X_{NN} as \alpha ranges over nonnegative integer vectors of sum d. Note that this is a definition, not a proposition. In statistical mechanics and quantum field theory, joint cumulants are referred to as connected correlation functions.

The main fact about Schur-Horn arrays that allows us to understand them is a second formula for the cumulant polynomials K_N^d(a_1,\dots,a_N) of the random variables X_{N1},\dots,X_{NN} in terms of the moments of the empirical distribution of the deterministic data (B_{N1},\dots,B_{NN}).

Theorem (Cumulant Formula for Schur-Horn Arrays). For each 1 \leq d \leq N, we have

K_N^d(a_1,\dots,a_N) = N^{2-d}\sum\limits_\alpha \frac{p_\alpha(a_1,\dots,a_N)}{N^{\ell(\alpha)}}\sum\limits_\beta \frac{p_\beta(B_{N1},\dots,B_{NN})}{N^{\ell(\beta)}}L_N(\alpha,\beta),

where the sums are over nonnegative integer vectors \alpha=(\alpha_1,\dots,\alpha_N) and \beta=(\beta_1,\dots,\beta_N) with non-increasing coordinates which sum to d, and \ell(\alpha) and \ell(\beta) are the number of nonzero coordinates of \alpha and \beta respectively. For each such pair \alpha,\beta the polynomials p_\alpha(z_1,\dots,z_N) and p_\beta(b_1,\dots,b_N) are the corresponding products of power sum symmetric polynomials, i.e.

p_\alpha = \prod\limits_{i=1}^{\ell(\alpha)} p_{\alpha_i}\qquad\text{and}\qquad p_\alpha = \prod\limits_{i=1}^{\ell(\beta)} p_{\beta_i}.

Finally,

L_N(\alpha,\beta) = (-1)^{\ell(\alpha)+\ell(\beta)} \sum\limits_{g=0}^\infty N^{-2g} \vec{H}_g(\alpha,\beta)

is a convergent series in which the coefficients \vec{H}_g(\alpha,\beta) are positive integers called monotone Hurwitz numbers which admit an explicit combinatorial description.

The cumulant formula for Schur-Horn arrays looks complicated, but it is fairly easy to use in practice. For example, let us set our scaling exponent to t=1/2. Then, we for any fixed r \in \mathbb{N} we have

K_N^d(a_1,\dots,a_r,0,\dots,0) = N^{1-\frac{d}{2}}\sum\limits_\alpha \frac{p_\alpha(a_1,\dots,a_N)}{N^{\ell(\alpha)-1}}\sum\limits_\beta \frac{p_\beta(\frac{B_{N1}}{\sqrt{N}},\dots,\frac{B_{NN}}{\sqrt{N}})}{N^{\ell(\beta)}}L_N(\alpha,\beta).

As we saw (twice) in lecture, you can calculate the N \to \infty limit of this polynomial. Assuming \gamma_1=0 and \gamma_1=1, the only nonzero limit is

K_N^d(a_1,\dots,a_N) \longrightarrow a_1^2 + \dots + a_r^2.

This means that

\lim\limits_{N \to \infty} \langle X_{N1}^{\alpha_1}\dots X_{Nr}^{\alpha_N}\rangle_c = 0

unless \alpha=(\alpha_1,\dots,\alpha_r) is a nonnegative integer vector with total sum 2 and only one nonzero coordinate. From this we conclude that for any fixed r \in \mathbb{N}, the random variables X_{N1},\dots,X_{Nr} converge in joint distribution to r iid standard Gaussians.

When our scaling exponent is t=1, we do not get such a simple Gaussian limit. Rather, we find that

\lim\limits_{N \rightarrow \infty} \frac{1}{N}K_N^d = (a_1^d + \dots + a_r^d) \kappa_d

for each fixed r \in \mathbb{N}, where

\kappa_d = \sum\limits_{\beta \vdash d} \gamma_\beta (-1)^{d+\ell(\beta)} \vec{H}_0(d,\beta),

where \gamma_\beta = \gamma_{\beta_1} \gamma_{\beta_2} \gamma_{\beta_3} \dots. Thus we still have a sum of pure powers indicating asymptotic independence between the random variables X_{N1},\dots,X_{Nr}, but the cumulants of these random variables are asymptotically N \kappa_d. We are going to see that

\kappa_1,\kappa_2,\kappa_3,\dots

is the free cumulant sequence corresponding to the moment sequence

\gamma_1,\gamma_2,\gamma_3,\dots.

Math 289A: Lecture 15

Today we want to prove the “inverse Wigner theorem” we started discussing last time. Let (B_N)_{N=1}^\infty be a sequence of traceless deterministic matrices, and for each N \in \mathbb{N} let X_N=U_NB_NU_N^* be a uniformly random Hermitian matrix isospectral with B_N. Suppose that the empirical eigenvalue distribution of \frac{1}{\sqrt{N}}B_N converges in moments as N \to \infty, meaning that the limit

\gamma_d=\lim\limits_{N \to \infty} \frac{1}{N} \mathrm{Tr} \left(\frac{1}{\sqrt{N}}B_N\right)^d

exists for each d \in \mathbb{N}. Equivalently, we have

\gamma_d=\lim\limits_{N \to \infty} \frac{p_d\left( \frac{B_{N1},}{\sqrt{N}},\dots,\frac{B_{NN}}{\sqrt{N}}\right)}{N}

with B_{N1},\dots,B_{NN} any enumeration of the eigenvalues of B_N and p_d(x_1,\dots,x_N)=x_1^d + \dots + x_N^d the power sum polynomial of degree d. Let X_{N1},\dots,X_{NN} denote the diagonal matrix elements of the random matrix X_N, so

\begin{bmatrix} X_{N1} \\ \vdots \\ X_{NN} \end{bmatrix} = \begin{bmatrix} {} & \vdots & {} \\ \dots & |U_N(i,j)|^2 & \dots \\ {} & \vdots & {}\end{bmatrix}\begin{bmatrix} B_{N1} \\ \vdots \\ B_{NN} \end{bmatrix}

with U_N a uniformly random unitary matrix.

Theorem 15.1. For any fixed r \in \mathbb{N}, the random vector (X_{N1},\dots,X_{Nr}) converges in moments to a vector (Y_1,\dots,Y_r) of iid centered Gaussians of variance \gamma_2. More precisely, we have

\lim\limits_{N \to \infty} \left\langle X_{N1}^{\alpha_1} \dots X_{Nr}^{\alpha_r}\right\rangle = \left\langle Y_1^{\alpha_1} \dots Y_r^{\alpha_r}\right\rangle

for any vector (\alpha_1,\dots,\alpha_r) of nonnegative integers.

Here is a corollary which further explains why we are calling Theorem 15.1 an inverse Wigner theorem.

Corollary 15.1. For any r \in \mathbb{N}, the r \times r principal submatrix of X_N converges to \mathrm{GUE}_r(\gamma_2), i.e. to an r \times r random Hermitian matrix following a centered Gaussian distribution of variance \gamma_2.

Problem 15.1. Deduce Corollary 15.1 from Theorem 15.1.

The rest of the lecture consists of the proof of Theorem 15.1. First, let us explain the assumption that our deterministic data B_N has trace zero. The random variables X_{N1},\dots,X_{NN} are identically distributed (but not independent), with first moment

\langle X_{N1} \rangle = \frac{B_{N1}+\dots+B_{NN}}{N}= \frac{1}{N}\mathrm{Tr}\, B_N.

So, assuming \mathrm{Tr}\, B_N=0 is equivalent to centering X_{N1},\dots,X_{NN}.

Now we start the proof. The joint distribution of X_{N1},\dots,X_{NN} is a compactly supported probability measure on \mathbb{R}^N — indeed, it is supported on the compact convex set whose extreme points are the distinct permutations of the deterministic vector (B_{N1},\dots,B_{NN}). Therefore, the Laplace transform

K_N(z,a_1,\dots,a_N) = \left\langle e^{z(a_1X_{N1} + \dots + a_NX_{NN})}\right\rangle

is an entire function of the complex variables z,a_1,\dots,a_N, i.e. K_N is an analytic function on \mathbb{C}^{N+1}. Note that if we set z=i and restricting the remaining variables (a_1,\dots,a_N), the Laplace transform specializes to the characteristic function of (X_{N1},\dots,X_{NN}), so that it completely determines the distribution of this random vector. The variable z is redundant, but it is a convenient marker of homogeneity: we write the Maclaurin expansion of K_N as

K_N(z,a_1,\dots,a_N) = 1 + \sum\limits_{d=1}^\infty \frac{z^d}{d!} K_N^d(a_1,\dots,a_N),

where K_N^d \colon \mathbb{C}^N \to \mathbb{C} is a homogeneous polynomial of degree d. Indeed, this polynomial is

K_N^d(a_1,\dots,a_N) = \left\langle (a_1X_{N1} + \dots + a_NX_{NN})^d \right\rangle,

which in the standard monomial basis of homogeneous polynomials is

K_N^d(a_1,\dots,a_N) = \sum\limits_{\alpha} a_1^{\alpha_1} \dots a_N^{\alpha_N} \langle X_{N1}^{\alpha_1} \dots X_{NN}^{\alpha_N}\rangle {d \choose \alpha_1,\dots,\alpha_N},

the sum being over all vectors \alpha=(\alpha_1,\dots,\alpha_N) of nonnegative integers whose coordinates sum to d, aka weak N-part compositions of d. We call K_N^d the degree d moment polynomial of X_{N1},\dots,X_{NN}, because its coefficients in the monomial basis are the degree d joint moments of these random variables (up to a multinomial coefficient). So our objective is to understand the N \to \infty behavior of this polynomial restricted to the r-dimensional subspace of \mathbb{C}^N spanned by the first r coordinate vectors, where r \in \mathbb{N} is arbitrary but fixed. To be completely explicit, this is the polynomial

K_N^d(a_1,\dots,a_r) := K_N^d(a_1,\dots,a_r,0,\dots,0)=\sum\limits_{\alpha} a_1^{\alpha_1} \dots a_r^{\alpha_r} \langle X_{N1}^{\alpha_1} \dots X_{Nr}^{\alpha_r}\rangle {d \choose \alpha_1,\dots,\alpha_r},

the sum being over weak r-part compositions of N. This is an important simplification, since the number of variables of K_N^d(a_1,\dots,a_r) is fixed, i.e. its domain does not vary with N.

In fact, it is more convenient to work with the log-Laplace transform of X_{N1},\dots,X_{NN}. More precisely, since K_N(0,\dots,0)=1, there exists a simply connected open neighborhood \Omega_N of the origin in \mathbb{C}^{N+1} on which K_N is non vanishing, and there is a unique analytic function L_N \colon \Omega_N \to \mathbb{C} vanishing at the origin which satisfies

K_N(z,a_1,\dots,a_N) = e^{L_N(z,a_1,\dots,a_N)}, \quad (z,a_1,\dots,a_N) \in \Omega_N.

Let

L_N(z,a_1,\dots,a_N) = \sum\limits_{d=1}^\infty \frac{z^d}{d!} L_N^d(a_1,\dots,a_N)

be the Maclaurin expansion of this analytic function, so L_N^d(a_1,\dots,a_N) is a homogeneous degree d polynomial. This is called the degree d cumulant polynomial of the random variables X_{N1},\dots,X_{NN}. According to the Exponential Formula (aka moment-cumulant formula), these cumulants are recursively determined by the moment polynomials by a triangular system of equations, the first three of which are (suppressing the variables a_1,\dots,a_N)

K_N^1=L_N^1,\ K_N^2=L_N^2+L_N^1L_N^1,\ K_N^3 =L_N^3+3L_N^2L_N^1+L_N^1L_N^1L_N^1.

We can write the polynomial L_N^d as

L_N^d(a_1,\dots,a_N) = \sum\limits_{\alpha} \sum\limits_{\alpha} a_1^{\alpha_1} \dots a_N^{\alpha_N} \langle X_{N1}^{\alpha_1} \dots X_{NN}^{\alpha_N}\rangle_c {d \choose \alpha_1,\dots,\alpha_N}

except for the subscript on the expectation-like quantity $\langle X_{N1}^{\alpha_1} \dots X_{NN}^{\alpha_N}\rangle_c$, which is known as a degree d joint cumulant of X_{N1},\dots,X_{NN}. Note that this is a definition, not a proposition – the relationship between the Maclaurin expansion of the Laplace transform and the log-Laplace transform implicitly defines these statistics.

Problem 15.2. Show that \langle X_{N1}X_{N2}\rangle_c=\langle X_{N1}X_{N2}\rangle-\langle X_{N1}\rangle \langle X_{N2}\rangle is the covariance of X_{N1} and X_{N2}.

Our discussion so far pertains to any finite family of random variables with compactly supported joint distribution. Now we will start invoking special features of the family X_{N1},\dots,X_{NN} we are actually dealing with. The first of these is that these random variables are exchangeable, which is equivalent to saying that the cumulant polynomials L_N^d(a_1,\dots,a_N) are symmetric polynomials. This means that we can also expand them as linear combinations of Newton polynomials, and as previously discussed have a powerful understanding of this expansion.

Theorem 15.2 (Leading Cumulants Formula). For any 1 \leq d \leq N, we have

L_N^d(a_1,\dots,a_N) = N^{2-d} \sum\limits_{\alpha \vdash d} \frac{p_\alpha(a_1,\dots,a_N)}{N^{\ell(\alpha)}}\sum\limits_{\beta \vdash d} \frac{p_\beta(B_{N1},\dots,B_{NN})}{N^{\ell(\beta)}} L_N(\alpha,\beta),

where each sum is over the set of integer partitions of d and

L_N(\alpha,\beta)=(-1)^{\ell(\alpha)+\ell(\beta)} \sum\limits_{g=0}^\infty N^{-2g} \vec{H}_g(\alpha,\beta)

is a convergent power series in N^{-2} whose coefficients \vec{H}_g(\alpha,\beta) are positive integers.

The combinatorial coefficients \vec{H}_g(\alpha,\beta) are quite intricate and there is no simple closed formula for them (you can read this if you want to know more), but luckily we do not need to know very much about them in order to prove our inverse Wigner theorem. All that matters for us is that this magic formula is expressing the cumulant polynomials of X_{N1},\dots,X_{NN} in terms of the moments of the empirical eigenvalue distribution of our deterministic data matrices B_N in a quite transparent way.

To leverage this, let us restrict to the cumulant polynomials of X_{N1},\dots,X_{Nr}), which just means setting the last N-r variables of L_N^d(a_1,\dots,a_N) equal to zero (this eventually makes sense because r is fixed and N increases without bound). We write the resulting polynomial in the form

L_N^d(a_1,\dots,a_r) = N^{1-\frac{d}{2}}\sum\limits_{\alpha \vdash d} \frac{p_\alpha(a_1,\dots,a_r)}{N^{\ell(\alpha)-1}}\sum\limits_{\beta \vdash d} \frac{p_\beta(\frac{B_{N1}}{\sqrt{N}},\dots,\frac{B_{NN}}{\sqrt{N}})}{N^{\ell(\beta)}} L_N(\alpha,\beta).

Ignore the factor of N^{1-\frac{d}{2}} for a moment and just think about the behavior of the double sum as N \to \infty. The number of terms in this sum is independent of N, which means that we only need to understand the asymptotics of each individual term

\frac{p_\alpha(a_1,\dots,a_r)}{N^{\ell(\alpha)-1}}\frac{p_\beta(\frac{B_{N1}}{\sqrt{N}},\dots,\frac{B_{NN}}{\sqrt{N}})}{N^{\ell(\beta)}} L_N(\alpha,\beta).

Reading from right to left, we have that L_N(\alpha,\beta) \to \vec{H}_0(\alpha,\beta) as N \to \infty. By hypothesis, we have

\frac{p_\beta(\frac{B_{N1}}{\sqrt{N}},\dots,\frac{B_{NN}}{\sqrt{N}})}{N^{\ell(\beta)}} \longrightarrow \prod_{i=1}^{\ell(\beta)}\gamma_{\beta_i}=:\gamma_\beta.

Finally, since

|p_\alpha(a_1,\dots,a_r)| \leq r^{\ell(\alpha)} \|(a_1,\dots,a_r)\|_\infty,

we have

\frac{p_\alpha(a_1,\dots,a_r)}{N^{\ell(\alpha)-1}} = O\left(\frac{1}{N^{\ell(\alpha)-1}}\right)

as N \to \infty. Thus, the whole double sum is O(1) as N \to \infty, so now is a good time to remember the prefactor N^{1-\frac{d}{2}} and immediately conclude that

L_N^d(a_1,\dots,a_r) \longrightarrow 0

as N \to \infty, uniformly on compact subsets of \mathbb{C}^r, for all d \geq 3. For d=2, we have

L_N^2(a_1,\dots,a_N) \to p_2(a_1,\dots,a_N)(\gamma_2\vec{H}_0(2,2)+\gamma_1\gamma_1\vec{H}_0(2,11)),

uniformly on compact subsets of \mathbb{C}^r, and since \gamma_1=0 and \vec{H}_0(d,d)=(d-1)! this simplifies to

L_N^2(a_1,\dots,a_N) \to p_2(a_1,\dots,a_N)\gamma_2.

Finally, the polynomial L_N^1(a_1,\dots,a_N) is identically zero by our trace zero hypothesis. What we have shown is that all joint cumulants of X_{N1},\dots,X_{Nr} converge to zero as N \to \infty except for pure cumulants of degree two,

\langle X_{N1}^2\rangle_c,\dots,\langle X_{Nr}\rangle_c,

each of which converges to \gamma_2.

Problem 15.3. Compute the joint cumulants of a finite family of centered iid Gaussians and complete the proof.

Math 289A: Lecture 14

In this lecture I wanted to connect our (admittedly meandering) discussion so far to the standard approach to limit theorems in random matrix theory. Let X_N be a random Hermitian matrix with characteristic function

\varphi_{X_N}(T) = e^{-\frac{1}{2} \mathrm{Tr}\, T^2}, \quad T \in \mathbb{H}_N.

This means precisely that the distribution of X_N in \mathbb{H}_N with Hilbert-Schmidt scalar product is the standard Gaussian measure on this Euclidean space.

Definition 14.1. The Gaussian Unitary Ensemble is the sequence (X_N)_{N=1}^\infty.

The word “unitary” here means that the distribution of X_N is unitarily invariant. Let $B_{N1} \geq \dots \geq B_{NN}$ be the eigenvalues of X_N.

Theorem 14.1 (Wigner). For each d \in \mathbb{N}, the limit

\gamma_d = \lim\limits_{N \to \infty} \mathbf{E}\frac{p_d(\left( \frac{B_{N1}}{\sqrt{N}},\dots,\frac{B_{NN}}{\sqrt{N}}\right)}{N}

exits and is given by

\gamma_d = \begin{cases} 0, \text{ if }d \text{ odd}\\ \mathrm{Cat}(\frac{d}{2}), \text{ if }d \text{ even} \end{cases}.

In Theorem 14.1, p_d(x_1,\dots,x_N)=x_1^d+\dots+x_N^d is the power sum symmetric polynomial of degree d, and \mathrm{Cat}(k)=\frac{1}{k+1}{2k \choose k} is the kth Catalan number. There are two things we should discuss: what the result means, and how to prove it.

Concerning the meaning of Theorem 14.1, the random variable

\frac{p_d(B_{N1},\dots,B_{NN})}{N} = \frac{B_{N1}^d + \dots + B_{NN}^d}{N}

is the degree d moment of the empirical eigenvalue distribution of X_N, which is the random discrete probability measure \eta_N on \mathbb{R} which places equal mass at each eigenvalue. Thus, Theorem 14.1 is saying that each moment of the empirical eigenvalue distribution of the scaled random matrix \frac{1}{\sqrt{N}}X_N converges in expectation to an explicit limit. It is therefore natural to ask whether \gamma_1,\gamma_2,\gamma_3,\dots is the moment sequence of a probability measure on \mathbb{R}. To answer this, one can recognize that the exponential generating function

E(z) = \sum\limits_{d=0}^\infty z^d \gamma_d = \sum_{k=0}^\infty \frac{z^{2k}}{(k+1)!k!}

is essentially a Bessel function. Then, a classical integral representation for Bessel functions allows one to determine that (\gamma_d)_{d=1}^\infty is the moment sequence of the probability measure \omega on \mathbb{R} which is supported on [-2,2] with density

\omega(\mathrm{d}x) = \frac{1}{2\pi}\sqrt{4-x^2}.

You can find the details of this derivation here, towards the end of Lecture One.

Now we discuss how one can prove Theorem 14.1. The proof uses the fact that power sums in the eigenvalues of a matrix are also traces of powers of that matrix. This means that we have

\frac{p_d(\left( \frac{B_{N1}}{\sqrt{N}},\dots,\frac{B_{NN}}{\sqrt{N}}\right)}{N}=\frac{1}{N} \mathrm{Tr} (\frac{1}{\sqrt{N}}X_N)^d,

the point being that the right hand side is also a polynomial in the elements X_N(i,j) of the Gaussian random matrix X_N. Indeed, by matrix algebra we have

\mathrm{Tr}\, X_N^d = \sum\limits_\phi X_N(\phi(1),\phi(2))X_N(\phi(2),\phi(3)) \dots X_N(\phi(d),\phi(1)),

where the sum is over all functions

\phi \colon \{1,\dots,d\} \longrightarrow \{1,\dots,N\}.

Using the fact that the matrix elements of X_N are Gaussian and independent (up to selfadjointness), the following can be established.

Theorem 14.2 (Harer-Zagier). For each d \in \mathbb{N}, we have

\mathbf{E}\left[\frac{1}{N} \mathrm{Tr} (\frac{1}{\sqrt{N}}X_N)^d\right] = \sum\limits_{g=0}^\infty N^{-2g} E_g^d,

where E_g^d is the number of ways to glue the edges of a d-gon in pairs so as to produce a compact orientable surface of genus g. The proof is a beautiful piece of mathematics and a nice treatment can be found here. Theorem 14.2 implies Theorem 14.1 because the number of ways to glue a sphere from a polygon with 2k sides is \mathrm{Cat}(k), and only the genus zero term survives in the N \to \infty limit.

There is something unsatisfying about the above sequence of ideas. We know that the entire distribution of X_N is completely determined by the joint distribution of its diagonal matrix elements

X_N(1,1),\dots,X_N(N,N),

which are simply iid standard Gaussians. In fact, at this point in the course we know much more: the spectral analysis of unitarily invariant random Hermitian matrices is equivalent to the analysis of pairs

\begin{matrix} X_{11} & {} & {} \\ X_{21} & X_{22} & {} \\ X_{31} & X_{32} & X_{33} \\ \vdots & \vdots & \vdots \end{matrix}

and

\begin{matrix} B_{11} & {} & {} \\ B_{21} & B_{22} & {} \\ B_{31} & B_{32} & B_{33} \\ \vdots & \vdots & \vdots \end{matrix},

of triangular arrays of real random variables related by

\begin{bmatrix} X_{N1} \\ \vdots \\ X_{NN}\end{bmatrix}=\begin{bmatrix} {} & \vdots & {} \\ \dots & |U_N(i,j)|^2 & \dots \\ {} & \vdots & {} \end{bmatrix}\begin{bmatrix} B_{N1} \\ \vdots \\ B_{NN}\end{bmatrix}

where U_N=[U_N(i,j)] is a uniformly random unitary matrix. Initially, X_{N1},\dots,X_{NN} were viewed as the diagonal matrix elements of a random Hermitian matrix with unitarily invariant distribution and B_{N1},\dots,B_{NN} were the eigenvalues of this selfadjoint matrix. But, we can in fact forget about this original setup entirely, and simply be probabilists thinking about two randomly coupled triangular arrays of real random variables. The question then how to recover Theorem 14.1 simply from the given fact that our X-array has rows X_N=(X_{N1},\dots,X_{NN}) of iid standard Gaussians.

This point of view is why the paper of Olshanski-Vershik is so relevant for our purposes, and so far ahead of its time: they are proving inverse theorems of this kind. More precisely, they are considering the case where the B-array is a given deterministic data set, and proving limit theorems about the corresponding X-array. A special case of their main result is the following “inverse Wigner theorem.”

Theorem 14.3 (Olshanski-Vershik). Suppose that the limit

\gamma_d = \lim\limits_{N \to \infty}\frac{p_d(\left( \frac{B_{N1}}{\sqrt{N}},\dots,\frac{B_{NN}}{\sqrt{N}}\right)}{N}

exists for each d \in \mathbb{N}. Then, for any fixed r \in \mathbb{N}, the random variables X_{N1},\dots,X_{Nr} converge in distribution to an r-tuple Y_1,\dots,Y_r of iid centered Gaussians with variance \gamma_2+\gamma_1^2.

We have proved the one-dimensional version of this Gaussian limit theorem (the case r=1), and we will finish the proof of the multivariate version next time (the case r>1). In our approach to these inverse results of Olshanski-Vershik we are using a combinatorial result which is analogous to Theorem 14.2, but arguably even more miraculous: a universal genus expansion for joint cumulants of the X-array in our theory of unistochastically coupled triangular arrays.

Math 289A: Lecture 13

Let (B_N)_{N=1}^\infty be a sequence of deterministic Hermitian matrices, and let b_{N1},\dots,b_{NN} be any enumeration of the eigenvalues of B_N \in\mathbb{H}_N. Consider the corresponding sequence (X_N)_{N=1}^\infty of random Hermitian matrices defined by X_N=U_NB_NU_N^*, where U_N is a random unitary matrix whose distribution in \mathbb{U}_N is Haar measure. We then have a corresponding triangular array of real random variables,

\begin{matrix} X_{11} & {} & {} \\ X_{21} & X_{22} & {} \\ X_{31} & X_{32} & X_{33} \\ \vdots & \vdots & \vdots \end{matrix},

whose Nth row

X_{N1}=X_N(1,1), \dots, X_{NN} = X_N(N,N)

consists of the diagonal elements of the random matrix X_N. In terms of our input data B_N, we have

X_{Nj} = \sum\limits_{k=1}^N |U_N(j,k)|^2 b_{Nk}.

This is superficially similar to the setup for the Central Limit Theorem, but it is different: X_{N1},\dots,X_{NN} are exchangeable but not independent (make sure you understand why). Thus our first objective is simply to understand the N \to \infty asymptotic distribution of X_{N1}, the (1,1)-matrix element of a uniformly random Hermitian matrix with spectrum B_N.

In this lecture I will use the angled brackets notation favored by physicists to denote expectation. I will also use the angled bracket with a subscript “c” for cumulants (the subscript could also stand for connected). So for example the variance of X_{N1} is

\langle X_{N1}^2\rangle_c = \langle X_{N1}^2\rangle - \langle X_{N1}\rangle\langle X_{N1}\rangle.

Problem 13.1. Prove that \langle X_{N1} \rangle = \frac{1}{N}\mathrm{Tr}\,B_N. Also show that for d > 1 we have \langle X_{N1}^d + x\rangle_c = \langle X_{N1}^d\rangle_c for any constant x (this translation invariance is a general property of cumulants).

Our Optimized Leading Cumulants Formula says that for any 1 \leq d \leq N we have

\langle X_{N1}^d \rangle_c = N^{2-d} \sum\limits_{\alpha,\beta \vdash d} \frac{p_\beta(b_{N1},\dots,b_{NN})}{N^{\ell(\alpha)+\ell(\beta)}}L_N(\alpha,\beta),

where

p_\beta(b_{N1},\dots,b_{NN}) = \prod\limits_{i=1}^{\ell(\beta)} \mathrm{Tr}(B_N^{\beta_i})

and

L_N(\alpha,\beta) = (-1)^{\ell(\alpha)+\ell(\beta)} \sum_{g=0}^\infty N^{-2g}\vec{H}_g(\alpha,\beta)

is \pm 1 times a convergent positive series.

What is nice about the Optimized Leading Cumulants Formula is that it almost immediately suggests the possibility of Gaussian limiting behavior: because of the factor N^{2-d} in the formula, it looks like cumulants of degree d \geq 3 should vanish in the N \to \infty limit, which is the Gaussian signature. To actually establish this we need to determine how the rest of the formula behaves as N grows large.

This is not so difficult. We have the finite sum

\sum\limits_{\alpha,\beta \vdash d} \frac{p_\beta(b_{N1},\dots,b_{NN})}{N^{\ell(\alpha)+\ell(\beta)}}L_N(\alpha,\beta),

and the number of terms in the sum has no dependence on N. The quantity L_N(\alpha,\beta) is O(1) as N \to \infty (make sure you understand why). This leaves the fraction

\frac{p_\beta(b_{N1},\dots,b_{NN})}{N^{\ell(\alpha)+\ell(\beta)}}.

The denominator is literally just a power of N. As for the numerator, we have the bound

|p_\beta(b_{N1},\dots,b_{NN})| \leq N^{\ell(\beta)} \|B_N\|^d,

where \|B_N\| = \max |b_{Nj}| is the spectral radius of B_N. Thus, we have

\left| \frac{p_\beta(b_{N1},\dots,b_{NN})}{N^{\ell(\alpha)+\ell(\beta)}} \right| \leq \frac{\|B_N\|^d}{N^{\ell(\alpha)}} \leq \frac{\|B_N\|^d}{N}.

We therefore have the cumulant bound

\left| \langle X_{N1}^d\rangle_c\right| \leq t_d N^{1-d} \|B_N\|^d

where t_d>0 depends only on d.

Theorem 13.1. If the input sequence B_N of Hermitian matrices is such that \|B_N\| \leq M\sqrt{N} for some constant M and all N \in \mathbb{N}, and if \gamma_2 = \lim_N N^{-1} \mathrm{Tr} B_N^2 exists, then the centered random variable Y_N = X_{N1} - N^{-1}\mathrm{Tr} B_N converges to a centered Gaussian of variance \gamma_2.

Proof: By definition the first cumulant of Y_N is zero for every N \in \mathbb{N}, and from translation invariance of cumulants and the bound established above we have that all cumulants of degree d >2 converge to zero. From the Optimized Leading Cumulants formula and the estimates above, the only O(1) term in the second cumulant is exactly N^{-1}\mathrm{Tr} B_N^2. Therefore, Y_N converges to a centered Gaussian random variable Y of variance \gamma_2 as N \to \infty, and the convergence holds in distribution by the Moment Method.

-QED

Math 289A: Lecture 12

This week we will finish working through the Olshanski-Vershik limit theorem, and explain how the tools we have developed along the way have set us up to prove other kinds of limit theorems about random matrices.

Recall the setting of the OV paper: we have an infinite triangular array of real numbers,

\begin{matrix} b_{11} & {} {} \\ b_{21} & b_{22} &{} \\ b_{31} & b_{32} & b_{33} \\ \vdots & \vdots & \vdots \end{matrix}.

Assume the entries are sorted so that each row is non-increasing from left to right and let

B_N=(b_{N1},\dots,b_{NN})

be the Nth row. Thinking of B_N \in \mathbb{W}_N as a diagonal matrix, define a corresponding random Hermitian matrix by

X_N=U_NB_NU_N^*.

We know that in the characteristic function of X_N,

\varphi_{X_N}(A) = \int\limits_{\mathbb{H}_N} e^{i \mathrm{Tr}\, AUB_NU^*}\mathrm{d}U,\quad A \in \mathbb{H}_N,

we can assume without loss in generality that A is diagonal, so what we are really looking at is the sequence of characteristic functions

K_N \colon \mathbb{R}^N \longrightarrow \mathbb{C}

corresponding to our data set b_{ij} given as

K_N(a_1,\dots,a_N) = \int\limits_{\mathbb{U}_N} e^{i \sum_{x,y=1}^N a_xb_y|U_{xy}|^2}\mathrm{d}U,

or equivalently

K_N(a_1,\dots,a_N) = \mathbb{E}\left[e^{i(a_1X_N(1,1)+\dots+a_NX_N(N,N))}\right], \quad A \in \mathbb{R}^N,

where the diagonal matrix elements X_N(1,1),\dots,X_N(N,N) are exchangeable (but not independent) real random variables. Part I of the Olshanski-Vershik theorem is concerned with finding necessary and sufficient conditions on the triangular array b_{ij} such that the characteristic function

K_N(a)=K_n(a,0,\dots,0), \quad a \in \mathbb{R}^N,

of a single diagonal X_N(1,1) converges as N\to \infty. Note that extracting the (1,1)-entry on either side of X_N=U_NB_NU_N^* we have

X_N(1,1) = \sum\limits_{j=1}^N |U_{1j}|^2 b_{Nj},

showing explicitly how X_N(1,1) is built from the uniformly random unit vector (U_{11},\dots,U_{1N}) in \mathbb{C}^N and the data B_N=(b_{N1},\dots,b_{NN}).

The characteristic function K_N(a) of X_N(1,1) has the power series expansion

K_N(a) = 1 + \sum\limits_{d=1}^\infty \frac{(ia)^d}{d!}K_N^d,

where K_N^d=\mathbf{E}[X_N^d(1,1)] are the moments of this real random variable. Olshanski-Vershik being by finding sufficient conditions on the data b_{ij} such that, for each fixed d \in \mathbb{N}, the sequence of numbers (K_N^d)_{d=1}^\infty converges to a limit K^d as N \to \infty. In other words, they begin by finding conditions on the data b_{ij} such that the random variable X_N(1,1) converges in moments. We will do the same thing, but in a different way, using the following consequence of Theorem 10.1 in Lecture 10.

Theorem 12.1 (Leading Moments Formula). For each 1 \leq d \leq N, we have

K_N^d = N^{-d}\sum\limits_{\alpha,\beta \vdash d} p_\beta(b_{N1},\dots,b_{NN})\sum\limits_{r=0}^\infty \frac{(-1)^r}{N^r}\vec{W}^r(\alpha,\beta),

where the inner sum is absolutely convergent and \vec{W}^r(\alpha,\beta) is the number of r-step monotone walks on the symmetric group of degree d which begin at a permutation of cycle type \alpha and end at a permutation of cycle type \beta.

The most efficient way to use this result is to take a logarithm, i.e. to work with the cumulants of X_N(1,1). This is also the path followed by Olshanski-Vershik, but the method they use to do the calculation is different.

Let

L_N(a) = \log K_N(a) = \sum\limits_{d=1}^\infty \frac{(ia)^d}{d!}L_N^d

be the log-characteristic function of X_N(1,1), i.e. the cumulant generating function. The cumulant sequence L_N^1,L_N^2,L_N^3,\dots contains the same information as the moment sequence K_N^1,K_N^2,K_N^3,\dots, via the moment-cumulant relations

K_N^1 =L_N^1

K_N^2 = L_N^2 + L_N^1L_N^1

K_N^3 =L_N^3 + 3L_N^2L_N^1+L_N^1L_N^1L_N^1

\vdots

The cumulants L_N^d encode the “connected” version of the combinatorial problem encoded by the moments K_N^d.

Theorem 12.2 (Leading Cumulants Formula). For each 1 \leq d \leq N, we have

L_N^d =N^{-d} \sum\limits_{\alpha,\beta \vdash d} p_\beta(b_{N1},\dots,b_{NN})\sum\limits_{r=0}^\infty \frac{(-1)^r}{N^r}\vec{H}^r(\alpha,\beta),

where \vec{H}^r(\alpha,\beta) is the number of walks counted by \vec{W}^r(\alpha,\beta) which have the additional feature that their steps and endpoints generate a transitive subgroup of the symmetric group.

The Leading Cumulants Formula is identical to the Leading Moments formula except that the walk count \vec{W}^r(\alpha,\beta) has been replaced with the connected walk count \vec{H}^r(\alpha,\beta). This seemingly minor change is in fact very advantageous.

Theorem 12.3 (Riemann-Hurwitz Formula). The number \vec{H}^r(\alpha,\beta) is nonzero if and only if there is g \in \mathbb{N}_0 such that

r=2g-2+\ell(\alpha)+\ell(\beta).

Due to the Riemann-Hurwitz formula it is more convenient to parameterize the numbers \vec{H}^r(\alpha,\beta) by the parameter g, i.e. to set

\vec{H}_g(\alpha,\beta):= \vec{H}^{2g-2+\ell(\alpha)+\ell(\beta)}.

Theorem 12.4 (Optimized Leading Cumulants Formula). For each 1 \leq d \leq N, we have

L_N^d =N^{2-d} \sum\limits_{\alpha,\beta \vdash d} \frac{p_\beta(b_{N1},\dots,b_{NN})}{N^{\ell(\alpha)+\ell(\beta)}}(-1)^{\ell(\alpha)+\ell(\beta)}\sum\limits_{g=0}^\infty N^{-2g}\vec{H}_g(\alpha,\beta).

Now we are in position to prove some interesting limit theorems. We begin by proving one such theorem corresponding to an explicit choice of the dataset b_{ij} (this corresponds to Remark 4.5 in Olshanski-Vershik).

Let \gamma >0 be a positive constant, and suppose our dataset b_{ij} has Nth row given by

B_N=(\sqrt{\gamma N},\quad,\sqrt{\gamma N},-\sqrt{\gamma N},\dots,-\sqrt{\gamma N}),

where the number of positive and negative entries is as near to equal as possible – equal if N even and differing by 1 if N odd, where we assume the data favors positive numbers in the latter case. In this situation we can establish the following N \to \infty limit theorem for a single diagonal matrix element of X_N=U_NB_NU_N^*.

Theorem 12.5 (Gaussian Limit Theorem). Any diagonal matrix element of X_N=U_NB_NU_N^* converges in distribution to a centered Gaussian random variable Y of variance \gamma.

Proof: Let \beta \vdash d be a partition of d \in \mathbb{N}. Then, we have

p_\beta(B_N) = (\gamma N)^{\frac{d}{2}}p_d(1,\dots,1,-1,\dots,-1),

and therefore

p_\beta(B_N) = O(N^{1+\frac{d}{2}}).

Thus, from the Optimized Leading Cumulants Formula we have

L_N^d=O(N^{1-\frac{d}{2}}),

showing that L_N^d converges to zero for all d>2. This means that X_N(1,1) converges in cumulants (equivalently, in moments) to a Gaussian random variable Y, whose mean and variance remain to be determined.

The first cumulant of X_N(1,1) is

L_N^1 =N^{2-1} \frac{\sqrt{\gamma N}}{N^2}=N^{-\frac{1}{2}}\sqrt{\gamma},

and this converges to 0 as N \to \infty. For the second cumulant,

L_N^2 =\frac{p_2(B_N)}{N^2} + O(\frac{1}{N}),

and

\frac{p_2(B_N)}{N^2} = \frac{N(\gamma N)}{N^2}=\gamma.

Thus, we have shown that X_N(1,1) converges in moments/cumulants to Y \sim \mathcal{N}(0,\gamma). In fact, this implies X_N(1,1) \to Y in distribution as N \to \infty, thanks to the Method of Moments.

-QED

Math 289A: Lecture 10

We start with a brief recap of what we are trying to achieve – since things are getting complicated it might be helpful to step back for a moment before stepping further in.

Let \mathbb{H}_N be the Eucliden space of N \times N Hermitian matrices, and let \mathbb{U}_N be the compact group of N \times N unitary matrices. We have a group action

\Phi \colon \mathbb{U}_N \longrightarrow \mathrm{Aut}(\mathbb{H}_N),

where \mathrm{Aut}(\mathbb{H}_N) is the orthogonal group of the Euclidean space \mathbb{H}_N, defined by

\Phi(U)X = UXU^*, \quad X \in \mathbb{H}_N,\ U \in \mathbb{U}_N.

This group action is called unitary conjugation. Let

\mathcal{O}_X = \{UXU^* \colon U \in \mathbb{U}_N\}

denote the orbit of a given Hermitian matrix X \in \mathbb{H}_N under unitary conjugation.

Theorem 10.1 (Spectral Theorem). For every X \in \mathbb{H}_N, the orbit \mathcal{O}_X contains a diagonal matrix B which is unique up to permutations of its diagonal elements.

Let

\mathbb{W}_N = \{(b_1,\dots,b_N) \in \mathbb{R}^N \colon b_1 \geq \dots \geq b_N\}.

We will refer to this orthant of \mathbb{R}^N as the Weyl chamber. This term comes from Lie theory – we are not doing Lie theory, we just need a name for \mathbb{W}_N. By the Spectral Theorem, the orbits of \mathbb{H}_N under unitary conjugation are indexed by points of the Weyl chamber: we set

\mathcal{O}(B) = \mathcal{O}_B, \quad B \in \mathbb{W}_N,

where on the right hand side we are viewing B as a diagonal matrix.

A standard problem in linear algebra is the following: we are given a matrix X \in \mathbb{H}_N, meaning that we are explicitly given all entries of this matrix, and from this information we are asked to find B \in \mathbb{W}_N such that X \in \mathcal{O}(B). The Spectral Theorem guarantees that this problem has a unique solution. The coordinates of the solution B \in \mathbb{R}^N are the eigenvalues of the given matrix X.

We are interested in solving a generic version of the inverse problem: given B \in \mathbb{W}_N, determine the matrix elements of a uniformly random point X \in \mathcal{O}(B). Since B is given, we are being asked about the random Hermitian matrix X_N=U_NBU_N^*, where U_N is a uniformly random unitary matrix from \mathbb{U}_N. This is something we know how to define precisely (Haar measure).

We want to solve this inverse problem by computing the characteristic function of X_N, which is

\varphi_{X_N}(T) = \mathbf{E}\left[e^{i\langle T,U_NBU_N^*\rangle}\right]=\int\limits_{\mathbb{U}_N}e^{i \mathrm{Tr}TUBU^*} \mathrm{d}U.

We know that it is in fact sufficient to do this for T=A being a diagonal Hermitian matrix, and we have introduced the notation

K_N(A,B) = \int\limits_{\mathbb{U}_N} e^{i \mathrm{Tr} AUBU^*} \mathrm{d}U

in order to emphasize the fact that we can view this characteristic function as defining a bounded symmetric kernel

K_N \colon \mathbb{R}^N \times \mathbb{R}^N \longrightarrow \mathbb{C}.

Furthermore, K_N is actually the characteristic function of the diagonal D_N=(X_N(1,1),\dots,X_N(N,N)) of X_N, so a solution to our problem is the same thing as determining the distribution of the diagonal of a uniformly random Hermitian matrix with prescribed eigenvalues.

We can view K_N as a generating function for the joint moments of the diagonal matrix elements X_N(1,1),\dots,X_N(N,N), meaning that

K_N(A,B) = 1 + \sum\limits_{d=1}^\infty \frac{i^d}{d!} \sum\limits_{\gamma \in \mathsf{C}_N^d} a_1^{\gamma_1} \dots a_N^{\gamma_N} \mathbf{E}\left[ X_N^{\gamma_1}(1,1) \dots X_N^{\gamma_N}(N,N)\right] K_N(\gamma),

where the internal sum runs over the set \mathsf{C}_N^d of weak N-part compositions of the number d, which are simply vectors \gamma=(\gamma_1,\dots,\gamma_N) of nonnegative integers whose coordinates sum to d. The coefficients K_N(\gamma) corresponding to a given \gamma \in \mathsf{C}_N^d is a very simple quantity, namely the multinomial coefficient

K_N(\gamma) = \frac{d!}{\gamma_1! \dots \gamma_N!}.

But, it is Haard (sorry, couldn’t resist) to see how to directly compute all mixed moments of the random variables X_N(1,1),\dots,X_N(N,N). Indeed, we actually have a formula for each of these random variables, namely

X_N(k,k) = \sum\limits_{j=1}^N |U_{jk}|^2 b_j,

which is obtained from X_N=U_NBU_N^* simply by matrix multiplication. This formula more or less shows that a direct computation of, say, the moment sequence of just X_N(1,1) involves computing a family of Haar integrals over \mathbb{U}_N which we don’t know how to contend with.

But K_N is a moment generating function in another sense: we have

K_N(A,B) = 1 + \sum\limits_{d=1}^\infty \frac{i^d}{d!} \sum\limits_{\alpha,\beta \in \mathsf{Y}_N^d} p_\alpha(a_1,\dots,a_N)p_\beta(b_1,\dots,b_N) K_N(\alpha,\beta),

where the internal (double) sum is over the set \mathsf{Y}_N^d of Young diagrams with first row of length at most N; such diagrams are pictorial representations of partitions of d with largest part at most N. We know that such an expansion exists by a “soft” symmetry argument combined with a classical theorem of Newton: we use that K_N(A,B) is invariant under independent permutations of the coordinates of the vectors A=(a_1,\dots,a_N) and B=(b_1,\dots,b_N). The moments involved in this expansion are the moments of the (non-normalized) empirical eigenvalue distributions of these vectors, i.e. the discrete measures \eta_A and \eta_B on \mathbb{R} which place unit mass at each coordinate. The moments of these measures are exactly sums of powers of the coordinates of A and B.

This leaves the problem of computing the coefficients K_N(\alpha,\beta) in this nonstandard moment generating function. This can be done using representation theory – in fact one needs to know quite a bit about representations of the unitary and symmetric groups, as well as about relations between them the relationship between them. We will mostly black box this to avoid going too deeply into algebra (though an indication of what needs to be done was given in the previous lecture, where we discussed Schur polynomials). Instead, we will focus on two facts: first, the coefficients K_N(\alpha,\beta) have an intrinsically interesting combinatorial interpretation, which is something that often happens with moment generating functions of interesting random variables (we discussed the example of Gaussians in lecture); second, this combinatorial interpretation will allow us to prove probabilistically meaningful limit theorems. The simplest such theorem is the result of Olshanski and Vershik on the N \to \infty limit joint distribution of X_N(1,1),\dots,X_N(k,k) where k \in \mathbb{N} is arbitrary but fixed (you should be actively reading the Olshanski-Vershik paper while we are doing this).

Theorem 10.1. For every integer 1 \leq d \leq N, and every pair of Young diagrams \alpha,\beta \in \mathsf{Y}_N^d, we have

K_N(\alpha,\beta) = \sum\limits_{r=0}^\infty \frac{(-1)^r}{N^r}\vec{W}^r(\alpha,\beta),

where the series is absolutely convergent and \vec{W}^r(\alpha,\beta) is the number of monotone r-step walks on the Cayley graph of the symmetric group \mathfrak{S}^d which begin at a permutation of cycle type \alpha and end at a permutation of cycle type \beta.

Clearly there is much to be unpacked here, but ultimately it is not so complicated, and moreover only a small amount of the information contained in this statement is needed to prove limit theorems about random matrices. A very basic observation is that in the range 1 \leq d \leq N, the set \mathsf{Y}_N^d is simply the set \mathbb{Y}^d of all Young diagrams with exactly d cells.

Now we have to explain what we mean by the Cayley graph of the symmetric group, and what we mean by a monotone walk on this graph. For this I am linking to a paper which hopefully provides a reasonably clear account of this combinatorial construction (all you need to read is Section 2.1 and 2.2, just a couple of pages). After you have read this you will know what the Cayley graph of the symmetric group is, what is meant by a strictly monotone walk on this graph. Furthermore, you will know the remarkable fact that there is a unique strictly monotone walk between any two permutations, and this walk is a geodesic.

As explained in lecture, it is often the case that the logarithm of the characteristic function is easier to work with than the characteristic function itself – it is a generating function for cumulants rather than moments. Let us consider the logarithm of the characteristic function of X_N = U_NBU_N^*,

L_N(A,B) = \log K_N(A,B) = \log\int\limits_{\mathbb{U}_N} e^{i\mathrm{Tr}\, AUBU^*} \mathrm{d}U.

Math 289A: Lecture 9

This week we are looking at a landmark paper by Grigori Olshanski and Anatoly Vershik, “Ergodic Unitarily Invariant Measures on the Space of Infinite Hermitian Matrices.” This paper is among the first to consider characteristic functions in the context of random matrix theory. Namely, it considers the problem of approximating the function

K_N \colon \mathbb{R}^N \times \mathbb{R}^N \longrightarrow \mathbb{C}

defined by

K_N(A,B) = \int\limits_{\mathbb{U}_N} e^{i\mathrm{Tr}AUBU^*}\mathrm{d}U,

where the integration is over the unitary group of rank N against Haar measure. We have referred to this function as the “quantum Fourier kernel,” but as we have stressed it is also the classical characteristic function of the diagonal (X_N(1,1),\dots,X_N(N,N)) of X_N=U_NBU_N^* of a uniformly random Hermitian matrix with deterministic eigenvalues B \in \mathbb{R}^N. The paper of Olshanski-Vershik is about approximating K_N(A,B) as N \to \infty, which probabilistically means that we are trying to understand how the joint distribution of the diagonal matrix elements of a very large uniformly random Hermitian matrix with prescribed eigenvalues B is determined by the vector B. This is an example of a high-dimensional problem: we are trying to approximate a function: our goal is to approximate a function

K_N(A,B) = K_N(a_1,\dots,a_N;b_1,\dots,b_N)=\int\limits_{\mathbb{U}_N} e^{i\sum_{x,y=1}^N a_xb_y|U_{xy}|^2}\mathrm{d}U

of 2N real variables as the number of variables goes to infinity. Typically such problems are much harder than traditional problems of asymptotic analysis, which consider the approximation of some parameter-dependent family of functions defined on a fixed domain as the parameter approaches a given value.

One way to simplify the question is to consider the restriction of K_N to the subspace \mathbb{R}^r \times \mathbb{R}^N with r \in \mathbb{N} fixed. That is, we consider the characteristic function

K_N(a_1,\dots,a_r,0,\dots,0;b_1,\dots,b_N) = \mathbb{E}\left[e^{i(a_1X_N(1,1) + \dots +a_r X_N(r,r))}\right]

of only the first r diagonal matrix elements of the random Hermitian matrix X_N=U_NBU_N^*. Recall that we have previously shown that all diagonal matrix elements of X_N are identically distributed, but not necessarily independent. The paper of Olshanski and Vershik shows that if the eigenvalues B=(b_1,\dots,b_N), scaled by a factor of N^{-1}, converge to a point configuration on \mathbb{R} as N \to \infty, then the random variables X_N(1,1),\dots,X_N(r,r) converge to independent random variables whose characteristic function can be calculated explicitly in terms of this point configuration. We will try to work through this result, which I believe to be the first high-dimensional limit theorem in RMT obtained via characteristic functions.

The Olshanski-Vershik approach of computing the asymptotics of the characteristic function

K_N(A,B) = \int\limits_{\mathbb{U}_N}e^{i\mathrm{Tr}AUBU^*}\mathrm{d}U

is based on series expansion of this matrix integral. I am a big proponent of this approach, but it does require some quite extensive background from representation theory and algebraic combinatorics which is not traditionally found in the realm of probability. We will try not to get too bogged down in the algebra, and in order to do this it will be necessary to accept some results from representation theory as black boxes.

First, let us analytically continue the characteristic function K_N \colon \mathbb{R}^N \times \mathbb{R}^N \to \mathbb{C} to a function

K_N \colon \mathbb{C} \times \mathbb{C}^N \times \mathbb{C}^N \longrightarrow \mathbb{C}

by declaring

K_N(z,A,B) = \int\limits_{\mathbb{U}_N} e^{z \mathrm{Tr} AUBU^*} \mathrm{d}U,

where z \in \mathbb{C} is a complex number and A,B \in \mathbb{C}^N are complex vectors. Thus, by compactness of \mathbb{U}_N, we have that K_N is an entire function of 1+2N complex variables z,a_1,\dots,a_N,b_1,\dots,b_N, and the Maclaurin series of this function, which is a multivariate Taylor series centered at the origin, converges uniformly absolutely on compact subsets of \mathbb{C}^{1+2N}. We recover our original characteristic function K_N by setting z=i and restricting A,B to \mathbb{R}^N \subset \mathbb{C}^N.

The analytic function K_N has various symmetries that allow us to present its Maclaurin series in a more advantageous form. More precisely, K_N(z,A,B)=K_N(z,a_1,\dots,a_N,b_1,\dots,b_N) is invariant under independent permutations of a_1,\dots,a_N and b_1,\dots,b_N, and also stable under swapping these two sets of variables. This means that the Macluarin series of K_N can be presented in the form

K_N(z,A,B) = 1 + \sum_{d=1}^\infty \frac{z^d}{d!} K_N^d(A,B),

where K_N^d(A,B) = K_N^d(a_1,\dots,a_N,b_1,\dots,b_N) is a homogeneous degree d symmetric polynomial in a_1,\dots,a_N and also a homogeneous degree d symmetric polynomial in b_1,\dots,b_N. Therefore, by Newton’s theorem on symmetric polynomials, we can write this partial derivative as

K_N^d(A,B) = \sum\limits_{\alpha,\beta \in \mathsf{Y}^d} p_\alpha(a_1,\dots,a_N)p_\beta(b_1,\dots,b_N)K_N(\alpha,\beta),

where \mathsf{Y}^d is the set of integer partitions of the degree d, which we identify with the set of Young diagrams with d cells, and p_\alpha(a_1,\dots,a_N) and p_\beta(b_1,\dots,b_N) are the Newton power-sum symmetric polynomials corresponding to \alpha,\beta \in \mathsf{Y}^d. For example, if d=3 and \alpha=(1,1,1), \beta=(2,1), then

p_\alpha(a_1,\dots,a_N)p_\beta(b_1,\dots,b_N)= (a_1+\dots+a_N)^3(b_1^2+\dots+b_N^2)(b_1+\dots+b_N).

Our objective then is to calculate the numerical coefficients K_N(\alpha,\beta) of the polynomial K_N^d(A,B) which we will refer to as its Newton coefficients.

Problem 9.1. Show that K_N^d(A,B)=K_N^d(B,A), and use this to show that K_N(\alpha,\beta)=K_N(\beta,\alpha).

What is good about the Newton polynomials is that they connect us to point configurations, which is what we want. More precisely, if B=(b_1,\dots,b_N) \in \mathbb{R}^N is a real vector, then we can view it not as a point in N-dimensional space but as a configuration of particles b_1,\dots,b_N on the real line. The empirical distribution \eta_B of B is the discrete measure on \mathbb{R} which places unit mass at each particle, and the degree d moment of this measure is

\int\limits_{\mathbb{R}} x^d \eta_B(\mathrm{d}x) = b_1^d + \dots +b_N^d=p_d(b_1,\dots,b_N).

This means that writing the Maclaurin expansion of $K_N(z,A,B)$ in terms of Newton symmetric polynomials is exactly what we want to do, because it is expressing this function in terms of the empirical distributions of its arguments A,B, and the empirical distribution of a point configuration is a natural way to describe the “shape” of the configuration in the infinite particle limit where N \to \infty.

Unfortunately, there is no obvious way to calculate the Newton coefficients K_N(\alpha,beta). What we can do is compute the Maclaurin series of K_N(z,A,B) = K_N(z,a_1,\dots,a_N,b_1,\dots,b_N) by repeatedly differentiating under the integral sign with respect to z. Equivalently, we just expand the exponential function of z in the inetegrand and then integrate term-by-term, which gives

K_N(z,A,B) = 1 + \sum\limits_{d=1}^\infty \frac{z^d}{d!} \int\limits_{\mathbb{U}_N} \left(\mathrm{Tr} AUBU^* \right)^d \mathrm{d}U.

This gives us the formula

K_N^d(A,B) = \int\limits_{\mathbb{U}_N} \left(\mathrm{Tr} AUBU^* \right)^d \mathrm{d}U,

which at least reduces our initial problem of computing an exponential integral over \mathbb{U}_N to computing a polynomial integral over \mathbb{U}_N,

K_N^d(A,B) = \int\limits_{\mathbb{U}_N} \left(\sum\limits_{x,y=1}^N a_xb_y U_{xy} \bar{U}_{xy} \right)^d \mathrm{d}U,

but still there is no obvious path forward.

At this point we should ask ourselves what integrals over \mathbb{U}_N we actually can compute. To make the discussion as simple as possible, take N=1, so \mathbb{U}_1 is just the unit circle in \mathbb{C}. One thing we can confidently state is the orthogonality relation

\int\limits_{\mathbb{U}_1} U^k \bar{U}^l \mathrm{d}U = \int\limits_0^{2\pi} e^{i(k-l)\theta} \frac{\mathrm{d}\theta}{2\pi} = \delta_{kl}.

Representation theory gives us analogous formulas for all N, namely a system of multivariate polynomials which are orthogonal polynomials for integration with respect to Haar measure on \mathbb{U}_N. Let \mathsf{Y}_N be the set of all Young diagrams with at most N rows.

Defintion 9.1. For each \lambda \in \mathsf{Y}_N, the corresponding Schur symmetric polynomial is defined by

s_\lambda(t_1,\dots,t_N) = \frac{\det [t_j^{N-i+\lambda_i}]}{\det[t_j^{N-i}]},

where \lambda_1 \geq \lambda_2 \geq \dots \geq \lambda_N \geq 0 are the row lengths of \lambda.

Out of the many equivalent definitions of Schur polynomials, we are taking this one because it is a fairly explicit and elementary formula. Indeed, the determinant in the denominator is the Vandermonde determinant, which has the property that it divides any other antisymmetric polynomial in t_1,\dots,t_N to yield a symmetric polynomial.

Problem 9.2. Let \lambda=\Box be the single-cell Young diagram. Compute the Schur polynomial s_\Box(t_1,\dots,t_N) directly from Definition 9.1. Do the same for a few more small Young diagrams.

The disadvantage of Definition 9.1 is that it makes the orthogonality relations for Schur polynomials, which are really consequences of the fact that these polynomials are characters of irreducible representations of \mathbb{U}_N, into a theorem whose proof would take us to far afield. So we will have to treat this property as a black box. Given a matrix A \in \mathbb{C}^{N \times N}, let us write s_\lambda(A) for the evaluation s_\lambda(a_1,\dots,a_N) of the Schur polynomial s_\lambda(t_1,\dots,t_N) on the eigenvalues of A.

Theorem 9.1 (Schur orthogonality). For any two Young diagrams \lambda,\mu \in \mathsf{Y}_N and any two matrices A,B \in \mathbb{C}^{N \times N}, we have

\int\limits_{\mathbb{U}_N} s_\lambda(AUBU^*) \mathrm{d}U = \frac{s_\lambda(A)s_\lambda(B)}{\dim \mathbf{W}_N^\lambda}

and

\int\limits_{\mathbb{U}_N} s_\lambda(AU)s_\mu(BU^*) \mathrm{d}U =\delta_{\lambda\mu} \frac{s_\lambda(AB)}{\dim \mathbf{W}_N^\lambda},

where \dim \mathbf{W}_N^\lambda is the number of semi standard Young tableaux of shape \lambda with entries from \{1,\dots,N\}.

We also need a remarkable “multinomial formula” due to Frobenius. Let \mathsf{Y}_N^d= \mathsf{Y}^d \cap \mathsf{Y}_N be the set of Young diagrams with exactly d cells and at most N rows.

Theorem 9.2 (Frobenius formula). We have

(t_1 + \dots + t_N)^d = \sum\limits_{\lambda \in \mathsf{Y}_N^d} (\dim \mathbf{V}^\lambda) s_\lambda(t_1,\dots,t_N),

where \dim \mathbf{V}^\lambda is the number of standard Young tableaux of shape \lambda.

The reason the tableaux counts \dim \mathsf{V}^\lambda and \dim \mathsf{W}_N^\lambda are denoted this way is that the former is the dimension of an irreducible representation of the symmetric group \mathbb{S}^d and the latter is the dimension of an irreducible representation of the unitary group \mathbb{U}_N. At this point we have enough material to explicitly calculate a series expansion of K_N(z,A,B). This expansion is the starting point of Olshanski and Vershik’s asymptotic analysis, and is stated as Theorem 5.1 in their paper.

Theorem 9.3 (Schur expansion of K_N). We have

K_N(z,A,B) = \int\limits_{\mathbb{U}_N} e^{z \mathrm{Tr} AUBU^*} \mathrm{d}U=1+ \sum\limits_{d=1}^\infty \frac{z^d}{d!} \sum\limits_{\lambda \in \mathsf{Y}_N^d} s_\lambda(A)s_\lambda(B) \frac{\dim \mathbf{V}^\lambda}{\dim \mathbf{W}_N^\lambda}.

Proof: From the Theorem 9.2, we have

K_N^d(A,B) = \int\limits_{\mathbb{U}_N} (\mathrm{Tr} AUBU^*)^d \mathrm{d}U=\sum\limits_{\lambda \in \mathsf{Y}_N^d} (\dim \mathbf{V}^\lambda) \int\limits_{\mathbb{U}_N} s_\lambda(AUBU^*) \mathrm{d}U,

and applying the first integration formula in Theorem 9.1 this becomes

K_N^d(A,B) =  \sum\limits_{\lambda \in \mathsf{Y}_N^d} s_\lambda(A)s_\lambda(B) \frac{\dim \mathbf{V}^\lambda}{\dim \mathbf{W}_N^\lambda}.

– QED

Problem 9.3. Compute the Schur expansion of K_N up to O(z^4). This is straightforward but will give you a feel for the structure of the formula.

Let us include the analogue of Theorem 9.3 for the quantum Bessel kernel

K_{MN}(A,B) = \int\limits_{\mathbb{U}_{MN}} e^{i \mathrm{Tr}(A^*UBV^* + VB^*U^*A)} \mathrm{d}(U,V),

where M \geq N and the integration is against Haar measure on the product group \mathbb{U}_{MN}= \mathbb{U}_M \times \mathbb{U}_N. Here A,B \in \mathbb{R}^N are real vectors viewed as diagonal matrices in \mathbb{C}^{M \times N}. Since K_{MN}(A,B) is invariant under signed permutations of the coordinates of A=(a_1,\dots,a_N) and B=(b_1,\dots,b_N), we can assume without loss in generality that these are non increasing lists of nonnegative numbers. The quantum Bessel kernel is the characteristic function of a uniformly random M \times N rectangular matrix with singular values B=(b_1,\dots,b_N). Once again we can analytically continue this to

K_{MN}(z,A,B) = \int\limits_{\mathbb{U}_{MN}} e^{z \mathrm{Tr}(A^*UBV^* + VB^*U^*A)} \mathrm{d}(U,V),

where z \in \mathbb{C} and A,B \in \mathbb{C}^N are viewed as complex diagonal M \times N matrices in the integral. The following expansion of K_{MN} is not in the Olshanski-Vershik paper, but it can be obtained in essentially the same way; a more general result is proved here.

Theorem 9.4 (Schur expansion of K_{MN}). We have

K_{MN}(z,A,B) = \int\limits_{\mathbb{U}_N} e^{z \mathrm{Tr} AUBU^*} \mathrm{d}U=1+ \sum\limits_{d=1}^\infty \frac{z^{2d}}{d!d!} \sum\limits_{\lambda \in \mathsf{Y}_N^d} s_\lambda(A^*A)s_\lambda(B^*B) \frac{\dim \mathbf{V}^\lambda}{\dim \mathbf{W}_M^\lambda}\frac{\dim \mathbf{V}^\lambda}{\dim \mathbf{W}_N^\lambda}.

We close this lecture with a pointer to interesting work in the engineering literature which considers exactly these integrals in an applied context.