289A – Jonathan Novak

Protected: Math 289A: Lecture N-1

Math 289A: Formalism of Schur-Horn Arrays

The right way to set up the classical limit theorems of probability theory is to start with a triangular array of real random variables

$\begin{matrix} X_{11} & {} & {} \\ X_{21} & X_{22} & {} \\ X_{31} & X_{32} & X_{33} \\ \vdots & \vdots & \vdots\end{matrix}.$

Your first exposure to the law of large numbers and central limit theorem likely involved a different setup, namely an infinite sequence

$X_1,X_2,X_3,\dots$

of iid random variables, whose partial sums

$S_N=X_1 + \dots + X_N$

are the main object of interest. There are some non-negligible advantages to the triangular array setup, for example the fact that the rows do not have to be defined on a common underlying probability space. The $N \to \infty$ behavior of the row sums

$S_N = X_{N1}+ \dots + X_{NN}$

is the topic of slightly more advanced and flexible versions of the central limit theorem. This result and other classical limit theorems apply to the situation when the random variables in each row of the triangular array are independent or nearly independent.

We are interested in triangular arrays of random variables as above which arise from an underlying deterministic data set

$\begin{matrix} B_{11} & {} & {} \\ B_{21} & B_{22} & {} \\ B_{31} & B_{32} & B_{33} \\ \vdots & \vdots & \vdots\end{matrix}.$

The triangular array of random variables associated to this data has rows defined by

$\begin{bmatrix} X_{N1} \\ \vdots \\ X_{NN} \end{bmatrix} = \begin{bmatrix} {} & \vdots & {}\\ \dots & |U_{ij}|^2 & \dots \\ {} & \vdots & {}\end{bmatrix}\begin{bmatrix} B_{N1} \\ \vdots \\ B_{NN} \end{bmatrix}$

where $U_N=[U_{ij}]$ is a uniformly random unitary matrix of rank $N.$ Thus, the random vector $(X_{N1},\dots,X_{NN})$ is a random linear transformation of the deterministic data $(B_{N1},\dots,B_{NN}).$ Because the entries of $U_N$ are highly correlated (rows are orthonormal, columns are orthonormal) the random variables $X_{N1},\dots,X_{NN}$ are far from independent. They are, however, exchangeable.

There are two ways to think about a Schur-Horn array. Geometrically, $(X_{N1},\dots,X_{NN})$ is a random point inside the permutahedron in $\mathbb{R}^N$ generated by $(B_{N1},\dots,B_{NN}).$ Algebraically, $(X_{N1},\dots,X_{NN})$ is the diagonal of the random matrix $U_NB_NU_N^*$ , where $B_N=\mathrm{diag}(B_{N1},\dots,B_{NN}).$

We are interested in proving limit theorems for Schur-Horn arrays. Such theorems will describe $N \to \infty$ behavior of the rows of the $X$ -array in terms of prescribed $N \to \infty$ behavior of the rows of the $B$ -array. In particular, we are interested in the situation where the empirical distribution of the vector

$\left(\frac{B_{N1}}{N^t}, \dots \frac{B_{NN}}{N^t}\right)$

converges in moments as $N \to \infty,$ where $t \geq 0$ is a scaling exponent. The empirical distribution is the probability measure on $\mathbb{R}$ which places mass $1/N$ at each coordinate of the vector $(B_{N1},\dots,B_{NN}).$ If we think of these data points as beads on a wire, then the empirical measure of $S \subseteq \mathbb{R}$ is the fraction of beads which lie in $S.$ Convergence in moments of the empirical measure means that the the limit

$\gamma_d = \lim\limits_{N \to \infty} \frac{p_d(\left(\frac{B_{N1}}{N^t}, \dots \frac{B_{NN}}{N^t}\right)}{N}$

exists for each fixed $d \in \mathbb{N},$ where $p_d(z_1,\dots,z_N)=z_1^d + \dots + z_N^d$ is the power sum polynomial of degree $d$ in $N$ variables.

Let $\langle \cdot \rangle$ denote expected value, and let

$K_N(a_1,\dots,a_N) = \left\langle e^{i(a_1X_{N1}+\dots+a_NX_{NN})}\right\rangle$

be the characteristic function of $(X_{N1},\dots,X_{NN}).$ Because the distribution of $(X_{N1},\dots,X_{NN})$ in $\mathbb{R}^N$ is compactly supported, $K_N$ is an analytic function on $\mathbb{R}^N.$ The Maclaurin expansion of the characteristic function is

$K_N(a_1,\dots,a_N) = 1+\sum\limits_{i=1}^\infty \frac{i^d}{d!} K_N^d(a_1,\dots,a_N),$

where

$K_N^d(a_1,\dots,a_N) = \sum\limits_\alpha a_1^{\alpha_1} \dots a_N^{\alpha_N} \langle X_{N1}^{\alpha_1}\dots X_{NN}^{\alpha_N}\rangle {d \choose \alpha_1,\dots,\alpha_N}$

and the sum is over all vectors $\alpha=(\alpha_1,\dots,\alpha_N)$ of nonnegative integers whose coordinates sum to $N.$ Thus, $K_N^d(a_1,\dots,a_N)$ is a homogeneous degree $d$ polynomial in $N$ variables whose coefficients are, up to a transparent factor, the degree $d$ joint moments of the random variables $X_{N1},\dots,X_{NN}$ . Since $K_N(0,\dots,0)=1,$ the logarithm of $L_N(a_1,\dots,a_N)=\log K_N(a_1,\dots,a_N)$ of the characteristic function is defined and analytic on an open neighborhood of the origin in $\mathbb{R}^N.$ We write the Maclaurin expansion of $L_N$ as

$L_N(a_1,\dots,a_N) = \sum\limits_{d=1}^\infty \frac{i^d}{d!} L_N^d(a_1,\dots,a_N),$

where $L_N^d(a_1,\dots,a_N)$ is a homogeneous degree $d$ polynomial in $N$ variables. The polynomials $L_N^1,L_N^2,L_N^3,\dots$ are called the cumulant polynomials of the random variables $X_{N1},\dots,X_{NN}.$ They are determined by the moment polynomials $K_N^1,K_N^2,K_N^3,\dots$ according to a certain recursion explained on pages 2 and 3 of these notes. We write the cumulant polynomials in the form

$K_N^d(a_1,\dots,a_N) = \sum\limits_\alpha a_1^{\alpha_1} \dots a_N^{\alpha_N} \langle X_{N1}^{\alpha_1}\dots X_{NN}^{\alpha_N}\rangle_c {d \choose \alpha_1,\dots,\alpha_N},$

where the quantities $\langle X_{N1}^{\alpha_1}\dots X_{NN}^{\alpha_N}\rangle_c$ are the degree $d$ joint cumulants of the random variables $X_{N1},\dots,X_{NN}$ as $\alpha$ ranges over nonnegative integer vectors of sum $d$ . Note that this is a definition, not a proposition. In statistical mechanics and quantum field theory, joint cumulants are referred to as connected correlation functions.

The main fact about Schur-Horn arrays that allows us to understand them is a second formula for the cumulant polynomials $K_N^d(a_1,\dots,a_N)$ of the random variables $X_{N1},\dots,X_{NN}$ in terms of the moments of the empirical distribution of the deterministic data $(B_{N1},\dots,B_{NN}).$

Theorem (Cumulant Formula for Schur-Horn Arrays). For each $1 \leq d \leq N,$ we have

$K_N^d(a_1,\dots,a_N) = N^{2-d}\sum\limits_\alpha \frac{p_\alpha(a_1,\dots,a_N)}{N^{\ell(\alpha)}}\sum\limits_\beta \frac{p_\beta(B_{N1},\dots,B_{NN})}{N^{\ell(\beta)}}L_N(\alpha,\beta),$

where the sums are over nonnegative integer vectors $\alpha=(\alpha_1,\dots,\alpha_N)$ and $\beta=(\beta_1,\dots,\beta_N)$ with non-increasing coordinates which sum to $d,$ and $\ell(\alpha)$ and $\ell(\beta)$ are the number of nonzero coordinates of $\alpha$ and $\beta$ respectively. For each such pair $\alpha,\beta$ the polynomials $p_\alpha(z_1,\dots,z_N)$ and $p_\beta(b_1,\dots,b_N)$ are the corresponding products of power sum symmetric polynomials, i.e.

$p_\alpha = \prod\limits_{i=1}^{\ell(\alpha)} p_{\alpha_i}\qquad\text{and}\qquad p_\alpha = \prod\limits_{i=1}^{\ell(\beta)} p_{\beta_i}.$

Finally,

$L_N(\alpha,\beta) = (-1)^{\ell(\alpha)+\ell(\beta)} \sum\limits_{g=0}^\infty N^{-2g} \vec{H}_g(\alpha,\beta)$

is a convergent series in which the coefficients $\vec{H}_g(\alpha,\beta)$ are positive integers called monotone Hurwitz numbers which admit an explicit combinatorial description.

The cumulant formula for Schur-Horn arrays looks complicated, but it is fairly easy to use in practice. For example, let us set our scaling exponent to $t=1/2.$ Then, we for any fixed $r \in \mathbb{N}$ we have

$K_N^d(a_1,\dots,a_r,0,\dots,0) = N^{1-\frac{d}{2}}\sum\limits_\alpha \frac{p_\alpha(a_1,\dots,a_N)}{N^{\ell(\alpha)-1}}\sum\limits_\beta \frac{p_\beta(\frac{B_{N1}}{\sqrt{N}},\dots,\frac{B_{NN}}{\sqrt{N}})}{N^{\ell(\beta)}}L_N(\alpha,\beta).$

As we saw (twice) in lecture, you can calculate the $N \to \infty$ limit of this polynomial. Assuming $\gamma_1=0$ and $\gamma_1=1,$ the only nonzero limit is

$K_N^d(a_1,\dots,a_N) \longrightarrow a_1^2 + \dots + a_r^2.$

This means that

$\lim\limits_{N \to \infty} \langle X_{N1}^{\alpha_1}\dots X_{Nr}^{\alpha_N}\rangle_c = 0$

unless $\alpha=(\alpha_1,\dots,\alpha_r)$ is a nonnegative integer vector with total sum $2$ and only one nonzero coordinate. From this we conclude that for any fixed $r \in \mathbb{N},$ the random variables $X_{N1},\dots,X_{Nr}$ converge in joint distribution to $r$ iid standard Gaussians.

When our scaling exponent is $t=1,$ we do not get such a simple Gaussian limit. Rather, we find that

$\lim\limits_{N \rightarrow \infty} \frac{1}{N}K_N^d = (a_1^d + \dots + a_r^d) \kappa_d$

for each fixed $r \in \mathbb{N},$ where

$\kappa_d = \sum\limits_{\beta \vdash d} \gamma_\beta (-1)^{d+\ell(\beta)} \vec{H}_0(d,\beta),$

where $\gamma_\beta = \gamma_{\beta_1} \gamma_{\beta_2} \gamma_{\beta_3} \dots.$ Thus we still have a sum of pure powers indicating asymptotic independence between the random variables $X_{N1},\dots,X_{Nr},$ but the cumulants of these random variables are asymptotically $N \kappa_d.$ We are going to see that

$\kappa_1,\kappa_2,\kappa_3,\dots$

is the free cumulant sequence corresponding to the moment sequence

$\gamma_1,\gamma_2,\gamma_3,\dots.$

Math 289A: Lecture 15

Today we want to prove the “inverse Wigner theorem” we started discussing last time. Let $(B_N)_{N=1}^\infty$ be a sequence of traceless deterministic matrices, and for each $N \in \mathbb{N}$ let $X_N=U_NB_NU_N^*$ be a uniformly random Hermitian matrix isospectral with $B_N.$ Suppose that the empirical eigenvalue distribution of $\frac{1}{\sqrt{N}}B_N$ converges in moments as $N \to \infty,$ meaning that the limit

$\gamma_d=\lim\limits_{N \to \infty} \frac{1}{N} \mathrm{Tr} \left(\frac{1}{\sqrt{N}}B_N\right)^d$

exists for each $d \in \mathbb{N}.$ Equivalently, we have

$\gamma_d=\lim\limits_{N \to \infty} \frac{p_d\left( \frac{B_{N1},}{\sqrt{N}},\dots,\frac{B_{NN}}{\sqrt{N}}\right)}{N}$

with $B_{N1},\dots,B_{NN}$ any enumeration of the eigenvalues of $B_N$ and $p_d(x_1,\dots,x_N)=x_1^d + \dots + x_N^d$ the power sum polynomial of degree $d.$ Let $X_{N1},\dots,X_{NN}$ denote the diagonal matrix elements of the random matrix $X_N,$ so

$\begin{bmatrix} X_{N1} \\ \vdots \\ X_{NN} \end{bmatrix} = \begin{bmatrix} {} & \vdots & {} \\ \dots & |U_N(i,j)|^2 & \dots \\ {} & \vdots & {}\end{bmatrix}\begin{bmatrix} B_{N1} \\ \vdots \\ B_{NN} \end{bmatrix}$

with $U_N$ a uniformly random unitary matrix.

Theorem 15.1. For any fixed $r \in \mathbb{N},$ the random vector $(X_{N1},\dots,X_{Nr})$ converges in moments to a vector $(Y_1,\dots,Y_r)$ of iid centered Gaussians of variance $\gamma_2.$ More precisely, we have

$\lim\limits_{N \to \infty} \left\langle X_{N1}^{\alpha_1} \dots X_{Nr}^{\alpha_r}\right\rangle = \left\langle Y_1^{\alpha_1} \dots Y_r^{\alpha_r}\right\rangle$

for any vector $(\alpha_1,\dots,\alpha_r)$ of nonnegative integers.

Here is a corollary which further explains why we are calling Theorem 15.1 an inverse Wigner theorem.

Corollary 15.1. For any $r \in \mathbb{N},$ the $r \times r$ principal submatrix of $X_N$ converges to $\mathrm{GUE}_r(\gamma_2),$ i.e. to an $r \times r$ random Hermitian matrix following a centered Gaussian distribution of variance $\gamma_2.$

Problem 15.1. Deduce Corollary 15.1 from Theorem 15.1.

The rest of the lecture consists of the proof of Theorem 15.1. First, let us explain the assumption that our deterministic data $B_N$ has trace zero. The random variables $X_{N1},\dots,X_{NN}$ are identically distributed (but not independent), with first moment

$\langle X_{N1} \rangle = \frac{B_{N1}+\dots+B_{NN}}{N}= \frac{1}{N}\mathrm{Tr}\, B_N.$

So, assuming $\mathrm{Tr}\, B_N=0$ is equivalent to centering $X_{N1},\dots,X_{NN}.$

Now we start the proof. The joint distribution of $X_{N1},\dots,X_{NN}$ is a compactly supported probability measure on $\mathbb{R}^N$ — indeed, it is supported on the compact convex set whose extreme points are the distinct permutations of the deterministic vector $(B_{N1},\dots,B_{NN}).$ Therefore, the Laplace transform

$K_N(z,a_1,\dots,a_N) = \left\langle e^{z(a_1X_{N1} + \dots + a_NX_{NN})}\right\rangle$

is an entire function of the complex variables $z,a_1,\dots,a_N,$ i.e. $K_N$ is an analytic function on $\mathbb{C}^{N+1}.$ Note that if we set $z=i$ and restricting the remaining variables $(a_1,\dots,a_N)$ , the Laplace transform specializes to the characteristic function of $(X_{N1},\dots,X_{NN}),$ so that it completely determines the distribution of this random vector. The variable $z$ is redundant, but it is a convenient marker of homogeneity: we write the Maclaurin expansion of $K_N$ as

$K_N(z,a_1,\dots,a_N) = 1 + \sum\limits_{d=1}^\infty \frac{z^d}{d!} K_N^d(a_1,\dots,a_N),$

where $K_N^d \colon \mathbb{C}^N \to \mathbb{C}$ is a homogeneous polynomial of degree $d$ . Indeed, this polynomial is

$K_N^d(a_1,\dots,a_N) = \left\langle (a_1X_{N1} + \dots + a_NX_{NN})^d \right\rangle,$

which in the standard monomial basis of homogeneous polynomials is

$K_N^d(a_1,\dots,a_N) = \sum\limits_{\alpha} a_1^{\alpha_1} \dots a_N^{\alpha_N} \langle X_{N1}^{\alpha_1} \dots X_{NN}^{\alpha_N}\rangle {d \choose \alpha_1,\dots,\alpha_N},$

the sum being over all vectors $\alpha=(\alpha_1,\dots,\alpha_N)$ of nonnegative integers whose coordinates sum to $d$ , aka weak $N$ -part compositions of $d.$ We call $K_N^d$ the degree $d$ moment polynomial of $X_{N1},\dots,X_{NN},$ because its coefficients in the monomial basis are the degree $d$ joint moments of these random variables (up to a multinomial coefficient). So our objective is to understand the $N \to \infty$ behavior of this polynomial restricted to the $r$ -dimensional subspace of $\mathbb{C}^N$ spanned by the first $r$ coordinate vectors, where $r \in \mathbb{N}$ is arbitrary but fixed. To be completely explicit, this is the polynomial

$K_N^d(a_1,\dots,a_r) := K_N^d(a_1,\dots,a_r,0,\dots,0)=\sum\limits_{\alpha} a_1^{\alpha_1} \dots a_r^{\alpha_r} \langle X_{N1}^{\alpha_1} \dots X_{Nr}^{\alpha_r}\rangle {d \choose \alpha_1,\dots,\alpha_r},$

the sum being over weak $r$ -part compositions of $N.$ This is an important simplification, since the number of variables of $K_N^d(a_1,\dots,a_r)$ is fixed, i.e. its domain does not vary with $N.$

In fact, it is more convenient to work with the log-Laplace transform of $X_{N1},\dots,X_{NN}.$ More precisely, since $K_N(0,\dots,0)=1,$ there exists a simply connected open neighborhood $\Omega_N$ of the origin in $\mathbb{C}^{N+1}$ on which $K_N$ is non vanishing, and there is a unique analytic function $L_N \colon \Omega_N \to \mathbb{C}$ vanishing at the origin which satisfies

$K_N(z,a_1,\dots,a_N) = e^{L_N(z,a_1,\dots,a_N)}, \quad (z,a_1,\dots,a_N) \in \Omega_N.$

Let

$L_N(z,a_1,\dots,a_N) = \sum\limits_{d=1}^\infty \frac{z^d}{d!} L_N^d(a_1,\dots,a_N)$

be the Maclaurin expansion of this analytic function, so $L_N^d(a_1,\dots,a_N)$ is a homogeneous degree $d$ polynomial. This is called the degree $d$ cumulant polynomial of the random variables $X_{N1},\dots,X_{NN}$ . According to the Exponential Formula (aka moment-cumulant formula), these cumulants are recursively determined by the moment polynomials by a triangular system of equations, the first three of which are (suppressing the variables $a_1,\dots,a_N$ )

$K_N^1=L_N^1,\ K_N^2=L_N^2+L_N^1L_N^1,\ K_N^3 =L_N^3+3L_N^2L_N^1+L_N^1L_N^1L_N^1.$

We can write the polynomial $L_N^d$ as

$L_N^d(a_1,\dots,a_N) = \sum\limits_{\alpha} \sum\limits_{\alpha} a_1^{\alpha_1} \dots a_N^{\alpha_N} \langle X_{N1}^{\alpha_1} \dots X_{NN}^{\alpha_N}\rangle_c {d \choose \alpha_1,\dots,\alpha_N}$

except for the subscript on the expectation-like quantity $\langle X_{N1}^{\alpha_1} \dots X_{NN}^{\alpha_N}\rangle_c$, which is known as a degree $d$ joint cumulant of $X_{N1},\dots,X_{NN}.$ Note that this is a definition, not a proposition – the relationship between the Maclaurin expansion of the Laplace transform and the log-Laplace transform implicitly defines these statistics.

Problem 15.2. Show that $\langle X_{N1}X_{N2}\rangle_c=\langle X_{N1}X_{N2}\rangle-\langle X_{N1}\rangle \langle X_{N2}\rangle$ is the covariance of $X_{N1}$ and $X_{N2}.$

Our discussion so far pertains to any finite family of random variables with compactly supported joint distribution. Now we will start invoking special features of the family $X_{N1},\dots,X_{NN}$ we are actually dealing with. The first of these is that these random variables are exchangeable, which is equivalent to saying that the cumulant polynomials $L_N^d(a_1,\dots,a_N)$ are symmetric polynomials. This means that we can also expand them as linear combinations of Newton polynomials, and as previously discussed have a powerful understanding of this expansion.

Theorem 15.2 (Leading Cumulants Formula). For any $1 \leq d \leq N,$ we have

$L_N^d(a_1,\dots,a_N) = N^{2-d} \sum\limits_{\alpha \vdash d} \frac{p_\alpha(a_1,\dots,a_N)}{N^{\ell(\alpha)}}\sum\limits_{\beta \vdash d} \frac{p_\beta(B_{N1},\dots,B_{NN})}{N^{\ell(\beta)}} L_N(\alpha,\beta),$

where each sum is over the set of integer partitions of $d$ and

$L_N(\alpha,\beta)=(-1)^{\ell(\alpha)+\ell(\beta)} \sum\limits_{g=0}^\infty N^{-2g} \vec{H}_g(\alpha,\beta)$

is a convergent power series in $N^{-2}$ whose coefficients $\vec{H}_g(\alpha,\beta)$ are positive integers.

The combinatorial coefficients $\vec{H}_g(\alpha,\beta)$ are quite intricate and there is no simple closed formula for them (you can read this if you want to know more), but luckily we do not need to know very much about them in order to prove our inverse Wigner theorem. All that matters for us is that this magic formula is expressing the cumulant polynomials of $X_{N1},\dots,X_{NN}$ in terms of the moments of the empirical eigenvalue distribution of our deterministic data matrices $B_N$ in a quite transparent way.

To leverage this, let us restrict to the cumulant polynomials of $X_{N1},\dots,X_{Nr})$ , which just means setting the last $N-r$ variables of $L_N^d(a_1,\dots,a_N)$ equal to zero (this eventually makes sense because $r$ is fixed and $N$ increases without bound). We write the resulting polynomial in the form

$L_N^d(a_1,\dots,a_r) = N^{1-\frac{d}{2}}\sum\limits_{\alpha \vdash d} \frac{p_\alpha(a_1,\dots,a_r)}{N^{\ell(\alpha)-1}}\sum\limits_{\beta \vdash d} \frac{p_\beta(\frac{B_{N1}}{\sqrt{N}},\dots,\frac{B_{NN}}{\sqrt{N}})}{N^{\ell(\beta)}} L_N(\alpha,\beta).$

Ignore the factor of $N^{1-\frac{d}{2}}$ for a moment and just think about the behavior of the double sum as $N \to \infty$ . The number of terms in this sum is independent of $N$ , which means that we only need to understand the asymptotics of each individual term

$\frac{p_\alpha(a_1,\dots,a_r)}{N^{\ell(\alpha)-1}}\frac{p_\beta(\frac{B_{N1}}{\sqrt{N}},\dots,\frac{B_{NN}}{\sqrt{N}})}{N^{\ell(\beta)}} L_N(\alpha,\beta).$

Reading from right to left, we have that $L_N(\alpha,\beta) \to \vec{H}_0(\alpha,\beta)$ as $N \to \infty.$ By hypothesis, we have

$\frac{p_\beta(\frac{B_{N1}}{\sqrt{N}},\dots,\frac{B_{NN}}{\sqrt{N}})}{N^{\ell(\beta)}} \longrightarrow \prod_{i=1}^{\ell(\beta)}\gamma_{\beta_i}=:\gamma_\beta.$

Finally, since

$|p_\alpha(a_1,\dots,a_r)| \leq r^{\ell(\alpha)} \|(a_1,\dots,a_r)\|_\infty,$

we have

$\frac{p_\alpha(a_1,\dots,a_r)}{N^{\ell(\alpha)-1}} = O\left(\frac{1}{N^{\ell(\alpha)-1}}\right)$

as $N \to \infty.$ Thus, the whole double sum is $O(1)$ as $N \to \infty,$ so now is a good time to remember the prefactor $N^{1-\frac{d}{2}}$ and immediately conclude that

$L_N^d(a_1,\dots,a_r) \longrightarrow 0$

as $N \to \infty$ , uniformly on compact subsets of $\mathbb{C}^r,$ for all $d \geq 3.$ For $d=2,$ we have

$L_N^2(a_1,\dots,a_N) \to p_2(a_1,\dots,a_N)(\gamma_2\vec{H}_0(2,2)+\gamma_1\gamma_1\vec{H}_0(2,11)),$

uniformly on compact subsets of $\mathbb{C}^r,$ and since $\gamma_1=0$ and $\vec{H}_0(d,d)=(d-1)!$ this simplifies to

$L_N^2(a_1,\dots,a_N) \to p_2(a_1,\dots,a_N)\gamma_2.$

Finally, the polynomial $L_N^1(a_1,\dots,a_N)$ is identically zero by our trace zero hypothesis. What we have shown is that all joint cumulants of $X_{N1},\dots,X_{Nr}$ converge to zero as $N \to \infty$ except for pure cumulants of degree two,

$\langle X_{N1}^2\rangle_c,\dots,\langle X_{Nr}\rangle_c,$

each of which converges to $\gamma_2.$

Problem 15.3. Compute the joint cumulants of a finite family of centered iid Gaussians and complete the proof.

Math 289A: Lecture 14

In this lecture I wanted to connect our (admittedly meandering) discussion so far to the standard approach to limit theorems in random matrix theory. Let $X_N$ be a random Hermitian matrix with characteristic function

$\varphi_{X_N}(T) = e^{-\frac{1}{2} \mathrm{Tr}\, T^2}, \quad T \in \mathbb{H}_N.$

This means precisely that the distribution of $X_N$ in $\mathbb{H}_N$ with Hilbert-Schmidt scalar product is the standard Gaussian measure on this Euclidean space.

Definition 14.1. The Gaussian Unitary Ensemble is the sequence $(X_N)_{N=1}^\infty.$

The word “unitary” here means that the distribution of $X_N$ is unitarily invariant. Let $B_{N1} \geq \dots \geq B_{NN}$ be the eigenvalues of $X_N.$

Theorem 14.1 (Wigner). For each $d \in \mathbb{N},$ the limit

$\gamma_d = \lim\limits_{N \to \infty} \mathbf{E}\frac{p_d(\left( \frac{B_{N1}}{\sqrt{N}},\dots,\frac{B_{NN}}{\sqrt{N}}\right)}{N}$

exits and is given by

$\gamma_d = \begin{cases} 0, \text{ if }d \text{ odd}\\ \mathrm{Cat}(\frac{d}{2}), \text{ if }d \text{ even} \end{cases}.$

In Theorem 14.1, $p_d(x_1,\dots,x_N)=x_1^d+\dots+x_N^d$ is the power sum symmetric polynomial of degree $d$ , and $\mathrm{Cat}(k)=\frac{1}{k+1}{2k \choose k}$ is the $k$ th Catalan number. There are two things we should discuss: what the result means, and how to prove it.

Concerning the meaning of Theorem 14.1, the random variable

$\frac{p_d(B_{N1},\dots,B_{NN})}{N} = \frac{B_{N1}^d + \dots + B_{NN}^d}{N}$

is the degree $d$ moment of the empirical eigenvalue distribution of $X_N,$ which is the random discrete probability measure $\eta_N$ on $\mathbb{R}$ which places equal mass at each eigenvalue. Thus, Theorem 14.1 is saying that each moment of the empirical eigenvalue distribution of the scaled random matrix $\frac{1}{\sqrt{N}}X_N$ converges in expectation to an explicit limit. It is therefore natural to ask whether $\gamma_1,\gamma_2,\gamma_3,\dots$ is the moment sequence of a probability measure on $\mathbb{R}.$ To answer this, one can recognize that the exponential generating function

$E(z) = \sum\limits_{d=0}^\infty z^d \gamma_d = \sum_{k=0}^\infty \frac{z^{2k}}{(k+1)!k!}$

is essentially a Bessel function. Then, a classical integral representation for Bessel functions allows one to determine that $(\gamma_d)_{d=1}^\infty$ is the moment sequence of the probability measure $\omega$ on $\mathbb{R}$ which is supported on $[-2,2]$ with density

$\omega(\mathrm{d}x) = \frac{1}{2\pi}\sqrt{4-x^2}.$

You can find the details of this derivation here, towards the end of Lecture One.

Now we discuss how one can prove Theorem 14.1. The proof uses the fact that power sums in the eigenvalues of a matrix are also traces of powers of that matrix. This means that we have

$\frac{p_d(\left( \frac{B_{N1}}{\sqrt{N}},\dots,\frac{B_{NN}}{\sqrt{N}}\right)}{N}=\frac{1}{N} \mathrm{Tr} (\frac{1}{\sqrt{N}}X_N)^d,$

the point being that the right hand side is also a polynomial in the elements $X_N(i,j)$ of the Gaussian random matrix $X_N$ . Indeed, by matrix algebra we have

$\mathrm{Tr}\, X_N^d = \sum\limits_\phi X_N(\phi(1),\phi(2))X_N(\phi(2),\phi(3)) \dots X_N(\phi(d),\phi(1)),$

where the sum is over all functions

$\phi \colon \{1,\dots,d\} \longrightarrow \{1,\dots,N\}.$

Using the fact that the matrix elements of $X_N$ are Gaussian and independent (up to selfadjointness), the following can be established.

Theorem 14.2 (Harer-Zagier). For each $d \in \mathbb{N},$ we have

$\mathbf{E}\left[\frac{1}{N} \mathrm{Tr} (\frac{1}{\sqrt{N}}X_N)^d\right] = \sum\limits_{g=0}^\infty N^{-2g} E_g^d,$

where $E_g^d$ is the number of ways to glue the edges of a $d$ -gon in pairs so as to produce a compact orientable surface of genus $g.$ The proof is a beautiful piece of mathematics and a nice treatment can be found here. Theorem 14.2 implies Theorem 14.1 because the number of ways to glue a sphere from a polygon with $2k$ sides is $\mathrm{Cat}(k),$ and only the genus zero term survives in the $N \to \infty$ limit.

There is something unsatisfying about the above sequence of ideas. We know that the entire distribution of $X_N$ is completely determined by the joint distribution of its diagonal matrix elements

$X_N(1,1),\dots,X_N(N,N),$

which are simply iid standard Gaussians. In fact, at this point in the course we know much more: the spectral analysis of unitarily invariant random Hermitian matrices is equivalent to the analysis of pairs

$\begin{matrix} X_{11} & {} & {} \\ X_{21} & X_{22} & {} \\ X_{31} & X_{32} & X_{33} \\ \vdots & \vdots & \vdots \end{matrix}$

and

$\begin{matrix} B_{11} & {} & {} \\ B_{21} & B_{22} & {} \\ B_{31} & B_{32} & B_{33} \\ \vdots & \vdots & \vdots \end{matrix},$

of triangular arrays of real random variables related by

$\begin{bmatrix} X_{N1} \\ \vdots \\ X_{NN}\end{bmatrix}=\begin{bmatrix} {} & \vdots & {} \\ \dots & |U_N(i,j)|^2 & \dots \\ {} & \vdots & {} \end{bmatrix}\begin{bmatrix} B_{N1} \\ \vdots \\ B_{NN}\end{bmatrix}$

where $U_N=[U_N(i,j)]$ is a uniformly random unitary matrix. Initially, $X_{N1},\dots,X_{NN}$ were viewed as the diagonal matrix elements of a random Hermitian matrix with unitarily invariant distribution and $B_{N1},\dots,B_{NN}$ were the eigenvalues of this selfadjoint matrix. But, we can in fact forget about this original setup entirely, and simply be probabilists thinking about two randomly coupled triangular arrays of real random variables. The question then how to recover Theorem 14.1 simply from the given fact that our $X$ -array has rows $X_N=(X_{N1},\dots,X_{NN})$ of iid standard Gaussians.

This point of view is why the paper of Olshanski-Vershik is so relevant for our purposes, and so far ahead of its time: they are proving inverse theorems of this kind. More precisely, they are considering the case where the $B$ -array is a given deterministic data set, and proving limit theorems about the corresponding $X$ -array. A special case of their main result is the following “inverse Wigner theorem.”

Theorem 14.3 (Olshanski-Vershik). Suppose that the limit

$\gamma_d = \lim\limits_{N \to \infty}\frac{p_d(\left( \frac{B_{N1}}{\sqrt{N}},\dots,\frac{B_{NN}}{\sqrt{N}}\right)}{N}$

exists for each $d \in \mathbb{N}.$ Then, for any fixed $r \in \mathbb{N}$ , the random variables $X_{N1},\dots,X_{Nr}$ converge in distribution to an $r$ -tuple $Y_1,\dots,Y_r$ of iid centered Gaussians with variance $\gamma_2+\gamma_1^2.$

We have proved the one-dimensional version of this Gaussian limit theorem (the case $r=1$ ), and we will finish the proof of the multivariate version next time (the case $r>1$ ). In our approach to these inverse results of Olshanski-Vershik we are using a combinatorial result which is analogous to Theorem 14.2, but arguably even more miraculous: a universal genus expansion for joint cumulants of the $X$ -array in our theory of unistochastically coupled triangular arrays.

Math 289A: Lecture 13

Let $(B_N)_{N=1}^\infty$ be a sequence of deterministic Hermitian matrices, and let $b_{N1},\dots,b_{NN}$ be any enumeration of the eigenvalues of $B_N \in\mathbb{H}_N.$ Consider the corresponding sequence $(X_N)_{N=1}^\infty$ of random Hermitian matrices defined by $X_N=U_NB_NU_N^*,$ where $U_N$ is a random unitary matrix whose distribution in $\mathbb{U}_N$ is Haar measure. We then have a corresponding triangular array of real random variables,

$\begin{matrix} X_{11} & {} & {} \\ X_{21} & X_{22} & {} \\ X_{31} & X_{32} & X_{33} \\ \vdots & \vdots & \vdots \end{matrix},$

whose $N$ th row

$X_{N1}=X_N(1,1), \dots, X_{NN} = X_N(N,N)$

consists of the diagonal elements of the random matrix $X_N.$ In terms of our input data $B_N,$ we have

$X_{Nj} = \sum\limits_{k=1}^N |U_N(j,k)|^2 b_{Nk}.$

This is superficially similar to the setup for the Central Limit Theorem, but it is different: $X_{N1},\dots,X_{NN}$ are exchangeable but not independent (make sure you understand why). Thus our first objective is simply to understand the $N \to \infty$ asymptotic distribution of $X_{N1},$ the $(1,1)$ -matrix element of a uniformly random Hermitian matrix with spectrum $B_N.$

In this lecture I will use the angled brackets notation favored by physicists to denote expectation. I will also use the angled bracket with a subscript “ $c$ ” for cumulants (the subscript could also stand for connected). So for example the variance of $X_{N1}$ is

$\langle X_{N1}^2\rangle_c = \langle X_{N1}^2\rangle - \langle X_{N1}\rangle\langle X_{N1}\rangle.$

Problem 13.1. Prove that $\langle X_{N1} \rangle = \frac{1}{N}\mathrm{Tr}\,B_N.$ Also show that for $d > 1$ we have $\langle X_{N1}^d + x\rangle_c = \langle X_{N1}^d\rangle_c$ for any constant $x$ (this translation invariance is a general property of cumulants).

Our Optimized Leading Cumulants Formula says that for any $1 \leq d \leq N$ we have

$\langle X_{N1}^d \rangle_c = N^{2-d} \sum\limits_{\alpha,\beta \vdash d} \frac{p_\beta(b_{N1},\dots,b_{NN})}{N^{\ell(\alpha)+\ell(\beta)}}L_N(\alpha,\beta),$

where

$p_\beta(b_{N1},\dots,b_{NN}) = \prod\limits_{i=1}^{\ell(\beta)} \mathrm{Tr}(B_N^{\beta_i})$

and

$L_N(\alpha,\beta) = (-1)^{\ell(\alpha)+\ell(\beta)} \sum_{g=0}^\infty N^{-2g}\vec{H}_g(\alpha,\beta)$

is $\pm 1$ times a convergent positive series.

What is nice about the Optimized Leading Cumulants Formula is that it almost immediately suggests the possibility of Gaussian limiting behavior: because of the factor $N^{2-d}$ in the formula, it looks like cumulants of degree $d \geq 3$ should vanish in the $N \to \infty$ limit, which is the Gaussian signature. To actually establish this we need to determine how the rest of the formula behaves as $N$ grows large.

This is not so difficult. We have the finite sum

$\sum\limits_{\alpha,\beta \vdash d} \frac{p_\beta(b_{N1},\dots,b_{NN})}{N^{\ell(\alpha)+\ell(\beta)}}L_N(\alpha,\beta),$

and the number of terms in the sum has no dependence on $N$ . The quantity $L_N(\alpha,\beta)$ is $O(1)$ as $N \to \infty$ (make sure you understand why). This leaves the fraction

$\frac{p_\beta(b_{N1},\dots,b_{NN})}{N^{\ell(\alpha)+\ell(\beta)}}.$

The denominator is literally just a power of $N$ . As for the numerator, we have the bound

$|p_\beta(b_{N1},\dots,b_{NN})| \leq N^{\ell(\beta)} \|B_N\|^d,$

where $\|B_N\| = \max |b_{Nj}|$ is the spectral radius of $B_N.$ Thus, we have

$\left| \frac{p_\beta(b_{N1},\dots,b_{NN})}{N^{\ell(\alpha)+\ell(\beta)}} \right| \leq \frac{\|B_N\|^d}{N^{\ell(\alpha)}} \leq \frac{\|B_N\|^d}{N}.$

We therefore have the cumulant bound

$\left| \langle X_{N1}^d\rangle_c\right| \leq t_d N^{1-d} \|B_N\|^d$

where $t_d>0$ depends only on $d.$

Theorem 13.1. If the input sequence $B_N$ of Hermitian matrices is such that $\|B_N\| \leq M\sqrt{N}$ for some constant $M$ and all $N \in \mathbb{N},$ and if $\gamma_2 = \lim_N N^{-1} \mathrm{Tr} B_N^2$ exists, then the centered random variable $Y_N = X_{N1} - N^{-1}\mathrm{Tr} B_N$ converges to a centered Gaussian of variance $\gamma_2.$

Proof: By definition the first cumulant of $Y_N$ is zero for every $N \in \mathbb{N},$ and from translation invariance of cumulants and the bound established above we have that all cumulants of degree $d >2$ converge to zero. From the Optimized Leading Cumulants formula and the estimates above, the only $O(1)$ term in the second cumulant is exactly $N^{-1}\mathrm{Tr} B_N^2.$ Therefore, $Y_N$ converges to a centered Gaussian random variable $Y$ of variance $\gamma_2$ as $N \to \infty,$ and the convergence holds in distribution by the Moment Method.

-QED

Math 289A: Lecture 12

This week we will finish working through the Olshanski-Vershik limit theorem, and explain how the tools we have developed along the way have set us up to prove other kinds of limit theorems about random matrices.

Recall the setting of the OV paper: we have an infinite triangular array of real numbers,

$\begin{matrix} b_{11} & {} {} \\ b_{21} & b_{22} &{} \\ b_{31} & b_{32} & b_{33} \\ \vdots & \vdots & \vdots \end{matrix}.$

Assume the entries are sorted so that each row is non-increasing from left to right and let

$B_N=(b_{N1},\dots,b_{NN})$

be the $N$ th row. Thinking of $B_N \in \mathbb{W}_N$ as a diagonal matrix, define a corresponding random Hermitian matrix by

$X_N=U_NB_NU_N^*.$

We know that in the characteristic function of $X_N,$

$\varphi_{X_N}(A) = \int\limits_{\mathbb{H}_N} e^{i \mathrm{Tr}\, AUB_NU^*}\mathrm{d}U,\quad A \in \mathbb{H}_N,$

we can assume without loss in generality that $A$ is diagonal, so what we are really looking at is the sequence of characteristic functions

$K_N \colon \mathbb{R}^N \longrightarrow \mathbb{C}$

corresponding to our data set $b_{ij}$ given as

$K_N(a_1,\dots,a_N) = \int\limits_{\mathbb{U}_N} e^{i \sum_{x,y=1}^N a_xb_y|U_{xy}|^2}\mathrm{d}U,$

or equivalently

$K_N(a_1,\dots,a_N) = \mathbb{E}\left[e^{i(a_1X_N(1,1)+\dots+a_NX_N(N,N))}\right], \quad A \in \mathbb{R}^N,$

where the diagonal matrix elements $X_N(1,1),\dots,X_N(N,N)$ are exchangeable (but not independent) real random variables. Part I of the Olshanski-Vershik theorem is concerned with finding necessary and sufficient conditions on the triangular array $b_{ij}$ such that the characteristic function

$K_N(a)=K_n(a,0,\dots,0), \quad a \in \mathbb{R}^N,$

of a single diagonal $X_N(1,1)$ converges as $N\to \infty.$ Note that extracting the $(1,1)$ -entry on either side of $X_N=U_NB_NU_N^*$ we have

$X_N(1,1) = \sum\limits_{j=1}^N |U_{1j}|^2 b_{Nj},$

showing explicitly how $X_N(1,1)$ is built from the uniformly random unit vector $(U_{11},\dots,U_{1N})$ in $\mathbb{C}^N$ and the data $B_N=(b_{N1},\dots,b_{NN}).$

The characteristic function $K_N(a)$ of $X_N(1,1)$ has the power series expansion

$K_N(a) = 1 + \sum\limits_{d=1}^\infty \frac{(ia)^d}{d!}K_N^d,$

where $K_N^d=\mathbf{E}[X_N^d(1,1)]$ are the moments of this real random variable. Olshanski-Vershik being by finding sufficient conditions on the data $b_{ij}$ such that, for each fixed $d \in \mathbb{N},$ the sequence of numbers $(K_N^d)_{d=1}^\infty$ converges to a limit $K^d$ as $N \to \infty.$ In other words, they begin by finding conditions on the data $b_{ij}$ such that the random variable $X_N(1,1)$ converges in moments. We will do the same thing, but in a different way, using the following consequence of Theorem 10.1 in Lecture 10.

Theorem 12.1 (Leading Moments Formula). For each $1 \leq d \leq N,$ we have

$K_N^d = N^{-d}\sum\limits_{\alpha,\beta \vdash d} p_\beta(b_{N1},\dots,b_{NN})\sum\limits_{r=0}^\infty \frac{(-1)^r}{N^r}\vec{W}^r(\alpha,\beta),$

where the inner sum is absolutely convergent and $\vec{W}^r(\alpha,\beta)$ is the number of $r$ -step monotone walks on the symmetric group of degree $d$ which begin at a permutation of cycle type $\alpha$ and end at a permutation of cycle type $\beta.$

The most efficient way to use this result is to take a logarithm, i.e. to work with the cumulants of $X_N(1,1)$ . This is also the path followed by Olshanski-Vershik, but the method they use to do the calculation is different.

Let

$L_N(a) = \log K_N(a) = \sum\limits_{d=1}^\infty \frac{(ia)^d}{d!}L_N^d$

be the log-characteristic function of $X_N(1,1),$ i.e. the cumulant generating function. The cumulant sequence $L_N^1,L_N^2,L_N^3,\dots$ contains the same information as the moment sequence $K_N^1,K_N^2,K_N^3,\dots,$ via the moment-cumulant relations

$K_N^1 =L_N^1$

$K_N^2 = L_N^2 + L_N^1L_N^1$

$K_N^3 =L_N^3 + 3L_N^2L_N^1+L_N^1L_N^1L_N^1$

$\vdots$

The cumulants $L_N^d$ encode the “connected” version of the combinatorial problem encoded by the moments $K_N^d.$

Theorem 12.2 (Leading Cumulants Formula). For each $1 \leq d \leq N,$ we have

$L_N^d =N^{-d} \sum\limits_{\alpha,\beta \vdash d} p_\beta(b_{N1},\dots,b_{NN})\sum\limits_{r=0}^\infty \frac{(-1)^r}{N^r}\vec{H}^r(\alpha,\beta),$

where $\vec{H}^r(\alpha,\beta)$ is the number of walks counted by $\vec{W}^r(\alpha,\beta)$ which have the additional feature that their steps and endpoints generate a transitive subgroup of the symmetric group.

The Leading Cumulants Formula is identical to the Leading Moments formula except that the walk count $\vec{W}^r(\alpha,\beta)$ has been replaced with the connected walk count $\vec{H}^r(\alpha,\beta)$ . This seemingly minor change is in fact very advantageous.

Theorem 12.3 (Riemann-Hurwitz Formula). The number $\vec{H}^r(\alpha,\beta)$ is nonzero if and only if there is $g \in \mathbb{N}_0$ such that

$r=2g-2+\ell(\alpha)+\ell(\beta).$

Due to the Riemann-Hurwitz formula it is more convenient to parameterize the numbers $\vec{H}^r(\alpha,\beta)$ by the parameter $g,$ i.e. to set

$\vec{H}_g(\alpha,\beta):= \vec{H}^{2g-2+\ell(\alpha)+\ell(\beta)}.$

Theorem 12.4 (Optimized Leading Cumulants Formula). For each $1 \leq d \leq N,$ we have

$L_N^d =N^{2-d} \sum\limits_{\alpha,\beta \vdash d} \frac{p_\beta(b_{N1},\dots,b_{NN})}{N^{\ell(\alpha)+\ell(\beta)}}(-1)^{\ell(\alpha)+\ell(\beta)}\sum\limits_{g=0}^\infty N^{-2g}\vec{H}_g(\alpha,\beta).$

Now we are in position to prove some interesting limit theorems. We begin by proving one such theorem corresponding to an explicit choice of the dataset $b_{ij}$ (this corresponds to Remark 4.5 in Olshanski-Vershik).

Let $\gamma >0$ be a positive constant, and suppose our dataset $b_{ij}$ has $N$ th row given by

$B_N=(\sqrt{\gamma N},\quad,\sqrt{\gamma N},-\sqrt{\gamma N},\dots,-\sqrt{\gamma N}),$

where the number of positive and negative entries is as near to equal as possible – equal if $N$ even and differing by $1$ if $N$ odd, where we assume the data favors positive numbers in the latter case. In this situation we can establish the following $N \to \infty$ limit theorem for a single diagonal matrix element of $X_N=U_NB_NU_N^*.$

Theorem 12.5 (Gaussian Limit Theorem). Any diagonal matrix element of $X_N=U_NB_NU_N^*$ converges in distribution to a centered Gaussian random variable $Y$ of variance $\gamma.$

Proof: Let $\beta \vdash d$ be a partition of $d \in \mathbb{N}.$ Then, we have

$p_\beta(B_N) = (\gamma N)^{\frac{d}{2}}p_d(1,\dots,1,-1,\dots,-1),$

and therefore

$p_\beta(B_N) = O(N^{1+\frac{d}{2}}).$

Thus, from the Optimized Leading Cumulants Formula we have

$L_N^d=O(N^{1-\frac{d}{2}}),$

showing that $L_N^d$ converges to zero for all $d>2.$ This means that $X_N(1,1)$ converges in cumulants (equivalently, in moments) to a Gaussian random variable $Y,$ whose mean and variance remain to be determined.

The first cumulant of $X_N(1,1)$ is

$L_N^1 =N^{2-1} \frac{\sqrt{\gamma N}}{N^2}=N^{-\frac{1}{2}}\sqrt{\gamma},$

and this converges to $0$ as $N \to \infty.$ For the second cumulant,

$L_N^2 =\frac{p_2(B_N)}{N^2} + O(\frac{1}{N}),$

and

$\frac{p_2(B_N)}{N^2} = \frac{N(\gamma N)}{N^2}=\gamma.$

Thus, we have shown that $X_N(1,1)$ converges in moments/cumulants to $Y \sim \mathcal{N}(0,\gamma).$ In fact, this implies $X_N(1,1) \to Y$ in distribution as $N \to \infty,$ thanks to the Method of Moments.

-QED

Math 289A: Lecture 10

We start with a brief recap of what we are trying to achieve – since things are getting complicated it might be helpful to step back for a moment before stepping further in.

Let $\mathbb{H}_N$ be the Eucliden space of $N \times N$ Hermitian matrices, and let $\mathbb{U}_N$ be the compact group of $N \times N$ unitary matrices. We have a group action

$\Phi \colon \mathbb{U}_N \longrightarrow \mathrm{Aut}(\mathbb{H}_N),$

where $\mathrm{Aut}(\mathbb{H}_N)$ is the orthogonal group of the Euclidean space $\mathbb{H}_N$ , defined by

$\Phi(U)X = UXU^*, \quad X \in \mathbb{H}_N,\ U \in \mathbb{U}_N.$

This group action is called unitary conjugation. Let

$\mathcal{O}_X = \{UXU^* \colon U \in \mathbb{U}_N\}$

denote the orbit of a given Hermitian matrix $X \in \mathbb{H}_N$ under unitary conjugation.

Theorem 10.1 (Spectral Theorem). For every $X \in \mathbb{H}_N$ , the orbit $\mathcal{O}_X$ contains a diagonal matrix $B$ which is unique up to permutations of its diagonal elements.

Let

$\mathbb{W}_N = \{(b_1,\dots,b_N) \in \mathbb{R}^N \colon b_1 \geq \dots \geq b_N\}.$

We will refer to this orthant of $\mathbb{R}^N$ as the Weyl chamber. This term comes from Lie theory – we are not doing Lie theory, we just need a name for $\mathbb{W}_N$ . By the Spectral Theorem, the orbits of $\mathbb{H}_N$ under unitary conjugation are indexed by points of the Weyl chamber: we set

$\mathcal{O}(B) = \mathcal{O}_B, \quad B \in \mathbb{W}_N,$

where on the right hand side we are viewing $B$ as a diagonal matrix.

A standard problem in linear algebra is the following: we are given a matrix $X \in \mathbb{H}_N,$ meaning that we are explicitly given all entries of this matrix, and from this information we are asked to find $B \in \mathbb{W}_N$ such that $X \in \mathcal{O}(B).$ The Spectral Theorem guarantees that this problem has a unique solution. The coordinates of the solution $B \in \mathbb{R}^N$ are the eigenvalues of the given matrix $X.$

We are interested in solving a generic version of the inverse problem: given $B \in \mathbb{W}_N,$ determine the matrix elements of a uniformly random point $X \in \mathcal{O}(B).$ Since $B$ is given, we are being asked about the random Hermitian matrix $X_N=U_NBU_N^*,$ where $U_N$ is a uniformly random unitary matrix from $\mathbb{U}_N.$ This is something we know how to define precisely (Haar measure).

We want to solve this inverse problem by computing the characteristic function of $X_N,$ which is

$\varphi_{X_N}(T) = \mathbf{E}\left[e^{i\langle T,U_NBU_N^*\rangle}\right]=\int\limits_{\mathbb{U}_N}e^{i \mathrm{Tr}TUBU^*} \mathrm{d}U.$

We know that it is in fact sufficient to do this for $T=A$ being a diagonal Hermitian matrix, and we have introduced the notation

$K_N(A,B) = \int\limits_{\mathbb{U}_N} e^{i \mathrm{Tr} AUBU^*} \mathrm{d}U$

in order to emphasize the fact that we can view this characteristic function as defining a bounded symmetric kernel

$K_N \colon \mathbb{R}^N \times \mathbb{R}^N \longrightarrow \mathbb{C}.$

Furthermore, $K_N$ is actually the characteristic function of the diagonal $D_N=(X_N(1,1),\dots,X_N(N,N))$ of $X_N,$ so a solution to our problem is the same thing as determining the distribution of the diagonal of a uniformly random Hermitian matrix with prescribed eigenvalues.

We can view $K_N$ as a generating function for the joint moments of the diagonal matrix elements $X_N(1,1),\dots,X_N(N,N)$ , meaning that

$K_N(A,B) = 1 + \sum\limits_{d=1}^\infty \frac{i^d}{d!} \sum\limits_{\gamma \in \mathsf{C}_N^d} a_1^{\gamma_1} \dots a_N^{\gamma_N} \mathbf{E}\left[ X_N^{\gamma_1}(1,1) \dots X_N^{\gamma_N}(N,N)\right] K_N(\gamma),$

where the internal sum runs over the set $\mathsf{C}_N^d$ of weak $N$ -part compositions of the number $d,$ which are simply vectors $\gamma=(\gamma_1,\dots,\gamma_N)$ of nonnegative integers whose coordinates sum to $d.$ The coefficients $K_N(\gamma)$ corresponding to a given $\gamma \in \mathsf{C}_N^d$ is a very simple quantity, namely the multinomial coefficient

$K_N(\gamma) = \frac{d!}{\gamma_1! \dots \gamma_N!}.$

But, it is Haard (sorry, couldn’t resist) to see how to directly compute all mixed moments of the random variables $X_N(1,1),\dots,X_N(N,N).$ Indeed, we actually have a formula for each of these random variables, namely

$X_N(k,k) = \sum\limits_{j=1}^N |U_{jk}|^2 b_j,$

which is obtained from $X_N=U_NBU_N^*$ simply by matrix multiplication. This formula more or less shows that a direct computation of, say, the moment sequence of just $X_N(1,1)$ involves computing a family of Haar integrals over $\mathbb{U}_N$ which we don’t know how to contend with.

But $K_N$ is a moment generating function in another sense: we have

$K_N(A,B) = 1 + \sum\limits_{d=1}^\infty \frac{i^d}{d!} \sum\limits_{\alpha,\beta \in \mathsf{Y}_N^d} p_\alpha(a_1,\dots,a_N)p_\beta(b_1,\dots,b_N) K_N(\alpha,\beta),$

where the internal (double) sum is over the set $\mathsf{Y}_N^d$ of Young diagrams with first row of length at most $N$ ; such diagrams are pictorial representations of partitions of $d$ with largest part at most $N.$ We know that such an expansion exists by a “soft” symmetry argument combined with a classical theorem of Newton: we use that $K_N(A,B)$ is invariant under independent permutations of the coordinates of the vectors $A=(a_1,\dots,a_N)$ and $B=(b_1,\dots,b_N).$ The moments involved in this expansion are the moments of the (non-normalized) empirical eigenvalue distributions of these vectors, i.e. the discrete measures $\eta_A$ and $\eta_B$ on $\mathbb{R}$ which place unit mass at each coordinate. The moments of these measures are exactly sums of powers of the coordinates of $A$ and $B.$

This leaves the problem of computing the coefficients $K_N(\alpha,\beta)$ in this nonstandard moment generating function. This can be done using representation theory – in fact one needs to know quite a bit about representations of the unitary and symmetric groups, as well as about relations between them the relationship between them. We will mostly black box this to avoid going too deeply into algebra (though an indication of what needs to be done was given in the previous lecture, where we discussed Schur polynomials). Instead, we will focus on two facts: first, the coefficients $K_N(\alpha,\beta)$ have an intrinsically interesting combinatorial interpretation, which is something that often happens with moment generating functions of interesting random variables (we discussed the example of Gaussians in lecture); second, this combinatorial interpretation will allow us to prove probabilistically meaningful limit theorems. The simplest such theorem is the result of Olshanski and Vershik on the $N \to \infty$ limit joint distribution of $X_N(1,1),\dots,X_N(k,k)$ where $k \in \mathbb{N}$ is arbitrary but fixed (you should be actively reading the Olshanski-Vershik paper while we are doing this).

Theorem 10.1. For every integer $1 \leq d \leq N,$ and every pair of Young diagrams $\alpha,\beta \in \mathsf{Y}_N^d$ , we have

$K_N(\alpha,\beta) = \sum\limits_{r=0}^\infty \frac{(-1)^r}{N^r}\vec{W}^r(\alpha,\beta),$

where the series is absolutely convergent and $\vec{W}^r(\alpha,\beta)$ is the number of monotone $r$ -step walks on the Cayley graph of the symmetric group $\mathfrak{S}^d$ which begin at a permutation of cycle type $\alpha$ and end at a permutation of cycle type $\beta.$

Clearly there is much to be unpacked here, but ultimately it is not so complicated, and moreover only a small amount of the information contained in this statement is needed to prove limit theorems about random matrices. A very basic observation is that in the range $1 \leq d \leq N,$ the set $\mathsf{Y}_N^d$ is simply the set $\mathbb{Y}^d$ of all Young diagrams with exactly $d$ cells.

Now we have to explain what we mean by the Cayley graph of the symmetric group, and what we mean by a monotone walk on this graph. For this I am linking to a paper which hopefully provides a reasonably clear account of this combinatorial construction (all you need to read is Section 2.1 and 2.2, just a couple of pages). After you have read this you will know what the Cayley graph of the symmetric group is, what is meant by a strictly monotone walk on this graph. Furthermore, you will know the remarkable fact that there is a unique strictly monotone walk between any two permutations, and this walk is a geodesic.

As explained in lecture, it is often the case that the logarithm of the characteristic function is easier to work with than the characteristic function itself – it is a generating function for cumulants rather than moments. Let us consider the logarithm of the characteristic function of $X_N = U_NBU_N^*,$

$L_N(A,B) = \log K_N(A,B) = \log\int\limits_{\mathbb{U}_N} e^{i\mathrm{Tr}\, AUBU^*} \mathrm{d}U.$

Math 289A: Lecture 9

This week we are looking at a landmark paper by Grigori Olshanski and Anatoly Vershik, “Ergodic Unitarily Invariant Measures on the Space of Infinite Hermitian Matrices.” This paper is among the first to consider characteristic functions in the context of random matrix theory. Namely, it considers the problem of approximating the function

$K_N \colon \mathbb{R}^N \times \mathbb{R}^N \longrightarrow \mathbb{C}$

defined by

$K_N(A,B) = \int\limits_{\mathbb{U}_N} e^{i\mathrm{Tr}AUBU^*}\mathrm{d}U,$

where the integration is over the unitary group of rank $N$ against Haar measure. We have referred to this function as the “quantum Fourier kernel,” but as we have stressed it is also the classical characteristic function of the diagonal $(X_N(1,1),\dots,X_N(N,N))$ of $X_N=U_NBU_N^*$ of a uniformly random Hermitian matrix with deterministic eigenvalues $B \in \mathbb{R}^N.$ The paper of Olshanski-Vershik is about approximating $K_N(A,B)$ as $N \to \infty,$ which probabilistically means that we are trying to understand how the joint distribution of the diagonal matrix elements of a very large uniformly random Hermitian matrix with prescribed eigenvalues $B$ is determined by the vector $B.$ This is an example of a high-dimensional problem: we are trying to approximate a function: our goal is to approximate a function

$K_N(A,B) = K_N(a_1,\dots,a_N;b_1,\dots,b_N)=\int\limits_{\mathbb{U}_N} e^{i\sum_{x,y=1}^N a_xb_y|U_{xy}|^2}\mathrm{d}U$

of $2N$ real variables as the number of variables goes to infinity. Typically such problems are much harder than traditional problems of asymptotic analysis, which consider the approximation of some parameter-dependent family of functions defined on a fixed domain as the parameter approaches a given value.

One way to simplify the question is to consider the restriction of $K_N$ to the subspace $\mathbb{R}^r \times \mathbb{R}^N$ with $r \in \mathbb{N}$ fixed. That is, we consider the characteristic function

$K_N(a_1,\dots,a_r,0,\dots,0;b_1,\dots,b_N) = \mathbb{E}\left[e^{i(a_1X_N(1,1) + \dots +a_r X_N(r,r))}\right]$

of only the first $r$ diagonal matrix elements of the random Hermitian matrix $X_N=U_NBU_N^*.$ Recall that we have previously shown that all diagonal matrix elements of $X_N$ are identically distributed, but not necessarily independent. The paper of Olshanski and Vershik shows that if the eigenvalues $B=(b_1,\dots,b_N)$ , scaled by a factor of $N^{-1},$ converge to a point configuration on $\mathbb{R}$ as $N \to \infty,$ then the random variables $X_N(1,1),\dots,X_N(r,r)$ converge to independent random variables whose characteristic function can be calculated explicitly in terms of this point configuration. We will try to work through this result, which I believe to be the first high-dimensional limit theorem in RMT obtained via characteristic functions.

The Olshanski-Vershik approach of computing the asymptotics of the characteristic function

$K_N(A,B) = \int\limits_{\mathbb{U}_N}e^{i\mathrm{Tr}AUBU^*}\mathrm{d}U$

is based on series expansion of this matrix integral. I am a big proponent of this approach, but it does require some quite extensive background from representation theory and algebraic combinatorics which is not traditionally found in the realm of probability. We will try not to get too bogged down in the algebra, and in order to do this it will be necessary to accept some results from representation theory as black boxes.

First, let us analytically continue the characteristic function $K_N \colon \mathbb{R}^N \times \mathbb{R}^N \to \mathbb{C}$ to a function

$K_N \colon \mathbb{C} \times \mathbb{C}^N \times \mathbb{C}^N \longrightarrow \mathbb{C}$

by declaring

$K_N(z,A,B) = \int\limits_{\mathbb{U}_N} e^{z \mathrm{Tr} AUBU^*} \mathrm{d}U,$

where $z \in \mathbb{C}$ is a complex number and $A,B \in \mathbb{C}^N$ are complex vectors. Thus, by compactness of $\mathbb{U}_N$ , we have that $K_N$ is an entire function of $1+2N$ complex variables $z,a_1,\dots,a_N,b_1,\dots,b_N,$ and the Maclaurin series of this function, which is a multivariate Taylor series centered at the origin, converges uniformly absolutely on compact subsets of $\mathbb{C}^{1+2N}.$ We recover our original characteristic function $K_N$ by setting $z=i$ and restricting $A,B$ to $\mathbb{R}^N \subset \mathbb{C}^N$ .

The analytic function $K_N$ has various symmetries that allow us to present its Maclaurin series in a more advantageous form. More precisely, $K_N(z,A,B)=K_N(z,a_1,\dots,a_N,b_1,\dots,b_N)$ is invariant under independent permutations of $a_1,\dots,a_N$ and $b_1,\dots,b_N$ , and also stable under swapping these two sets of variables. This means that the Macluarin series of $K_N$ can be presented in the form

$K_N(z,A,B) = 1 + \sum_{d=1}^\infty \frac{z^d}{d!} K_N^d(A,B),$

where $K_N^d(A,B) = K_N^d(a_1,\dots,a_N,b_1,\dots,b_N)$ is a homogeneous degree $d$ symmetric polynomial in $a_1,\dots,a_N$ and also a homogeneous degree $d$ symmetric polynomial in $b_1,\dots,b_N$ . Therefore, by Newton’s theorem on symmetric polynomials, we can write this partial derivative as

$K_N^d(A,B) = \sum\limits_{\alpha,\beta \in \mathsf{Y}^d} p_\alpha(a_1,\dots,a_N)p_\beta(b_1,\dots,b_N)K_N(\alpha,\beta),$

where $\mathsf{Y}^d$ is the set of integer partitions of the degree $d,$ which we identify with the set of Young diagrams with $d$ cells, and $p_\alpha(a_1,\dots,a_N)$ and $p_\beta(b_1,\dots,b_N)$ are the Newton power-sum symmetric polynomials corresponding to $\alpha,\beta \in \mathsf{Y}^d.$ For example, if $d=3$ and $\alpha=(1,1,1),$ $\beta=(2,1),$ then

$p_\alpha(a_1,\dots,a_N)p_\beta(b_1,\dots,b_N)= (a_1+\dots+a_N)^3(b_1^2+\dots+b_N^2)(b_1+\dots+b_N).$

Our objective then is to calculate the numerical coefficients $K_N(\alpha,\beta)$ of the polynomial $K_N^d(A,B)$ which we will refer to as its Newton coefficients.

Problem 9.1. Show that $K_N^d(A,B)=K_N^d(B,A),$ and use this to show that $K_N(\alpha,\beta)=K_N(\beta,\alpha).$

What is good about the Newton polynomials is that they connect us to point configurations, which is what we want. More precisely, if $B=(b_1,\dots,b_N) \in \mathbb{R}^N$ is a real vector, then we can view it not as a point in $N$ -dimensional space but as a configuration of particles $b_1,\dots,b_N$ on the real line. The empirical distribution $\eta_B$ of $B$ is the discrete measure on $\mathbb{R}$ which places unit mass at each particle, and the degree $d$ moment of this measure is

$\int\limits_{\mathbb{R}} x^d \eta_B(\mathrm{d}x) = b_1^d + \dots +b_N^d=p_d(b_1,\dots,b_N).$

This means that writing the Maclaurin expansion of $K_N(z,A,B)$ in terms of Newton symmetric polynomials is exactly what we want to do, because it is expressing this function in terms of the empirical distributions of its arguments $A,B$ , and the empirical distribution of a point configuration is a natural way to describe the “shape” of the configuration in the infinite particle limit where $N \to \infty.$

Unfortunately, there is no obvious way to calculate the Newton coefficients $K_N(\alpha,beta).$ What we can do is compute the Maclaurin series of $K_N(z,A,B) = K_N(z,a_1,\dots,a_N,b_1,\dots,b_N)$ by repeatedly differentiating under the integral sign with respect to $z$ . Equivalently, we just expand the exponential function of $z$ in the inetegrand and then integrate term-by-term, which gives

$K_N(z,A,B) = 1 + \sum\limits_{d=1}^\infty \frac{z^d}{d!} \int\limits_{\mathbb{U}_N} \left(\mathrm{Tr} AUBU^* \right)^d \mathrm{d}U.$

This gives us the formula

$K_N^d(A,B) = \int\limits_{\mathbb{U}_N} \left(\mathrm{Tr} AUBU^* \right)^d \mathrm{d}U,$

which at least reduces our initial problem of computing an exponential integral over $\mathbb{U}_N$ to computing a polynomial integral over $\mathbb{U}_N,$

$K_N^d(A,B) = \int\limits_{\mathbb{U}_N} \left(\sum\limits_{x,y=1}^N a_xb_y U_{xy} \bar{U}_{xy} \right)^d \mathrm{d}U,$

but still there is no obvious path forward.

At this point we should ask ourselves what integrals over $\mathbb{U}_N$ we actually can compute. To make the discussion as simple as possible, take $N=1,$ so $\mathbb{U}_1$ is just the unit circle in $\mathbb{C}.$ One thing we can confidently state is the orthogonality relation

$\int\limits_{\mathbb{U}_1} U^k \bar{U}^l \mathrm{d}U = \int\limits_0^{2\pi} e^{i(k-l)\theta} \frac{\mathrm{d}\theta}{2\pi} = \delta_{kl}.$

Representation theory gives us analogous formulas for all $N,$ namely a system of multivariate polynomials which are orthogonal polynomials for integration with respect to Haar measure on $\mathbb{U}_N.$ Let $\mathsf{Y}_N$ be the set of all Young diagrams with at most $N$ rows.

Defintion 9.1. For each $\lambda \in \mathsf{Y}_N,$ the corresponding Schur symmetric polynomial is defined by

$s_\lambda(t_1,\dots,t_N) = \frac{\det [t_j^{N-i+\lambda_i}]}{\det[t_j^{N-i}]},$

where $\lambda_1 \geq \lambda_2 \geq \dots \geq \lambda_N \geq 0$ are the row lengths of $\lambda.$

Out of the many equivalent definitions of Schur polynomials, we are taking this one because it is a fairly explicit and elementary formula. Indeed, the determinant in the denominator is the Vandermonde determinant, which has the property that it divides any other antisymmetric polynomial in $t_1,\dots,t_N$ to yield a symmetric polynomial.

Problem 9.2. Let $\lambda=\Box$ be the single-cell Young diagram. Compute the Schur polynomial $s_\Box(t_1,\dots,t_N)$ directly from Definition 9.1. Do the same for a few more small Young diagrams.

The disadvantage of Definition 9.1 is that it makes the orthogonality relations for Schur polynomials, which are really consequences of the fact that these polynomials are characters of irreducible representations of $\mathbb{U}_N,$ into a theorem whose proof would take us to far afield. So we will have to treat this property as a black box. Given a matrix $A \in \mathbb{C}^{N \times N},$ let us write $s_\lambda(A)$ for the evaluation $s_\lambda(a_1,\dots,a_N)$ of the Schur polynomial $s_\lambda(t_1,\dots,t_N)$ on the eigenvalues of $A.$

Theorem 9.1 (Schur orthogonality). For any two Young diagrams $\lambda,\mu \in \mathsf{Y}_N$ and any two matrices $A,B \in \mathbb{C}^{N \times N},$ we have

$\int\limits_{\mathbb{U}_N} s_\lambda(AUBU^*) \mathrm{d}U = \frac{s_\lambda(A)s_\lambda(B)}{\dim \mathbf{W}_N^\lambda}$

and

$\int\limits_{\mathbb{U}_N} s_\lambda(AU)s_\mu(BU^*) \mathrm{d}U =\delta_{\lambda\mu} \frac{s_\lambda(AB)}{\dim \mathbf{W}_N^\lambda},$

where $\dim \mathbf{W}_N^\lambda$ is the number of semi standard Young tableaux of shape $\lambda$ with entries from $\{1,\dots,N\}.$

We also need a remarkable “multinomial formula” due to Frobenius. Let $\mathsf{Y}_N^d= \mathsf{Y}^d \cap \mathsf{Y}_N$ be the set of Young diagrams with exactly $d$ cells and at most $N$ rows.

Theorem 9.2 (Frobenius formula). We have

$(t_1 + \dots + t_N)^d = \sum\limits_{\lambda \in \mathsf{Y}_N^d} (\dim \mathbf{V}^\lambda) s_\lambda(t_1,\dots,t_N),$

where $\dim \mathbf{V}^\lambda$ is the number of standard Young tableaux of shape $\lambda.$

The reason the tableaux counts $\dim \mathsf{V}^\lambda$ and $\dim \mathsf{W}_N^\lambda$ are denoted this way is that the former is the dimension of an irreducible representation of the symmetric group $\mathbb{S}^d$ and the latter is the dimension of an irreducible representation of the unitary group $\mathbb{U}_N.$ At this point we have enough material to explicitly calculate a series expansion of $K_N(z,A,B)$ . This expansion is the starting point of Olshanski and Vershik’s asymptotic analysis, and is stated as Theorem 5.1 in their paper.

Theorem 9.3 (Schur expansion of $K_N$ ). We have

$K_N(z,A,B) = \int\limits_{\mathbb{U}_N} e^{z \mathrm{Tr} AUBU^*} \mathrm{d}U=1+ \sum\limits_{d=1}^\infty \frac{z^d}{d!} \sum\limits_{\lambda \in \mathsf{Y}_N^d} s_\lambda(A)s_\lambda(B) \frac{\dim \mathbf{V}^\lambda}{\dim \mathbf{W}_N^\lambda}.$

Proof: From the Theorem 9.2, we have

$K_N^d(A,B) = \int\limits_{\mathbb{U}_N} (\mathrm{Tr} AUBU^*)^d \mathrm{d}U=\sum\limits_{\lambda \in \mathsf{Y}_N^d} (\dim \mathbf{V}^\lambda) \int\limits_{\mathbb{U}_N} s_\lambda(AUBU^*) \mathrm{d}U,$

and applying the first integration formula in Theorem 9.1 this becomes

$K_N^d(A,B) = \sum\limits_{\lambda \in \mathsf{Y}_N^d} s_\lambda(A)s_\lambda(B) \frac{\dim \mathbf{V}^\lambda}{\dim \mathbf{W}_N^\lambda}.$

– QED

Problem 9.3. Compute the Schur expansion of $K_N$ up to $O(z^4).$ This is straightforward but will give you a feel for the structure of the formula.

Let us include the analogue of Theorem 9.3 for the quantum Bessel kernel

$K_{MN}(A,B) = \int\limits_{\mathbb{U}_{MN}} e^{i \mathrm{Tr}(A^*UBV^* + VB^*U^*A)} \mathrm{d}(U,V),$

where $M \geq N$ and the integration is against Haar measure on the product group $\mathbb{U}_{MN}= \mathbb{U}_M \times \mathbb{U}_N.$ Here $A,B \in \mathbb{R}^N$ are real vectors viewed as diagonal matrices in $\mathbb{C}^{M \times N}.$ Since $K_{MN}(A,B)$ is invariant under signed permutations of the coordinates of $A=(a_1,\dots,a_N)$ and $B=(b_1,\dots,b_N)$ , we can assume without loss in generality that these are non increasing lists of nonnegative numbers. The quantum Bessel kernel is the characteristic function of a uniformly random $M \times N$ rectangular matrix with singular values $B=(b_1,\dots,b_N).$ Once again we can analytically continue this to

$K_{MN}(z,A,B) = \int\limits_{\mathbb{U}_{MN}} e^{z \mathrm{Tr}(A^*UBV^* + VB^*U^*A)} \mathrm{d}(U,V),$

where $z \in \mathbb{C}$ and $A,B \in \mathbb{C}^N$ are viewed as complex diagonal $M \times N$ matrices in the integral. The following expansion of $K_{MN}$ is not in the Olshanski-Vershik paper, but it can be obtained in essentially the same way; a more general result is proved here.

Theorem 9.4 (Schur expansion of $K_{MN}$ ). We have

$K_{MN}(z,A,B) = \int\limits_{\mathbb{U}_N} e^{z \mathrm{Tr} AUBU^*} \mathrm{d}U=1+ \sum\limits_{d=1}^\infty \frac{z^{2d}}{d!d!} \sum\limits_{\lambda \in \mathsf{Y}_N^d} s_\lambda(A^*A)s_\lambda(B^*B) \frac{\dim \mathbf{V}^\lambda}{\dim \mathbf{W}_M^\lambda}\frac{\dim \mathbf{V}^\lambda}{\dim \mathbf{W}_N^\lambda}.$

We close this lecture with a pointer to interesting work in the engineering literature which considers exactly these integrals in an applied context.