Math 289A: Formalism of Schur-Horn Arrays

The right way to set up the classical limit theorems of probability theory is to start with a triangular array of real random variables

$\begin{matrix} X_{11} & {} & {} \\ X_{21} & X_{22} & {} \\ X_{31} & X_{32} & X_{33} \\ \vdots & \vdots & \vdots\end{matrix}.$

Your first exposure to the law of large numbers and central limit theorem likely involved a different setup, namely an infinite sequence

$X_1,X_2,X_3,\dots$

of iid random variables, whose partial sums

$S_N=X_1 + \dots + X_N$

are the main object of interest. There are some non-negligible advantages to the triangular array setup, for example the fact that the rows do not have to be defined on a common underlying probability space. The $N \to \infty$ behavior of the row sums

$S_N = X_{N1}+ \dots + X_{NN}$

is the topic of slightly more advanced and flexible versions of the central limit theorem. This result and other classical limit theorems apply to the situation when the random variables in each row of the triangular array are independent or nearly independent.

We are interested in triangular arrays of random variables as above which arise from an underlying deterministic data set

$\begin{matrix} B_{11} & {} & {} \\ B_{21} & B_{22} & {} \\ B_{31} & B_{32} & B_{33} \\ \vdots & \vdots & \vdots\end{matrix}.$

The triangular array of random variables associated to this data has rows defined by

$\begin{bmatrix} X_{N1} \\ \vdots \\ X_{NN} \end{bmatrix} = \begin{bmatrix} {} & \vdots & {}\\ \dots & |U_{ij}|^2 & \dots \\ {} & \vdots & {}\end{bmatrix}\begin{bmatrix} B_{N1} \\ \vdots \\ B_{NN} \end{bmatrix}$

where $U_N=[U_{ij}]$ is a uniformly random unitary matrix of rank $N.$ Thus, the random vector $(X_{N1},\dots,X_{NN})$ is a random linear transformation of the deterministic data $(B_{N1},\dots,B_{NN}).$ Because the entries of $U_N$ are highly correlated (rows are orthonormal, columns are orthonormal) the random variables $X_{N1},\dots,X_{NN}$ are far from independent. They are, however, exchangeable.

There are two ways to think about a Schur-Horn array. Geometrically, $(X_{N1},\dots,X_{NN})$ is a random point inside the permutahedron in $\mathbb{R}^N$ generated by $(B_{N1},\dots,B_{NN}).$ Algebraically, $(X_{N1},\dots,X_{NN})$ is the diagonal of the random matrix $U_NB_NU_N^*$ , where $B_N=\mathrm{diag}(B_{N1},\dots,B_{NN}).$

We are interested in proving limit theorems for Schur-Horn arrays. Such theorems will describe $N \to \infty$ behavior of the rows of the $X$ -array in terms of prescribed $N \to \infty$ behavior of the rows of the $B$ -array. In particular, we are interested in the situation where the empirical distribution of the vector

$\left(\frac{B_{N1}}{N^t}, \dots \frac{B_{NN}}{N^t}\right)$

converges in moments as $N \to \infty,$ where $t \geq 0$ is a scaling exponent. The empirical distribution is the probability measure on $\mathbb{R}$ which places mass $1/N$ at each coordinate of the vector $(B_{N1},\dots,B_{NN}).$ If we think of these data points as beads on a wire, then the empirical measure of $S \subseteq \mathbb{R}$ is the fraction of beads which lie in $S.$ Convergence in moments of the empirical measure means that the the limit

$\gamma_d = \lim\limits_{N \to \infty} \frac{p_d(\left(\frac{B_{N1}}{N^t}, \dots \frac{B_{NN}}{N^t}\right)}{N}$

exists for each fixed $d \in \mathbb{N},$ where $p_d(z_1,\dots,z_N)=z_1^d + \dots + z_N^d$ is the power sum polynomial of degree $d$ in $N$ variables.

Let $\langle \cdot \rangle$ denote expected value, and let

$K_N(a_1,\dots,a_N) = \left\langle e^{i(a_1X_{N1}+\dots+a_NX_{NN})}\right\rangle$

be the characteristic function of $(X_{N1},\dots,X_{NN}).$ Because the distribution of $(X_{N1},\dots,X_{NN})$ in $\mathbb{R}^N$ is compactly supported, $K_N$ is an analytic function on $\mathbb{R}^N.$ The Maclaurin expansion of the characteristic function is

$K_N(a_1,\dots,a_N) = 1+\sum\limits_{i=1}^\infty \frac{i^d}{d!} K_N^d(a_1,\dots,a_N),$

where

$K_N^d(a_1,\dots,a_N) = \sum\limits_\alpha a_1^{\alpha_1} \dots a_N^{\alpha_N} \langle X_{N1}^{\alpha_1}\dots X_{NN}^{\alpha_N}\rangle {d \choose \alpha_1,\dots,\alpha_N}$

and the sum is over all vectors $\alpha=(\alpha_1,\dots,\alpha_N)$ of nonnegative integers whose coordinates sum to $N.$ Thus, $K_N^d(a_1,\dots,a_N)$ is a homogeneous degree $d$ polynomial in $N$ variables whose coefficients are, up to a transparent factor, the degree $d$ joint moments of the random variables $X_{N1},\dots,X_{NN}$ . Since $K_N(0,\dots,0)=1,$ the logarithm of $L_N(a_1,\dots,a_N)=\log K_N(a_1,\dots,a_N)$ of the characteristic function is defined and analytic on an open neighborhood of the origin in $\mathbb{R}^N.$ We write the Maclaurin expansion of $L_N$ as

$L_N(a_1,\dots,a_N) = \sum\limits_{d=1}^\infty \frac{i^d}{d!} L_N^d(a_1,\dots,a_N),$

where $L_N^d(a_1,\dots,a_N)$ is a homogeneous degree $d$ polynomial in $N$ variables. The polynomials $L_N^1,L_N^2,L_N^3,\dots$ are called the cumulant polynomials of the random variables $X_{N1},\dots,X_{NN}.$ They are determined by the moment polynomials $K_N^1,K_N^2,K_N^3,\dots$ according to a certain recursion explained on pages 2 and 3 of these notes. We write the cumulant polynomials in the form

$K_N^d(a_1,\dots,a_N) = \sum\limits_\alpha a_1^{\alpha_1} \dots a_N^{\alpha_N} \langle X_{N1}^{\alpha_1}\dots X_{NN}^{\alpha_N}\rangle_c {d \choose \alpha_1,\dots,\alpha_N},$

where the quantities $\langle X_{N1}^{\alpha_1}\dots X_{NN}^{\alpha_N}\rangle_c$ are the degree $d$ joint cumulants of the random variables $X_{N1},\dots,X_{NN}$ as $\alpha$ ranges over nonnegative integer vectors of sum $d$ . Note that this is a definition, not a proposition. In statistical mechanics and quantum field theory, joint cumulants are referred to as connected correlation functions.

The main fact about Schur-Horn arrays that allows us to understand them is a second formula for the cumulant polynomials $K_N^d(a_1,\dots,a_N)$ of the random variables $X_{N1},\dots,X_{NN}$ in terms of the moments of the empirical distribution of the deterministic data $(B_{N1},\dots,B_{NN}).$

Theorem (Cumulant Formula for Schur-Horn Arrays). For each $1 \leq d \leq N,$ we have

$K_N^d(a_1,\dots,a_N) = N^{2-d}\sum\limits_\alpha \frac{p_\alpha(a_1,\dots,a_N)}{N^{\ell(\alpha)}}\sum\limits_\beta \frac{p_\beta(B_{N1},\dots,B_{NN})}{N^{\ell(\beta)}}L_N(\alpha,\beta),$

where the sums are over nonnegative integer vectors $\alpha=(\alpha_1,\dots,\alpha_N)$ and $\beta=(\beta_1,\dots,\beta_N)$ with non-increasing coordinates which sum to $d,$ and $\ell(\alpha)$ and $\ell(\beta)$ are the number of nonzero coordinates of $\alpha$ and $\beta$ respectively. For each such pair $\alpha,\beta$ the polynomials $p_\alpha(z_1,\dots,z_N)$ and $p_\beta(b_1,\dots,b_N)$ are the corresponding products of power sum symmetric polynomials, i.e.

$p_\alpha = \prod\limits_{i=1}^{\ell(\alpha)} p_{\alpha_i}\qquad\text{and}\qquad p_\alpha = \prod\limits_{i=1}^{\ell(\beta)} p_{\beta_i}.$

Finally,

$L_N(\alpha,\beta) = (-1)^{\ell(\alpha)+\ell(\beta)} \sum\limits_{g=0}^\infty N^{-2g} \vec{H}_g(\alpha,\beta)$

is a convergent series in which the coefficients $\vec{H}_g(\alpha,\beta)$ are positive integers called monotone Hurwitz numbers which admit an explicit combinatorial description.

The cumulant formula for Schur-Horn arrays looks complicated, but it is fairly easy to use in practice. For example, let us set our scaling exponent to $t=1/2.$ Then, we for any fixed $r \in \mathbb{N}$ we have

$K_N^d(a_1,\dots,a_r,0,\dots,0) = N^{1-\frac{d}{2}}\sum\limits_\alpha \frac{p_\alpha(a_1,\dots,a_N)}{N^{\ell(\alpha)-1}}\sum\limits_\beta \frac{p_\beta(\frac{B_{N1}}{\sqrt{N}},\dots,\frac{B_{NN}}{\sqrt{N}})}{N^{\ell(\beta)}}L_N(\alpha,\beta).$

As we saw (twice) in lecture, you can calculate the $N \to \infty$ limit of this polynomial. Assuming $\gamma_1=0$ and $\gamma_1=1,$ the only nonzero limit is

$K_N^d(a_1,\dots,a_N) \longrightarrow a_1^2 + \dots + a_r^2.$

This means that

$\lim\limits_{N \to \infty} \langle X_{N1}^{\alpha_1}\dots X_{Nr}^{\alpha_N}\rangle_c = 0$

unless $\alpha=(\alpha_1,\dots,\alpha_r)$ is a nonnegative integer vector with total sum $2$ and only one nonzero coordinate. From this we conclude that for any fixed $r \in \mathbb{N},$ the random variables $X_{N1},\dots,X_{Nr}$ converge in joint distribution to $r$ iid standard Gaussians.

When our scaling exponent is $t=1,$ we do not get such a simple Gaussian limit. Rather, we find that

$\lim\limits_{N \rightarrow \infty} \frac{1}{N}K_N^d = (a_1^d + \dots + a_r^d) \kappa_d$

for each fixed $r \in \mathbb{N},$ where

$\kappa_d = \sum\limits_{\beta \vdash d} \gamma_\beta (-1)^{d+\ell(\beta)} \vec{H}_0(d,\beta),$

where $\gamma_\beta = \gamma_{\beta_1} \gamma_{\beta_2} \gamma_{\beta_3} \dots.$ Thus we still have a sum of pure powers indicating asymptotic independence between the random variables $X_{N1},\dots,X_{Nr},$ but the cumulants of these random variables are asymptotically $N \kappa_d.$ We are going to see that