Jonathan Novak

Math 289A: Lecture 12

This week we will finish working through the Olshanski-Vershik limit theorem, and explain how the tools we have developed along the way have set us up to prove other kinds of limit theorems about random matrices.

Recall the setting of the OV paper: we have an infinite triangular array of real numbers,

$\begin{matrix} b_{11} & {} {} \\ b_{21} & b_{22} &{} \\ b_{31} & b_{32} & b_{33} \\ \vdots & \vdots & \vdots \end{matrix}.$

Assume the entries are sorted so that each row is non-increasing from left to right and let

$B_N=(b_{N1},\dots,b_{NN})$

be the $N$ th row. Thinking of $B_N \in \mathbb{W}_N$ as a diagonal matrix, define a corresponding random Hermitian matrix by

$X_N=U_NB_NU_N^*.$

We know that in the characteristic function of $X_N,$

$\varphi_{X_N}(A) = \int\limits_{\mathbb{H}_N} e^{i \mathrm{Tr}\, AUB_NU^*}\mathrm{d}U,\quad A \in \mathbb{H}_N,$

we can assume without loss in generality that $A$ is diagonal, so what we are really looking at is the sequence of characteristic functions

$K_N \colon \mathbb{R}^N \longrightarrow \mathbb{C}$

corresponding to our data set $b_{ij}$ given as

$K_N(a_1,\dots,a_N) = \int\limits_{\mathbb{U}_N} e^{i \sum_{x,y=1}^N a_xb_y|U_{xy}|^2}\mathrm{d}U,$

or equivalently

$K_N(a_1,\dots,a_N) = \mathbb{E}\left[e^{i(a_1X_N(1,1)+\dots+a_NX_N(N,N))}\right], \quad A \in \mathbb{R}^N,$

where the diagonal matrix elements $X_N(1,1),\dots,X_N(N,N)$ are exchangeable (but not independent) real random variables. Part I of the Olshanski-Vershik theorem is concerned with finding necessary and sufficient conditions on the triangular array $b_{ij}$ such that the characteristic function

$K_N(a)=K_n(a,0,\dots,0), \quad a \in \mathbb{R}^N,$

of a single diagonal $X_N(1,1)$ converges as $N\to \infty.$ Note that extracting the $(1,1)$ -entry on either side of $X_N=U_NB_NU_N^*$ we have

$X_N(1,1) = \sum\limits_{j=1}^N |U_{1j}|^2 b_{Nj},$

showing explicitly how $X_N(1,1)$ is built from the uniformly random unit vector $(U_{11},\dots,U_{1N})$ in $\mathbb{C}^N$ and the data $B_N=(b_{N1},\dots,b_{NN}).$

The characteristic function $K_N(a)$ of $X_N(1,1)$ has the power series expansion

$K_N(a) = 1 + \sum\limits_{d=1}^\infty \frac{(ia)^d}{d!}K_N^d,$

where $K_N^d=\mathbf{E}[X_N^d(1,1)]$ are the moments of this real random variable. Olshanski-Vershik being by finding sufficient conditions on the data $b_{ij}$ such that, for each fixed $d \in \mathbb{N},$ the sequence of numbers $(K_N^d)_{d=1}^\infty$ converges to a limit $K^d$ as $N \to \infty.$ In other words, they begin by finding conditions on the data $b_{ij}$ such that the random variable $X_N(1,1)$ converges in moments. We will do the same thing, but in a different way, using the following consequence of Theorem 10.1 in Lecture 10.

Theorem 12.1 (Leading Moments Formula). For each $1 \leq d \leq N,$ we have

$K_N^d = N^{-d}\sum\limits_{\alpha,\beta \vdash d} p_\beta(b_{N1},\dots,b_{NN})\sum\limits_{r=0}^\infty \frac{(-1)^r}{N^r}\vec{W}^r(\alpha,\beta),$

where the inner sum is absolutely convergent and $\vec{W}^r(\alpha,\beta)$ is the number of $r$ -step monotone walks on the symmetric group of degree $d$ which begin at a permutation of cycle type $\alpha$ and end at a permutation of cycle type $\beta.$

The most efficient way to use this result is to take a logarithm, i.e. to work with the cumulants of $X_N(1,1)$ . This is also the path followed by Olshanski-Vershik, but the method they use to do the calculation is different.

Let

$L_N(a) = \log K_N(a) = \sum\limits_{d=1}^\infty \frac{(ia)^d}{d!}L_N^d$

be the log-characteristic function of $X_N(1,1),$ i.e. the cumulant generating function. The cumulant sequence $L_N^1,L_N^2,L_N^3,\dots$ contains the same information as the moment sequence $K_N^1,K_N^2,K_N^3,\dots,$ via the moment-cumulant relations

$K_N^1 =L_N^1$

$K_N^2 = L_N^2 + L_N^1L_N^1$

$K_N^3 =L_N^3 + 3L_N^2L_N^1+L_N^1L_N^1L_N^1$

$\vdots$

The cumulants $L_N^d$ encode the “connected” version of the combinatorial problem encoded by the moments $K_N^d.$

Theorem 12.2 (Leading Cumulants Formula). For each $1 \leq d \leq N,$ we have

$L_N^d =N^{-d} \sum\limits_{\alpha,\beta \vdash d} p_\beta(b_{N1},\dots,b_{NN})\sum\limits_{r=0}^\infty \frac{(-1)^r}{N^r}\vec{H}^r(\alpha,\beta),$

where $\vec{H}^r(\alpha,\beta)$ is the number of walks counted by $\vec{W}^r(\alpha,\beta)$ which have the additional feature that their steps and endpoints generate a transitive subgroup of the symmetric group.

The Leading Cumulants Formula is identical to the Leading Moments formula except that the walk count $\vec{W}^r(\alpha,\beta)$ has been replaced with the connected walk count $\vec{H}^r(\alpha,\beta)$ . This seemingly minor change is in fact very advantageous.

Theorem 12.3 (Riemann-Hurwitz Formula). The number $\vec{H}^r(\alpha,\beta)$ is nonzero if and only if there is $g \in \mathbb{N}_0$ such that

$r=2g-2+\ell(\alpha)+\ell(\beta).$

Due to the Riemann-Hurwitz formula it is more convenient to parameterize the numbers $\vec{H}^r(\alpha,\beta)$ by the parameter $g,$ i.e. to set

$\vec{H}_g(\alpha,\beta):= \vec{H}^{2g-2+\ell(\alpha)+\ell(\beta)}.$

Theorem 12.4 (Optimized Leading Cumulants Formula). For each $1 \leq d \leq N,$ we have

$L_N^d =N^{2-d} \sum\limits_{\alpha,\beta \vdash d} \frac{p_\beta(b_{N1},\dots,b_{NN})}{N^{\ell(\alpha)+\ell(\beta)}}(-1)^{\ell(\alpha)+\ell(\beta)}\sum\limits_{g=0}^\infty N^{-2g}\vec{H}_g(\alpha,\beta).$

Now we are in position to prove some interesting limit theorems. We begin by proving one such theorem corresponding to an explicit choice of the dataset $b_{ij}$ (this corresponds to Remark 4.5 in Olshanski-Vershik).

Let $\gamma >0$ be a positive constant, and suppose our dataset $b_{ij}$ has $N$ th row given by

$B_N=(\sqrt{\gamma N},\quad,\sqrt{\gamma N},-\sqrt{\gamma N},\dots,-\sqrt{\gamma N}),$

where the number of positive and negative entries is as near to equal as possible – equal if $N$ even and differing by $1$ if $N$ odd, where we assume the data favors positive numbers in the latter case. In this situation we can establish the following $N \to \infty$ limit theorem for a single diagonal matrix element of $X_N=U_NB_NU_N^*.$

Theorem 12.5 (Gaussian Limit Theorem). Any diagonal matrix element of $X_N=U_NB_NU_N^*$ converges in distribution to a centered Gaussian random variable $Y$ of variance $\gamma.$

Proof: Let $\beta \vdash d$ be a partition of $d \in \mathbb{N}.$ Then, we have

$p_\beta(B_N) = (\gamma N)^{\frac{d}{2}}p_d(1,\dots,1,-1,\dots,-1),$

and therefore

$p_\beta(B_N) = O(N^{1+\frac{d}{2}}).$

Thus, from the Optimized Leading Cumulants Formula we have

$L_N^d=O(N^{1-\frac{d}{2}}),$

showing that $L_N^d$ converges to zero for all $d>2.$ This means that $X_N(1,1)$ converges in cumulants (equivalently, in moments) to a Gaussian random variable $Y,$ whose mean and variance remain to be determined.

The first cumulant of $X_N(1,1)$ is

$L_N^1 =N^{2-1} \frac{\sqrt{\gamma N}}{N^2}=N^{-\frac{1}{2}}\sqrt{\gamma},$

and this converges to $0$ as $N \to \infty.$ For the second cumulant,

$L_N^2 =\frac{p_2(B_N)}{N^2} + O(\frac{1}{N}),$

and

$\frac{p_2(B_N)}{N^2} = \frac{N(\gamma N)}{N^2}=\gamma.$

Thus, we have shown that $X_N(1,1)$ converges in moments/cumulants to $Y \sim \mathcal{N}(0,\gamma).$ In fact, this implies $X_N(1,1) \to Y$ in distribution as $N \to \infty,$ thanks to the Method of Moments.

-QED

Math 289A: Lecture 10

We start with a brief recap of what we are trying to achieve – since things are getting complicated it might be helpful to step back for a moment before stepping further in.

Let $\mathbb{H}_N$ be the Eucliden space of $N \times N$ Hermitian matrices, and let $\mathbb{U}_N$ be the compact group of $N \times N$ unitary matrices. We have a group action

$\Phi \colon \mathbb{U}_N \longrightarrow \mathrm{Aut}(\mathbb{H}_N),$

where $\mathrm{Aut}(\mathbb{H}_N)$ is the orthogonal group of the Euclidean space $\mathbb{H}_N$ , defined by

$\Phi(U)X = UXU^*, \quad X \in \mathbb{H}_N,\ U \in \mathbb{U}_N.$

This group action is called unitary conjugation. Let

$\mathcal{O}_X = \{UXU^* \colon U \in \mathbb{U}_N\}$

denote the orbit of a given Hermitian matrix $X \in \mathbb{H}_N$ under unitary conjugation.

Theorem 10.1 (Spectral Theorem). For every $X \in \mathbb{H}_N$ , the orbit $\mathcal{O}_X$ contains a diagonal matrix $B$ which is unique up to permutations of its diagonal elements.

Let

$\mathbb{W}_N = \{(b_1,\dots,b_N) \in \mathbb{R}^N \colon b_1 \geq \dots \geq b_N\}.$

We will refer to this orthant of $\mathbb{R}^N$ as the Weyl chamber. This term comes from Lie theory – we are not doing Lie theory, we just need a name for $\mathbb{W}_N$ . By the Spectral Theorem, the orbits of $\mathbb{H}_N$ under unitary conjugation are indexed by points of the Weyl chamber: we set

$\mathcal{O}(B) = \mathcal{O}_B, \quad B \in \mathbb{W}_N,$

where on the right hand side we are viewing $B$ as a diagonal matrix.

A standard problem in linear algebra is the following: we are given a matrix $X \in \mathbb{H}_N,$ meaning that we are explicitly given all entries of this matrix, and from this information we are asked to find $B \in \mathbb{W}_N$ such that $X \in \mathcal{O}(B).$ The Spectral Theorem guarantees that this problem has a unique solution. The coordinates of the solution $B \in \mathbb{R}^N$ are the eigenvalues of the given matrix $X.$

We are interested in solving a generic version of the inverse problem: given $B \in \mathbb{W}_N,$ determine the matrix elements of a uniformly random point $X \in \mathcal{O}(B).$ Since $B$ is given, we are being asked about the random Hermitian matrix $X_N=U_NBU_N^*,$ where $U_N$ is a uniformly random unitary matrix from $\mathbb{U}_N.$ This is something we know how to define precisely (Haar measure).

We want to solve this inverse problem by computing the characteristic function of $X_N,$ which is

$\varphi_{X_N}(T) = \mathbf{E}\left[e^{i\langle T,U_NBU_N^*\rangle}\right]=\int\limits_{\mathbb{U}_N}e^{i \mathrm{Tr}TUBU^*} \mathrm{d}U.$

We know that it is in fact sufficient to do this for $T=A$ being a diagonal Hermitian matrix, and we have introduced the notation

$K_N(A,B) = \int\limits_{\mathbb{U}_N} e^{i \mathrm{Tr} AUBU^*} \mathrm{d}U$

in order to emphasize the fact that we can view this characteristic function as defining a bounded symmetric kernel

$K_N \colon \mathbb{R}^N \times \mathbb{R}^N \longrightarrow \mathbb{C}.$

Furthermore, $K_N$ is actually the characteristic function of the diagonal $D_N=(X_N(1,1),\dots,X_N(N,N))$ of $X_N,$ so a solution to our problem is the same thing as determining the distribution of the diagonal of a uniformly random Hermitian matrix with prescribed eigenvalues.

We can view $K_N$ as a generating function for the joint moments of the diagonal matrix elements $X_N(1,1),\dots,X_N(N,N)$ , meaning that

$K_N(A,B) = 1 + \sum\limits_{d=1}^\infty \frac{i^d}{d!} \sum\limits_{\gamma \in \mathsf{C}_N^d} a_1^{\gamma_1} \dots a_N^{\gamma_N} \mathbf{E}\left[ X_N^{\gamma_1}(1,1) \dots X_N^{\gamma_N}(N,N)\right] K_N(\gamma),$

where the internal sum runs over the set $\mathsf{C}_N^d$ of weak $N$ -part compositions of the number $d,$ which are simply vectors $\gamma=(\gamma_1,\dots,\gamma_N)$ of nonnegative integers whose coordinates sum to $d.$ The coefficients $K_N(\gamma)$ corresponding to a given $\gamma \in \mathsf{C}_N^d$ is a very simple quantity, namely the multinomial coefficient

$K_N(\gamma) = \frac{d!}{\gamma_1! \dots \gamma_N!}.$

But, it is Haard (sorry, couldn’t resist) to see how to directly compute all mixed moments of the random variables $X_N(1,1),\dots,X_N(N,N).$ Indeed, we actually have a formula for each of these random variables, namely

$X_N(k,k) = \sum\limits_{j=1}^N |U_{jk}|^2 b_j,$

which is obtained from $X_N=U_NBU_N^*$ simply by matrix multiplication. This formula more or less shows that a direct computation of, say, the moment sequence of just $X_N(1,1)$ involves computing a family of Haar integrals over $\mathbb{U}_N$ which we don’t know how to contend with.

But $K_N$ is a moment generating function in another sense: we have

$K_N(A,B) = 1 + \sum\limits_{d=1}^\infty \frac{i^d}{d!} \sum\limits_{\alpha,\beta \in \mathsf{Y}_N^d} p_\alpha(a_1,\dots,a_N)p_\beta(b_1,\dots,b_N) K_N(\alpha,\beta),$

where the internal (double) sum is over the set $\mathsf{Y}_N^d$ of Young diagrams with first row of length at most $N$ ; such diagrams are pictorial representations of partitions of $d$ with largest part at most $N.$ We know that such an expansion exists by a “soft” symmetry argument combined with a classical theorem of Newton: we use that $K_N(A,B)$ is invariant under independent permutations of the coordinates of the vectors $A=(a_1,\dots,a_N)$ and $B=(b_1,\dots,b_N).$ The moments involved in this expansion are the moments of the (non-normalized) empirical eigenvalue distributions of these vectors, i.e. the discrete measures $\eta_A$ and $\eta_B$ on $\mathbb{R}$ which place unit mass at each coordinate. The moments of these measures are exactly sums of powers of the coordinates of $A$ and $B.$

This leaves the problem of computing the coefficients $K_N(\alpha,\beta)$ in this nonstandard moment generating function. This can be done using representation theory – in fact one needs to know quite a bit about representations of the unitary and symmetric groups, as well as about relations between them the relationship between them. We will mostly black box this to avoid going too deeply into algebra (though an indication of what needs to be done was given in the previous lecture, where we discussed Schur polynomials). Instead, we will focus on two facts: first, the coefficients $K_N(\alpha,\beta)$ have an intrinsically interesting combinatorial interpretation, which is something that often happens with moment generating functions of interesting random variables (we discussed the example of Gaussians in lecture); second, this combinatorial interpretation will allow us to prove probabilistically meaningful limit theorems. The simplest such theorem is the result of Olshanski and Vershik on the $N \to \infty$ limit joint distribution of $X_N(1,1),\dots,X_N(k,k)$ where $k \in \mathbb{N}$ is arbitrary but fixed (you should be actively reading the Olshanski-Vershik paper while we are doing this).

Theorem 10.1. For every integer $1 \leq d \leq N,$ and every pair of Young diagrams $\alpha,\beta \in \mathsf{Y}_N^d$ , we have

$K_N(\alpha,\beta) = \sum\limits_{r=0}^\infty \frac{(-1)^r}{N^r}\vec{W}^r(\alpha,\beta),$

where the series is absolutely convergent and $\vec{W}^r(\alpha,\beta)$ is the number of monotone $r$ -step walks on the Cayley graph of the symmetric group $\mathfrak{S}^d$ which begin at a permutation of cycle type $\alpha$ and end at a permutation of cycle type $\beta.$

Clearly there is much to be unpacked here, but ultimately it is not so complicated, and moreover only a small amount of the information contained in this statement is needed to prove limit theorems about random matrices. A very basic observation is that in the range $1 \leq d \leq N,$ the set $\mathsf{Y}_N^d$ is simply the set $\mathbb{Y}^d$ of all Young diagrams with exactly $d$ cells.

Now we have to explain what we mean by the Cayley graph of the symmetric group, and what we mean by a monotone walk on this graph. For this I am linking to a paper which hopefully provides a reasonably clear account of this combinatorial construction (all you need to read is Section 2.1 and 2.2, just a couple of pages). After you have read this you will know what the Cayley graph of the symmetric group is, what is meant by a strictly monotone walk on this graph. Furthermore, you will know the remarkable fact that there is a unique strictly monotone walk between any two permutations, and this walk is a geodesic.

As explained in lecture, it is often the case that the logarithm of the characteristic function is easier to work with than the characteristic function itself – it is a generating function for cumulants rather than moments. Let us consider the logarithm of the characteristic function of $X_N = U_NBU_N^*,$

$L_N(A,B) = \log K_N(A,B) = \log\int\limits_{\mathbb{U}_N} e^{i\mathrm{Tr}\, AUBU^*} \mathrm{d}U.$

Math 289A: Lecture 9

This week we are looking at a landmark paper by Grigori Olshanski and Anatoly Vershik, “Ergodic Unitarily Invariant Measures on the Space of Infinite Hermitian Matrices.” This paper is among the first to consider characteristic functions in the context of random matrix theory. Namely, it considers the problem of approximating the function

$K_N \colon \mathbb{R}^N \times \mathbb{R}^N \longrightarrow \mathbb{C}$

defined by

$K_N(A,B) = \int\limits_{\mathbb{U}_N} e^{i\mathrm{Tr}AUBU^*}\mathrm{d}U,$

where the integration is over the unitary group of rank $N$ against Haar measure. We have referred to this function as the “quantum Fourier kernel,” but as we have stressed it is also the classical characteristic function of the diagonal $(X_N(1,1),\dots,X_N(N,N))$ of $X_N=U_NBU_N^*$ of a uniformly random Hermitian matrix with deterministic eigenvalues $B \in \mathbb{R}^N.$ The paper of Olshanski-Vershik is about approximating $K_N(A,B)$ as $N \to \infty,$ which probabilistically means that we are trying to understand how the joint distribution of the diagonal matrix elements of a very large uniformly random Hermitian matrix with prescribed eigenvalues $B$ is determined by the vector $B.$ This is an example of a high-dimensional problem: we are trying to approximate a function: our goal is to approximate a function

$K_N(A,B) = K_N(a_1,\dots,a_N;b_1,\dots,b_N)=\int\limits_{\mathbb{U}_N} e^{i\sum_{x,y=1}^N a_xb_y|U_{xy}|^2}\mathrm{d}U$

of $2N$ real variables as the number of variables goes to infinity. Typically such problems are much harder than traditional problems of asymptotic analysis, which consider the approximation of some parameter-dependent family of functions defined on a fixed domain as the parameter approaches a given value.

One way to simplify the question is to consider the restriction of $K_N$ to the subspace $\mathbb{R}^r \times \mathbb{R}^N$ with $r \in \mathbb{N}$ fixed. That is, we consider the characteristic function

$K_N(a_1,\dots,a_r,0,\dots,0;b_1,\dots,b_N) = \mathbb{E}\left[e^{i(a_1X_N(1,1) + \dots +a_r X_N(r,r))}\right]$

of only the first $r$ diagonal matrix elements of the random Hermitian matrix $X_N=U_NBU_N^*.$ Recall that we have previously shown that all diagonal matrix elements of $X_N$ are identically distributed, but not necessarily independent. The paper of Olshanski and Vershik shows that if the eigenvalues $B=(b_1,\dots,b_N)$ , scaled by a factor of $N^{-1},$ converge to a point configuration on $\mathbb{R}$ as $N \to \infty,$ then the random variables $X_N(1,1),\dots,X_N(r,r)$ converge to independent random variables whose characteristic function can be calculated explicitly in terms of this point configuration. We will try to work through this result, which I believe to be the first high-dimensional limit theorem in RMT obtained via characteristic functions.

The Olshanski-Vershik approach of computing the asymptotics of the characteristic function

$K_N(A,B) = \int\limits_{\mathbb{U}_N}e^{i\mathrm{Tr}AUBU^*}\mathrm{d}U$

is based on series expansion of this matrix integral. I am a big proponent of this approach, but it does require some quite extensive background from representation theory and algebraic combinatorics which is not traditionally found in the realm of probability. We will try not to get too bogged down in the algebra, and in order to do this it will be necessary to accept some results from representation theory as black boxes.

First, let us analytically continue the characteristic function $K_N \colon \mathbb{R}^N \times \mathbb{R}^N \to \mathbb{C}$ to a function

$K_N \colon \mathbb{C} \times \mathbb{C}^N \times \mathbb{C}^N \longrightarrow \mathbb{C}$

by declaring

$K_N(z,A,B) = \int\limits_{\mathbb{U}_N} e^{z \mathrm{Tr} AUBU^*} \mathrm{d}U,$

where $z \in \mathbb{C}$ is a complex number and $A,B \in \mathbb{C}^N$ are complex vectors. Thus, by compactness of $\mathbb{U}_N$ , we have that $K_N$ is an entire function of $1+2N$ complex variables $z,a_1,\dots,a_N,b_1,\dots,b_N,$ and the Maclaurin series of this function, which is a multivariate Taylor series centered at the origin, converges uniformly absolutely on compact subsets of $\mathbb{C}^{1+2N}.$ We recover our original characteristic function $K_N$ by setting $z=i$ and restricting $A,B$ to $\mathbb{R}^N \subset \mathbb{C}^N$ .

The analytic function $K_N$ has various symmetries that allow us to present its Maclaurin series in a more advantageous form. More precisely, $K_N(z,A,B)=K_N(z,a_1,\dots,a_N,b_1,\dots,b_N)$ is invariant under independent permutations of $a_1,\dots,a_N$ and $b_1,\dots,b_N$ , and also stable under swapping these two sets of variables. This means that the Macluarin series of $K_N$ can be presented in the form

$K_N(z,A,B) = 1 + \sum_{d=1}^\infty \frac{z^d}{d!} K_N^d(A,B),$

where $K_N^d(A,B) = K_N^d(a_1,\dots,a_N,b_1,\dots,b_N)$ is a homogeneous degree $d$ symmetric polynomial in $a_1,\dots,a_N$ and also a homogeneous degree $d$ symmetric polynomial in $b_1,\dots,b_N$ . Therefore, by Newton’s theorem on symmetric polynomials, we can write this partial derivative as

$K_N^d(A,B) = \sum\limits_{\alpha,\beta \in \mathsf{Y}^d} p_\alpha(a_1,\dots,a_N)p_\beta(b_1,\dots,b_N)K_N(\alpha,\beta),$

where $\mathsf{Y}^d$ is the set of integer partitions of the degree $d,$ which we identify with the set of Young diagrams with $d$ cells, and $p_\alpha(a_1,\dots,a_N)$ and $p_\beta(b_1,\dots,b_N)$ are the Newton power-sum symmetric polynomials corresponding to $\alpha,\beta \in \mathsf{Y}^d.$ For example, if $d=3$ and $\alpha=(1,1,1),$ $\beta=(2,1),$ then

$p_\alpha(a_1,\dots,a_N)p_\beta(b_1,\dots,b_N)= (a_1+\dots+a_N)^3(b_1^2+\dots+b_N^2)(b_1+\dots+b_N).$

Our objective then is to calculate the numerical coefficients $K_N(\alpha,\beta)$ of the polynomial $K_N^d(A,B)$ which we will refer to as its Newton coefficients.

Problem 9.1. Show that $K_N^d(A,B)=K_N^d(B,A),$ and use this to show that $K_N(\alpha,\beta)=K_N(\beta,\alpha).$

What is good about the Newton polynomials is that they connect us to point configurations, which is what we want. More precisely, if $B=(b_1,\dots,b_N) \in \mathbb{R}^N$ is a real vector, then we can view it not as a point in $N$ -dimensional space but as a configuration of particles $b_1,\dots,b_N$ on the real line. The empirical distribution $\eta_B$ of $B$ is the discrete measure on $\mathbb{R}$ which places unit mass at each particle, and the degree $d$ moment of this measure is

$\int\limits_{\mathbb{R}} x^d \eta_B(\mathrm{d}x) = b_1^d + \dots +b_N^d=p_d(b_1,\dots,b_N).$

This means that writing the Maclaurin expansion of $K_N(z,A,B)$ in terms of Newton symmetric polynomials is exactly what we want to do, because it is expressing this function in terms of the empirical distributions of its arguments $A,B$ , and the empirical distribution of a point configuration is a natural way to describe the “shape” of the configuration in the infinite particle limit where $N \to \infty.$

Unfortunately, there is no obvious way to calculate the Newton coefficients $K_N(\alpha,beta).$ What we can do is compute the Maclaurin series of $K_N(z,A,B) = K_N(z,a_1,\dots,a_N,b_1,\dots,b_N)$ by repeatedly differentiating under the integral sign with respect to $z$ . Equivalently, we just expand the exponential function of $z$ in the inetegrand and then integrate term-by-term, which gives

$K_N(z,A,B) = 1 + \sum\limits_{d=1}^\infty \frac{z^d}{d!} \int\limits_{\mathbb{U}_N} \left(\mathrm{Tr} AUBU^* \right)^d \mathrm{d}U.$

This gives us the formula

$K_N^d(A,B) = \int\limits_{\mathbb{U}_N} \left(\mathrm{Tr} AUBU^* \right)^d \mathrm{d}U,$

which at least reduces our initial problem of computing an exponential integral over $\mathbb{U}_N$ to computing a polynomial integral over $\mathbb{U}_N,$

$K_N^d(A,B) = \int\limits_{\mathbb{U}_N} \left(\sum\limits_{x,y=1}^N a_xb_y U_{xy} \bar{U}_{xy} \right)^d \mathrm{d}U,$

but still there is no obvious path forward.

At this point we should ask ourselves what integrals over $\mathbb{U}_N$ we actually can compute. To make the discussion as simple as possible, take $N=1,$ so $\mathbb{U}_1$ is just the unit circle in $\mathbb{C}.$ One thing we can confidently state is the orthogonality relation

$\int\limits_{\mathbb{U}_1} U^k \bar{U}^l \mathrm{d}U = \int\limits_0^{2\pi} e^{i(k-l)\theta} \frac{\mathrm{d}\theta}{2\pi} = \delta_{kl}.$

Representation theory gives us analogous formulas for all $N,$ namely a system of multivariate polynomials which are orthogonal polynomials for integration with respect to Haar measure on $\mathbb{U}_N.$ Let $\mathsf{Y}_N$ be the set of all Young diagrams with at most $N$ rows.

Defintion 9.1. For each $\lambda \in \mathsf{Y}_N,$ the corresponding Schur symmetric polynomial is defined by

$s_\lambda(t_1,\dots,t_N) = \frac{\det [t_j^{N-i+\lambda_i}]}{\det[t_j^{N-i}]},$

where $\lambda_1 \geq \lambda_2 \geq \dots \geq \lambda_N \geq 0$ are the row lengths of $\lambda.$

Out of the many equivalent definitions of Schur polynomials, we are taking this one because it is a fairly explicit and elementary formula. Indeed, the determinant in the denominator is the Vandermonde determinant, which has the property that it divides any other antisymmetric polynomial in $t_1,\dots,t_N$ to yield a symmetric polynomial.

Problem 9.2. Let $\lambda=\Box$ be the single-cell Young diagram. Compute the Schur polynomial $s_\Box(t_1,\dots,t_N)$ directly from Definition 9.1. Do the same for a few more small Young diagrams.

The disadvantage of Definition 9.1 is that it makes the orthogonality relations for Schur polynomials, which are really consequences of the fact that these polynomials are characters of irreducible representations of $\mathbb{U}_N,$ into a theorem whose proof would take us to far afield. So we will have to treat this property as a black box. Given a matrix $A \in \mathbb{C}^{N \times N},$ let us write $s_\lambda(A)$ for the evaluation $s_\lambda(a_1,\dots,a_N)$ of the Schur polynomial $s_\lambda(t_1,\dots,t_N)$ on the eigenvalues of $A.$

Theorem 9.1 (Schur orthogonality). For any two Young diagrams $\lambda,\mu \in \mathsf{Y}_N$ and any two matrices $A,B \in \mathbb{C}^{N \times N},$ we have

$\int\limits_{\mathbb{U}_N} s_\lambda(AUBU^*) \mathrm{d}U = \frac{s_\lambda(A)s_\lambda(B)}{\dim \mathbf{W}_N^\lambda}$

and

$\int\limits_{\mathbb{U}_N} s_\lambda(AU)s_\mu(BU^*) \mathrm{d}U =\delta_{\lambda\mu} \frac{s_\lambda(AB)}{\dim \mathbf{W}_N^\lambda},$

where $\dim \mathbf{W}_N^\lambda$ is the number of semi standard Young tableaux of shape $\lambda$ with entries from $\{1,\dots,N\}.$

We also need a remarkable “multinomial formula” due to Frobenius. Let $\mathsf{Y}_N^d= \mathsf{Y}^d \cap \mathsf{Y}_N$ be the set of Young diagrams with exactly $d$ cells and at most $N$ rows.

Theorem 9.2 (Frobenius formula). We have

$(t_1 + \dots + t_N)^d = \sum\limits_{\lambda \in \mathsf{Y}_N^d} (\dim \mathbf{V}^\lambda) s_\lambda(t_1,\dots,t_N),$

where $\dim \mathbf{V}^\lambda$ is the number of standard Young tableaux of shape $\lambda.$

The reason the tableaux counts $\dim \mathsf{V}^\lambda$ and $\dim \mathsf{W}_N^\lambda$ are denoted this way is that the former is the dimension of an irreducible representation of the symmetric group $\mathbb{S}^d$ and the latter is the dimension of an irreducible representation of the unitary group $\mathbb{U}_N.$ At this point we have enough material to explicitly calculate a series expansion of $K_N(z,A,B)$ . This expansion is the starting point of Olshanski and Vershik’s asymptotic analysis, and is stated as Theorem 5.1 in their paper.

Theorem 9.3 (Schur expansion of $K_N$ ). We have

$K_N(z,A,B) = \int\limits_{\mathbb{U}_N} e^{z \mathrm{Tr} AUBU^*} \mathrm{d}U=1+ \sum\limits_{d=1}^\infty \frac{z^d}{d!} \sum\limits_{\lambda \in \mathsf{Y}_N^d} s_\lambda(A)s_\lambda(B) \frac{\dim \mathbf{V}^\lambda}{\dim \mathbf{W}_N^\lambda}.$

Proof: From the Theorem 9.2, we have

$K_N^d(A,B) = \int\limits_{\mathbb{U}_N} (\mathrm{Tr} AUBU^*)^d \mathrm{d}U=\sum\limits_{\lambda \in \mathsf{Y}_N^d} (\dim \mathbf{V}^\lambda) \int\limits_{\mathbb{U}_N} s_\lambda(AUBU^*) \mathrm{d}U,$

and applying the first integration formula in Theorem 9.1 this becomes

$K_N^d(A,B) = \sum\limits_{\lambda \in \mathsf{Y}_N^d} s_\lambda(A)s_\lambda(B) \frac{\dim \mathbf{V}^\lambda}{\dim \mathbf{W}_N^\lambda}.$

– QED

Problem 9.3. Compute the Schur expansion of $K_N$ up to $O(z^4).$ This is straightforward but will give you a feel for the structure of the formula.

Let us include the analogue of Theorem 9.3 for the quantum Bessel kernel

$K_{MN}(A,B) = \int\limits_{\mathbb{U}_{MN}} e^{i \mathrm{Tr}(A^*UBV^* + VB^*U^*A)} \mathrm{d}(U,V),$

where $M \geq N$ and the integration is against Haar measure on the product group $\mathbb{U}_{MN}= \mathbb{U}_M \times \mathbb{U}_N.$ Here $A,B \in \mathbb{R}^N$ are real vectors viewed as diagonal matrices in $\mathbb{C}^{M \times N}.$ Since $K_{MN}(A,B)$ is invariant under signed permutations of the coordinates of $A=(a_1,\dots,a_N)$ and $B=(b_1,\dots,b_N)$ , we can assume without loss in generality that these are non increasing lists of nonnegative numbers. The quantum Bessel kernel is the characteristic function of a uniformly random $M \times N$ rectangular matrix with singular values $B=(b_1,\dots,b_N).$ Once again we can analytically continue this to

$K_{MN}(z,A,B) = \int\limits_{\mathbb{U}_{MN}} e^{z \mathrm{Tr}(A^*UBV^* + VB^*U^*A)} \mathrm{d}(U,V),$

where $z \in \mathbb{C}$ and $A,B \in \mathbb{C}^N$ are viewed as complex diagonal $M \times N$ matrices in the integral. The following expansion of $K_{MN}$ is not in the Olshanski-Vershik paper, but it can be obtained in essentially the same way; a more general result is proved here.

Theorem 9.4 (Schur expansion of $K_{MN}$ ). We have

$K_{MN}(z,A,B) = \int\limits_{\mathbb{U}_N} e^{z \mathrm{Tr} AUBU^*} \mathrm{d}U=1+ \sum\limits_{d=1}^\infty \frac{z^{2d}}{d!d!} \sum\limits_{\lambda \in \mathsf{Y}_N^d} s_\lambda(A^*A)s_\lambda(B^*B) \frac{\dim \mathbf{V}^\lambda}{\dim \mathbf{W}_M^\lambda}\frac{\dim \mathbf{V}^\lambda}{\dim \mathbf{W}_N^\lambda}.$

We close this lecture with a pointer to interesting work in the engineering literature which considers exactly these integrals in an applied context.

Math 289A: Lecture 8

Let $X_{MN}$ be a random rectangular matrix, i.e. a random variable taking values in $\mathbb{C}^{M\times N}$ viewed as a real vector space with scalar product

$\langle T,X \rangle = \Re\mathrm{Tr}\, T^*X = \frac{1}{2}\mathrm{Tr}(T^*X+X^*T).$

The characteristic function of $X_{MN}$ is the function on $2MN$ -dimensional Euclidean space $\mathbb{C}^{M \times N}$ defined by the expectation

$\varphi_{X_{MN}}(T) = \mathbb{E}\left[e^{i \langle T,X_{MN}\rangle}\right],$

which is the Fourier transform

$\varphi_{X_{MN}}(T) = \int\limits_{\mathbb{C}^{M\times N}}e^{ \frac{i}{2}\mathrm{Tr}(T^*X+X^*T)}\mu_{MN}(\mathrm{d}X)$

of the distribution of $\mu_{MN}$ of $X_{MN}$ in $\mathbb{C}^{M \times N} \simeq \mathbb{R}^{2MN}.$

The product group $\mathbb{U}_{MN} = \mathbb{U}_M \times \mathbb{U}_N$ acts on $\mathbb{C}^{M \times N}$ by bi-conjugation: we have a group homomorphism $\Phi$ from $\mathbb{U}_{MN}$ into the orthogonal group of the Euclidean space $\mathbb{C}^{M \times N}$ defined by

$\Phi(U,V)X = UXV^*, \quad (U,V) \in \mathbb{U}_{MN},\ X \in \mathbb{C}^{M \times N}.$

Problem 8.1. Check that $\Phi$ really is a group homomorphism, and that it really does preserve the Euclidean scalar product on $\mathbb{C}^{M \times N}.$

Let us assume $M \geq N,$ so that the product $T^*X$ is an $N \times N$ square matrix which fits inside an $M \times N$ rectangle, rather than containing it. The action of $\mathbb{U}_{MN}$ on $\mathbb{C}^{M \times N}$ is the subject of the rectangular analogue of the spectral theorem. The rectangular spectral theorem is called the singular value decomposition, and it says that that the biconjugation orbit of any given rectangular matrix $X \in \mathbb{C}^{M \times N}$ contains an $M \times N$ real diagonal matrix $B$ which is unique up to signed permutations of its diagonal elements. In particular, there is one and only one diagonal matrix $B$ in the biconjugation orbit of $X$ whose entries satisfy

$b_1 \geq \dots \geq b_N \geq 0,$

and these numbers are referred to as the singular values of $X.$

Definition 8.1. We say that a random rectangular matrix $X_{MN}$ is unitarily bi-invariant if its characteristic function is invariant under unitary bi-conjugation,

$\varphi_{X_{MN}}(T) = \varphi_{X_{MN}}(UTV^*), \quad T \in \mathbb{C}^{M \times N},\ (U,V) \in \mathbb{U}_{MN}.$

Problem 8.2. We proved that a unitarily invariant random Hermitian matrix is uniquely determined by the joint law of its diagonal matrix elements. State and prove the analogue of this theorem for unitarily bi-invariant random rectangular matrices.

Theorem 8.1. A random rectangular matrix $X_{MN}$ is unitarily bi-invariant if and only if its distribution is stable under unitary bi-conjugation.

Proof: By cyclic invariance of the trace, we have

$\varphi_{UX_{MN}V^*}(T)= \varphi_{X_{MN}}(U^*TV).$

-QED

Corollary 8.1. A random rectangular matrix $X_{MN}$ is unitarily bi-invariant if and only if it has the same law as a product of the form $U_MB_{MN}V_N^*$ , where $(U_M,V_N)$ is a uniformly random sample from $\mathbb{U}_{MN},$ and $B_{MN}$ is an $N$ -dimensional real random vector viewed as an $M \times N$ diagonal matrix.

Equivalently, Corollary 8.1 says that a unitarily bi-invariant random rectangular matrix $X_{MN}$ is uniformly random conditional on the law of its singular values. In terms of characteristic functions, we can formulate this as follows.

Definition 8.2. The quantum Bessel kernel

$K_{MN} \colon \mathbb{R}^N \times \mathbb{R}^N \longrightarrow \mathbb{C}$

is defined by

$K_{MN}(A,B) = \mathbf{E}[e^{i\langle A,U_MBV_N^*\rangle}]=\int\limits_{\mathbb{U}_{MN}} e^{\frac{i}{2}\mathrm{Tr}\,(A^*UBV^*+VB^*U^*A)} \mathrm{d}(U,V),$

where $A,B \in \mathbb{R}^N$ are viewed as $M \times N$ diagonal matrices.

The quantum Bessel kernel $K_{MN}$ is the characteristic function of a uniformly random $M \times N$ rectangular matrix with prescribed deterministic singular. This is the rectangular counterpart of the quantum Fourier kernel $K_N$ , which is the characteristic function of a uniformly random Hermitian matrix with prescribed singular values. Note that, even when $M=N,$ the kernels $K_{NN}$ and $K_N$ are not the same thing. In particular, look at the case $M=N=1.$ The quantum Fourier kernel reverts to the classical Fourier kernel on the line,

$K_1(a,b) = e^{iab},$

which is tautologically the same thing as the characteristic function of a real random variable whose distribution is a delta measure, i.e. the Fourier transform of a point mass. However, the quantum Bessel kernel at $M=N=1$ is something else, namely

$K_{11}(a,b) = \int\limits_{\mathbb{U}_{11}} e^{\frac{i}{2}(aub\bar{v} + vb\bar{u}a)}\mathrm{d}(u,v) = \int\limits_{\mathbb{U}_1} e^{\frac{i}{2}( abu + ab\bar{u})}\mathrm{d}u.$ \

This is a nontrivial transform, namely the characteristic function of a uniformly random point on a circle of radius $|b|,$ the power series expansion of which is

$K_{11}(a,b) = 1 + \sum\limits_{d=1}^\infty \frac{i^{2d}}{2^{2d}d!d!}(ab)^{2d}=1 + \sum\limits_{d=1}^\infty \frac{\left(\frac{1}{2}iab\right)^{2d}}{d!d!}.$

Stop a random person on the street and they will almost surely recognize this as the Bessel kernel $J_0(ab)$ which defines the Hankel transform of a random vector in $\mathbb{R}^2$ with radially symmetric distribution.

Definition 8.3. Given an $N$ -dimensional real random vector $B_N$ and any integer $M\geq N,$ the $M$ th quantum Hankel transform of $B_N$ is defined to be the expectation

$\psi_{MN}(A) = \mathbf{E}K_{MN}(A,B_N), \quad A \in \mathbb{R}^N.$

The random matrix calculation we have done above can be stated as follows.

Theorem 8.2. The classical characteristic function of a unitarily bi-invariant random rectangular matrix $X_{MN}$ is equal to the quantum Hankel transform of its singular values $B_{MN}.$

Math 289A: Lecture 7

In this lecture we will review what we have done so far from a slightly different perspective. Starting with the Fourier kernel

$F \colon \mathbb{R} \times \mathbb{R} \longrightarrow \mathbb{C},$

which is defined by $F(a,b) = e^{iab},$ we can define the characteristic function of a random variable $X$ with distribution $\mu$ by

$\varphi_X(a) = \mathbf{E}F(a,X) = \int\limits_{\mathbb{R}} F(a,x)\mu(\mathrm{d}x).$

This is an absolutely continuous function from $\mathbb{R}$ into the closed unit disc in $\mathbb{C}$ which uniquely determines the measure $\mu.$ To get some feel for which this might be, expand the Fourier kernel in a power series,

$F(a,b) = 1 + \sum_{d=1}^\infty \frac{i^d}{d!} (ab)^d.$

Then, assuming we can interchange expectation with the summation we get

$\mathbf{E}F(a,X) = 1 + \sum_{d=1}^\infty \frac{i^d}{d!} a^d\mathbf{E}[X^d],$

where

$\mathbf{E}[X^d] = \int\limits_{\mathbb{R}} x^d \mu(\mathrm{d}x)$ ,

is the degree $d$ moment of $X,$ or equivalently of $\mu.$ So at least heuristically, the characteristic function of $X$ is a generating function for the moments of its distribution. One situation in which this formal computation is quantitatively correct is when $\mu$ has compact support, so if the characteristic function does indeed characterize the distribution of $X$ then it must be the case that $\mu$ is characterized by its moments. It is not difficult to see why the why the moments of a probability measure with compact support determine it completely: if we know how to compute the expectation of polynomial functions of $X$ , then thanks to the Weierstrass approximation theorem we also know how to compute the expectation of continuous functions of $X,$ and this in turn tells us how $\mu$ measures intervals, by taking expectations of smooth bump functions. Of course, to obtain the fully general result that the characteristic function of $X$ completely determines its distribution a different argument is needed. For example, the Cauchy distribution has no moments whatsoever, but its characteristic function still exists and determines it uniquely.

The $N$ -dimensional Fourier kernel

$F_N \colon \mathbb{R^N} \times \mathbb{R^N} \longrightarrow \mathbb{C}$

is defined by

$F_N(A,B) = e^{i \langle A,B\rangle} = e^{i(a_1b_1+\dots+a_Nb_N)}, \quad A,B \in \mathbb{R}^N$

and the characteristic function of an $N$ -dimensional random vector $X_N$ is the continuous function $\mathbb{R}^N \to \mathbb{C}$ defined by

$\varphi_{X_N}(A) = \mathbf{E}F_N(A,X_N) = \int\limits_{\mathbb{R}^N} F_N(A,X) \mu_N(\mathrm{d}X),$

where $\mu_N$ is the distribution of $X_N$ in $\mathbb{R}^N.$ It is still true that $\varphi_{X_N}$ exists and uniquely determines $\mu_N$ , and we can tell a similar story about moments which supports this statement. To do so we first have to expand the $N$ -dimensional Fourier kernel in a power series. This is easy but let us first make some general observations about what this expansion must look like before doing the computation. Since $F_N(A,B) = F_N(B,A)$ and

$F_N(PA,PB) = e^{i\langle PA,PB\rangle} = e^{i \langle A,B \rangle} = F_N(A,B),$

for any $N \times N$ permutation matrix $P,$ we have

$F_N(A,B) =1 + \sum_{d=1}^\infty \frac{d^d}{d!} F_N^d(A,B)$

where $F_N^d(A,B)$ is a homogeneous polynomial of degree $d$ in $a_1b_1,\dots,a_Nb_N$ which is invariant under independent permutations of these variables. Therefore, we have

$F_N^d(A,B) = \sum\limits_{\alpha \in \mathsf{C}_N^d} (a_1b_1)^{\alpha_1}\dots (a_Nb_N)^{\alpha_N}F_N(\alpha)$

where $\mathsf{C}_N^d$ is the set of weak $N$ -part compositions $\alpha$ of $d \in \mathbb{N}$ and $F_N(\alpha)$ is a numerical coefficient. It is easy to determine this coefficient explicitly: since

$F_N(A,B) = e^{ia_1b_1} \dots e^{ia_Nb_N} = \left(\sum\limits_{d=0}^\infty \frac{i^d}{d!}(a_1b_1)^d \right)\dots \left(\sum\limits_{d=0}^\infty \frac{d^d}{d!}(a_Nb_N)^d \right),$

we have

$F_N(\alpha) = \frac{d!}{\alpha_1! \dots \alpha_N!}={d \choose \alpha_1,\dots,\alpha_N},$

the multinomial coefficient. Thus, the expected value of $F_N^d(A,X_N)$ is a generating polynomial for degree $d$ mixed moments of the entries of the random vector $X_N=(X_N(1),\dots,X_N(N))$ ,

$\mathbf{E}F_N^d(A,X_N) = \sum\limits_{\alpha \in \mathsf{C}_N^d} a_1^{\alpha_1} \dots a_N^{\alpha_N} \mathbf{E}[X^{\alpha_1}_N(1)\dots X^{\alpha_N}_N(N)]F_N(\alpha).$

In the previous lectures we defined a strange new kernel

$K_N \colon \mathbb{R}^N \times \mathbb{R}^N \longrightarrow \mathbb{C}$

$K_N(A,B) = \int\limits_{\mathbb{U}_N} F_N(A,|U|^2B)\mathrm{d}U,$

where the integration is against Haar probability measure on the unitary group $\mathbb{U}_N,$ and for any $U \in \mathbb{U}_N$ the matrix

$|U|^2=\begin{bmatrix} {} & \vdots & {} \\ \dots & |U_{xy}|^2 & \dots \\ {} & \vdots & {} \end{bmatrix}$

is obtained from $U=[U_{xy}]$ by taking the squared modulus of each entry. Thus for any $B \in \mathbb{R}^N$ the vector

$|U|^2B=\begin{bmatrix} |U_{11}|^2b_1 + \dots + |U_{1N}|^2b_N \\ |U_{21}|^2b_1 + \dots + |U_{2N}|^2b_N \\ \vdots \\ |U_{N1}|^2b_1 + \dots + |U_{NN}|^2b_N\end{bmatrix} \in \mathbb{R}^N$

has coordinates which are superpositions of the coordinates of $B,$ and because of this we named $K_N$ the $N$ -dimensional “quantum Fourier kernel.”

Problem 7.1. Prove that $K_N$ is a bounded symmetric kernel on $late \mathbb{R}^N$, i.e. $|K_N(A,B)|\leq 1$ and $K_N(A,B)=K_N(B,A)$ for all $A,B \in \mathbb{R}^N.$

Then, given a random vector $B_N$ with distribution $\sigma_N$ in $\mathbb{R}^N,$ we defined its quantum characteristic function by

$\psi_{B_N}(A)=\mathbf{E}K_N(A,B_N) = \int\limits_{\mathbb{R}^N} K_N(A,B)\sigma(\mathrm{d}B).$

This is a continuous function from $\mathbb{R}^N$ into the closed unit disc in $\mathbb{C}$ , and we claimed that $\psi_{B_N}$ uniquely determines the distribution $\sigma_N$ of $B_N.$ Since the quantum characteristic function of $B_N$ is clearly distinct from its classical characteristic function

$\varphi_{B_N}(A) = \mathbf{E}F_N(A,B_N)=\int\limits_{\mathbb{R}^N} F_N(A,B)\sigma(\mathrm{d}B),$

this must be true for some different reason. To understand what this reason might be, at least heuristically, we could proceed along the same lines we did with $F_N$ and try to determine if $\varphi_{B_N}(A)$ is some sort of generating function encoding natural statistics of $B_N$ .

Let’s give this a try. The first step is to expand the quantum Fourier kernel $K_N$ as a power series. Before doing the computation, we can look for some easy symmetries of $K_N$ which will tell us in advance what a series expansion of $K_N$ ought to look like. In fact, the quantum Fourier kernel is much more symmetric than the classical Fourier kernel – it is invariant under independent (as opposed to simultaneous) permutations of the coordinates of its arguments.

Problem 7.2. Prove that $K_N(PA,QB)=K_N(A,B)$ for any $A,B \in \mathbb{R}^N$ and any $N\times N$ permutation matrices $P,Q.$

From Problems 7.1 and 7.2, we can indeed infer the general form of a series expansion of the kernel $K_N.$ Namely, we have

$K_N(A,B) = 1 + \sum\limits_{d=1}^\infty \frac{i^d}{d!} K_N^d(A,B),$

where $K_N^d(A,B)$ is a homogeneously degree $d$ polynomial function of the coordinates $a_1,\dots,a_N,b_1,\dots,b_N$ which is invariant under independent permutations of $a_1,\dots,a_N$ and $b_1,\dots,b_N$ and stable under swapping these two sets of variables. What does such a polynomial look like?

A classical theorem of Newton asserts that a symmetric homogeneous degree $d$ polynomial in $N$ variables is a polynomial function of the power-sum polynomials

$p_1(x_1,\dots,x_N) = x_1+\dots+x_N \\ p_2(x_1,\dots,x_N)=x_1^2+\dots+x_N^2\\ \vdots \\ p_N(x_1,\dots,x_N)=x_1^N+\dots+x_N^N.$

Equivalently, the vector space $\Lambda_N^d$ of symmetric homogeneous degree $d$ polynomials in $N$ variables has as a linear basis the polynomials

$p_\alpha(x_1,\dots,x_N)=\prod\limits_{k=1}^{\ell(\alpha)} p_{\alpha_1k}(x_1,\dots,x_N),$

where $\alpha=(\alpha_1,\dots,\alpha_r)$ ranges over the set $\mathsf{P}_N^d$ of partitions of $d$ with largest part at most $N,$ and $\ell(\alpha)$ denoting the number of parts.

Problem 7.3. Prove that for any $\alpha \in \mathsf{P}_N^d$ and any $A \in \mathbb{R}^N,$ the evolution $p_\alpha(A)=p_\alpha(a_1,\dots,a_N)$ satisfies $|p_\alpha(A)| \leq \|A\|_\infty N^{\ell(\alpha)},$ and that this bound is sharp.

Thus, a power series expansion of the quantum Fourier kernel can be written in the form

$K_N(A,B) = 1 + \sum\limits_{d=1}^\infty \frac{i^d}{d!}\sum\limits_{\alpha,\beta \in \mathsf{P}_N^d} \frac{p_\alpha(a_1,\dots,a_N)}{N^{\ell(\alpha)}}\frac{p_\beta(b_1,\dots,b_N)}{N^{\ell(\beta)}}K_N(\alpha,\beta),$

where the coefficients satisfy $K_N(\alpha,\beta)=K_N(\beta,\alpha).$ This complicated-looking expression is actually quite meaningful. For any vector $A \in \mathbb{R}^N$ , the normalized power sum polynomial of degree $d$ in the coordinates of $A$ is

$\frac{p_k(a_1,\dots,a_N)}{N} = \int\limits_{\mathbb{R}} x^k \eta_A(\mathrm{d}x),$

where

$\eta_A = \frac{1}{N} \sum\limits_{i=1}^N \delta_{a_i}$

is the discrete probability measure on $\mathbb{R}$ which places equal mass at each coordinate of $A.$ This is called the empirical distribution of $A \in \mathbb{R}^N.$ Thus, the expansion of the quantum Fourier kernel $K_N(A,B)$ is a generating function for the moments of the empirical distributions of its arguments. Consequently, for $B_N$ a random vector, the quantum characteristic function $\psi_{B_N}(A)$ is a generating function for expectations of polynomials in the random variables

$p_1(B_N),p_2(B_N),p_3(B_N),\dots,$

which are the moments of the (random) empirical distribution of the random vector $B_N.$

Math 289A: Lecture 6

Here is a link to the slides from today’s Zoom lecture: https://web.goodnotes.com/s/GPePlT3BSSbzDsGPUrqG2G#page-5

Note that some of this material was typed up towards the end of the Lecture 5 notes.

Math 289A: Lecture 5

Let $\mathbb{H}_N$ be the real vector space of Hermitian matrices with the Hilbert-Schmidt scalar product

$\langle X,Y \rangle =\mathrm{Tr}\, XY.$

Let

$f \colon \mathbb{H}_N \longrightarrow \mathbb{R}^N$

be the function from Hermitian matrices to standard Euclidean space defined by

$f(X) = (\text{eigenvalues of }X\text{ listed largest to smallest}).$

Theorem 5.1 (Hoffman-Wielandt Inequality). For any $X,Y \in \mathbb{H}_N$ we have

$\|f(X)-f(Y)\| \leq \|X-Y\|.$

Proof: By the spectral theorem, there are unitary matrices $V,W \in \mathbb{U}_N$ and non-increasing vectors $A,B \in \mathbb{R}^N$ such that

$X=VAV^* \quad\text{and}\quad Y=WBW^*.$

The square of the Hilbert-Schmidt norm of the difference $X-Y$ is

$\|X-Y\|^2 = \langle VAV^*-WBW^*,VAV^*-WBW^*\rangle = \mathrm{Tr}\, (VAV^*-WBW^*)^2,$

and

$\mathrm{Tr}\, (VAV^*-WBW^*)^2=\mathrm{Tr}\, A^2 -2\mathrm{Tr}\, VAV^*WBW^*+\mathrm{Tr}\, B^2.$

Therefore, if we can show

$\mathrm{Tr}\, AUBU^* \leq \mathrm{Tr}\, AB$

holds for all $U \in \mathbb{U}_N,$ we will have proved the claimed inequality (make sure you understand why).

Now,

$\mathrm{Tr}\, AUBU^* = \sum_{i,j=1}^N a_ib_j |U_{ij}|^2,$

and $[|U_{ij}|^2]$ is a doubly stochastic matrix. Therefore, the maximum value of $\mathrm{Tr}\, AUBU^*$ over all $U \in \mathbb{U}_N$ is bounded by the maximum value of the function

$g(M) = \sum_{i,j=1}^N a_ib_j |M_{ij}|,$

as $M$ ranges over all doubly stochastic matrices. Since $g$ is a linear function on the Birkhoff polytope, it achieves its maximum value at an extreme point of this convex set, so at a permutation matrix. The vertex corresponding to the identity matrix gives

$g(I) = \sum\limits_{i=1}^N a_i b_i = \mathrm{Tr}\, AB,$

and since the coordinates of $A$ and $B$ are nonincreasing the identity permutation is the optimal matching.

– QED

Now let $X_N$ be a random Hermitian matrix.

Theorem 5.2. The following are equivalent:

The characteristic function of $X_N$ is invariant under unitary conjugation;
The distribution of $X_N$ is invariant under unitary conjugation;
$X_N$ is equal in distribution to a random Hermitian matrix of the form $U_NB_NU_N^*,$ where $U_N$ is a uniformly random unitary matrix and $B_N$ is a random real diagonal matrix independent of $U_N$ .

We have already proved the equivalence of (1) and (2).

Problem 5.2. Complete the proof of Theorem 5.1.

At this point we have enough information to explain how the characteristic function of a unitarily invariant random Hermitian determines the joint distribution of its eigenvalues.

Theorem 5.2. The characteristic function of a unitarily invariant random Hermitian matrix is equal to the quantum characteristic function of its eigenvalues.

Obviously, this statement needs to be explained.

Let $B_N$ be a real random vector whose distribution $\sigma_N$ in $\mathbb{R}^N$ is arbitrary, and let $U_N$ be an independent random unitary matrix whose distribution $\eta_N$ in $\mathbb{U}_N$ is uniform (Haar) measure. Then $X_N=U_NB_NU_NB^*$ is a unitarily invariant random Hermitian matrix, and by Theorem 5.2 every unitarily invariant random Hermitian matrix is equal in law to a matrix constructed in this way.

Because $\mathbb{H}_N$ is non-compact, there is no such thing as a uniformly random Hermitian matrix. The matrix $X_N = U_NB_NU_N^*$ is as close as we can get: it is uniform conditional on the distribution of its eigenvalues $B_N.$ If $\mu_N$ is the distribution of $X_N$ in $\mathbb{H}_N$ , a random sample from $\mu_N$ is obtained by sampling a vector from the distribution $\sigma_N$ on $\mathbb{R}^N,$ and then choosing a uniformly random point on the $\mathbb{U}_N$ -orbit in $\mathbb{H}_N$ which contains this vector.

In terms of characteristic functions, this says that

$\varphi_{X_N}(A) = \mathbf{E}_{\mu_N}\left[e^{i \mathrm{Tr} AU_NB_NU_N^*}\right] = \mathbf{E}_{\sigma_N} \left[K_N(A,B_N)\right],$

where

$K_N(A,B) = \mathbf{E}_{\eta_N}\left[e^{i\mathrm{Tr}\, AUBU^*}\right] = \int_{\mathbb{U}_N} e^{i \mathrm{Tr}\, AUBU^*} \eta_N(\mathrm{d}U)$

is the characteristic function of a uniformly random Hermitian matrix from the $\mathbb{U}_N$ orbit containing $B \in \mathbb{R}^N$ viewed as a diagonal matrix. Thus, $K_N(A,B)$ is the characteristic function of a uniformly random Hermitian matrix with prescribed eigenvalues. This defines a bounded symmetric kernel

$K_N \colon \mathbb{R}^N \times \mathbb{R}^N \longrightarrow \mathbb{C}$

which we call the quantum Fourier kernel, because it is a superposition of standard $N$ -dimensional Fourier kernels

$K_N(A,B) = \int_{\mathbb{U}_N}e^{i \sum_{x,y=1}^N a_xb_y |U_{xy}|^2} \eta_N(\mathrm{d}U).$

Problem 5.3. Prove that indeed $K_N(A,B)=K_N(B,A)$ and $|K_N(A,B)|\leq 1.$ Moreover, show that $K_N(A,B)$ is invariant under independent permutations of the coordinates of $A,B \in \mathbb{R}^N.$

If $K_N(A,B)$ is the quantum Fourier kernel, then

$\psi_{B_N}(A) = \mathbf{E}[K_N(A,B_N)] = \int_{\mathbb{R}^N} K_N(A,B) \sigma_N(\mathrm{d}B)$

deserves to be called the quantum characteristic function of random vector $B_N$ , or the quantum Fourier transform of the probability measure $\sigma_N.$ What we have shown in this lecture is that for any unitarily invariant random Hermitian matrix $X_N,$ we have

$\varphi_{X_N}(A) = \psi_{B_N}(A), \quad A \in \mathbb{R}^N,$

where $B_N$ is any $N$ -dimensional random vector whose components are eigenvalues of $X_N$ (this is what Theorem 5.2 means). We can also state the above as

$\varphi_{D_N}(A) = \psi_{B_N}(A), \quad A \in \mathbb{R}^N,$

where $D_N$ is the diagonal of the random matrix $X_N$ .

The conclusion to be drawn from all of this is that figuring out how to analyze the eigenvalues of a unitarily invariant random Hermitian matrix $X_N$ given its characteristic function $\varphi_{X_N}$ is exactly the same thing as figuring out how to analyze an arbitrary random real vector $B_N$ given its quantum characteristic function $\psi_{B_N}.$

Before telling you how we will do this, let me tell you how we won’t. The main reason the Hoffman-Weilandt theorem was presented in this lecture is that the proof makes you think about maximizing the function

$g(U) = \sum_{x,y=1}^N a_xb_y|U_{xy}|^2$

over unitary matrices $U.$ The action $g(U)$ is exactly what appears in the exponent of the quantum Fourier kernel:

$K_N(A,B) = \int_{\mathbb{U}_N} e^{i g(U)} \mathrm{d}U.$

(I am going to stop writing the Haar measure $\eta_N$ explicitly in Haar integrals, so just $\mathrm{d}U$ rather than $\eta_N(\mathrm{d}U)$ ). Now, the method of stationary phase predicts that $K_N(A,B)$ should be approximated by contributions from the local maxima of the action, which means that this integral should localize to a sum over permutations. This turns out to be exactly correct, not just a good approximation, and the reasons for this are rooted in symplectic geometry. The end result is the following determinantal formula for the quantum Fourier kernel.

Theorem 5.4 (HCIZ Formula). For any $A,B \in \mathbb{R}^N$ with no repeated coordinates we have

$K_N(A,B) = \frac{1!2! \dots (N-1)!}{i^{N \choose 2}}\frac{\det [ e^{ia_xb_y}]}{\det[a_x^{N-y}]\det[b_x^{N-y}]}.$

This is a very interesting formula which makes manifest all the symmetries of the quantum Fourier kernel claimed in Problem 5.3 (but, you should solve that problem without using the determinantal formula). It is also psychologically satisfying in that it shows exactly how the quantum Fourier kernel is built out of classical one-dimensional Fourier kernels. However, it is not really good for much else. We will explain this further in the coming lectures. But first we will adapt all of the above to random rectangular matrices.

Math 289A: Lecture 4

We have defined the characteristic function of a random Hermitian matrix $X$ according to the usual recipe for random variables taking values in a Euclidean space: it is the expectation

$\varphi_{X}(T) = \mathbf{E}\left[e^{i\langle T,X \rangle}\right],$

where the argument $T$ ranges over the real vector space

$\mathbb{H}_N = \{X \in \mathbb{C}^{N \times N} \colon X^*=X\},$

which carries the Hilbert-Schmidt scalar product

$\langle T,X \rangle = \mathrm{Tr}\, TX.$

Equivalently,

$\varphi_{X}(T) = \int\limits_{\mathbb{H}_N} e^{i \mathrm{Tr}\, TX}\mu_X(\mathrm{d}X),$

where $\mu_X$ is the distribution of $X$ in $\mathbb{H}$ . With this definition in hand, we can consider special classes of random Hermitian matrices carved out by clauses of the form

“A random Hermitian matrix $X$ is said to be (descriptor) if its characteristic function satisfies (property).”

A natural construction of this form arises from the Spectral Theorem. The set

$\mathbb{U}_N=\{U \in \mathbb{C}^{N \times N} \colon UU^*=UU^*=I\}$

of unitary matrices is a group under matrix multiplication, called the unitary group of rank $N.$ The group $\mathbb{U}_N$ acts on the Euclidean space $\mathbb{H}_N$ by conjugation, which is a group homomorphism

$\Phi \colon \mathbb{U}_N \longrightarrow \mathrm{Aut}(\mathbb{H}_N,\langle \cdot,\cdot\rangle)$

from $\mathbb{U}$ into the orthogonal group of the Euclidean space $\mathbb{H}$ defined by

$\Phi(U)T = UTU^*, \quad U \in \mathbb{U}_N,\ T \in \mathbb{H}_N.$

Problem 4.1. Prove that $\Phi$ really is a group homomorphism from $\mathbb{U}_N$ into orthogonal transformations of Euclidean space $\mathbb{H}_N.$

One formulation of the Spectral Theorem says: the conjugation orbit

$\mathcal{O}_X = \{UXU^* \colon U \in \mathbb{U}_N\}$

of any Hermitian matrix $X \in \mathbb{H}_N$ contains a diagonal matrix $A,$ which is unique up to simultaneous permutations of its rows and columns.

Definition 4.1. A random Hermitian matrix $X$ is said to be unitarily invariant if its characteristic function $\varphi_{X}$ satisfies

$\varphi_{X}(UTU^*) = \varphi_{X}(T)$

for all $T \in \mathbb{H}_N$ and all $U \in\mathbb{U}_N.$ That is, $X$ is unitarily invariant if its characteristic function $\varphi_{X}$ is constant on each $\mathbb{U}_N$ -orbit of $\mathbb{H}_N.$

Unitary invariance is a basic symmetry with many ramifications. First of all, it leads to a massive reduction in complexity by allowing us to view the characteristic function of $X$ as a function of $A \in \mathbb{R}^N$ rather than a function of $T \in \mathbb{H}_N.$ Indeed, for any arbitrary Hermitian $T,$ the Spectral Theorem says that $UTU^*=A$ for some unitary matrix $U \in \mathbb{U}_N$ and some diagonal matrix $A \in \mathbb{H}_N.$ Unitary invariance of the characteristic function then gives

$\varphi_{X}(T) = \varphi_{X}(UTU^*) = \varphi_{X}(A).$

The diagonal Hermitian matrix

$A = \begin{bmatrix} a_1 & {} & {} \\ {} & \ddots & {} \\ {} & {} & a_N\end{bmatrix}$

can be viewed as a vector $A =(a_1,\dots,a_N) \in \mathbb{R}^N,$ so that the characteristic function of a unitarily invariant random Hermitian matrix $X$ can be viewed as a continuous function

$\varphi_{X} \colon \mathbb{R}^N \longrightarrow \mathbf{D}$

on $N$ -dimensional Euclidean space taking values in the closed unit disc $\mathbf{D} \subset \mathbb{C}.$

Problem 4.2. Prove that the characteristic function of a unitarily invariant random Hermitian matrix $X$ is a symmetric function on $\mathbb{R}^N.$

Now we discuss the probabilistic ramifications of unitary invariance, i.e. what Definition 4.2 implies about the distribution of $X.$

Theorem 4.1. The distribution of a unitarily invariant random Hermitian matrix $X$ is completely determined by the joint distribution of its diagonal matrix elements $X_{11},\dots,X_{NN}$ , which are exchangeable real random variables.

Proof: For $A=\mathrm{diag}(a_1,\dots,a_N)$ real diagonal, we have

$\mathrm{Tr}\, AX = a_1X_{11} + \dots + a_NX_{NN}.$

Consequently, we have

$\varphi_{X}(A) = \mathbb{E}[e^{i\mathrm{Tr}\, AX}] = \mathbb{E}[e^{i(a_1X_{11} + \dots + a_NX_{NN})}],$

which is exactly the characteristic function of the real random vector

$D = (X_{11},\dots,X_{NN}).$

Thus, the characteristic function $\varphi_X(A)=\varphi_D(A).$ The exchangeability of $X_{11},\dots,X_{11}$ follows from Problem 4.1.

-QED

It is important to note that exchangeable random variables are identically distributed, but not necessarily independent.

Problem 4.3. Prove that the diagonal matrix elements of a unitarily invariant random Hermitian matrix $X$ are independent if and only if the distribution $\mu_{X}$ is a Gaussian measure on $\mathbb{H}_N.$

Now we characterize unitary invariance as a feature of the distribution of a random Hermitian matrix.

Theorem 4.2. A random Hermitian matrix $X$ is unitarily invariant if and only if $X$ and $UX_NU^*$ have the same distribution for every $U \in \mathbb{U}_N.$

Proof: Note that for any random Hermitian matrix $X$ and any deterministic unitary matrix $U \in \mathbb{U}_N$ we have

$\varphi_{UXU^*}(T) =\varphi_{X}(U^*TU),\quad T \in \mathbb{H}_N,$

by cyclic invariance of the trace. If $X$ is unitarily invariant, this becomes

$\varphi_{UXU^*}(T) =\varphi_{X}(T), \quad T \in \mathbb{H}_N,$

which is equivalent to equidistribution of $X$ and $UX_NU^*.$ Conversely, if $UXU^*$ and $X$ have the same distribution then their characteristic functions are equal.

-QED

Theorem 4.2 is quite important for the following reason. Define a random linear operator acting on $\mathbb{C}^N$ by

$L_Xv = Xv, \quad v \in \mathbb{C}^N.$

The matrix of $L_{X}$ in the standard basis $e_1,\dots,e_N$ of $\mathbb{C}^N$ is $X.$ The matrix of $L_{X}$ in some other orthonormal basis $f_1,\dots,f_N$ is the $UXU^*$ , where $U$ is the matrix of the linear transformation defined by $e_i \mapsto f_i.$ Unitary invariance says that $UXU^*$ has the same distribution as $X$ – the random matrix of the random operator $L_{X}$ has the same law for any choice of orthonormal basis in the Hilbert space $\mathbb{C}^N.$ In this sense, unitarily invariant random Hermitian matrices are precisely those random matrices which correspond to canonically defined random Hermitian operators on finite-dimensional Hilbert space.