Tag Archives: 289A
Math 289A: Formalism of Schur-Horn Arrays
The right way to set up the classical limit theorems of probability theory is to start with a triangular array of real random variables
Your first exposure to the law of large numbers and central limit theorem likely involved a different setup, namely an infinite sequence
of iid random variables, whose partial sums
are the main object of interest. There are some non-negligible advantages to the triangular array setup, for example the fact that the rows do not have to be defined on a common underlying probability space. The behavior of the row sums
is the topic of slightly more advanced and flexible versions of the central limit theorem. This result and other classical limit theorems apply to the situation when the random variables in each row of the triangular array are independent or nearly independent.
We are interested in triangular arrays of random variables as above which arise from an underlying deterministic data set
The triangular array of random variables associated to this data has rows defined by
where is a uniformly random unitary matrix of rank
Thus, the random vector
is a random linear transformation of the deterministic data
Because the entries of
are highly correlated (rows are orthonormal, columns are orthonormal) the random variables
are far from independent. They are, however, exchangeable.
There are two ways to think about a Schur-Horn array. Geometrically, is a random point inside the permutahedron in
generated by
Algebraically,
is the diagonal of the random matrix
, where
We are interested in proving limit theorems for Schur-Horn arrays. Such theorems will describe behavior of the rows of the
-array in terms of prescribed
behavior of the rows of the
-array. In particular, we are interested in the situation where the empirical distribution of the vector
converges in moments as where
is a scaling exponent. The empirical distribution is the probability measure on
which places mass
at each coordinate of the vector
If we think of these data points as beads on a wire, then the empirical measure of
is the fraction of beads which lie in
Convergence in moments of the empirical measure means that the the limit
exists for each fixed where
is the power sum polynomial of degree
in
variables.
Let denote expected value, and let
be the characteristic function of Because the distribution of
in
is compactly supported,
is an analytic function on
The Maclaurin expansion of the characteristic function is
where
and the sum is over all vectors of nonnegative integers whose coordinates sum to
Thus,
is a homogeneous degree
polynomial in
variables whose coefficients are, up to a transparent factor, the degree
joint moments of the random variables
. Since
the logarithm of
of the characteristic function is defined and analytic on an open neighborhood of the origin in
We write the Maclaurin expansion of
as
where is a homogeneous degree
polynomial in
variables. The polynomials
are called the cumulant polynomials of the random variables
They are determined by the moment polynomials
according to a certain recursion explained on pages 2 and 3 of these notes. We write the cumulant polynomials in the form
where the quantities are the degree
joint cumulants of the random variables
as
ranges over nonnegative integer vectors of sum
. Note that this is a definition, not a proposition. In statistical mechanics and quantum field theory, joint cumulants are referred to as connected correlation functions.
The main fact about Schur-Horn arrays that allows us to understand them is a second formula for the cumulant polynomials of the random variables
in terms of the moments of the empirical distribution of the deterministic data
Theorem (Cumulant Formula for Schur-Horn Arrays). For each we have
where the sums are over nonnegative integer vectors and
with non-increasing coordinates which sum to
and
and
are the number of nonzero coordinates of
and
respectively. For each such pair
the polynomials
and
are the corresponding products of power sum symmetric polynomials, i.e.
Finally,
is a convergent series in which the coefficients are positive integers called monotone Hurwitz numbers which admit an explicit combinatorial description.
The cumulant formula for Schur-Horn arrays looks complicated, but it is fairly easy to use in practice. For example, let us set our scaling exponent to Then, we for any fixed
we have
As we saw (twice) in lecture, you can calculate the limit of this polynomial. Assuming
and
the only nonzero limit is
This means that
unless is a nonnegative integer vector with total sum
and only one nonzero coordinate. From this we conclude that for any fixed
the random variables
converge in joint distribution to
iid standard Gaussians.
When our scaling exponent is we do not get such a simple Gaussian limit. Rather, we find that
for each fixed where
where Thus we still have a sum of pure powers indicating asymptotic independence between the random variables
but the cumulants of these random variables are asymptotically
We are going to see that
is the free cumulant sequence corresponding to the moment sequence
Math 289A: Lecture 15
Today we want to prove the “inverse Wigner theorem” we started discussing last time. Let be a sequence of traceless deterministic matrices, and for each
let
be a uniformly random Hermitian matrix isospectral with
Suppose that the empirical eigenvalue distribution of
converges in moments as
meaning that the limit
exists for each Equivalently, we have
with any enumeration of the eigenvalues of
and
the power sum polynomial of degree
Let
denote the diagonal matrix elements of the random matrix
so
with a uniformly random unitary matrix.
Theorem 15.1. For any fixed the random vector
converges in moments to a vector
of iid centered Gaussians of variance
More precisely, we have
for any vector of nonnegative integers.
Here is a corollary which further explains why we are calling Theorem 15.1 an inverse Wigner theorem.
Corollary 15.1. For any the
principal submatrix of
converges to
i.e. to an
random Hermitian matrix following a centered Gaussian distribution of variance
Problem 15.1. Deduce Corollary 15.1 from Theorem 15.1.
The rest of the lecture consists of the proof of Theorem 15.1. First, let us explain the assumption that our deterministic data has trace zero. The random variables
are identically distributed (but not independent), with first moment
So, assuming is equivalent to centering
Now we start the proof. The joint distribution of is a compactly supported probability measure on
— indeed, it is supported on the compact convex set whose extreme points are the distinct permutations of the deterministic vector
Therefore, the Laplace transform
is an entire function of the complex variables i.e.
is an analytic function on
Note that if we set
and restricting the remaining variables
, the Laplace transform specializes to the characteristic function of
so that it completely determines the distribution of this random vector. The variable
is redundant, but it is a convenient marker of homogeneity: we write the Maclaurin expansion of
as
where is a homogeneous polynomial of degree
. Indeed, this polynomial is
which in the standard monomial basis of homogeneous polynomials is
the sum being over all vectors of nonnegative integers whose coordinates sum to
, aka weak
-part compositions of
We call
the degree
moment polynomial of
because its coefficients in the monomial basis are the degree
joint moments of these random variables (up to a multinomial coefficient). So our objective is to understand the
behavior of this polynomial restricted to the
-dimensional subspace of
spanned by the first
coordinate vectors, where
is arbitrary but fixed. To be completely explicit, this is the polynomial
the sum being over weak -part compositions of
This is an important simplification, since the number of variables of
is fixed, i.e. its domain does not vary with
In fact, it is more convenient to work with the log-Laplace transform of More precisely, since
there exists a simply connected open neighborhood
of the origin in
on which
is non vanishing, and there is a unique analytic function
vanishing at the origin which satisfies
Let
be the Maclaurin expansion of this analytic function, so is a homogeneous degree
polynomial. This is called the degree
cumulant polynomial of the random variables
. According to the Exponential Formula (aka moment-cumulant formula), these cumulants are recursively determined by the moment polynomials by a triangular system of equations, the first three of which are (suppressing the variables
)
We can write the polynomial as
except for the subscript on the expectation-like quantity $\langle X_{N1}^{\alpha_1} \dots X_{NN}^{\alpha_N}\rangle_c$, which is known as a degree joint cumulant of
Note that this is a definition, not a proposition – the relationship between the Maclaurin expansion of the Laplace transform and the log-Laplace transform implicitly defines these statistics.
Problem 15.2. Show that is the covariance of
and
Our discussion so far pertains to any finite family of random variables with compactly supported joint distribution. Now we will start invoking special features of the family we are actually dealing with. The first of these is that these random variables are exchangeable, which is equivalent to saying that the cumulant polynomials
are symmetric polynomials. This means that we can also expand them as linear combinations of Newton polynomials, and as previously discussed have a powerful understanding of this expansion.
Theorem 15.2 (Leading Cumulants Formula). For any we have
where each sum is over the set of integer partitions of and
is a convergent power series in whose coefficients
are positive integers.
The combinatorial coefficients are quite intricate and there is no simple closed formula for them (you can read this if you want to know more), but luckily we do not need to know very much about them in order to prove our inverse Wigner theorem. All that matters for us is that this magic formula is expressing the cumulant polynomials of
in terms of the moments of the empirical eigenvalue distribution of our deterministic data matrices
in a quite transparent way.
To leverage this, let us restrict to the cumulant polynomials of , which just means setting the last
variables of
equal to zero (this eventually makes sense because
is fixed and
increases without bound). We write the resulting polynomial in the form
Ignore the factor of for a moment and just think about the behavior of the double sum as
. The number of terms in this sum is independent of
, which means that we only need to understand the asymptotics of each individual term
Reading from right to left, we have that as
By hypothesis, we have
Finally, since
we have
as Thus, the whole double sum is
as
so now is a good time to remember the prefactor
and immediately conclude that
as , uniformly on compact subsets of
for all
For
we have
uniformly on compact subsets of and since
and
this simplifies to
Finally, the polynomial is identically zero by our trace zero hypothesis. What we have shown is that all joint cumulants of
converge to zero as
except for pure cumulants of degree two,
each of which converges to
Problem 15.3. Compute the joint cumulants of a finite family of centered iid Gaussians and complete the proof.
Math 289A: Lecture 14
In this lecture I wanted to connect our (admittedly meandering) discussion so far to the standard approach to limit theorems in random matrix theory. Let be a random Hermitian matrix with characteristic function
This means precisely that the distribution of in
with Hilbert-Schmidt scalar product is the standard Gaussian measure on this Euclidean space.
Definition 14.1. The Gaussian Unitary Ensemble is the sequence
The word “unitary” here means that the distribution of is unitarily invariant. Let $B_{N1} \geq \dots \geq B_{NN}$ be the eigenvalues of
Theorem 14.1 (Wigner). For each the limit
exits and is given by
In Theorem 14.1, is the power sum symmetric polynomial of degree
, and
is the
th Catalan number. There are two things we should discuss: what the result means, and how to prove it.
Concerning the meaning of Theorem 14.1, the random variable
is the degree moment of the empirical eigenvalue distribution of
which is the random discrete probability measure
on
which places equal mass at each eigenvalue. Thus, Theorem 14.1 is saying that each moment of the empirical eigenvalue distribution of the scaled random matrix
converges in expectation to an explicit limit. It is therefore natural to ask whether
is the moment sequence of a probability measure on
To answer this, one can recognize that the exponential generating function
is essentially a Bessel function. Then, a classical integral representation for Bessel functions allows one to determine that is the moment sequence of the probability measure
on
which is supported on
with density
You can find the details of this derivation here, towards the end of Lecture One.
Now we discuss how one can prove Theorem 14.1. The proof uses the fact that power sums in the eigenvalues of a matrix are also traces of powers of that matrix. This means that we have
the point being that the right hand side is also a polynomial in the elements of the Gaussian random matrix
. Indeed, by matrix algebra we have
where the sum is over all functions
Using the fact that the matrix elements of are Gaussian and independent (up to selfadjointness), the following can be established.
Theorem 14.2 (Harer-Zagier). For each we have
where is the number of ways to glue the edges of a
-gon in pairs so as to produce a compact orientable surface of genus
The proof is a beautiful piece of mathematics and a nice treatment can be found here. Theorem 14.2 implies Theorem 14.1 because the number of ways to glue a sphere from a polygon with
sides is
and only the genus zero term survives in the
limit.
There is something unsatisfying about the above sequence of ideas. We know that the entire distribution of is completely determined by the joint distribution of its diagonal matrix elements
which are simply iid standard Gaussians. In fact, at this point in the course we know much more: the spectral analysis of unitarily invariant random Hermitian matrices is equivalent to the analysis of pairs
and
of triangular arrays of real random variables related by
where is a uniformly random unitary matrix. Initially,
were viewed as the diagonal matrix elements of a random Hermitian matrix with unitarily invariant distribution and
were the eigenvalues of this selfadjoint matrix. But, we can in fact forget about this original setup entirely, and simply be probabilists thinking about two randomly coupled triangular arrays of real random variables. The question then how to recover Theorem 14.1 simply from the given fact that our
-array has rows
of iid standard Gaussians.
This point of view is why the paper of Olshanski-Vershik is so relevant for our purposes, and so far ahead of its time: they are proving inverse theorems of this kind. More precisely, they are considering the case where the -array is a given deterministic data set, and proving limit theorems about the corresponding
-array. A special case of their main result is the following “inverse Wigner theorem.”
Theorem 14.3 (Olshanski-Vershik). Suppose that the limit
exists for each Then, for any fixed
, the random variables
converge in distribution to an
-tuple
of iid centered Gaussians with variance
We have proved the one-dimensional version of this Gaussian limit theorem (the case ), and we will finish the proof of the multivariate version next time (the case
). In our approach to these inverse results of Olshanski-Vershik we are using a combinatorial result which is analogous to Theorem 14.2, but arguably even more miraculous: a universal genus expansion for joint cumulants of the
-array in our theory of unistochastically coupled triangular arrays.
Math 289A: Lecture 13
Let be a sequence of deterministic Hermitian matrices, and let
be any enumeration of the eigenvalues of
Consider the corresponding sequence
of random Hermitian matrices defined by
where
is a random unitary matrix whose distribution in
is Haar measure. We then have a corresponding triangular array of real random variables,
whose th row
consists of the diagonal elements of the random matrix In terms of our input data
we have
This is superficially similar to the setup for the Central Limit Theorem, but it is different: are exchangeable but not independent (make sure you understand why). Thus our first objective is simply to understand the
asymptotic distribution of
the
-matrix element of a uniformly random Hermitian matrix with spectrum
In this lecture I will use the angled brackets notation favored by physicists to denote expectation. I will also use the angled bracket with a subscript “” for cumulants (the subscript could also stand for connected). So for example the variance of
is
Problem 13.1. Prove that Also show that for
we have
for any constant
(this translation invariance is a general property of cumulants).
Our Optimized Leading Cumulants Formula says that for any we have
where
and
is times a convergent positive series.
What is nice about the Optimized Leading Cumulants Formula is that it almost immediately suggests the possibility of Gaussian limiting behavior: because of the factor in the formula, it looks like cumulants of degree
should vanish in the
limit, which is the Gaussian signature. To actually establish this we need to determine how the rest of the formula behaves as
grows large.
This is not so difficult. We have the finite sum
and the number of terms in the sum has no dependence on . The quantity
is
as
(make sure you understand why). This leaves the fraction
The denominator is literally just a power of . As for the numerator, we have the bound
where is the spectral radius of
Thus, we have
We therefore have the cumulant bound
where depends only on
Theorem 13.1. If the input sequence of Hermitian matrices is such that
for some constant
and all
and if
exists, then the centered random variable
converges to a centered Gaussian of variance
Proof: By definition the first cumulant of is zero for every
and from translation invariance of cumulants and the bound established above we have that all cumulants of degree
converge to zero. From the Optimized Leading Cumulants formula and the estimates above, the only
term in the second cumulant is exactly
Therefore,
converges to a centered Gaussian random variable
of variance
as
and the convergence holds in distribution by the Moment Method.
-QED
Math 289A: Lecture 12
This week we will finish working through the Olshanski-Vershik limit theorem, and explain how the tools we have developed along the way have set us up to prove other kinds of limit theorems about random matrices.
Recall the setting of the OV paper: we have an infinite triangular array of real numbers,
Assume the entries are sorted so that each row is non-increasing from left to right and let
be the th row. Thinking of
as a diagonal matrix, define a corresponding random Hermitian matrix by
We know that in the characteristic function of
we can assume without loss in generality that is diagonal, so what we are really looking at is the sequence of characteristic functions
corresponding to our data set given as
or equivalently
where the diagonal matrix elements are exchangeable (but not independent) real random variables. Part I of the Olshanski-Vershik theorem is concerned with finding necessary and sufficient conditions on the triangular array
such that the characteristic function
of a single diagonal converges as
Note that extracting the
-entry on either side of
we have
showing explicitly how is built from the uniformly random unit vector
in
and the data
The characteristic function of
has the power series expansion
where are the moments of this real random variable. Olshanski-Vershik being by finding sufficient conditions on the data
such that, for each fixed
the sequence of numbers
converges to a limit
as
In other words, they begin by finding conditions on the data
such that the random variable
converges in moments. We will do the same thing, but in a different way, using the following consequence of Theorem 10.1 in Lecture 10.
Theorem 12.1 (Leading Moments Formula). For each we have
where the inner sum is absolutely convergent and is the number of
-step monotone walks on the symmetric group of degree
which begin at a permutation of cycle type
and end at a permutation of cycle type
The most efficient way to use this result is to take a logarithm, i.e. to work with the cumulants of . This is also the path followed by Olshanski-Vershik, but the method they use to do the calculation is different.
Let
be the log-characteristic function of i.e. the cumulant generating function. The cumulant sequence
contains the same information as the moment sequence
via the moment-cumulant relations
The cumulants encode the “connected” version of the combinatorial problem encoded by the moments
Theorem 12.2 (Leading Cumulants Formula). For each we have
where is the number of walks counted by
which have the additional feature that their steps and endpoints generate a transitive subgroup of the symmetric group.
The Leading Cumulants Formula is identical to the Leading Moments formula except that the walk count has been replaced with the connected walk count
. This seemingly minor change is in fact very advantageous.
Theorem 12.3 (Riemann-Hurwitz Formula). The number is nonzero if and only if there is
such that
Due to the Riemann-Hurwitz formula it is more convenient to parameterize the numbers by the parameter
i.e. to set
Theorem 12.4 (Optimized Leading Cumulants Formula). For each we have
Now we are in position to prove some interesting limit theorems. We begin by proving one such theorem corresponding to an explicit choice of the dataset (this corresponds to Remark 4.5 in Olshanski-Vershik).
Let be a positive constant, and suppose our dataset
has
th row given by
where the number of positive and negative entries is as near to equal as possible – equal if even and differing by
if
odd, where we assume the data favors positive numbers in the latter case. In this situation we can establish the following
limit theorem for a single diagonal matrix element of
Theorem 12.5 (Gaussian Limit Theorem). Any diagonal matrix element of converges in distribution to a centered Gaussian random variable
of variance
Proof: Let be a partition of
Then, we have
and therefore
Thus, from the Optimized Leading Cumulants Formula we have
showing that converges to zero for all
This means that
converges in cumulants (equivalently, in moments) to a Gaussian random variable
whose mean and variance remain to be determined.
The first cumulant of is
and this converges to as
For the second cumulant,
and
Thus, we have shown that converges in moments/cumulants to
In fact, this implies
in distribution as
thanks to the Method of Moments.
-QED
Math 289A: Lecture 10
We start with a brief recap of what we are trying to achieve – since things are getting complicated it might be helpful to step back for a moment before stepping further in.
Let be the Eucliden space of
Hermitian matrices, and let
be the compact group of
unitary matrices. We have a group action
where is the orthogonal group of the Euclidean space
, defined by
This group action is called unitary conjugation. Let
denote the orbit of a given Hermitian matrix under unitary conjugation.
Theorem 10.1 (Spectral Theorem). For every , the orbit
contains a diagonal matrix
which is unique up to permutations of its diagonal elements.
Let
We will refer to this orthant of as the Weyl chamber. This term comes from Lie theory – we are not doing Lie theory, we just need a name for
. By the Spectral Theorem, the orbits of
under unitary conjugation are indexed by points of the Weyl chamber: we set
where on the right hand side we are viewing as a diagonal matrix.
A standard problem in linear algebra is the following: we are given a matrix meaning that we are explicitly given all entries of this matrix, and from this information we are asked to find
such that
The Spectral Theorem guarantees that this problem has a unique solution. The coordinates of the solution
are the eigenvalues of the given matrix
We are interested in solving a generic version of the inverse problem: given determine the matrix elements of a uniformly random point
Since
is given, we are being asked about the random Hermitian matrix
where
is a uniformly random unitary matrix from
This is something we know how to define precisely (Haar measure).
We want to solve this inverse problem by computing the characteristic function of which is
We know that it is in fact sufficient to do this for being a diagonal Hermitian matrix, and we have introduced the notation
in order to emphasize the fact that we can view this characteristic function as defining a bounded symmetric kernel
Furthermore, is actually the characteristic function of the diagonal
of
so a solution to our problem is the same thing as determining the distribution of the diagonal of a uniformly random Hermitian matrix with prescribed eigenvalues.
We can view as a generating function for the joint moments of the diagonal matrix elements
, meaning that
where the internal sum runs over the set of weak
-part compositions of the number
which are simply vectors
of nonnegative integers whose coordinates sum to
The coefficients
corresponding to a given
is a very simple quantity, namely the multinomial coefficient
But, it is Haard (sorry, couldn’t resist) to see how to directly compute all mixed moments of the random variables Indeed, we actually have a formula for each of these random variables, namely
which is obtained from simply by matrix multiplication. This formula more or less shows that a direct computation of, say, the moment sequence of just
involves computing a family of Haar integrals over
which we don’t know how to contend with.
But is a moment generating function in another sense: we have
where the internal (double) sum is over the set of Young diagrams with first row of length at most
; such diagrams are pictorial representations of partitions of
with largest part at most
We know that such an expansion exists by a “soft” symmetry argument combined with a classical theorem of Newton: we use that
is invariant under independent permutations of the coordinates of the vectors
and
The moments involved in this expansion are the moments of the (non-normalized) empirical eigenvalue distributions of these vectors, i.e. the discrete measures
and
on
which place unit mass at each coordinate. The moments of these measures are exactly sums of powers of the coordinates of
and
This leaves the problem of computing the coefficients in this nonstandard moment generating function. This can be done using representation theory – in fact one needs to know quite a bit about representations of the unitary and symmetric groups, as well as about relations between them the relationship between them. We will mostly black box this to avoid going too deeply into algebra (though an indication of what needs to be done was given in the previous lecture, where we discussed Schur polynomials). Instead, we will focus on two facts: first, the coefficients
have an intrinsically interesting combinatorial interpretation, which is something that often happens with moment generating functions of interesting random variables (we discussed the example of Gaussians in lecture); second, this combinatorial interpretation will allow us to prove probabilistically meaningful limit theorems. The simplest such theorem is the result of Olshanski and Vershik on the
limit joint distribution of
where
is arbitrary but fixed (you should be actively reading the Olshanski-Vershik paper while we are doing this).
Theorem 10.1. For every integer and every pair of Young diagrams
, we have
where the series is absolutely convergent and is the number of monotone
-step walks on the Cayley graph of the symmetric group
which begin at a permutation of cycle type
and end at a permutation of cycle type
Clearly there is much to be unpacked here, but ultimately it is not so complicated, and moreover only a small amount of the information contained in this statement is needed to prove limit theorems about random matrices. A very basic observation is that in the range the set
is simply the set
of all Young diagrams with exactly
cells.
Now we have to explain what we mean by the Cayley graph of the symmetric group, and what we mean by a monotone walk on this graph. For this I am linking to a paper which hopefully provides a reasonably clear account of this combinatorial construction (all you need to read is Section 2.1 and 2.2, just a couple of pages). After you have read this you will know what the Cayley graph of the symmetric group is, what is meant by a strictly monotone walk on this graph. Furthermore, you will know the remarkable fact that there is a unique strictly monotone walk between any two permutations, and this walk is a geodesic.
As explained in lecture, it is often the case that the logarithm of the characteristic function is easier to work with than the characteristic function itself – it is a generating function for cumulants rather than moments. Let us consider the logarithm of the characteristic function of
Math 289A: Lecture 9
This week we are looking at a landmark paper by Grigori Olshanski and Anatoly Vershik, “Ergodic Unitarily Invariant Measures on the Space of Infinite Hermitian Matrices.” This paper is among the first to consider characteristic functions in the context of random matrix theory. Namely, it considers the problem of approximating the function
defined by
where the integration is over the unitary group of rank against Haar measure. We have referred to this function as the “quantum Fourier kernel,” but as we have stressed it is also the classical characteristic function of the diagonal
of
of a uniformly random Hermitian matrix with deterministic eigenvalues
The paper of Olshanski-Vershik is about approximating
as
which probabilistically means that we are trying to understand how the joint distribution of the diagonal matrix elements of a very large uniformly random Hermitian matrix with prescribed eigenvalues
is determined by the vector
This is an example of a high-dimensional problem: we are trying to approximate a function: our goal is to approximate a function
of real variables as the number of variables goes to infinity. Typically such problems are much harder than traditional problems of asymptotic analysis, which consider the approximation of some parameter-dependent family of functions defined on a fixed domain as the parameter approaches a given value.
One way to simplify the question is to consider the restriction of to the subspace
with
fixed. That is, we consider the characteristic function
of only the first diagonal matrix elements of the random Hermitian matrix
Recall that we have previously shown that all diagonal matrix elements of
are identically distributed, but not necessarily independent. The paper of Olshanski and Vershik shows that if the eigenvalues
, scaled by a factor of
converge to a point configuration on
as
then the random variables
converge to independent random variables whose characteristic function can be calculated explicitly in terms of this point configuration. We will try to work through this result, which I believe to be the first high-dimensional limit theorem in RMT obtained via characteristic functions.
The Olshanski-Vershik approach of computing the asymptotics of the characteristic function
is based on series expansion of this matrix integral. I am a big proponent of this approach, but it does require some quite extensive background from representation theory and algebraic combinatorics which is not traditionally found in the realm of probability. We will try not to get too bogged down in the algebra, and in order to do this it will be necessary to accept some results from representation theory as black boxes.
First, let us analytically continue the characteristic function to a function
by declaring
where is a complex number and
are complex vectors. Thus, by compactness of
, we have that
is an entire function of
complex variables
and the Maclaurin series of this function, which is a multivariate Taylor series centered at the origin, converges uniformly absolutely on compact subsets of
We recover our original characteristic function
by setting
and restricting
to
.
The analytic function has various symmetries that allow us to present its Maclaurin series in a more advantageous form. More precisely,
is invariant under independent permutations of
and
, and also stable under swapping these two sets of variables. This means that the Macluarin series of
can be presented in the form
where is a homogeneous degree
symmetric polynomial in
and also a homogeneous degree
symmetric polynomial in
. Therefore, by Newton’s theorem on symmetric polynomials, we can write this partial derivative as
where is the set of integer partitions of the degree
which we identify with the set of Young diagrams with
cells, and
and
are the Newton power-sum symmetric polynomials corresponding to
For example, if
and
then
Our objective then is to calculate the numerical coefficients of the polynomial
which we will refer to as its Newton coefficients.
Problem 9.1. Show that and use this to show that
What is good about the Newton polynomials is that they connect us to point configurations, which is what we want. More precisely, if is a real vector, then we can view it not as a point in
-dimensional space but as a configuration of particles
on the real line. The empirical distribution
of
is the discrete measure on
which places unit mass at each particle, and the degree
moment of this measure is
This means that writing the Maclaurin expansion of $K_N(z,A,B)$ in terms of Newton symmetric polynomials is exactly what we want to do, because it is expressing this function in terms of the empirical distributions of its arguments , and the empirical distribution of a point configuration is a natural way to describe the “shape” of the configuration in the infinite particle limit where
Unfortunately, there is no obvious way to calculate the Newton coefficients What we can do is compute the Maclaurin series of
by repeatedly differentiating under the integral sign with respect to
. Equivalently, we just expand the exponential function of
in the inetegrand and then integrate term-by-term, which gives
This gives us the formula
which at least reduces our initial problem of computing an exponential integral over to computing a polynomial integral over
but still there is no obvious path forward.
At this point we should ask ourselves what integrals over we actually can compute. To make the discussion as simple as possible, take
so
is just the unit circle in
One thing we can confidently state is the orthogonality relation
Representation theory gives us analogous formulas for all namely a system of multivariate polynomials which are orthogonal polynomials for integration with respect to Haar measure on
Let
be the set of all Young diagrams with at most
rows.
Defintion 9.1. For each the corresponding Schur symmetric polynomial is defined by
where are the row lengths of
Out of the many equivalent definitions of Schur polynomials, we are taking this one because it is a fairly explicit and elementary formula. Indeed, the determinant in the denominator is the Vandermonde determinant, which has the property that it divides any other antisymmetric polynomial in to yield a symmetric polynomial.
Problem 9.2. Let be the single-cell Young diagram. Compute the Schur polynomial
directly from Definition 9.1. Do the same for a few more small Young diagrams.
The disadvantage of Definition 9.1 is that it makes the orthogonality relations for Schur polynomials, which are really consequences of the fact that these polynomials are characters of irreducible representations of into a theorem whose proof would take us to far afield. So we will have to treat this property as a black box. Given a matrix
let us write
for the evaluation
of the Schur polynomial
on the eigenvalues of
Theorem 9.1 (Schur orthogonality). For any two Young diagrams and any two matrices
we have
and
where is the number of semi standard Young tableaux of shape
with entries from
We also need a remarkable “multinomial formula” due to Frobenius. Let be the set of Young diagrams with exactly
cells and at most
rows.
Theorem 9.2 (Frobenius formula). We have
where is the number of standard Young tableaux of shape
The reason the tableaux counts and
are denoted this way is that the former is the dimension of an irreducible representation of the symmetric group
and the latter is the dimension of an irreducible representation of the unitary group
At this point we have enough material to explicitly calculate a series expansion of
. This expansion is the starting point of Olshanski and Vershik’s asymptotic analysis, and is stated as Theorem 5.1 in their paper.
Theorem 9.3 (Schur expansion of ). We have
Proof: From the Theorem 9.2, we have
and applying the first integration formula in Theorem 9.1 this becomes
– QED
Problem 9.3. Compute the Schur expansion of up to
This is straightforward but will give you a feel for the structure of the formula.
Let us include the analogue of Theorem 9.3 for the quantum Bessel kernel
where and the integration is against Haar measure on the product group
Here
are real vectors viewed as diagonal matrices in
Since
is invariant under signed permutations of the coordinates of
and
, we can assume without loss in generality that these are non increasing lists of nonnegative numbers. The quantum Bessel kernel is the characteristic function of a uniformly random
rectangular matrix with singular values
Once again we can analytically continue this to
where and
are viewed as complex diagonal
matrices in the integral. The following expansion of
is not in the Olshanski-Vershik paper, but it can be obtained in essentially the same way; a more general result is proved here.
Theorem 9.4 (Schur expansion of ). We have
We close this lecture with a pointer to interesting work in the engineering literature which considers exactly these integrals in an applied context.