Math 289A: Lecture 5

Let \mathbb{H}_N be the real vector space of Hermitian matrices with the Hilbert-Schmidt scalar product

\langle X,Y \rangle =\mathrm{Tr}\, XY.

Let

f \colon \mathbb{H}_N \longrightarrow \mathbb{R}^N

be the function from Hermitian matrices to standard Euclidean space defined by

f(X) = (\text{eigenvalues of }X\text{ listed largest to smallest}).

Theorem 5.1 (Hoffman-Wielandt Inequality). For any X,Y \in \mathbb{H}_N we have

\|f(X)-f(Y)\| \leq \|X-Y\|.

Proof: By the spectral theorem, there are unitary matrices V,W \in \mathbb{U}_N and non-increasing vectors A,B \in \mathbb{R}^N such that

X=VAV^* \quad\text{and}\quad Y=WBW^*.

The square of the Hilbert-Schmidt norm of the difference X-Y is

\|X-Y\|^2 = \langle VAV^*-WBW^*,VAV^*-WBW^*\rangle = \mathrm{Tr}\, (VAV^*-WBW^*)^2,

and

\mathrm{Tr}\, (VAV^*-WBW^*)^2=\mathrm{Tr}\, A^2 -2\mathrm{Tr}\, VAV^*WBW^*+\mathrm{Tr}\, B^2.

Therefore, if we can show

\mathrm{Tr}\, AUBU^* \leq \mathrm{Tr}\, AB

holds for all U \in \mathbb{U}_N, we will have proved the claimed inequality (make sure you understand why).

Now,

\mathrm{Tr}\, AUBU^* = \sum_{i,j=1}^N a_ib_j |U_{ij}|^2,

and [|U_{ij}|^2] is a doubly stochastic matrix. Therefore, the maximum value of \mathrm{Tr}\, AUBU^* over all U \in \mathbb{U}_N is bounded by the maximum value of the function

g(M) = \sum_{i,j=1}^N a_ib_j |M_{ij}|,

as M ranges over all doubly stochastic matrices. Since g is a linear function on the Birkhoff polytope, it achieves its maximum value at an extreme point of this convex set, so at a permutation matrix. The vertex corresponding to the identity matrix gives

g(I) = \sum\limits_{i=1}^N a_i b_i = \mathrm{Tr}\, AB,

and since the coordinates of A and B are nonincreasing the identity permutation is the optimal matching.

– QED

Now let X_N be a random Hermitian matrix.

Theorem 5.2. The following are equivalent:

  1. The characteristic function of X_N is invariant under unitary conjugation;
  2. The distribution of X_N is invariant under unitary conjugation;
  3. X_N is equal in distribution to a random Hermitian matrix of the form U_NB_NU_N^*, where U_N is a uniformly random unitary matrix and B_N is a random real diagonal matrix independent of U_N.

We have already proved the equivalence of (1) and (2).

Problem 5.2. Complete the proof of Theorem 5.1.

At this point we have enough information to explain how the characteristic function of a unitarily invariant random Hermitian determines the joint distribution of its eigenvalues.

Theorem 5.2. The characteristic function of a unitarily invariant random Hermitian matrix is equal to the quantum characteristic function of its eigenvalues.

Obviously, this statement needs to be explained.

Let B_N be a real random vector whose distribution \sigma_N in \mathbb{R}^N is arbitrary, and let U_N be an independent random unitary matrix whose distribution \eta_N in \mathbb{U}_N is uniform (Haar) measure. Then X_N=U_NB_NU_NB^* is a unitarily invariant random Hermitian matrix, and by Theorem 5.2 every unitarily invariant random Hermitian matrix is equal in law to a matrix constructed in this way.

Because \mathbb{H}_N is non-compact, there is no such thing as a uniformly random Hermitian matrix. The matrix X_N = U_NB_NU_N^* is as close as we can get: it is uniform conditional on the distribution of its eigenvalues B_N. If \mu_N is the distribution of X_N in \mathbb{H}_N, a random sample from \mu_N is obtained by sampling a vector from the distribution \sigma_N on \mathbb{R}^N, and then choosing a uniformly random point on the \mathbb{U}_N-orbit in \mathbb{H}_N which contains this vector.

In terms of characteristic functions, this says that

\varphi_{X_N}(A) = \mathbf{E}_{\mu_N}\left[e^{i \mathrm{Tr} AU_NB_NU_N^*}\right] = \mathbf{E}_{\sigma_N} \left[K_N(A,B_N)\right],

where

K_N(A,B) = \mathbf{E}_{\eta_N}\left[e^{i\mathrm{Tr}\, AUBU^*}\right] = \int_{\mathbb{U}_N} e^{i \mathrm{Tr}\, AUBU^*} \eta_N(\mathrm{d}U)

is the characteristic function of a uniformly random Hermitian matrix from the \mathbb{U}_N orbit containing B \in \mathbb{R}^N viewed as a diagonal matrix. Thus, K_N(A,B) is the characteristic function of a uniformly random Hermitian matrix with prescribed eigenvalues. This defines a bounded symmetric kernel

K_N \colon \mathbb{R}^N \times \mathbb{R}^N \longrightarrow \mathbb{C}

which we call the quantum Fourier kernel, because it is a superposition of standard N-dimensional Fourier kernels

K_N(A,B) = \int_{\mathbb{U}_N}e^{i \sum_{x,y=1}^N a_xb_y |U_{xy}|^2} \eta_N(\mathrm{d}U).

Problem 5.3. Prove that indeed K_N(A,B)=K_N(B,A) and |K_N(A,B)|\leq 1. Moreover, show that K_N(A,B) is invariant under independent permutations of the coordinates of A,B \in \mathbb{R}^N.

If K_N(A,B) is the quantum Fourier kernel, then

\psi_{B_N}(A) = \mathbf{E}[K_N(A,B_N)] = \int_{\mathbb{R}^N} K_N(A,B) \sigma_N(\mathrm{d}B)

deserves to be called the quantum characteristic function of random vector B_N, or the quantum Fourier transform of the probability measure \sigma_N. What we have shown in this lecture is that for any unitarily invariant random Hermitian matrix X_N, we have

\varphi_{X_N}(A) = \psi_{B_N}(A), \quad A \in \mathbb{R}^N,

where B_N is any N-dimensional random vector whose components are eigenvalues of X_N (this is what Theorem 5.2 means). We can also state the above as

\varphi_{D_N}(A) = \psi_{B_N}(A), \quad A \in \mathbb{R}^N,

where D_N is the diagonal of the random matrix X_N.

The conclusion to be drawn from all of this is that figuring out how to analyze the eigenvalues of a unitarily invariant random Hermitian matrix X_N given its characteristic function \varphi_{X_N} is exactly the same thing as figuring out how to analyze an arbitrary random real vector B_N given its quantum characteristic function \psi_{B_N}.

Before telling you how we will do this, let me tell you how we won’t. The main reason the Hoffman-Weilandt theorem was presented in this lecture is that the proof makes you think about maximizing the function

g(U) = \sum_{x,y=1}^N a_xb_y|U_{xy}|^2

over unitary matrices U. The action g(U) is exactly what appears in the exponent of the quantum Fourier kernel:

K_N(A,B) = \int_{\mathbb{U}_N} e^{i g(U)} \mathrm{d}U.

(I am going to stop writing the Haar measure \eta_N explicitly in Haar integrals, so just \mathrm{d}U rather than \eta_N(\mathrm{d}U)). Now, the method of stationary phase predicts that K_N(A,B) should be approximated by contributions from the local maxima of the action, which means that this integral should localize to a sum over permutations. This turns out to be exactly correct, not just a good approximation, and the reasons for this are rooted in symplectic geometry. The end result is the following determinantal formula for the quantum Fourier kernel.

Theorem 5.4 (HCIZ Formula). For any A,B \in \mathbb{R}^N with no repeated coordinates we have

K_N(A,B) = \frac{1!2! \dots (N-1)!}{i^{N \choose 2}}\frac{\det [ e^{ia_xb_y}]}{\det[a_x^{N-y}]\det[b_x^{N-y}]}.

This is a very interesting formula which makes manifest all the symmetries of the quantum Fourier kernel claimed in Problem 5.3 (but, you should solve that problem without using the determinantal formula). It is also psychologically satisfying in that it shows exactly how the quantum Fourier kernel is built out of classical one-dimensional Fourier kernels. However, it is not really good for much else. We will explain this further in the coming lectures. But first we will adapt all of the above to random rectangular matrices.

Math 289A: Lecture 4

We have defined the characteristic function of a random Hermitian matrix X according to the usual recipe for random variables taking values in a Euclidean space: it is the expectation

\varphi_{X}(T) = \mathbf{E}\left[e^{i\langle T,X \rangle}\right],

where the argument T ranges over the real vector space

\mathbb{H}_N = \{X \in \mathbb{C}^{N \times N} \colon X^*=X\},

which carries the Hilbert-Schmidt scalar product

\langle T,X \rangle = \mathrm{Tr}\, TX.

Equivalently,

\varphi_{X}(T) = \int\limits_{\mathbb{H}_N} e^{i \mathrm{Tr}\, TX}\mu_X(\mathrm{d}X),

where \mu_X is the distribution of X in \mathbb{H}. With this definition in hand, we can consider special classes of random Hermitian matrices carved out by clauses of the form

“A random Hermitian matrix X is said to be (descriptor) if its characteristic function satisfies (property).”

A natural construction of this form arises from the Spectral Theorem. The set

\mathbb{U}_N=\{U \in \mathbb{C}^{N \times N} \colon UU^*=UU^*=I\}

of unitary matrices is a group under matrix multiplication, called the unitary group of rank N. The group \mathbb{U}_N acts on the Euclidean space \mathbb{H}_N by conjugation, which is a group homomorphism

\Phi \colon \mathbb{U}_N \longrightarrow \mathrm{Aut}(\mathbb{H}_N,\langle \cdot,\cdot\rangle)

from \mathbb{U} into the orthogonal group of the Euclidean space \mathbb{H} defined by

\Phi(U)T = UTU^*, \quad U \in \mathbb{U}_N,\ T \in \mathbb{H}_N.

Problem 4.1. Prove that \Phi really is a group homomorphism from \mathbb{U}_N into orthogonal transformations of Euclidean space \mathbb{H}_N.

One formulation of the Spectral Theorem says: the conjugation orbit

\mathcal{O}_X = \{UXU^* \colon U \in \mathbb{U}_N\}

of any Hermitian matrix X \in \mathbb{H}_N contains a diagonal matrix A, which is unique up to simultaneous permutations of its rows and columns.

Definition 4.1. A random Hermitian matrix X is said to be unitarily invariant if its characteristic function \varphi_{X} satisfies

\varphi_{X}(UTU^*) = \varphi_{X}(T)

for all T \in \mathbb{H}_N and all U \in\mathbb{U}_N. That is, X is unitarily invariant if its characteristic function \varphi_{X} is constant on each \mathbb{U}_N-orbit of \mathbb{H}_N.

Unitary invariance is a basic symmetry with many ramifications. First of all, it leads to a massive reduction in complexity by allowing us to view the characteristic function of X as a function of A \in \mathbb{R}^N rather than a function of T \in \mathbb{H}_N. Indeed, for any arbitrary Hermitian T, the Spectral Theorem says that UTU^*=A for some unitary matrix U \in \mathbb{U}_N and some diagonal matrix A \in \mathbb{H}_N. Unitary invariance of the characteristic function then gives

\varphi_{X}(T) = \varphi_{X}(UTU^*) = \varphi_{X}(A).

The diagonal Hermitian matrix

A = \begin{bmatrix} a_1 & {} & {} \\ {} & \ddots & {} \\ {} & {} & a_N\end{bmatrix}

can be viewed as a vector A =(a_1,\dots,a_N) \in \mathbb{R}^N, so that the characteristic function of a unitarily invariant random Hermitian matrix X can be viewed as a continuous function

\varphi_{X} \colon \mathbb{R}^N \longrightarrow \mathbf{D}

on N-dimensional Euclidean space taking values in the closed unit disc \mathbf{D} \subset \mathbb{C}.

Problem 4.2. Prove that the characteristic function of a unitarily invariant random Hermitian matrix X is a symmetric function on \mathbb{R}^N.

Now we discuss the probabilistic ramifications of unitary invariance, i.e. what Definition 4.2 implies about the distribution of X.

Theorem 4.1. The distribution of a unitarily invariant random Hermitian matrix X is completely determined by the joint distribution of its diagonal matrix elements X_{11},\dots,X_{NN}, which are exchangeable real random variables.

Proof: For A=\mathrm{diag}(a_1,\dots,a_N) real diagonal, we have

\mathrm{Tr}\, AX = a_1X_{11} + \dots + a_NX_{NN}.

Consequently, we have

\varphi_{X}(A) = \mathbb{E}[e^{i\mathrm{Tr}\, AX}] = \mathbb{E}[e^{i(a_1X_{11} + \dots + a_NX_{NN})}],

which is exactly the characteristic function of the real random vector

D = (X_{11},\dots,X_{NN}).

Thus, the characteristic function \varphi_X(A)=\varphi_D(A). The exchangeability of X_{11},\dots,X_{11} follows from Problem 4.1.

-QED

It is important to note that exchangeable random variables are identically distributed, but not necessarily independent.

Problem 4.3. Prove that the diagonal matrix elements of a unitarily invariant random Hermitian matrix X are independent if and only if the distribution \mu_{X} is a Gaussian measure on \mathbb{H}_N.

Now we characterize unitary invariance as a feature of the distribution of a random Hermitian matrix.

Theorem 4.2. A random Hermitian matrix X is unitarily invariant if and only if X and UX_NU^* have the same distribution for every U \in \mathbb{U}_N.

Proof: Note that for any random Hermitian matrix X and any deterministic unitary matrix U \in \mathbb{U}_N we have

\varphi_{UXU^*}(T) =\varphi_{X}(U^*TU),\quad T \in \mathbb{H}_N,

by cyclic invariance of the trace. If X is unitarily invariant, this becomes

\varphi_{UXU^*}(T) =\varphi_{X}(T), \quad T \in \mathbb{H}_N,

which is equivalent to equidistribution of X and UX_NU^*. Conversely, if UXU^* and X have the same distribution then their characteristic functions are equal.

-QED

Theorem 4.2 is quite important for the following reason. Define a random linear operator acting on \mathbb{C}^N by

L_Xv = Xv, \quad v \in \mathbb{C}^N.

The matrix of L_{X} in the standard basis e_1,\dots,e_N of \mathbb{C}^N is X. The matrix of L_{X} in some other orthonormal basis f_1,\dots,f_N is the UXU^*, where U is the matrix of the linear transformation defined by e_i \mapsto f_i. Unitary invariance says that UXU^* has the same distribution as X – the random matrix of the random operator L_{X} has the same law for any choice of orthonormal basis in the Hilbert space \mathbb{C}^N. In this sense, unitarily invariant random Hermitian matrices are precisely those random matrices which correspond to canonically defined random Hermitian operators on finite-dimensional Hilbert space.

Math 289A: Lecture 1

Let \mathbb{L} be a Euclidean space and \mathfrak{B} its Borel sigma algebra.

Definition 1.1. A random variable taking values in \mathbb{L} is a measurable function X from a probability space (\Omega,\mathfrak{F},\mathbf{P}) into \mathbb{L}.

Every random variable leads a double life: it is both a function into \mathbb{L} and a measure on \mathbb{L}.

Definition 1.2. The distribution of X is the probability measure on Borel sets \mathsf{S} \subseteq \mathbb{L} defined by

\mu_X(\mathsf{S}) = \mathbf{P}\left(\{\omega \in \Omega \colon X(\omega) \in \mathsf{S}\}\right).

Strictly speaking, it is not correct to conflate X with its distribution \mu_X, since one could have a second random variable Y which is distinct from X but has the same distribution – when this happens we say that X and Y are “equal in distribution”, or “equal in law.” I heard the “double life” descriptor in a probability course taught by David Steinsaltz, and I think it is helpful provided one keeps the above caveat in mind. In fact I will go one step further and say that X leads a triple life – it is also a complex-valued function on \mathbb{L}.

Definition 1.2. The characteristic function of X is defined by the expectation

\varphi_{X}(T) = \mathbf{E}\left[ e^{i \langle T,X\rangle} \right] = \int\limits_{\mathbb{L}} e^{i \langle T,X \rangle} \mu_{X}(\mathrm{d}X), \quad T \in \mathbb{L}.

That is, the characteristic function of a random variable is the Fourier transform of its distribution.

Problem 1.1. Prove that the characteristic function of a random variable is an absolutely continuous function from \mathbb{L} into the closed unit disc in \mathbb{C}.

The characteristic function of X faithfully encodes its distribution: if Y is another random variable, then X and Y are equal in law if and only if they have the same characteristic function. This is a famous result in probability theory – a proof can be found in virtually any textbook on the subject. What makes characteristic functions so useful is that they transform probability into analysis, specifically harmonic analysis, which allows us to apply analytic methods to probabilistic questions. In particular, the famous Levy Continuity Theorem characterizes convergence in distribution for sequences of random variables in terms of the corresponding sequence of characteristic functions. This provides an efficient approach to the classical limit theorems of probability theory, the Law of Large Numbers and the Central Limit Theorem. Again, you will find an exposition of this approach in any standard probability text.

Random Matrix Theory is one of the most active areas of research in contemporary probability theory. It is a beautiful blend of algebra and probability which enjoys striking connections with many other fields, including combinatorics, number theory, physics, statistics, and something called “data science.” Strangely, RMT makes no use of characteristic functions – I am not aware of a single textbook on random matrices which even defines the characteristic function of a random matrix, despite the fact that this is simply a special case of the above construction.

To be concrete, let us take \mathbb{H} to be the real vector space of N \times N Hermitian matrices equipped with the scalar product

\langle X,Y \rangle = \mathrm{Tr}\, XY,

where \mathrm{Tr} denotes the standard matrix trace.

Problem 1.2. Prove that the above really does define a scalar product on the space of Hermitian matrices, and write down an orthonormal basis.

Everything we have said so far about random variables taking values in a general Euclidean space \mathbb{L} is perfectly valid in the special case where \mathbb{L}=\mathbb{H}. Thus, there is no definitional obstruction to analyzing a random variable X taking values in \mathbb{H} via its characteristic function \varphi_X. But there is a conceptual issue: random matrices are random variables which lead not just a triple life, but a quadruple life. More precisely, a random Hermitian matrix X has associated to it not just a measure \mu_X and a function \varphi_X, but also an operator L_X defined by matrix multiplication

L_X(v) = Xv, \quad v \in \mathbb{C}^N.

The random operator L_X is a geometric object, a random linear transformation of \mathbb{C}^N, and as such has geometric invariants such as eigenvalues and eigenvectors.

The statistical behavior of the geometric invariants of L_X is encoded in the characteristic function \varphi_X of its matrix X. However, the characteristic function “sees” X as a random vector in the N^2-dimensional Euclidean space \mathbb{H}, not as a random operator on the N-dimensional Hilbert space \mathbb{C}^N, and the question of how to extract geometric information from \varphi_X does not have an obvious or easy answer. Because of this, researchers in random matrix theory long ago abandoned characteristic functions. The subject has developed its own specific tools which have been very successful but have also distanced it from other parts of probability theory.

In recent years the question of whether a Fourier-analytic approach to random matrices might be possible after all has begun to be reconsidered. I do not claim to have a satisfactory answer to this question, but we will explore it in this topics course and perhaps see the beginnings of an answer. To get the most out of this endeavor, it would be best to absorb the standard approach to random matrix theory at the same time. The course notes by Todd Kemp and Terence Tao are both excellent resources which are freely available online.