Math 289A: Lecture 7

In this lecture we will review what we have done so far from a slightly different perspective. Starting with the Fourier kernel

F \colon \mathbb{R} \times \mathbb{R} \longrightarrow \mathbb{C},

which is defined by F(a,b) = e^{iab}, we can define the characteristic function of a random variable X with distribution \mu by

\varphi_X(a) = \mathbf{E}F(a,X) = \int\limits_{\mathbb{R}} F(a,x)\mu(\mathrm{d}x).

This is an absolutely continuous function from \mathbb{R} into the closed unit disc in \mathbb{C} which uniquely determines the measure \mu. To get some feel for which this might be, expand the Fourier kernel in a power series,

F(a,b) = 1 + \sum_{d=1}^\infty \frac{i^d}{d!} (ab)^d.

Then, assuming we can interchange expectation with the summation we get

\mathbf{E}F(a,X) = 1 + \sum_{d=1}^\infty \frac{i^d}{d!} a^d\mathbf{E}[X^d],

where

\mathbf{E}[X^d] = \int\limits_{\mathbb{R}} x^d \mu(\mathrm{d}x),

is the degree d moment of X, or equivalently of \mu. So at least heuristically, the characteristic function of X is a generating function for the moments of its distribution. One situation in which this formal computation is quantitatively correct is when \mu has compact support, so if the characteristic function does indeed characterize the distribution of X then it must be the case that \mu is characterized by its moments. It is not difficult to see why the why the moments of a probability measure with compact support determine it completely: if we know how to compute the expectation of polynomial functions of X, then thanks to the Weierstrass approximation theorem we also know how to compute the expectation of continuous functions of X, and this in turn tells us how \mu measures intervals, by taking expectations of smooth bump functions. Of course, to obtain the fully general result that the characteristic function of X completely determines its distribution a different argument is needed. For example, the Cauchy distribution has no moments whatsoever, but its characteristic function still exists and determines it uniquely.

The N-dimensional Fourier kernel

F_N \colon \mathbb{R^N} \times \mathbb{R^N} \longrightarrow \mathbb{C}

is defined by

F_N(A,B) = e^{i \langle A,B\rangle} = e^{i(a_1b_1+\dots+a_Nb_N)}, \quad A,B \in \mathbb{R}^N

and the characteristic function of an N-dimensional random vector X_N is the continuous function \mathbb{R}^N \to \mathbb{C} defined by

\varphi_{X_N}(A) = \mathbf{E}F_N(A,X_N) = \int\limits_{\mathbb{R}^N} F_N(A,X) \mu_N(\mathrm{d}X),

where \mu_N is the distribution of X_N in \mathbb{R}^N. It is still true that \varphi_{X_N} exists and uniquely determines \mu_N, and we can tell a similar story about moments which supports this statement. To do so we first have to expand the N-dimensional Fourier kernel in a power series. This is easy but let us first make some general observations about what this expansion must look like before doing the computation. Since F_N(A,B) = F_N(B,A) and

F_N(PA,PB) = e^{i\langle PA,PB\rangle} = e^{i \langle A,B \rangle} = F_N(A,B),

for any N \times N permutation matrix P, we have

F_N(A,B) =1 + \sum_{d=1}^\infty \frac{d^d}{d!} F_N^d(A,B)

where F_N^d(A,B) is a homogeneous polynomial of degree d in a_1b_1,\dots,a_Nb_N which is invariant under independent permutations of these variables. Therefore, we have

F_N^d(A,B) = \sum\limits_{\alpha \in \mathsf{C}_N^d} (a_1b_1)^{\alpha_1}\dots (a_Nb_N)^{\alpha_N}F_N(\alpha)

where \mathsf{C}_N^d is the set of weak N-part compositions \alpha of d \in \mathbb{N} and F_N(\alpha) is a numerical coefficient. It is easy to determine this coefficient explicitly: since

F_N(A,B) = e^{ia_1b_1} \dots e^{ia_Nb_N} = \left(\sum\limits_{d=0}^\infty \frac{i^d}{d!}(a_1b_1)^d \right)\dots \left(\sum\limits_{d=0}^\infty \frac{d^d}{d!}(a_Nb_N)^d \right),

we have

F_N(\alpha) = \frac{d!}{\alpha_1! \dots \alpha_N!}={d \choose \alpha_1,\dots,\alpha_N},

the multinomial coefficient. Thus, the expected value of F_N^d(A,X_N) is a generating polynomial for degree d mixed moments of the entries of the random vector X_N=(X_N(1),\dots,X_N(N)),

\mathbf{E}F_N^d(A,X_N) = \sum\limits_{\alpha \in \mathsf{C}_N^d} a_1^{\alpha_1} \dots a_N^{\alpha_N} \mathbf{E}[X^{\alpha_1}_N(1)\dots X^{\alpha_N}_N(N)]F_N(\alpha).

In the previous lectures we defined a strange new kernel

K_N \colon \mathbb{R}^N \times \mathbb{R}^N \longrightarrow \mathbb{C}

by

K_N(A,B) = \int\limits_{\mathbb{U}_N} F_N(A,|U|^2B)\mathrm{d}U,

where the integration is against Haar probability measure on the unitary group \mathbb{U}_N, and for any U \in \mathbb{U}_N the matrix

|U|^2=\begin{bmatrix} {} & \vdots & {} \\ \dots & |U_{xy}|^2 & \dots \\ {} & \vdots & {} \end{bmatrix}

is obtained from U=[U_{xy}] by taking the squared modulus of each entry. Thus for any B \in \mathbb{R}^N the vector

|U|^2B=\begin{bmatrix} |U_{11}|^2b_1 + \dots + |U_{1N}|^2b_N \\ |U_{21}|^2b_1 + \dots + |U_{2N}|^2b_N \\ \vdots \\ |U_{N1}|^2b_1 + \dots + |U_{NN}|^2b_N\end{bmatrix} \in \mathbb{R}^N

has coordinates which are superpositions of the coordinates of B, and because of this we named K_N the N-dimensional “quantum Fourier kernel.”

Problem 7.1. Prove that K_N is a bounded symmetric kernel on $late \mathbb{R}^N$, i.e. |K_N(A,B)|\leq 1 and K_N(A,B)=K_N(B,A) for all A,B \in \mathbb{R}^N.

Then, given a random vector B_N with distribution \sigma_N in \mathbb{R}^N, we defined its quantum characteristic function by

\psi_{B_N}(A)=\mathbf{E}K_N(A,B_N) = \int\limits_{\mathbb{R}^N} K_N(A,B)\sigma(\mathrm{d}B).

This is a continuous function from \mathbb{R}^N into the closed unit disc in \mathbb{C}, and we claimed that \psi_{B_N} uniquely determines the distribution \sigma_N of B_N. Since the quantum characteristic function of B_N is clearly distinct from its classical characteristic function

\varphi_{B_N}(A) = \mathbf{E}F_N(A,B_N)=\int\limits_{\mathbb{R}^N} F_N(A,B)\sigma(\mathrm{d}B),

this must be true for some different reason. To understand what this reason might be, at least heuristically, we could proceed along the same lines we did with F_N and try to determine if \varphi_{B_N}(A) is some sort of generating function encoding natural statistics of B_N.

Let’s give this a try. The first step is to expand the quantum Fourier kernel K_N as a power series. Before doing the computation, we can look for some easy symmetries of K_N which will tell us in advance what a series expansion of K_N ought to look like. In fact, the quantum Fourier kernel is much more symmetric than the classical Fourier kernel – it is invariant under independent (as opposed to simultaneous) permutations of the coordinates of its arguments.

Problem 7.2. Prove that K_N(PA,QB)=K_N(A,B) for any A,B \in \mathbb{R}^N and any N\times N permutation matrices P,Q.

From Problems 7.1 and 7.2, we can indeed infer the general form of a series expansion of the kernel K_N. Namely, we have

K_N(A,B) = 1 + \sum\limits_{d=1}^\infty \frac{i^d}{d!} K_N^d(A,B),

where K_N^d(A,B) is a homogeneously degree d polynomial function of the coordinates a_1,\dots,a_N,b_1,\dots,b_N which is invariant under independent permutations of a_1,\dots,a_N and b_1,\dots,b_N and stable under swapping these two sets of variables. What does such a polynomial look like?

A classical theorem of Newton asserts that a symmetric homogeneous degree d polynomial in N variables is a polynomial function of the power-sum polynomials

p_1(x_1,\dots,x_N) = x_1+\dots+x_N \\ p_2(x_1,\dots,x_N)=x_1^2+\dots+x_N^2\\ \vdots \\ p_N(x_1,\dots,x_N)=x_1^N+\dots+x_N^N.

Equivalently, the vector space \Lambda_N^d of symmetric homogeneous degree d polynomials in N variables has as a linear basis the polynomials

p_\alpha(x_1,\dots,x_N)=\prod\limits_{k=1}^{\ell(\alpha)} p_{\alpha_1k}(x_1,\dots,x_N),

where \alpha=(\alpha_1,\dots,\alpha_r) ranges over the set \mathsf{P}_N^d of partitions of d with largest part at most N, and \ell(\alpha) denoting the number of parts.

Problem 7.3. Prove that for any \alpha \in \mathsf{P}_N^d and any A \in \mathbb{R}^N, the evolution p_\alpha(A)=p_\alpha(a_1,\dots,a_N) satisfies |p_\alpha(A)| \leq \|A\|_\infty N^{\ell(\alpha)}, and that this bound is sharp.

Thus, a power series expansion of the quantum Fourier kernel can be written in the form

K_N(A,B) = 1 + \sum\limits_{d=1}^\infty \frac{i^d}{d!}\sum\limits_{\alpha,\beta \in \mathsf{P}_N^d} \frac{p_\alpha(a_1,\dots,a_N)}{N^{\ell(\alpha)}}\frac{p_\beta(b_1,\dots,b_N)}{N^{\ell(\beta)}}K_N(\alpha,\beta),

where the coefficients satisfy K_N(\alpha,\beta)=K_N(\beta,\alpha). This complicated-looking expression is actually quite meaningful. For any vector A \in \mathbb{R}^N, the normalized power sum polynomial of degree d in the coordinates of A is

\frac{p_k(a_1,\dots,a_N)}{N} = \int\limits_{\mathbb{R}} x^k \eta_A(\mathrm{d}x),

where

\eta_A = \frac{1}{N} \sum\limits_{i=1}^N \delta_{a_i}

is the discrete probability measure on \mathbb{R} which places equal mass at each coordinate of A. This is called the empirical distribution of A \in \mathbb{R}^N. Thus, the expansion of the quantum Fourier kernel K_N(A,B) is a generating function for the moments of the empirical distributions of its arguments. Consequently, for B_N a random vector, the quantum characteristic function \psi_{B_N}(A) is a generating function for expectations of polynomials in the random variables

p_1(B_N),p_2(B_N),p_3(B_N),\dots,

which are the moments of the (random) empirical distribution of the random vector B_N.

Math 289A: Lecture 5

Let \mathbb{H}_N be the real vector space of Hermitian matrices with the Hilbert-Schmidt scalar product

\langle X,Y \rangle =\mathrm{Tr}\, XY.

Let

f \colon \mathbb{H}_N \longrightarrow \mathbb{R}^N

be the function from Hermitian matrices to standard Euclidean space defined by

f(X) = (\text{eigenvalues of }X\text{ listed largest to smallest}).

Theorem 5.1 (Hoffman-Wielandt Inequality). For any X,Y \in \mathbb{H}_N we have

\|f(X)-f(Y)\| \leq \|X-Y\|.

Proof: By the spectral theorem, there are unitary matrices V,W \in \mathbb{U}_N and non-increasing vectors A,B \in \mathbb{R}^N such that

X=VAV^* \quad\text{and}\quad Y=WBW^*.

The square of the Hilbert-Schmidt norm of the difference X-Y is

\|X-Y\|^2 = \langle VAV^*-WBW^*,VAV^*-WBW^*\rangle = \mathrm{Tr}\, (VAV^*-WBW^*)^2,

and

\mathrm{Tr}\, (VAV^*-WBW^*)^2=\mathrm{Tr}\, A^2 -2\mathrm{Tr}\, VAV^*WBW^*+\mathrm{Tr}\, B^2.

Therefore, if we can show

\mathrm{Tr}\, AUBU^* \leq \mathrm{Tr}\, AB

holds for all U \in \mathbb{U}_N, we will have proved the claimed inequality (make sure you understand why).

Now,

\mathrm{Tr}\, AUBU^* = \sum_{i,j=1}^N a_ib_j |U_{ij}|^2,

and [|U_{ij}|^2] is a doubly stochastic matrix. Therefore, the maximum value of \mathrm{Tr}\, AUBU^* over all U \in \mathbb{U}_N is bounded by the maximum value of the function

g(M) = \sum_{i,j=1}^N a_ib_j |M_{ij}|,

as M ranges over all doubly stochastic matrices. Since g is a linear function on the Birkhoff polytope, it achieves its maximum value at an extreme point of this convex set, so at a permutation matrix. The vertex corresponding to the identity matrix gives

g(I) = \sum\limits_{i=1}^N a_i b_i = \mathrm{Tr}\, AB,

and since the coordinates of A and B are nonincreasing the identity permutation is the optimal matching.

– QED

Now let X_N be a random Hermitian matrix.

Theorem 5.2. The following are equivalent:

  1. The characteristic function of X_N is invariant under unitary conjugation;
  2. The distribution of X_N is invariant under unitary conjugation;
  3. X_N is equal in distribution to a random Hermitian matrix of the form U_NB_NU_N^*, where U_N is a uniformly random unitary matrix and B_N is a random real diagonal matrix independent of U_N.

We have already proved the equivalence of (1) and (2).

Problem 5.2. Complete the proof of Theorem 5.1.

At this point we have enough information to explain how the characteristic function of a unitarily invariant random Hermitian determines the joint distribution of its eigenvalues.

Theorem 5.2. The characteristic function of a unitarily invariant random Hermitian matrix is equal to the quantum characteristic function of its eigenvalues.

Obviously, this statement needs to be explained.

Let B_N be a real random vector whose distribution \sigma_N in \mathbb{R}^N is arbitrary, and let U_N be an independent random unitary matrix whose distribution \eta_N in \mathbb{U}_N is uniform (Haar) measure. Then X_N=U_NB_NU_NB^* is a unitarily invariant random Hermitian matrix, and by Theorem 5.2 every unitarily invariant random Hermitian matrix is equal in law to a matrix constructed in this way.

Because \mathbb{H}_N is non-compact, there is no such thing as a uniformly random Hermitian matrix. The matrix X_N = U_NB_NU_N^* is as close as we can get: it is uniform conditional on the distribution of its eigenvalues B_N. If \mu_N is the distribution of X_N in \mathbb{H}_N, a random sample from \mu_N is obtained by sampling a vector from the distribution \sigma_N on \mathbb{R}^N, and then choosing a uniformly random point on the \mathbb{U}_N-orbit in \mathbb{H}_N which contains this vector.

In terms of characteristic functions, this says that

\varphi_{X_N}(A) = \mathbf{E}_{\mu_N}\left[e^{i \mathrm{Tr} AU_NB_NU_N^*}\right] = \mathbf{E}_{\sigma_N} \left[K_N(A,B_N)\right],

where

K_N(A,B) = \mathbf{E}_{\eta_N}\left[e^{i\mathrm{Tr}\, AUBU^*}\right] = \int_{\mathbb{U}_N} e^{i \mathrm{Tr}\, AUBU^*} \eta_N(\mathrm{d}U)

is the characteristic function of a uniformly random Hermitian matrix from the \mathbb{U}_N orbit containing B \in \mathbb{R}^N viewed as a diagonal matrix. Thus, K_N(A,B) is the characteristic function of a uniformly random Hermitian matrix with prescribed eigenvalues. This defines a bounded symmetric kernel

K_N \colon \mathbb{R}^N \times \mathbb{R}^N \longrightarrow \mathbb{C}

which we call the quantum Fourier kernel, because it is a superposition of standard N-dimensional Fourier kernels

K_N(A,B) = \int_{\mathbb{U}_N}e^{i \sum_{x,y=1}^N a_xb_y |U_{xy}|^2} \eta_N(\mathrm{d}U).

Problem 5.3. Prove that indeed K_N(A,B)=K_N(B,A) and |K_N(A,B)|\leq 1. Moreover, show that K_N(A,B) is invariant under independent permutations of the coordinates of A,B \in \mathbb{R}^N.

If K_N(A,B) is the quantum Fourier kernel, then

\psi_{B_N}(A) = \mathbf{E}[K_N(A,B_N)] = \int_{\mathbb{R}^N} K_N(A,B) \sigma_N(\mathrm{d}B)

deserves to be called the quantum characteristic function of random vector B_N, or the quantum Fourier transform of the probability measure \sigma_N. What we have shown in this lecture is that for any unitarily invariant random Hermitian matrix X_N, we have

\varphi_{X_N}(A) = \psi_{B_N}(A), \quad A \in \mathbb{R}^N,

where B_N is any N-dimensional random vector whose components are eigenvalues of X_N (this is what Theorem 5.2 means). We can also state the above as

\varphi_{D_N}(A) = \psi_{B_N}(A), \quad A \in \mathbb{R}^N,

where D_N is the diagonal of the random matrix X_N.

The conclusion to be drawn from all of this is that figuring out how to analyze the eigenvalues of a unitarily invariant random Hermitian matrix X_N given its characteristic function \varphi_{X_N} is exactly the same thing as figuring out how to analyze an arbitrary random real vector B_N given its quantum characteristic function \psi_{B_N}.

Before telling you how we will do this, let me tell you how we won’t. The main reason the Hoffman-Weilandt theorem was presented in this lecture is that the proof makes you think about maximizing the function

g(U) = \sum_{x,y=1}^N a_xb_y|U_{xy}|^2

over unitary matrices U. The action g(U) is exactly what appears in the exponent of the quantum Fourier kernel:

K_N(A,B) = \int_{\mathbb{U}_N} e^{i g(U)} \mathrm{d}U.

(I am going to stop writing the Haar measure \eta_N explicitly in Haar integrals, so just \mathrm{d}U rather than \eta_N(\mathrm{d}U)). Now, the method of stationary phase predicts that K_N(A,B) should be approximated by contributions from the local maxima of the action, which means that this integral should localize to a sum over permutations. This turns out to be exactly correct, not just a good approximation, and the reasons for this are rooted in symplectic geometry. The end result is the following determinantal formula for the quantum Fourier kernel.

Theorem 5.4 (HCIZ Formula). For any A,B \in \mathbb{R}^N with no repeated coordinates we have

K_N(A,B) = \frac{1!2! \dots (N-1)!}{i^{N \choose 2}}\frac{\det [ e^{ia_xb_y}]}{\det[a_x^{N-y}]\det[b_x^{N-y}]}.

This is a very interesting formula which makes manifest all the symmetries of the quantum Fourier kernel claimed in Problem 5.3 (but, you should solve that problem without using the determinantal formula). It is also psychologically satisfying in that it shows exactly how the quantum Fourier kernel is built out of classical one-dimensional Fourier kernels. However, it is not really good for much else. We will explain this further in the coming lectures. But first we will adapt all of the above to random rectangular matrices.

Math 202B: Lecture 15

Let V be a Hilbert space of dimension 1 < V < \infty and \mathcal{L}(V) be the algebra of all linear operators A \colon V \to V. As we saw in Lecture 14, the trace on \mathcal{L}(V) is, up to scaling, the unique linear functional \tau \colon \mathcal{L}(V) \to \mathbb{C} such that \tau(AB)=\tau(BA). Thus, the only linear functional satisfying \tau(I)=1 and \tau(AB)=\tau(BA) is

\tau(A) = \frac{1}{\dim V}\mathrm{Tr}(A), \quad A \in \mathcal{L}(V).

This fact allows us to prove a fact claimed early on in the course.

Problem 15.1. Prove that there does not exist an algebra homomorphism \chi \colon \mathcal{L}(V) \to \mathbb{C}.

The fact that \mathcal{L}(V) admits not even a single homomorphism to \mathbb{C} indicates that this is an algebra with the highest degree of noncommutativity. According to our definitions in Week One, the degree of noncommutativity of an algebra is formalized as the dimension of its center. We now prove that the center of \mathcal{L}(V) is indeed one-dimensional. The key notion is the following.

Definition 15.1. Given a subalgebra \mathcal{A} of \mathcal{L}(V), a subspace W of V is said to be \mathcal{A}invariant if for every A \in \mathcal{A} and every w \in W we have Aw \in W.

Every subalgebra \mathcal{A} of \mathcal{L}(V) has at least two invariant subspaces, namely the zero space \{0_V\} and the total space V. An important theorem of Burnside characterizes subalgebras of \mathcal{L}(V) for which this minimum is achieved.

Theorem 15.1. (Burnside) A subalgebra \mathcal{A} of \mathcal{L}(V) has exactly two invariant subspaces if and only if \mathcal{A}=\mathcal{L}(V).

Proof: An elementary proof of Burnside’s theorem by induction in the dimension of V can be found here.

-QED

Corollary 15.2. A proper subalgebra \mathcal{A} of \mathcal{L}(V) has at least four invariant subspaces.

Proof: The zero space and the total space are \mathcal{A}-invariant, and since \mathcal{A} \neq \mathcal{L}(V) there is a third \mathcal{A}-invariant subspace \{0_V\} < W < \mathcal{L}(V), by Burnside’s theorem. The orthogonal complement W^\perp of W is again a nonzero proper subspace of V, and we claim it is \mathcal{A}-invariant. Indeed, let x \in W^\perp and A \in \mathcal{A}. Then, for any w \in W we have

\langle w,Ax \rangle = \langle A^*w,x \rangle = 0,

where we used the fact that W is \mathcal{A}-invariant. This shows Ax \in W^\perp, i.e. W^\perp is \mathcal{A}-invariant.

– QED

Using Burnside’s theorem we can characterize the center of \mathcal{L}(V).

Theorem 15.2. The center of \mathcal{L}(V) is the set \mathbb{C}I of scalar operators.

Proof: Let Z \in \mathcal{L}(V) be a central element, and let \zeta \in \mathbb{C} be an eigenvalue of Z. Then, the \zeta-eigenspace of Z is an \mathcal{L}(V)-invariant subspace, since if v \in V is an eigenvector of Z corresponding to \zeta and A \in \mathcal{L}(V) is any operator we have

ZAv = AZv =\lambda Av,

which shows that Av is again in the \zeta-eigenspace of Z. Since this eigenspace is not the zero space, Burnside’s theorem forces it to be all of V, i.e. Z=\zeta I.

-QED

We have succeeded in describing the center of \mathcal{L}(V) explicitly, and ideally we would like to have an explicit description of all subalgebras of \mathcal{L}(V). We were able to do this for function algebras \mathcal{F}(X), where subalgebras are indexed by partitions \mathsf{P} of the finite set X. The classification is very simple: the subalgebra \mathcal{A}(\mathsf{P}) of a given partition \mathsf{P} of X is the set of all functions on X which are constant on the blocks of \mathsf{P}. A nice feature of this classification is that it shows every subalgebra of \mathcal{F}(X) is itself isomorphic to a function algebra, because \mathcal{A}(\mathsf{P}) is isomorphic to \mathcal{F}(\mathsf{P}).

We can attempt to classify subalgebras of \mathcal{L}(V) by adapting our approach to the classification of subalgebras of \mathcal{F}(X). The first step is to formulate the vector space version of partitions of a set.

Definition 15.2. A linear partition of V is a set \mathsf{W} of nonzero subspaces of V whose union spans V. We refer to the subspaces W \in \mathsf{W} as the blocks of \mathsf{W}.

Given a linear partition \mathsf{W} of V, it is not useful to consider the set of linear operators in \mathcal{L}(V) constant on the blocks of \mathsf{W}. Indeed, there is only one such operator, namely the zero operator. Rather, we define \mathcal{A}(\mathsf{W}) to be the set of all operators A \in \mathcal{L}(V) such that every block W \in \mathsf{W} is A-invariant.

Problem 15.2. Prove that A(\mathsf{W}) is a subalgebra of \mathcal{L}(V).

Here our construction starts to diverge from what we saw in the setting of function algebras: unless \mathsf{W}=\{V\} is the linear partition of V consisting of a single block, in which case \mathcal{A}(\mathsf{W})=\mathcal{L}(V), the subalgebra \mathcal{A}(\mathsf{W}) is definitely not isomorphic to the algebra of all linear operators on a Hilbert space. Indeed, consider the case where \mathsf{W}=\{W_1,W_2\} is a linear partition of V with two blocks. Let e_1,\dots,e_m be an ordered basis of W_1, and let f_1,\dots,f_n be an ordered basis of W_2. Then e_1,\dots,e_m,f_1,\dots,f_n is an ordered basis of V, and \mathcal{A}(\mathsf{W}) consists of all operators in A\in \mathcal{L}(V) whose matrix relative to this ordered basis has the form

[A]_{(e_1,\dots,e_m,f_1,\dots,f_n)}=\begin{bmatrix} M_1 & {} \\ {} & M_2 \end{bmatrix},

with M_1 \in \mathbb{C}^{m \times m} and M_2 \in \mathbb{C}^{n\times n}. Thus, \mathcal{A}(\{W_1,W_2\}) is not isomorphic to a linear algebra, but rather to a direct sum of linear algebras,

\mathcal{A}(\{W_1,W_2\}) = \mathcal{L}(W_1) \oplus \mathcal{L}(W_2).

The question now is whether the above can reversed: given a subalgebra \mathcal{A} of \mathcal{L}(V), does there exist a linear partition \mathsf{W} such that \mathcal{A}=\mathcal{A}(\mathsf{W})? If \mathcal{A}=\mathcal{L}(V), the answer is clearly “yes” since we have \mathcal{L}(V) = \mathcal{A}(\{V\}). Thus we consider the case where \mathcal{A} is a proper subalgebra of \mathcal{L}(V). Our starting point is Burnside’s theorem, which guarantees the existence of an \mathcal{A}-invariant subspace \{0_V\} < W < V. Together with the corollary to Burnside’s theorem, we thus have a linear partition \mathsf{W} = \{W,W^\perp\} of V whose blocks are \mathcal{A}-invariant. We now ask whether W has a proper non-trivial subspace invariant under \mathcal{A}. If it does we can split it into the orthogonal direct sum of two smaller \mathcal{A}-invariant subspaces; if not W is said to be \mathcal{A}-irreducible. Continuing this process, for both W and W^\perp, we get the following.

Theorem 15.4. (Maschke’s theorem) There exists a linear partition \mathsf{W} of V whose blocks are \mathcal{A}-invariant and \mathcal{A}-irreducible subspaces.

Now consider the linear partition \mathsf{W} of V into \mathcal{A}-invariant, \mathcal{A}-irreducible subspaces we have constructed. It is tempting to hope that \mathcal{A} = \mathcal{A}(\mathsf{W}). This is almost correct. Let us say that two blocks W,W' \in \mathsf{W} are \mathcal{A}equivalent if it is possible to choose a basis in W and a basis in W' such that the matrix of every A \in \mathcal{A}, viewed as an operator on W, coincides with the matrix of A viewed as an operator on W'. This is an equivalence relation on the blocks of \mathsf{W}. Let \Lambda be a set parameterizing the corresponding equivalence classes of blocks, and for each \lambda \in \Lambda choose a representative V^\lambda of the corresponding equivalence class.

Theorem 15.5. (Classification Theorem for subalgebras of linear algebras) The subalgebra \mathcal{A} is isomorphic to \bigoplus_{\lambda \in \Lambda} \mathcal{L}(V^\lambda).

We are not going to prove the above classification in full right now – in the coming lectures we will prove a special case of it for homomorphic images of convolution algebras of finite groups. If time permits, we will prove the full theorem later.

Math 202B: Lecture 12

Let \mathcal{C}(G) be the convolution algebra of a finite group G. Then, the left regular representation

\mathsf{L} \colon \mathcal{C}(G) \longrightarrow \mathcal{L}(V)

is an injective algebra homomorphism from the convolution algebra \mathcal{C}(G) into the the linear algebra of \mathcal{C}(G) viewed as a Hilbert space. That is, the regular representation converts convolution of functions into matrix multiplication.

If G is abelian, then the Fourier transform

\widehat{\cdot} \colon \mathcal{C}(G) \longrightarrow \mathcal{F}(\Lambda)

is an algebra isomorphism between the convolution algebra \mathcal{C}(G) and the function algebra of the dual group \Lambda of G. That is, the Fourier transform converts convolution into pointwise multiplication.

Let us reconcile these two points of view on the convolution algebra of a finite abelian group. Let A \in mathcal{C}(G) be any function on G. Let

A = \sum\limits_{\lambda \in \Lambda} \widehat{A}(\lambda)F^\lambda

be the Fourier expansion of A. Taking the left regular representation, we have

\mathsf{L}(A) = \sum\limits_{\lambda \in \Lambda} \widehat{A}(\lambda)P^\lambda,

where P^\lambda=\mathsf{L}(F^\lambda) is the linear operator on the Hilbert space V=\mathcal{C}(G) given by the image of the Fourier basis vector F^\lambda under the injective algebra homomorphism \mathsf{L}. In other words,

P^\lambda, \quad \lambda \in \Lambda,

is a basis of orthogonal projections for the image of \mathcal{C}(G) in \mathcal{L}(V) under \mathsf{L}, and

\mathsf{L}(A) = \sum\limits_{\lambda \in \Lambda} \widehat{A}(\lambda)P^\lambda

is the spectral decomposition of the operator \mathsf{L}(A). This says that the operators \mathsf{L}(A) making up the image of the commutative algebra \mathcal{C}(G) in \mathcal{L}(V) are simultaneously diagonalizable, which we already knew from our characterization of commutative subalgebras of \mathcal{L}(V), and moreover that calculating the eigenvalues of the operator \mathsf{L}(A) \in \mathcal{L}(V) is the same thing as calculating the Fourier transform of the function A \in \mathcal{C}(G).

Problem 12.3. A circulant matrix is a complex square matrix of the form

C=\begin{bmatrix} c_0 & c_1 & \dots & c_{n-1} \\ c_{n-1} & c_0 & \dots & c_{n-2} \\ \vdots & \vdots & {} & \vdots \\ c_1 & c_2 & \dots & c_0 \end{bmatrix}.

Determine the eigenvalues and eigenvectors of C. Hint: show that every n \times n circulant matrix is the image of a function on the cyclic group of order n in the regular representation.

As an extension to the above problem, you may consider how the uncertainty principle for the Fourier transform relates the sparsity of a circulant matrix to the dimension of its kernel. (Note to self: perhaps do this in Lecture in the next iteration of Math 202B).

Math 202B: Lecture 10

Let

G = C_{n_1} \times \dots \times C_{n_r}

be an abelian group, and let

\Lambda = \mathbb{Z}_{n_1} \times \dots \times \mathbb{Z}_{n_r}

be its dual group. In Lecture 8 we showed that the convolution algebra \mathcal{C}(G) admits a basis \{F^\lambda \colon \lambda \in \Lambda\} consisting of orthogonal projections, called its Fourier basis. Thus any function A \in \mathcal{C}(G) can be expanded in the group basis,

A = \sum\limits_{g \in G} A(g) E_g,

and in the Fourier basis

A = \sum\limits_{\lambda \in \Lambda} \widehat{A}(\lambda) F^\lambda.

The coefficients of the Fourier expansion of A \in \mathcal{C}(G) give a function \widehat{A} \in \mathcal{F}(\Lambda) defined by \lambda \mapsto \widehat{A}(\lambda).

Definition 9.1. The Fourier transform on \mathcal{C}(G) is the map \mathcal{C}(G) \to \mathcal{F}(\Lambda) defined by A \mapsto \widehat{A}.

In Lecture 3, we proved that any algebra which admits a basis of orthogonal projections is isomorphic to a function algebra, and the proof of this general fact given there applies here in particular, allowing us to conclude that the Fourier transform A \mapsto \widehat{A} is an algebra isomorphism \mathcal{C}(G) \to \mathcal{F}(G). Thus, we have

\widehat{A*B} = \widehat{A} \widehat{B},

where on the left hand side

[A*B](g) = \sum\limits_{h \in G} A(gh^{-1})B(h), \quad g \in G,

is the convolution product of two functions in \mathcal{C}(G), and on the right hand side

[\widehat{A}\widehat{B}](\lambda) = \widehat{A}(\lambda)\widehat{B}(\lambda), \quad \lambda \in \Lambda

is the pointwise product of two functions in \mathcal{F}(\Lambda).

By definition, the elements of the Fourier basis of \mathcal{C}(G) are

F^\lambda = \frac{1}{|G|} \sum\limits_{g \in G} \chi^\lambda(g) E_g,

where \{\chi^\lambda \colon \lambda \in \Lambda\} is the character basis of \mathcal{C}(G), which consists of all group homomorphisms G \to U(\mathbb{C}).

Theorem 9.1. For any A \in \mathcal{C}(G), we have

\widehat{A}(\lambda) = \frac{1}{|G|}\langle \chi^\lambda,A\rangle,

where the right hand side is an \ell^2-scalar product in \mathcal{C}(G).

Proof: Taking a slightly different normalization of the Fourier basis, we get an orthonormal basis of \mathcal{C}(G) consisting of the functions

E^\lambda = \sqrt{|G|}F^\lambda = \frac{1}{\sqrt{|G|}}\chi^\lambda, \quad \lambda \in \Lambda.

Because this basis is orthonormal, we have

A = \sum\limits_{\lambda \in \Lambda} \langle E^\lambda,A\rangle E^\lambda,

for any function A \in \mathcal{C}(G), and comparing with the Fourier expansion of A this gives

\widehat{A}(\lambda) = \frac{1}{\sqrt{|G|}}\langle E^\lambda,A\rangle E^\lambda=\frac{1}{|G|}\langle \chi^\lambda,A\rangle.

-QED

Theorem 9.1 is called the Fourier inversion formula, because

\widehat{A}(\lambda) = \frac{1}{|G|}\langle \chi^\lambda,A\rangle = \frac{1}{|G|}\sum\limits_{g \in G} \overline{\chi^\lambda(g)}A(g)

recovers the values of the transformed function \widehat{A} \in \mathcal{F}(\Lambda) in terms of the values of the original function A \in \mathcal{C}(G). As an example, let us calculate the Fourier transform of the elementary function E_h \in \mathcal{C}(G) corresponding to a given group element h \in G. From the inversion formula, the Fourier transform of the indicator function of h is

\widehat{E_h}(\lambda) = \frac{1}{|G|}\langle \chi^\lambda,E_h\rangle = \frac{1}{|G|}\sum\limits_{g \in G} \overline{\chi^\lambda(g)}E_h(g) = \frac{1}{|G|} \overline{\chi^\lambda(h)}.

Equivalently, we have

\widehat{E_h} = \frac{1}{|G|}\sum\limits_{\lambda \in \Lambda} \overline{\chi^\lambda(g)}E_\lambda,

where \{E_\lambda \colon \lambda \in \Lambda\} is the elementary basis of \mathcal{F}(\Lambda) consisting of indicator functions E_\lambda(\mu) = \delta_{\lambda\mu}.

Theorem 9.2. (Plancherel and Parseval formulas) For any A,B \in \mathcal{C}(G), we have

\langle \widehat{A},\widehat{B}\rangle=|G|\langle A,B\rangle

and

\|\widehat{A}\|_2 = \sqrt{|G|}\|A\|_2.

Proof: From the definition of the Fourier transform, we have

\langle A,B \rangle = \left\langle \sum\limits_{\lambda \in \Lambda} \widehat{A}(\lambda)F^\lambda,\sum\limits_{\mu \in \Lambda} \widehat{B}(\mu)F^\mu\right\rangle=\sum\limits_{\lambda \in \Lambda}\sum\limits_{\mu \in \Lambda}\overline{\widehat{A}(\lambda)}\widehat{B}(\mu)\langle F^\lambda,F^\mu\rangle=\sum\limits_{\lambda \in \Lambda}\overline{\widehat{A}(\lambda)}\widehat{B}(\lambda)\frac{1}{|G|}.

This is the Plancherel formula, and the Parseval formula is the special case where A=B, together with the definition of the norm associated to a scalar product.

-QED

The \ell^2-norm on functions A \colon G\to \mathbb{C} the p=2 case of the \ell^p-norm, which for any p>0 is defined by

\|A\|_p =\left(\sum\limits_{g\in G}|A(g)|^p\right)^{\frac{1}{p}}.

Also familiar from analysis is the case p=\infty, which is defined by

\|A\|_\infty = \max\limits_{g \in g} |A(g)|.

The intuition behind this definition is that, as p \to \infty the sum \|A\|_p^p is approximately equal to its largest term, and this concentration becomes exact in the p=\infty limit. This intuition leads to the following inequality.

Proposition 11.3. For any p>0 and A \colon G\to \mathbb{C}, we have

\|A\|_p^p \leq \|A\|^p_\infty |\mathrm{supp}(A)|,

where by definition the support of A is the cardinality of the complement of its vanishing set,

\mathrm{supp}(A) = \{g\in G\colon A(g) \neq 0\}.

Problem 11.2. Prove Proposition 11.3.

We now prove a discrete version of the famous uncertainty principle, whose origins are in quantum mechanics; in Fourier analysis, this principle effectively says that if the support of a function is small than the support of its Fourier transform is large.

Theorem 11.4. (Uncertainty principle) For any nonzero function A \in \mathcal{C}(G), we have

|\mathrm{supp}(A)||\mathrm{supp}(\widehat{A}| \geq |\Lambda|.

Proof: Start from the Fourier expansion of A, which in terms of the character basis \{\chi^\lambda \colon \lambda \in \Lambda\} of \mathcal{C}(G) is

A = \frac{1}{|G|} \sum\limits_{\lambda \in \Lambda} \widehat{A}(\lambda)\chi^\lambda.

For any g \in G, we have

|A(g)| =\left|\frac{1}{|G|} \sum\limits_{\lambda \in \Lambda} \widehat{A}\chi^\lambda(g)\right| \leq \frac{1}{|G|}\sum\limits_{\lambda \in \Lambda} |\widehat{A}(\lambda)| =\frac{1}{|G|}\sum\limits_{\lambda \in \mathrm{supp}(\widehat{A})} |\widehat{A}(\lambda)|,

where we used the triangle inequality and the fact that |\chi^\lambda(g)|=1. Since g\in G was arbitrary, the above shows that

\|A\|_\infty \leq \frac{1}{|G|}\sum\limits_{\lambda \in \mathrm{supp}(\widehat{A})} |\widehat{A}(\lambda)|.

Squaring both sides, we thus have

\|A\|_\infty^2 \leq \frac{1}{|G|^2}\left(\sum\limits_{\lambda \in \Lambda} \delta_{\lambda \in \mathrm{supp}(A)} |\widehat{A}(\lambda)|\right)^2\leq \frac{1}{|G|^2} |\mathrm{supp}(\widehat{A})| \|\widehat{A}\|_2^2=\frac{1}{|G|} |\mathrm{supp}(\widehat{A})| \|A\|_2^2\leq \frac{1}{|G|} |\mathrm{supp}(\widehat{A})| \|A\|_\infty^2|\mathrm{supp}(A)|,

where we applied the Cauchy-Schwarz inequality, the Parseval formula, and Proposition 11.3. Since \|A\|_\infty^2 \neq 0, it can be cancelled from both sides of the above inequality, leaving

1 \leq \frac{1}{|\Lambda|} |\mathrm{supp}(\widehat{A})||\mathrm{supp}(A)|.

-QED

Problem 11.3. Prove that the uncertainty principle is sharp by exhibiting a function A \colon G\to \mathbb{C} for which equality holds in Theorem 11.4.

Math 202B: Lecture 9

The convolution and class algebras \mathcal{C}(G) and \mathcal{Z}(G) of a group G are apparently quite different from the other types of algebras we have seen, namely function algebras \mathcal{F}(X) and linear algebras \mathcal{L}(V). However, they are in fact closely related.

Theorem 9.1. The convolution algebra \mathcal{C}(G) is isomorphic to a subalgebra of a linear algebra.

Proof: Since \mathcal{C}(G) has a natural scalar product, the \ell^2-scalar product

\langle A,B \rangle = \sum\limits_{g \in G} \overline{A(g)}B(g),

it is a Hilbert space as well as an algebra. For each A \in \mathcal{C}(G), consider the function \mathsf{L}(A) \colon \mathcal{C}(G) \to \mathcal{C}(G) defined by

\mathsf{L}(A)B = AB, \quad B \in \mathcal{C}(G).

Observe that \mathsf{L}(A) is a linear operator on \mathcal{C}(G), i.e.

\mathsf{L}(A)(\beta B + \gamma C) = A(\beta B + \gamma C) = \beta AB + \gamma AC = \beta \mathsf{L}(A)B + \gamma \mathsf{L}(A)C.

Thus, \mathsf{L} is a function from the convolution algebra \mathcal{C}(G) to the linear algebra \mathcal{L}(\mathcal{C}(G)). Moreover, the map \mathsf{L} itself is a linear transformation, i.e.

\mathsf{L}(\alpha A + \beta B) C= (\alpha A +\beta B)C=\alpha AB +\beta AC = \alpha \mathsf{L}(A)B + \gamma \mathsf{L}(B)C.

We claim that \mathsf{L} is an injective linear transformation. Since \mathsf{L} is a linear transformation of \mathcal{C}(G), it is completely determined by its action on the group basis \{E_g \colon g \in G\} of \mathcal{C}(G). Thus, it suffices to show that the linear operators \mathsf{L}(E_g) and \mathsf{L}(E_h) on \mathcal{C}(G) are distinct unless g=h. Since \mathsf{L}(E_g) and \mathsf{L}(E_h) are linear operators on \mathcal{C}(G), they are uniquely determined by their actions on the group basis \{E_g \colon g \in G\} of \mathcal{C}(G). We have

\mathsf{L}(E_g)E_k = E_gE_k = E_{gk} \quad\text{and}\quad \mathsf{L}(E_h)E_k = E_hE_k = E_{hk},

hence

\mathsf{L}(E_g)E_k=\mathsf{L}(E_h)E_k \iff gk = hk \iff g=h.

From this we can conclude that the image of \mathcal{C}(G) in \mathcal{L}(\mathcal{C}(G)) under \mathsf{L} is a subspace isomorphic to \mathcal{C}(G). It remains to show that \mathsf{L} is not just a linear transformation, but an algebra homomorphism, and we leave this as an exercise (that you really should do).

-QED

The proof of Theorem 8.1 is the linear version of Cayley’s theorem from group theory: instead of representing G as a subgroup of the group of permutations of G, it represents \mathcal{C}(G) as a subalgebra of the algebra of linear operators on \mathcal{C}(G). This is called the left regular representation of \mathcal{C}(G).

Corollary 9.2. The class algebra \mathcal{Z}(G) is isomorphic to a function algebra.

Proof: Since \mathcal{Z}(G) is a subalgebra of \mathcal{C}(G), and since \mathcal{C}(G) is isomorphic to a subalgebra of a linear algebra, \mathcal{Z}(G) is isomorphic to a subalgebra of a linear algebra. Thus, \mathcal{Z}(G) is isomorphic to a commutative subgalgebra of a linear algebra, and we have previously shown that all commutative subalgebras of linear algebras are isomorphic to function algebras.

-QED

For an example illustrating how the proof of Theorem 8.1 works, let us take G=\{g^0,g^1,g^2,g^3\} to be a cyclic group of order four with generator g, so that g^0=g^4=\dots=e is the group unit. The group basis of \mathcal{C}(G) is thus E_{g^0},E_{g^1},E_{g^2},E_{g^3}, and we denote these vectors by E_0,E_1,E_2,E_3 for brevity. Let A \in \mathcal{C}(G) be any function on G, and let

A = \alpha_0 E_0 + \alpha_1 E_1 + \alpha_2 E_2 + \alpha_3 E_3

be its expansion in the group basis, so that

\alpha_0 = A(g^0),\ \alpha_1=A(g),\ \alpha_2=A(g^2),\ \alpha_3=A(g^3).

Problem 9.1. Show by direct calculation that the matrix of \mathsf{L}(A) \in \mathcal{L}(\mathcal{C}(G)) with respect to the ordered basis E_0,E_1,E_2,E_3 of \mathcal{C}(G) is

\mathsf{L}(A) = \begin{bmatrix} \alpha_0 & \alpha_1 & \alpha_2 & \alpha_3 \\ \alpha_3 & \alpha_0 & \alpha_1 & \alpha_2 \\ \alpha_2 & \alpha_3 & \alpha_0 & \alpha_1 \\ \alpha_1 & \alpha_2 & \alpha_3 & \alpha_0 \end{bmatrix}.

In the case where G is an abelian group, as in the example above, we have that \mathcal{C}(G) = \mathcal{Z}(G). Moreover, it is possible to establish Corollary 8.1 directly, without appealing to Theorem 8.1. This is done constructively, by finding an explicit basis of orthogonal projections in the commutative convolution algebra \mathcal{C}(G) called its Fourier basis. The advantage of this direct approach is that it also gives us an explicit description of the spectrum of all matrices in the left regular representation of \mathcal{C}(G). This is very useful in applications – in particular, matrices in the left regular representation of a cyclical group are called circulant matrices and they are important in engineering.

To see begin to see where a projection basis of \mathcal{C}(G) might come from, recall that we previously showed the non-existence of an algebra homomorphism \mathsf{T} \colon \mathcal{L}(V) \to \mathbb{C} for V a Hilbert space of dimension at least two. This reflects the fact that linear algebras \mathcal{L}(V) are maximally noncommuative. But we have also seen that for G a group with at least two elements, the dimension of \mathcal{Z}(G) is at least two, so convolution algebras are always at least one degree more commutative than linear algebras, and therefore might admit homomorphisms to the complex numbers.

Theorem 9.3. A linear transformation \mathsf{T} \colon \mathcal{C}(G) \to \mathbb{C} is an algebra homomorphism if and only if the function \chi \in \mathcal{C}(G) defined by

\chi(g) = \mathsf{T}(E_g), \quad g \in G,

is a group homomorphism taking values in U(\mathbb{C}), the unitary group of the complex numbers.

Problem 9.2. Prove Theorem 8.3.

Definition 9.1. A group homomorphism \chi \colon G \to U(\mathbb{C}) is called a character of G. The set of all characters of G is denoted \widehat{G}.

Observe first of all that \widehat{G} is a set of functions on G, is a subset of \mathcal{C}(G), the space of all \mathbb{C}-valued functions on G. It is this special subset of functions tethered to the group law in G that we look to for orthogonal projections. No matter what the group G is, the set \widehat{G} is nonempty: it contains at least the trivial character defined by \chi(g)=1, g \in G. However, for highly noncommutative groups there may not be many nontrivial characters.

Problem 9.3. Determine the Fourier dual of the symmetric group S_n.

For abelian groups the situation is much better and in fact we always have |G|=|\widehat{G}|. To begin, recall that every finite abelian group is isomorphic to a product of cyclic groups. We thus fix a positive integer r \in \mathbb{N}, and r positive integers n_1,\dots,n_r \in \mathbb{N}, and consider the group

G = G_1 \times \dots \times G_r,

where G_i is a cyclic group of order n_i with generator g_i. Define the dual group of G to be

\Lambda = \{\alpha=(\alpha_1,\dots,\alpha_r) \colon \alpha_k \in \{0,1,\dots,n_k-1\},\ 1 \leq k \leq r\}.

That is,

\Lambda = \mathbb{Z}_{n_1} \times \dots \times \mathbb{Z}_{n_r},

where \mathbb{Z}_n= is the additive group of integers modulo n. We can parameterize G by the points of \Lambda, writing

g_\alpha=(g_1^{\alpha_1},\dots,g_r^{\alpha_r}), \quad \alpha \in \Lambda.

Indeed, the parameterization \alpha \mapsto g_\alpha is a group isomorphism \Lambda \to G (Exercise: prove this, noting that because |\Lambda|=|G| it is sufficient to show the parameterization is an injective group homomorphism).

Theorem 9.4. For every \lambda \in \Lambda, the function \chi^\lambda \colon G \to U(\mathbb{C}) defined by

\chi^\lambda(g_\alpha) = \omega_1^{\alpha_1\lambda_1} \dots \omega_r^{\alpha_r\lambda^r},

where \omega_k=\exp\left(\frac{2\pi i}{n_k}\right) is a principal kth root of unity, is a character of G, and every character of G is of this form.

Proof: For any \lambda \in \Lambda, it is clear that \chi^\lambda(e)=1, because the identity element e \in G has parameters \alpha=(0,0,\dots,0). Moreover, for any \alpha,\beta \in \Lambda we have

\chi^\lambda(g_\alpha g_\beta) = \chi^\lambda(g_{\alpha+\beta})=(\omega_1)^{(\alpha_1+\beta_1)\lambda_1} \dots (\omega_r)^{(\alpha_r+\beta_r)\lambda_r)} = (\omega_1)^{\alpha_1\lambda_1} \dots (\omega_r)^{\alpha_r\lambda_r}(\omega_1)^{\beta_1\lambda_1} \dots (\omega_r)^{\beta_r\lambda_r}=\chi^\lambda(g_\alpha)\chi^\lambda(g_\beta),

so \chi^\lambda is indeed a group homomorphism G \to U(\mathbb{C}). The fact that every homomorphism \chi \colon G \to U(\mathbb{C}) is \chi^\lambda for some \lambda \in \Lambda is left as an exercise.

-QED

We now have a special subset \widehat{G}=\{\chi^\lambda \colon \lambda \in \Lambda\} of the convolution algebra \mathcal{C}(G) of the finite abelian group G, namely the set \widehat{G} of all homomorphisms to the unitary group U(\mathbb{C}). We now claim that the characters form a basis of \mathcal{C}(G). Since the number of characters is |\widehat{G}|=|\Lambda|=|G|, which is the dimension of \mathcal{C}(G), it is sufficient to show that \widehat{G}=\{\chi^\lambda \colon \lambda \in \Lambda\} is a linearly independent set in \mathcal{C}(G).

Theorem 9.5. The set \{\chi^\lambda \colon \lambda \in \Lambda\} is orthogonal with respect to the \ell^2-scalar product on G – we have

\langle \chi^\lambda,\chi^\mu\rangle = \sum\limits_{g \in G} \overline{\chi^\lambda(g)} \chi^\mu(g) = \delta_{\lambda\mu}|G|.

Proof: For any \lambda,\mu \in \Lambda, we have

\langle \chi^\lambda,\chi^\mu \rangle = \sum\limits_{\alpha \in \Lambda} \overline{\chi^\lambda(g_\alpha)}\chi^\mu(g_\alpha) = \sum\limits_{\alpha \in \Lambda}(\omega_1)^{\alpha_1(\mu_1-\lambda_1)} \dots (\omega_r)^{\alpha_r(\mu_r-\lambda_r)}=\left(\sum\limits_{\alpha_1=0}^{n_1-1} \zeta_1^{\alpha_1}\right) \dots \left(\sum\limits_{\alpha_r=0}^{n_r-1} \zeta_r^{\alpha_r}\right),

where

\zeta_1 = \omega_1^{\mu_1-\lambda_1},\ \dots,\ \zeta_r=\omega_r^{\mu_r-\lambda_r}.

Thus if \lambda=\mu we have

\langle \chi^\lambda,\chi^\mu \rangle = n_1 \dots n_r = |\Lambda|=|G|,

and if \lambda \neq \mu we have

\langle \chi^\lambda,\chi^\mu \rangle =\frac{1-\zeta_1^{n_1}}{1-\zeta_1} \dots \frac{1-\zeta_r^{n_r}}{1-\zeta_r},

where the denominator of each fraction is nonzero and the numerator is zero, because \zeta_k=\omega_k^{\mu_k-\lambda_k} is an n_kth root of unity.

-QED

The orthogonal basis \widehat{G}=\{\chi^\lambda \colon \lambda \in \Lambda\} of \mathcal{C}(G) is called its character basis. It is convenient to write \chi^\lambda_\alpha := \chi^\lambda(g_\alpha), since this highlights the symmetry \chi^\lambda_\alpha = \chi^\alpha_\lambda. The |G| \times |G| symmetric matrix X=[\chi^\lambda_\alpha] is called the character table of G, and Theorem 8.3 says that \frac{1}{\sqrt{|G|}} is a symmetric unitary matrix. Another way to say the same thing is that the rescaled character basis

E^\lambda = \frac{1}{\sqrt{|G|}} \chi^\lambda, \quad \lambda \in \Lambda,

is an orthonormal basis of the convolution algebra \mathcal{C}(G). In fact, the further scaling

F^\lambda = \frac{1}{|G|}F^\lambda, \quad \lambda \in \Lambda,

is even better, for the following reason.

Theorem 9.6. The elements of the basis \{F^\lambda \colon \lambda \in \Lambda\} are orthogonal projections in \mathcal{C}(G).

Proof: For any \lambda \in \Lambda, we have

(F^\lambda)^* = \left( \frac{1}{|G|}\sum\limits_{g \in G} \chi^\lambda(g) E_g\right)^*=\frac{1}{|G|}\sum\limits_{g \in G} \overline{\chi^\lambda(g)} E_g^* =\frac{1}{|G|} \sum\limits_{g \in G} \chi^\lambda(g^{-1}) E_{g^{-1}} = F^\lambda.

For any \lambda,\mu \in \Lambda, we have

F^\lambda F^\mu = \left( \frac{1}{|G|}\sum\limits_{g \in G} \chi^\lambda(g) E_g\right)\left( \frac{1}{|G|}\sum\limits_{g \in G} \chi^\mu(g) E_g\right)=\frac{1}{|G|^2}\sum\limits_{g \in G} \left( \sum\limits_{h \in G}\chi^\lambda(gh^{-1})\chi^\mu(h)\right)E_g = \frac{1}{|G|^2}\sum\limits_{g \in G} \chi^\lambda(g)\left( \sum\limits_{h \in G}\chi^\lambda(h^{-1})\chi^\mu(h)\right)E_g.

The internal sum is

\sum\limits_{h \in G}\chi^\lambda({h^{-1}})\chi^\mu(h)=\sum\limits_{h \in G}\overline{\chi^\lambda(h)}\chi^\mu(h)=\delta_{\lambda\mu}|G|,

where the final equality is Theorem 8.3. Thus

F^\lambda F^\mu = \frac{1}{|G|^2}\delta_{\lambda\mu}|G|\sum\limits_{g \in G} \chi^\lambda(g)E_g=\delta_{\lambda\mu}F^\lambda.

-QED

Since we know that any algebra with a basis of orthogonal projections is isomorphic to the function algebra (Lecture 3), Theorem 8.4 gives the promised second proof of the fact that the convolution algebra \mathcal{C}(G) of an abelian group G is isomorphic to a function algebra. In this particular case, the basis \{F^\lambda \colon \lambda \in \Lambda\} is known as the Fourier basis of \mathcal{C}(G).

Math 202B: Lecture 8

The convolution algebra \mathcal{C}(G) of a group G is commutative if and only if G is abelian. As in Lecture 2, we will refine this dichotomy by giving a quantitative measurement of how (non)commutative \mathcal{C}(G) is. In Lecture 2 we defined the commutativity index of an algebra to be the dimension of its center, so our goal now is to determine the dimension of the center of a convolution algebra.

The degree of non commutativity of \mathcal{C}(G) is determined by that of G, and in group theory we understand this using the action of G on itself defined by

g.h = ghg^{-1}.

The orbits of this action are the conjugacy classes of G, and the number of conjugacy classes in G is called its class number. The higher the class number, the more commutative the group – for abelian groups, conjugacy classes are singleton sets.

Problem 8.1. Prove that G is an abelian group if and only if its class number is equal to its cardinality, and more generally that the number of commuting pairs of elements in G is equal to the cardinality of G times its class number (hint: the orbit-stabilizer theorem may be helpful). Show also that every group except the trivial group contains at least two conjugacy classes.

For the function algebra \mathcal{F}(G), every partition of G gives rise to a subalgebra of \mathcal{F}(G), namely the set of functions constant on the blocks of the partition. As we discussed in Lecture 4, there is no reason that the convolution of two such functions will still be constant on the blocks of the given partition. On the other hand, the partition of G into conjugacy classes is determined by the group structure of G, and so functions on G which are constant on the blocks of this particular group-theoretic partition of G are relevant from the point of view of \mathcal{C}(G).

Definition 8.2. A function A \colon G \to \mathbb{C} is called a class function if it is constant on conjugacy classes, meaning that A(g) = A(hgh^{-1}) for all g,h \in G.

The following gives an alternative way to think about class functions as functions which are insensitive to any noncommutativity present in G.

Theorem 8.1. A function A \colon G \to \mathbb{C} is a class function if and only if A(gh)=A(hg) for all g,h \in G.

Proof: Suppose first that A is a class function on G. Then,

A(gh) = A(hghh^{-1}) = A(hg).

Conversely, suppose A is insensitive to noncommutativity. Then,

A(hgh^{-1}) = A(h^{-1}hg) = A(g).

-QED

We can now characterize the center of \mathcal{C}(G).

Theorem 8.2. A function A \in \mathcal{C}(G) belongs to Z(\mathcal{C}(G)) if and only if it is constant on conjugacy classes.

Proof: Suppose first that A \in \mathcal{C}(G) is a class function; we will prove that it commutes with every B \in \mathcal{C}(G). For any g \in G, we have

[AB](g) = \sum\limits_{h \in G} A(gh^{-1})B(h) = \sum\limits_{h \in G} A(g(gh)^{-1}) B(gh) = \sum\limits_{h \in G} A(gh^{-1}g^{-1})B(gh),

where the second inequality follows from the fact that the substitution h \rightsquigarrow gh simply permutes the terms of the sum. Continuing the calculation, we have

[AB](g) = \sum\limits_{h \in G} A(gh^{-1}g^{-1})B(gh) =\sum\limits_{h \in G} A(h^{-1})B(gh) = \sum\limits_{h \in G} B(gh)A(h^{-1}),

where the second inequality follows from the fact that A is a class function. We now conclude

[AB](g) = \sum\limits_{h \in G} B(gh^{-1})A(h) = [BA](g),

as required.

Now suppose that Z \in Z(\mathcal{C}(g)) is a central function; we will prove it is constant on conjugacy classes. Since Z commutes with all functions in \mathcal{C}(g), it commutes with every elementary function E_g. Since

[ZE_g](h) = \sum\limits_{k \in G}Z(hk^{-1})E_g(k)=Z(hg^{-1}),

and

[E_gZ](h) = \sum\limits_{k\in G}E_g(hk^{-1})Z(k)=Z(g^{-1}h),

the centrality of Z implies that Z(gh)=Z(hg) for all g,h \in G. Thus Z is a class function, by Theorem 5.1.

-QED

Thus, the set of all functions constant on conjugacy classes of G is a subalgebra of both \mathcal{F}(G) and \mathcal{C}(G). The subalgebra of class functions has no special significance in \mathcal{F}(G), not being any more or less special than the subalgebra associated with any other partition of G. But in \mathcal{C}(G), the subalgebra of class functions is the center of \mathcal{C}(G). The center of \mathcal{C}(G) is often called the class algebra of G and denoted \mathcal{Z}(G) rather than Z(\mathcal{C}(G)) to emphasize that it is worth thinking about as a standalone object, i.e. as a commutative algebra naturally associated to the finite group G.

Just as the convolution algebra \mathcal{C}(G) has a natural basis given by indicator functions of elements of G, the class algebra \mathcal{Z}(G) has a natural basis given by indicator functions of conjugacy classes in G. Let \Lambda be a set parameterizing the conjugacy classes of G, so the collection of these is

\{C_\alpha \colon \alpha \in \Lambda\},

and this is a subset of the power set of G. For each \alpha \in \Lambda, let K_\alpha \colon G \to \mathbb{C} be the indicator function of C_\alpha, so

K_\alpha(g) = \begin{cases} 1, \text{ if }g \in C_\alpha \\ 0, \text{ if }g \not\in C_\alpha \end{cases}.

Equivalently, in terms of the group basis \{E_g \colon g \in G\} of \mathcal{C}(G) we have

K_\alpha = \sum\limits_{g \in C_\alpha} E_g, \quad \alpha \in \Lambda.

Then, the functions \{K_\alpha \colon \alpha \in \Lambda\} span the class algebra of G, since any function Z \colon G \to \mathbb{C} constant on conjugacy classes can be written

Z = \sum\limits_{\alpha \in \Lambda} Z(\alpha) K_\alpha,

where Z(\alpha) denotes the value Z(g) for any g \in \mathcal{C}_\alpha. We will show that \{K_\alpha \colon \alpha \in \Lambda\} is a basis of \mathcal{Z}(G) using the \ell^2-scalar product on \mathcal{C}(G), which is

\langle A,B \rangle = \sum\limits_{g \in G} \overline{A(g)}B(g).

It is clear that \{E_g \colon g \in G\} is an orthonormal basis of \mathcal{C}(G) with respect to the \ell^2-scalar product, and we therefore have

\langle K_\alpha,K_\beta \rangle = \left\langle \sum\limits_{g \in C_\alpha} E_g , \sum\limits_{h \in C_\beta} E_h \right\rangle = \sum\limits_{g \in C_\alpha,\ h \in C_\beta} \langle E_g,E_h\rangle = |C_\alpha \cap C_\beta|.

Since \{C_\alpha \colon \alpha \in \Lambda\} is a partition of G, the scalar product is

\langle K_\alpha,K_\beta \rangle = \delta_{\alpha\beta}|C_\alpha|,

which shows that \{K_\gamma \colon \gamma \in \Gamma\} is an orthogonal (but not orthonormal) set of functions in \mathcal{Z}(G), hence linearly independent.

In conclusion, if G is any finite group, the dimension of the convolution algebra \mathcal{C}(G) is the cardinality of G, and the dimension of the class algebra \mathcal{Z}(G) is the class number of G.

Problem 8.3. Consider the multiplication tensor [c_{\alpha\beta\gamma}] of \mathcal{Z}(G), i.e. K_\alpha K_\beta = \sum_\gamma c_{\alpha\beta\gamma}K_\gamma. For any \alpha,\beta,\gamma \in \Gamma and any g \in C_\gamma, show that

c_{\alpha\beta\gamma} = |\{(x,y) \in C_\alpha \times C_\beta \colon xy=g\}|.

Thus, the multiplication tensor of \mathcal{Z}(G) is quite an interesting object: c_{\alpha\beta\gamma} counts solutions to the equation xy=g in G, where g is any particular point of the conjugacy class C_\gamma, and x,y \in G are required to belong to C_\alpha and C_\beta respectively.

Math 202B: Lecture 3

Let \mathcal{A} be an algebra.

Definition 3.1. A basis of orthogonal projections in \mathcal{A} is said to be a Fourier basis of \mathcal{A}.

Let X be a finite nonempty set.

Definition 3.2. The function algebra of X is the vector space

\mathcal{F}(X) = \{A \colon X \to \mathbb{C}\}

of \mathbb{C}-valued functions on X with multiplication defined by

[AB](x) = A(x)B(x), \quad x \in X,

and conjugation defined by

A^*(x) = \overline{A(x)}, \quad x \in X.

One says that the operations in \mathcal{F}(X) are defined pointwise.

Problem 3.1. Prove that \mathcal{F}(X) is indeed an algebra.

There is a natural Fourier basis in \mathcal{F}(X) indexed by the points of X. For each x \in X, define the corresponding elementary function E_x \in \mathcal{F}(X) by

E_x(y) = \begin{cases} 1, \text{ if }x=y \\ 0,\text{ if }x \neq y\end{cases}.

That is, E_x(y) = \delta_{xy} where \delta_{xy} is the Kronecker delta. The elementary functions are selfadjoint elements of \mathcal{F}(X) because they are real-valued, and they are orthogonal projections because

[E_xE_y](z)=E_x(z)E_y(z)=\delta_{xz}\delta_{yz}=\delta_{xy}E_x(z).

Thus, \{E_x \colon x \in X\} is a linearly independent set in \mathcal{F}(X) by Theorem 1.1 in Lecture 1. To see that \{E_x \colon x \in X\} spans \mathcal{F}(X), observe that any function A \in \mathcal{F}(X) can be written as

A = \sum\limits_{x \in X} A(x)E_x.

Using the elementary basis \{E_x \colon x \in X\} of \mathcal{F}(X), we can forget that the elements of this algebra are functions on X and view them as linear combinations

A = \sum\limits_{x\in X} \alpha_xE_x, \quad \alpha_x \in \mathbb{C}.

Combining the fact that the elementary functions are orthogonal projections with the axioms of bilinearity and antilinearity, we recover multiplication and conjugation in \mathcal{F}(X) in the form

AB = \left(\sum\limits_{x \in X} \alpha_xE_x\right)\left(\sum\limits_{x \in X} \beta_xE_x\right)=\sum\limits_{x,y\in X}\alpha_x\beta_yE_xE_y = \sum\limits_{x \in X} \alpha_x\beta_xE_x

and

A^*= \left(\sum\limits_{x \in X} \alpha_xE_x\right)^*=\sum\limits_{x \in X} \bar{\alpha}_xE_x^* = \sum\limits_{x \in X} \bar{\alpha}_xE_x.

Two sets X and Y are said to be isomorphic if there exists a bijection between them.

Problem 3.1. Prove that two finite nonempty sets X and Y are isomorphic if and only if their function algebras \mathcal{F}(X) and \mathcal{F}(Y) are isomorphic.

Theorem 3.1. An algebra \mathcal{A} admits a Fourier basis if and only if it is isomorphic to a function algebra.

Proof: We have already shown that a function algebra has a Fourier basis. Conversely, let \mathcal{A} be an algebra, and let \{F^\lambda \colon \lambda \in \Lambda\} be a vector space basis of \mathcal{A} indexed by the points of a finite set \Lambda whose cardinality is the dimension of \mathcal{A}. Let \mathcal{F}(\Lambda) be the function algebra of this set, and define a linear transformation

\mathsf{T} \colon \mathcal{A} \longrightarrow \mathcal{F}(\Lambda)

by \mathsf{T}(F^\lambda)=E_\lambda, where E_\lambda \in \mathcal{F}(\Lambda) is the elementary function corresponding to \lambda \in \Lambda. This is a vector space isomorphism, since it maps a basis of \mathcal{A} onto a basis of \mathcal{F}(\Lambda). If \{F^\lambda \colon \lambda \in \Lambda\} is a basis of orthogonal projections in \mathcal{A}, then \mathsf{T} is also an algebra homomorphism. Indeed, by Theorem 1.2 and linearity of \mathsf{T} we have

\mathsf{T}(I_\mathcal{A}) = \sum\limits_{\lambda \in \Lambda} \mathsf{T}(F^\lambda) = \sum\limits_{\lambda \in \Lambda} E_\lambda = I_{\mathcal{F}(\Lambda)},

so \mathsf{T} maps the multiplicative unit of \mathcal{A} to that of \mathcal{F}(\Lambda). Next,

\mathsf{T}(F^\lambda F^\mu) = \mathsf{T}(\delta_{\lambda\mu}F^\lambda)=\delta_{\lambda\mu}E_\lambda = E_\lambda E_\mu = \mathsf{T}(F^\lambda)\mathsf{T}(F^\mu),

so \mathsf{T} respects multiplication. Finally,

\mathsf{T}((F^\lambda)^*)=\mathsf{T}(F^\lambda)=E_\lambda=E_\lambda^*=\mathsf{T}(F^\lambda)^*.

-QED

Theorem 3.1 is simple but important and you should go over its proof carefully. The argument shows that if \{F^\lambda \colon \lambda \in \Lambda\} is a Fourier basis of \mathcal{A}, then the linear transformation

\mathsf{T} \colon \mathcal{A} \longrightarrow \mathcal{F}(\Lambda)

defined by

\mathsf{T}(F^\lambda) = E_\lambda, \quad \lambda \in \Lambda,

is an algebra isomorphism. This isomorphism is called the Fourier transform on \mathcal{A}, and it is generally denoted as A \mapsto \widehat{A} rather than A \mapsto \mathsf{T}(A). Thus,

\widehat{F^\lambda} = E_\lambda,\quad \lambda \in \Lambda,

and the argument above shows that the linear isomorphism \mathcal{A} \to \mathcal{F}(\Lambda) so defined is an algebra isomorphism. Note that the Fourier transform on \mathcal{A} is not canonical – it is defined in terms of a specified basis \{F^\lambda \colon \lambda \in \Lambda\} of orthogonal projections in \mathcal{A}. For any element A \in \mathcal{A}, its expansion in the Fourier basis is written

A = \sum\limits_{\lambda \in \Lambda} \widehat{A}(\lambda)F^\lambda,

and the coefficients in this expansion are called the Fourier coefficients of A. We thus have

\widehat{A} = \sum\limits_{\lambda \in \Lambda} \widehat{A}(\lambda)E_\lambda,

which is the elementary expansion of a function \widehat{A} \in \mathcal{F}(\Lambda). The function \widehat{A} \in \mathcal{F}(\Lambda) is called the Fourier transform of the algebra element A \in \mathcal{A}. The practical value of the Fourier transform lies in the fact that

\widehat{A * B} = \widehat{A}\widehat{B}

where A*B is the product of A,B \in \mathcal{A}, which might be convoluted and difficult to calculate. On the other hand, the pointwise product of the corresponding functions \widehat{A},\widehat{B}\in \mathcal{F}(\Lambda) is very simple, both conceptually and computationally.

Math 202B: Lecture 2

In this lecture we move beyond the commutative vs noncommutative dichotomy and quantify the degree of (non)commutativity of a given algebra \mathcal{A}. This is done using the notion of a subalgebra.

Definition 2.1. A subspace \mathcal{B} of \mathcal{A} is called a subalgebra if it contains I_\mathcal{A} and is closed under multiplication and conjugation.

Being a subalgebra is a stronger condition than being a subspace. In particular, the zero subspace \{0_\mathcal{A}\} of \mathcal{A} is not a subalgebra because it does not contain I_\mathcal{A}. Indeed, every subalgebra of \mathcal{A} must contain the one-dimensional subspace

\mathbb{C}I_\mathcal{A} = \{\alpha I_\mathcal{A} \colon \alpha \in \mathbb{C}\}

of scalar multiples of I_\mathcal{A}, which is a commutative subalgebra of \mathcal{A} isomorphic to \mathbb{C}. In this sense, \mathbb{C}I_\mathcal{A} is the smallest subalgebra of \mathcal{A}, and not only is it commutative but its elements commute with all elements in \mathcal{A}. We can consider the set of all elements which have the “commute with everything” property.

Definition 2.2. The center of \mathcal{A} is the set

Z(\mathcal{A})=\{Z \in \mathcal{A} \colon ZA=AZ \text{ for all }A \in \mathcal{A}\}.

Elements of Z(\mathcal{A}) are called central elements.

The use of the letter “Z” here comes from the German word for “center,” which is “zentrum.” Don’t forget to take your zentrum.

Theorem 2.1. The center Z(\mathcal{A}) is a subalgebra of \mathcal{A}.

Proof: We have to check that the conditions stipulated by Definition 2.1 hold for Z(\mathcal{A}) – it would be bad if the center did not hold. The argument is straightforward but also worth spelling out in detail, since all the defining features of an algebra (Definition 1.1) are used.

First we need to verify that Z(\mathcal{A}) is a vector subspace of \mathcal{A}, i.e. that it is closed under linear combinations. Let Z_1,Z_2 \in Z(\mathcal{A}) be any two central elements and A \in \mathcal{A} be any element. For any two scalars \alpha_1,\alpha_2 \in \mathbb{C}, we have

(\alpha_1 Z_1 + \alpha_2 Z_2)A = \alpha_1 Z_1A + \alpha_2 Z_2A = \alpha_1 AZ_1 + \alpha_2 AZ_2 = A(\alpha_1 Z_1 + \alpha_2 Z_2).

Now we check that Z(\mathcal{A}) is closed under multiplication:

Z_1Z_2A = Z_1AZ_2=AZ_1Z_2.

Now we check that Z(\mathcal{A}) is closed under conjugation:

Z^*A = Z^*(A^*)^*=(A^*Z)^*=(ZA^*)^*=AZ^*.

Finally, it is clear that I_\mathcal{A} \in Z(\mathcal{A}).

-QED

We can now quantify commutativity.

Definition 2.3. The commutativity index of \mathcal{A} is the dimension of Z(\mathcal{A}).

The commutativity index of \mathcal{A} is a positive integer between 1 (minimally commutative, maximally noncommutative) and \dim \mathcal{A} (maximally commutative, minimally noncommutative). Any algebra \mathcal{A} whose commutativity index is less than \dim \mathcal{A} is called noncommutative, which is a bit misleading because every algebra contains (uncountably) many elements which commute with one another. Algebras with one-dimensional center Z(\mathcal{A})=\mathbb{C}I have the lowest commutativity index; such maximally noncommutative algebras are called central-simple algebras.

We now generalize the center of an algebra as follows.

Definition 2.4. Given a subalgebra \mathcal{B} of \mathcal{A}, its centralizer Z(\mathcal{B},\mathcal{A}) is the set of all elements in \mathcal{A} which commute with every element of \mathcal{B}:

Z(\mathcal{B},\mathcal{A}) = \{A \in \mathcal{A} \colon AB=BA \text{ for all }B \in \mathcal{B}\}.

The symbol Z(\mathcal{B},\mathcal{A}) is read as “the centralizer of \mathcal{B} in \mathcal{A}.’’ In particular, the centralizer of \mathcal{A} in \mathcal{A} is Z(\mathcal{A},\mathcal{A})=Z(\mathcal{A}).

Problem 2.1. Prove that \mathcal{Z}(\mathcal{B},\mathcal{A}) is indeed a subalgebra of \mathcal{A}. Moreover, show that if \mathcal{B},\mathcal{C} are subalgebras of \mathcal{A} with \mathcal{B} \subseteq \mathcal{C}, then Z(\mathcal{B},\mathcal{A}) \supseteq Z(\mathcal{C},\mathcal{A}).

Problem 2.1 suggests a way to organize the set of all subalgebras of a given algebra \mathcal{A}. We know that this is a nonempty set, whose “smallest” element is \mathbb{C}I and whose “largest” element is \mathcal{A} itself — we want to order everything in between.

Defintion 2.3. Given a set \Omega, a relation on \Omega is a subset \mathrm{R} of \Omega \times \Omega.

In the context of relations, instead of ordered pairs of elements it is standard to write X \text{ symbol }Y to denote the membership of (X,Y) in \mathrm{R}. For example, you are familiar with the notion of an equivalence relation on a set \Omega, which is a relation written X \sim Y with the following properties :

  • Reflexivity: X \sim X for all X \in \Omega;
  • Symmetry: X\sim Y iff Y \sim X;
  • Transitivity: if X \sim Y and Y \sim Z then X \sim Z.

There is a different relation which formalizes order in the same way as the above formalizes equivalence.

Defintion 2.3. A partial order on \Omega is a relation with the following properties:

  • Reflexivity: X \leq X for all X \in \Omega;
  • Antisymmetry: if X\leq Y and Y \leq X then X=Y;
  • Transitivity: if X \leq Y and Y \leq Z then X \leq Z.

A pair (\Omega,\leq) consisting of a set together with a partial order on it is called a partially ordered set, or “poset.” The symbol X \geq Y by definition means Y\leq X. Importantly, note the use of the adjective “partial” – Definition 2.3 allows for the possibility that there are pairs of elements X,Y \in \Omega for which neither X \leq Y nor X \geq Y holds true, i.e. X,Y are incomparable. The axiomatic study of partially ordered sets is a branch of algebra called order theory.

A very basic example of a partial order is obtained by taking any set A and defining a partial order on the power set \Omega = \{X\subseteq A\} by inclusion:

X \leq Y \iff X \subseteq Y.

For example, if A=\{x,y,z\}, then \Omega consists of the 2^3=8 subsets of X ordered as in the following Hasse diagram:

The poset \Omega of subsets X of an arbitrary set A has an additional attribute: every pair of subsets X,Y \in \Omega has a minimum and a maximum. More precisely, if we define

\min(X,Y) := X \cap Y \quad\text{ and }\quad \max(X,Y) := X \cup Y,

then \min(X,Y) is the largest subset of A contained in both X and Y,

Z \leq X \text{ and }Z \leq Y \implies Z \leq \min(X,Y),

and \max(X,Y) is the smallest subset of A containing both X and Y,

Z \geq X \text{ and }Z \geq Y \implies Z \geq \max(X,Y).

A partially ordered set in which we have such a notion of greatest lower bound (min) and least upper bound (max) is called a lattice, and the power set \Omega of subsets of an arbitrary set A partially ordered by inclusion is such an object. Moreover, this lattice has a unique smallest element: the empty X=\emptyset set is the only subset of A satisfying X \leq Y for all Y \in \Omega. Similarly, the whole set X=A is the only set satisfying X \geq Y for all other Y, making it the unique largest element of \Omega.

We now want to use the partial order concept to organize the subalgebgras of a given algebra \mathcal{A}. Starting very simply, we could forget the structure of \mathcal{A} entirely and just view it as a set, which would result in the construction above with \Omega the power set of \mathcal{A}. Of course, since \mathcal{A} is a not a finite set \Omega is uncountably infinite.

Now let us remember the vector space structure on \mathcal{A} and accordingly take \Omega' \subseteq \Omega to be the set of all vector subspaces of \mathcal{A}, partially ordered by inclusion, so \Omega' is an induced subposet of \Omega. If we want to make \Omega' into a lattice we can still use the set-theoretic definition of \min as intersection, because the intersection of two subspaces is again a subspace. However, the union of two subspaces is not, and we have to modify the definition to

\max(V,W) = \mathrm{span}(V \cup W).

This makes \Omega' into a lattice with largest element \mathcal{A} and smallest element the zero subspace \{0\}, as the empty subset of \mathcal{A} is not a vector space.

Now let us remember the algebra structure on \mathcal{A} and declare \Omega'' \subseteq \Omega' to be the set of all subalgebras of \mathcal{A} partially ordered by inclusion. Then \Omega" is an induced subposet of \Omega', and once again taking \min to be intersection gives us a greatest lower bound operation.

Problem 2.2. Show that the intersection of any nonempty set \mathfrak{F} of subalgebras of \mathcal{A} is a subalgebra. Show moreover that if all members of the family \mathfrak{F} are commutative, so is their intersection.

However, our notion of least upper bound must be modified in order to make \Omega'' into a lattice, because the span of the union \mathcal{B} \cup \mathcal{C} of two subalgebras of \mathcal{A} is a subspace but not necessarily a subalgebra.

Definition 2.4. For any subset X \subseteq \mathcal{A}, define \mathrm{alg}(X) to be the intersection of all subalgebras of \mathcal{A} containing X. This is called the subalgebra generated by X. It contains all scalar products, sums, products, and conjugates of elements of X, and is the smallest subset of \mathcal{A} containing all of these elements.

Note that the set of subalgebras containing X is always nonempty, since it contains the whole algebra \mathcal{A}, and consequently Problem 2.1 legitimizes Definition 2.4 and allows us to define the least upper bound of a pair of subalgebras by

\max(\mathcal{B},\mathcal{C}) = \mathrm{alg}(\mathcal{B} \cup \mathcal{C}).

This makes \Omega'' into a lattice, the lattice of subalgebras of \mathcal{A}. The maximal element of this lattice is still \mathcal{A}, and its minimal element is \mathbb{C}I, since the zero space is not a subalgebra.

Finally, let \Omega''' be the set of all commutative subalgebras of \mathcal{A}, partially ordered by inclusion, an induced subposet of the lattice \Omega'' of algebras of \mathcal{A}. To latticize this we can keep the same greatest lower bound operation, but must modify least upper bound to

\max(\mathcal{B},\mathcal{C}) = \mathrm{calg}(\mathcal{B} \cup \mathcal{C}),

where the right hand should be the intersection of all commutative subalgebras of the ambient algebra \mathcal{A} which contain the set \mathcal{B} \cup \mathcal{C}. This definition will fail if one of B or C is not contained in any larger commutative subalgebra of \mathcal{A}.

Definition 2.5. A maximal abelian subalgebra (MASA) of \mathcal{A} is a commutative subalgebra \mathcal{B} \subseteq \mathcal{A} with the following property: if \mathcal{C} is a commutative subalgebra with \mathcal{B} \leq \mathcal{C}, then \mathcal{B}=\mathcal{C}.

Note that “abelian” is a synonym for “commutative” and the two are used interchangeably.

Problem 2.3. Prove that any two distinct MASAs are incomparable.

MASAs can be characterized using centralizers.

Theorem 2.2. An abelian subalgebra \mathcal{B} of an algebra \mathcal{A} is a MASA if and only if it is its own centralizer: Z(\mathcal{B},\mathcal{A})=\mathcal{B}.

Proof: Suppose first that \mathcal{B} is a MASA. Since \mathcal{B} is abelian we have \mathcal{B} \subseteq Z(\mathcal{A},\mathcal{B}). If \mathcal{B} is a proper subset of its centralizer, then there exists C \in Z(\mathcal{A},\mathcal{B}) such that C \not\in \mathcal{B}. In this case, the algebra generated by \{C\} \cup \mathcal{B} is a commutative subalgebra of \mathcal{A} properly containing \mathcal{B}, and this contradicts the maximality of \mathcal{B}.

Conversely, suppose \mathcal{B} is an abelian subalgebra of \mathcal{A} such that \mathcal{B} = Z(\mathcal{A},\mathcal{B}). Then for a commutative subalgebra \mathcal{C} \geq \mathcal{B} we have

\mathcal{C} \leq Z(\mathcal{A},\mathcal{C}) \leq Z(\mathcal{A},\mathcal{B}) = \mathcal{B},

so we also have \mathcal{C} \leq \mathcal{B} (note that the conclusion of Problem 2.1 was used here) .

-QED

Math 220A: Lecture 0

Schedule

LectureDateTopicModality
0S 23Course descriptionIn person
1S 26Numbers: complexIn person
2S 28Numbers: hypercomplexIn person
3S 30CodaOnline
4O 03Formal power series: algebra structureIn person
5O 05Formal power series: group structureIn person
6O 07CodaOnline
7O 10Convergent power series: algebra structureIn person
8O 12Convergent power series: group structureIn person
9O 14CodaOnline
10O 17Analytic functions: local inverse function theoremIn person
11O 19Analytic functions: local maximum modulus principleIn person
12O 21CodaOnline
13O 24Convergent power series: partial sumsIn person
14O 26Convergent power series: Jentzsch’s theoremIn person
15O 28CodaOnline
16O 31Exponential function: partial sumsIn person
17N 02Exponential function: Szego’s theoremIn person
18N 04CodaOnline
19N 07Analytic continuation: logarithmIn person
20N 09Analytic continuation: Hadamard gap theoremIn person
N 11Veteran’s Day
21N 14
22N 16
23N 18
24N 21
25N 23
N 25Thanksgiving
26N 28
27N 30
28D 02
Schedule subject to change.

Books

Articles

  • Buckholtz, A characterization of the exponential function. American Mathematical Monthly 73 (1966), 121-123.
  • Gronwall, On the power series for \log(1+z). Annals of Mathematics 18 (1916), p. 70-73.
  • Palais, The classification of real division algebras, American Mathematical Monthly 75 (1968), 366-368.

Evaluation

  • Problem sets: 70%
  • Final exam: 30%

Instructors

  • Lecturer: Jonathan Novak, APM 7157. Office hours MW after lecture.
  • Teaching Assistant: Shubham Sinha.

Communication

  • Piazza. Public posts preferred, private messages when necessary.
  • No email.