## Math 202C: Lecture 21 -The sensitivity conjecture

##### Preliminary definitions and Notations

First up, a list of preliminary notations and definitions of terms that recur throughout this blog post.

• $Q^n$ denotes the $n$-dimensional hypercube graph whose vertex set consists of all strings in $\{-1,1\}^n$ and two vertices are adjacent iff the corresponding strings differ exactly in one-coordinate.
• For an undirected graph $G$, the maximum degree of a vertex in $G$ is denoted by $\Delta(G)$ and $\lambda_1(G)$ denotes the largest eigenvalue of the adjacency matrix of $G$.
• For $x \in \{-1,1\}^n$ and a subset $S$ of indices from $[n] = \{1, 2, \cdots, n\}$, the string obtained from $x$ by flippling all entries of $x$ corresponding to the indices in $S$ is denoted by $x^S$.
• Any function $f : \{-1,1\}^n \rightarrow \{-1,1\}$ is called a boolean function.

Definition (Local sensitivity and sensitivity). Given a boolean function $f : \{-1,1\}^n \rightarrow \{-1,1\}$, the local sensitivity on the input $x$, denoted by $s(f,x)$, is defined as the number of indices $i$, such that $f(x) \neq f(x^{\{i\}})$, and the sensitivity $s(f)$ of $f$ is $\max_x s(f,x)$.

Definition (Local block sensitivity and sensitivity). Given a boolean function $f : \{-1,1\}^n \rightarrow \{-1,1\}$, the local block sensitivity on the input $x$, denoted by $bs(f,x)$ is defined as the maximum number of disjoint blocks $B_1, \cdots B_k$ of $[n]$ such that for each $B_i, f(x) \neq f(x^{B_i})$, and the block sensitivity $bs(f)$ of $f$ is $\max_x bs(f,x)$.

#### The sensitivity conjecture and the tale of three theorems

(Sensitivity conjecture) There exists an absolute constant $C >0$, such that for every boolean function $f$, $bs(f) \leq s(f)^C$.

Sensitivity and block sensitivity are examples of complexity measures for boolean functions. Two complexity measures $A$ and $B$ are said to be polynomially related to each other if there exist polynomials $p(t), q(t)$ such that for any Boolean function $f$, we have $A(f) \leq p(B(f))$ and $B(f) \leq q(A(f))$. The block sensitivity is known to be polynomially related to other complexity measures of boolean functions such as the degree (to be defined later). However, the question (a.k.a the sensitivity conjecture) of whether the block sensitivity is polynomially related to the sensitivity was posed as an open problem by Nisan and Szagedy in 1989 (Note that $bs(f) \geq s(f)$ by a direct consequence of their definitons. The open problem was showing whether $bs(f) \leq s(f)^C$ for some $C >0$). A few years later in 1992, Gotsman and Linial [2] showed the equivalence of the sensitivity conjecture to another open problem on the hypercube. Finally in 2019, Hao Huang resolved the sensitivity conjecture by leveraging on the equivalence shown by Gotsman and Linial.

#### Cue harmonic analysis on finite abelian groups

In lecture post 7 by Prof. Novak, we saw that $\mathrm{B}(n)$ is isomorphic to the group $\mathbb{Z}_2^n$, i.e. bitstrings of length $n$ (Recall that $B(n)$ denotes the subgroup of $\mathrm{S}(2n)$ generated by the n disjoint transpositions $\tau_1=(1\ 2),\ \tau_2=(3\ 4),\ \dots, \tau_n = (2n-1\ 2n)).$ Thus, a boolean function $f: \{-1,1\}^n \rightarrow \{-1,1\}$, can be identified with a member of the Hilbert space $\mathbb{C}^{ \mathrm{B}(n) }$ and the rich tools of harmonic analysis on $B(n)$ discussed in lecture posts 7-9 are applicable to boolean functions. Note that the hypercube $Q^n$ is the subgraph of the Caley graph of $S(2n)$ induced by the subgroup $B(n)$. Thus, the sensitivity conjecture can be rephrased in the language of induced subgraphs of the hypercube $Q^n$ and this is exactly the essence of the work done by Gotsman and Linial in 1992.

Before we dive into the details, a few (post)-preliminary notations and remarks.

• Given an induced subgraph $H$ of $Q^n$, let $Q^n- H$ denote the subgraph of $Q^n$ induced on the vertex set $V(Q) \backslash V(H)$.
• For $S \subseteq [n] := \{1,2, \cdots, n\}$, the function $\chi_S : \{-1,1\}^n \rightarrow \{+1,-1\}$ is defined as $\chi_S(x) = \Pi_{i \in S}x_i$, with the convention that for all $x, \: \: \chi_{\phi}(x) = 1$ where $\phi$ denotes the empty set.
• In lecture post 7 (and also in the qual exam) we saw that the functions $\{\chi_S\}_{S \subseteq [n]}$ (which exactly correspond to the characters of $B(n)$ since $\mathbb{Z}_2^n$ is isomorphic to $\mathrm{B}(n)$) form an orthogonal basis for $\mathbb{C}^{ \mathbb{Z}_2^n }$ under the inner product $\langle f,g \rangle = \sum_{x \in \{-1,1\}^n} f(x)g(x)$.
• Thus, given a boolean function $f$, by Fourier expansion, $f$ can be written as $f(x) = \sum_{S \subseteq [n]} \hat{f}(S)\chi_S(x)$, where $\hat{f}$ denotes the Fourier coefficients.
• The average of a boolean function $f$ on $Q^n$ is denoted as $\mathbf{E}(f) : = \frac{1}{2^n}\sum_{x \in Q^n}f(x)$. Note that since $\chi_{\phi}(x) = 1$ for all $x$, we have the identity $\hat{f}(\phi) = \mathbf{E}(f)$ which is quite useful when proving theorem 1 (see below).

Definition. Given a boolean function $f$, the degree of $f$ is defined as $deg(f) := \max _{S \subseteq [n]}\{|S| \: | \: \hat{f}(S) \neq 0\}$.

Theorem 1 [Gotsman and Linial]. The following two statements are equivalent for any monotone strictly increasing function $h: \mathbb{N} \rightarrow \mathbb{R}$.
a. For any induced subgraph $H$ of $Q^n$ with $|V(H)| \neq 2^{n-1}$, we have $\max \{ \Delta(H), \Delta(Q^n - H)\} \geq h(n)$.
b. For any boolean function $f$, we have that $s(f) \geq h(deg(f))$.

Gotsman and Linial show that proving the equivalence of statements a and b above is equivalent to proving the equivalence of yet another two statements and then proceed to prove this second equivalence.

Proof of Theorem 1. First, we show the equivalence of statement a to following statement a’:

a’. For any boolean function $g$, $\mathbf{E(g)} \neq 0$ implies there exists an input $x$ such that $n - s(g,x) \geq h(n)$.

a’ implies a : Given an induced subgraph $H$ of $Q^n$, it can be associated with a corresponding boolean function $g$ defined such that $g(x) = 1$ iff $x \in V(H)$. By a direct consequence of the definitions of adjacency in $Q^n$ and the local sensitivity of $g, \: \: \text{deg}_H(x) = n - s(g,x)$ for $x \in V(H)$ and $\text{deg}_{Q^n -H}(x) = n - s(g,x)$ for $x \in V(Q) \backslash V(H)$ (Note that here $\text{deg}_H(x)$ denotes the degree of the vertex $x$ in the subgraph $H$ and is not to be confused with the degree of a boolean function that was defined earlier). Thus, $\max \{ \Delta(H), \Delta(Q^n - H)\} = \max_x \{n - s(g,x)\}$. Let $\mathbf{E}(g)$ denote the average value of $g$ on $Q^n$, i.e. $\mathbf{E}(g) = \frac{1}{2^n}\left[ \sum_{x \in V(H)}g(x) + \sum_{x \in V(Q)\backslash V(H)}g(x) \right]$. Note that $V(H) \neq 2^{n-1}$ implies $\mathbf{E}(g) \neq 0$. Thus, statement a’ implies $\max \{ \Delta(H), \Delta(Q^n - H)\} = \max_x \{n - s(g,x)\} \geq h(n)$.

a implies a’ : Given a boolean function $g$, define $H$ to be the subgraph induced by the set of vertices $\{x \in Q^n \: | \: g(x) = 1\}$. By an identical arument as in the previous paragraph, $\max \{ \Delta(H), \Delta(Q^n - H)\} = \max_x \{n - s(g,x)\}$. Also, $\mathbf{E}(G) \neq 0$ implies $V(H) \neq 2^{n-1}$. Thus, statement a implies $\max \{ \Delta(H), \Delta(Q^n - H)\} = \max_x \{n - s(g,x)\} \geq \sqrt{n}$ thereby showing the equivalent of statements a and a’.

Now, we show the equivalence of statement b to statement b’:

b’. For any boolean function $f$, $s(f) < h(n)$ implies $deg(f) < n$.

Assuming b and that the premise for b’ is true, then $h(deg(f)) < s(f) < h(n)$ and since $h$ is assumed to be monotonic strictly increasing this shows that $deg(f) < n$ and thus b implies b’. Now, towards showing the other implication let $f$ denote a boolean function with degree $d \leq n$. Assume without loss of generality that for $S =\{1, \cdots, d\}, \: \: \hat{f}(S) \neq 0$. Now, consider the function $g : \{-1,1\}^d \rightarrow \{-1,1\}$ defined as $g(x_1, x_2, \cdots x_d) := f(x_1, x_2, \cdots, x_d, -1, -1, \cdots -1)$. By construction $s(f) \geq s(g)$ and $deg(g) = d$, i.e $g$ has full degree as a boolean function on $Q^d$. Now, the contrapositive of b’ says that $deg(g) = d$ implies $s(g) \geq h(d)$. Thus, $s(f) \geq s(g) \geq h(d) = h(deg(f))$ and this finshes showing the equivalence of b and b’.

Thus, proving theorem 1 has been reduced to the task of proving the equivalence of statements a’ and b’. Now, given a boolean function $f$, define another boolean function $g(x) = f(x)\chi_{[n]}(x)$ where $\chi_{[n]}(x) = \Pi_{i=1}^nx_i$ is the parity function of $x$. Note that $\hat{g}(S) = \frac{1}{2^n}\sum_{x \in Q^n}f(x) \chi_{[n]}(x) \chi_S(s)$. Thus,

$\hat{g}(S) = \frac{1}{2^n}\sum_{x \in Q^n}f(x) (\Pi_{i=1}^nx_i) (\Pi_{i \in S}x_i) = \frac{1}{2^n}\sum_{x \in Q^n}f(x) \Pi_{i \notin S}x_i = \hat{f}([n] \backslash S)$

Thus, for all $S \subseteq [n]$, $\hat{g}(S) = \hat{f}([n]\backslash S)$. Also note that for all $x \in Q^n$, $s(g,x) = n- s(f,x)$ because every bit flip which causes a change in value of $f(x)$ does not cause a change in function value of $g(x)$ and vice-versa. Thus, $\mathbf{E}(g) = \hat{g}(\phi) = \hat{f}([n])$ where $\phi$ denotes the empty set.

a’ implies b’: Towards a contradiction assume that $s(f) < h(n)$ and $deg(f) = n$ . This by definition implies $\hat{f}([n]) \neq 0$ and thus $\mathbf{E}(g) \neq 0$. By a’, there exist a $x$ such that $s(g,x) \leq n - h(n)$ and since $s(g,x) = n- s(f,x)$ this implies there exists a $x$ such that $s(f,x) \geq h(n)$ which contradicts the premise that $s(f) < h(n)$.

b’ implies a’: Again towards a contradiction assume that given $g$ such that $\mathbf{E}(g) \neq 0$ we have that for all $x, \: s(g,x) > n- h(n)$. Since $s(g,x) = n- s(f,x)$ this implies that $s(f) < h(n)$ which by 2′ implies $deg(f) . Note that $deg(f) < n$ is equivalent to $0 = \hat{f}([n])$ and thus $\hat{f}([n]) = \hat{g}(\phi) = \mathbf{E}(g) = 0$ which contradicts the premise that $\mathbf{E}(g) \neq 0$ and this completes the proof of theorem 1.

Theorem 2 [Huang] . For every integer $n \geq 1$, let $H$ be an arbitrary $(2^{n-1}+1)$-vertex induced subgraph of $Q^n$, then

$\Delta(H) \geq \sqrt{n}$

The proof of theorem 2 is going to be the main focus for the remainder of this blogpost but before that we discuss two corollaries which show why theorem 2 resolves the sensitivity conjecture.

Corollary 1. For every boolean function $f$, $s(f) \geq \sqrt{deg(f)}$.

Proof of Corollary 1. For any induced subgraph $H$ of $Q^n$ with $|V(H)| \neq 2^{n-1}$, since the total number of vertices in $Q^n$ is $2^n$, either $H$ contains atleast $2^{n-1}+1$ vertices or $Q^n - H$ does. So without loss of generality, assume $H$ contains atleast $2^{n-1}+1$ vertices (If not, just replace $H$ with $Q^n -H$ in the remainder of the proof). Now, theorem 2 (along with the fact that the maximum degree $\Delta(H)$ is non-decreasing in the number of vertices in $H$) implies $\max \{ \Delta(H), \Delta(Q^n - H)\} \geq \sqrt{n}$. Let $h(n) := \sqrt{n}$ and note that it is a monotone strictly increasing function on $\mathbb{N}$. Now, theorem 1 implies $s(f) \geq \sqrt{deg(f)}$ and this completes the proof.

Theorem 3 [Tal]. For every boolean function $f$, $bs(f) \leq deg(f)^2$.

It should be noted that theorem 3 is an imporvement on the work by Nisan and Szegedy in 1992 wherein they showed that $bs(f) \leq 2deg(f)^2$.

Corollary 2. For every boolean function $f$, $bs(f) \leq s(f)^4$.

Note that corollary 2 follows immediately from theorem 3 and corollary 1. Thus, corollary 2 confirms the sensitivity conjecture by showing that the block sensitivity and the sensitivity of a boolean function are indeed polynomially related.

#### Cue matrix analysis – “A proof that came straight from THE Book”

In this section, we will focus on Hao Huang’s proof of theorem 2. The core chunk of the proof invovles a creative construction of an iterative sequence of matrices whose $n^{th}$ instance is closely related to the adjacency matrix of $Q^n$ and an application of Cauchy’s interlace theorem which is stated as follows,

(Cauchy’s interlace theorem) Let $A$ be a symmetric $n \times n$, $B$ be a $m \times m$ principal submatrix of $A$ for some $m. If the eigenvalues of $A$ are $\lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_n$, and the eigenvalues of $B$ are $\mu_1 \geq \mu_2 \geq \cdots \geq \mu_n$, then for all $1 \leq i \leq m$,

$\lambda_i \geq \mu_i \geq \lambda_{i+n-m}$

Recall that we saw in Math 202A that Cauchy’s interlace theorem can be proven via a direct application of the Courant-Fischer theorem.

Now, we move on to the iterative matrix sequence that turns out to be a bite-sized kryptonite for the sensitivity conjecture. Consider the sequence of symmetric square matrices defined iteratively as follows,

$B_1 := \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix}, \qquad B_n :=\begin{pmatrix} B_{n-1} & I \\ I & -B_{n-1} \end{pmatrix}$

The first exercise is to study the spectrum of $B_n$ and the second exercise is to compute an upper bound for the largest eigenvalue $\lambda_1$ of a specific type of matrix. These two exercises then set us up nicely for a clever application of Cauchy’s interlace theorem to prove theorem 2.

Exercise 1.
a)
Show that $B_n^2 = nI$ where $I$ denotes the $2^n \times 2^n$ identity matrix.
b) Show that the trace of $B_n$ is $0$.
c) Conclude that the eigenvalues of $B_n$ are $\sqrt{n}$ with multiplicity $2^{n-1}$ and $-\sqrt{n}$ with multiplicity $2^{n-1}$ .

Exercise 2. Suppose $H$ is a $m$-vertex undirected graph and $B$ is a symmetric matrix whose entries are in $\{-1,0,1\}$, and whose rows and columns are indexed by the vertex set of $H$. Also assume $B_{u,v} = 0$ whenever $u$ and $v$ are non-adjacent vertices in $H$. Then show that

$\Delta(H) \geq \lambda_1 := \lambda_1(B)$

The defintion of this iterative sequence of matrices $\{B_j\}_1^n$ is not without precedent. Note that the adjacency matrix of $Q^n$ can also be defined in a similarly iterative manner. Recall that a vertex of $Q^n$ is a string in $\{-1,1\}^n$. Given a $(n-1)$-length string of -1s and 1s it naturally extends to exactly two strings in $\{-1,1\}^n$ and these two $n$-strings correspond to the only two vertices of $Q^n$ whose corresponding strings have their first $n-1$ entries to be the same as the original $n-1$ string (note that these two vetices of $Q^n$ are adjacent). Exapnding on this observation, the hypercube $Q^n$ can be thought of as having two $Q^{n-1}$ sub-hypercubes with a perfect matching connecting these two subcubes. In otherwords, consider the following sequence of iteratively defined matrices,

$A_1 := \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix}, \qquad A_n :=\begin{pmatrix} A_{n-1} & I \\ I & A_{n-1} \end{pmatrix}$

Then $A_n$ represents the adjacency matrix of $Q^n$ and in particular, note that $A_n$ is equal to the matrix obtained by changing every $(-1)$ entry of $B_n$ to $1$.

Now that we have all the necessary ingredients, let’s discuss the proof of Theorem 2.

(Restated) Theorem 2 [Huang] . For every integer $n \geq 1$, let $H$ be an arbitrary $(2^{n-1}+1)$-vertex induced subgraph of $Q^n$, then

$\Delta(H) \geq \sqrt{n}$

Proof of Theorem 2 [Haung].

Let $B_n$ be the sequence of matrices that we defined above and let $H$ be a $(2^{n-1} + 1)$-vertex induced subgraph of $Q^n$. Note that $H$ naturally induces a principal submatrix $B_H$ of $B_n$ (written with respect to an appropriate permutation of the standard basis). As discussed in the preceeding paragraph, by changing every $(-1)$ entry of $B_n$ to 1, we get the adjacency matrix of $Q^n$. So, in particular, $B_H$ and $H$ satisfy the conditions of the statement in Exercise 2 and thus,

$\Delta(H) \geq \lambda_1(B_H)$

Also, since $B_H$ is a $(2^{n-1} + 1) \times (2^{n-1} + 1)$ principal submatrix of the $2^n \times 2^n$ matrix $B_n$, by Cauchy’s interlace theorem,

$\lambda_1(B_H) \geq \lambda_{2^{n-1}}(B_n)$

From Exercise 1, we know that the eigenvalues of $B_n$ are $\sqrt{n}$ with multiplicity $2^{n-1}$ and $-\sqrt{n}$ with multiplicity $2^{n-1}$. Thus, $\lambda_{2^{n-1}}(B_n) = \sqrt{n}$ and thus,

$\Delta(H) \geq \lambda_1(B_H) \geq \lambda_{2^{n-1}}(B_n) = \sqrt{n}$

and this completes the proof.

#### Closing Remarks

I will end this blog post for our Math 202C class with a quote from Scott Aaronson’s blog post about Hao Huang’s proof of the sensitivity conjecture.

Paul Erdös famously spoke of a book, maintained by God, in which was written the simplest, most beautiful proof of each theorem. The highest compliment Erdös could give a proof was that it “came straight from the book.” In this case, I find it hard to imagine that even God knows how to prove the Sensitivity Conjecture in any simpler way than this.

-Scott Aaronson

The funny thing is, a few days after Prof. Aaronson’s blogpost, his former student Shalev Ben-David posted in the comments section, a simplication of Huang’s proof which does not use Cauchy’s interlacing theorem.

Anyways, talking about the sensitivity conjecture is indeed quite a fitting end to the Math 202 series since both, Gotsman & Linial’s work and Hao Huang’s work on this 30 year old ex-open problem utilize almost exclusively, concepts which we have studied over the past three quarters.

Thanks for reading my post and have a great summer everyone!

##### References
• [1] Huang, H. (2019). Induced subgraphs of hypercubes and a proof of the sensitivity conjecture. Annals of Mathematics190(3), 949-955.
• [2] N. Nisan, M. Szegedy, On the degree of Boolean functions as real polynomials, Comput. Complexity, 4 (1992), 462–467.
• [3] C. Gotsman, N. Linial, The equivalence of two problems on the cube, J. Combin.Theory Ser. A, 61 (1) (1992), 142–146.
• [4] F. Chung, Z. F¨uredi, R. Graham, P. Seymour, On induced subgraphs of the cube, J. Comb. Theory, Ser. A, 49 (1) (1988), 180–187.
• [5] A. Tal, Properties and applications of boolean function composition, Proceedings of the 4th conference on Innovations in Theoretical Computer Science, ITCS ’13, 441–454.
• [6] Karthikeyan, Rohan & Sinha, Siddharth & Patil, Vallabh. (2020). On the resolution of the sensitivity conjecture. Bulletin of the American Mathematical Society. 57. 1. 10.1090/bull/1697.
• [7] https://www.scottaaronson.com/blog/?p=4229
• [8] Lecture notes for Math 202A by Prof. Philip Gill.
• [9] D. Kroes, J. Naranjo, J. Nie, J. O’ Neill, N. Seiger, S. Spiro, E. Zhu, Linear Algebra methods in Combinatorics, ABACUS notes Fall 2019 UCSD.

## Math 202C: Lecture 20 – Uncertainty Principle

Author: Zhaolong Han

In this lecture, we will introduce the Fourier transform on finite Abelian group and a discrete version of “uncertainty principle” based on the chapter 2 of “Discrete Harmonic Analysis” by Ceccherini-Sillberstein[1]. In quantum mechanics, there is a well-known uncertainty principle saying that the more precisely the position of some particle is determined, the less precisely its momentum can be predicted from initial conditions, and vice versa.
For finite Abelian group, there is a theorem that assimilates this uncertainty principle, so people refer to this group-theoretic version as “uncertainty principle” too. However, our focus today will be Tao’s uncertainty principle for cyclic group of prime order[2], which is a refinement of the discrete uncertainty principle due to the special structure of the group.

First, let us introduce some notations and redefine the Discrete Fourier transform as Andrew did in Lecture 18. In fact, these materials already appeared in Lecture 8, where we studied the concrete example of the the rank $k$ hypercube group $B(k)$.

Let $X$ be a finite set and denote by $L(X)=\{f:X\to \mathbb{C}\}$ the vector space of all complex -valued functions defined on $X$. Then $\mathrm{dim}\ L(X)=|X|$, where $|\cdot|$ denotes cardinality.

For $x\in X$ we denote by $\delta_x$ the Dirac function centered at $x$, that is , the element $\delta_x\in L(X)$ defined by $\delta_x(y)=1$ if $y=x$ and $\delta_x(y)=0$ if $y\neq x$ for all $y\in X$.

The set $\{\delta_x:x\in X\}$ is a natural basis for $L(X)$ and if $f\in L(X)$ then $f=\sum_{x\in X}f(x)\delta_x$.

The space $L(X)$ is endowed with the scalar product defined by setting

$\langle f_1,f_2\rangle=\sum_{x\in X}f_1(x)\overline{f_2(x)}$

for $f_1,f_2\in L(X)$, and we denote by $||f||=\sqrt{\langle f,f\rangle}$ the norm of $f\in L(X)$. Note that ${\delta_x:x\in X}$ is an orthonormal basis of $L(X)$ with respect to $\langle\cdot\rangle$.

Let $A$ be a finite Abelian group. A character of $A$ is a map $\chi:A\to\mathbb{T}$ such that

$\chi(x+y)=\chi(x)\chi(y)$

for all $x,y\in A$. The set $\widehat{A}$ of all characters of $A$ is an Abelian group with respect to the product $(\chi,\psi)\mapsto \chi\cdot\psi$ defined by $(\chi\cdot\psi)(x)=\chi(x)\psi(x)$ for all $x\in A$. $\widehat{A}$ is called the dual of $A$.

For $A=\mathbb{Z}_n$, where $\mathbb{Z}_n$ is the cyclic group of order $n$, $n\ge 2$, $n\in \mathbb{N}$. Let $\omega:=e^{\frac{2\pi i}{n}}$. For $x\in \mathbb{Z}_n$, define $\chi_x\in L(\mathbb{Z}_n)$ by the function $z\mapsto \omega^{zx}$.

Exercise 1: Check that for $x\in \mathbb{Z}_n$, $\chi_x$ is indeed a character of $\mathbb{Z}_n$, and $\widehat{\mathbb{Z}_n}=\{\chi_x:x\in \mathbb{Z}_n\}$ is isomorphic to $\mathbb{Z}_n$.

Now we define the Discrete Fourier transform. Let $A$ be a finite Abelian group. The Fourier transform of a function $f\in L(A)$ is the function $\hat{f}\in L(\widehat{A})$ defined by

$\widehat{f}(\chi)=\sum_{y\in A}f(y)\overline{\chi(y)}$

for all $\chi\in \widehat{A}$. Then $\widehat{f}(\chi)$ is called the Fourier coefficient of $f$ with respect to $\chi$. When $A=\mathbb{Z}_n$ and $f\in L(\mathbb{Z}_n)$, we call $\frac{1}{n}\hat{f}$ the Discrete Fourier transform (DFT) of $f$. Specifically, for $x\in\mathbb{Z}_n$ and $\omega=e^{\frac{2\pi i}{n}}$, the DFT of $f$ is

$\hat{f}(x)=\frac{1}{n}\sum_{y\in\mathbb{Z}_n}f(y)\omega^{-xy}.$

The uncertainty principle is the following (see Theorem 2.5.4 of [1]):
Theorem (Uncertainty principle): Let $f\in L(A)$ and suppose that $f\neq 0$. Then

$|\mathrm{supp}(f)|\cdot|\mathrm{supp}\hat{f}|\ge |A|.$

Now we come to the main part of this lecture: Tao’s uncertainty principle for cyclic group of prime order. This improves the inequality in the above discrete uncertainty principle when the finite Abelian group $A$ is cyclic with prime order $p$. To prove the theorem, we present some specific tools developed in Tao’s paper [2].

The main theorem in [2] is the following:
Theorem (Tao’s uncertainty principle): Let $p$ be a prime number and $f\in L(\mathbb{Z}_p)$ non-zero. Then

$|\mathrm{supp}(f)|+|\mathrm{supp}(\hat{f})|\ge p+1.$

Conversely, if $\emptyset\neq A,A'\subset \mathbb{Z}_p$ are two subsets such that $|A|+|A'|=p+1$, then there exists $f\in L(\mathbb{Z}_p)$ such that $\mathrm{supp}(f)=A$ and $\mathrm{supp}(\hat{f})=A'$.

To prove the theorem, we present some specific tools developed in Tao’s paper [2].

Recall that $\mathbb{Z}[x]$ denotes the ring of polynomials with integer coefficients.

Proposition 1:
Let $P(x_1,x_2,\cdots,x_n)$ be a polynomial in the variables $x_1,x_2,\cdots,x_n$ with integer coefficients. Suppose that for some $i\neq j$,

$P(x_1,x_2,\cdots,x_n)|_{x_i=x_j}\equiv 0.$

Then there exists a polynomial $Q(x_1,x_2,\cdots,x_n)$ with integer coefficients such that

$P(x_1,x_2,\cdots,x_n)=(x_i-x_j)Q(x_1,x_2,\cdots,x_n)$.

Remark: The proof of this proposition can be omitted for the first reading. Instead, considering $P(x_1,x_2)=x_1^2+x_1x_2-2x_2^2$ and trying to find corresponding $Q(x_1,x_2)$ by using this proposition would be helpful for understanding.

Proof:
For the sake of simplicity, suppose that $i = 1$ and $j = 2$ so that $P(x_1,x_2,\cdots,x_n) \equiv 0$. Let us denote by $P_1(x_1,x_2,\cdots,x_n)$ (respectively $P_2(x_1,x_2,\cdots,x_n)$) the sum of the monomials of $P(x_1,x_2,\cdots,x_n)$ with positive (respectively negative) coefficients so that

$P(x_1,x_2,\cdots,x_n) = P_1(x_1,x_2,\cdots,x_n) + P_2(x_1,x_2,\cdots,x_n).$

Note that

$P_1(x_1,x_1,x_3,\cdots,x_n) =- P_2(x_1,x_1,x_3,\cdots,x_n),$

since $P(x_1 , x_1 ,x_3, \cdots , x_n ) \equiv 0$. This implies that there exists a bijection between the monomials in $P_1(x_1,x_1,x_3,\cdots,x_n)$ and those in $P_2(x_1,x_1,x_3,\cdots,x_n)$. More precisely, let us fix $m > 0$ and $k, k_3 , \cdots, k_n \ge 0$; then the monomial $mx_1^k x_3^{k_3}\cdots x_n^{k_n}$ appears in $P_1(x_1,x_1,x_3,\cdots,x_n)$ if and only if $-mx_1^k x_3^{k_3}\cdots x_n^{k_n}$ appears in $P_2(x_1,x_1,x_3,\cdots,x_n)$. Suppose this is the case. Then we can find $m_0 , m_1 , \cdots , m_k$ and $n_0 , n_1 , \cdots, n_k$ non-negative integers such that the sum of the monomials of $P(x_1,x_1,x_3,\cdots,x_n)$ whose variables $x_i$ have degree $k_i$ for $i = 1, 2,\cdots , n$ and $k_1 + k_2 = k$ is

$\sum_{l=0}^k m_lx_1^{k-l}x_2^lx_3^{k_3}\cdots x_n^{k_n}-\sum_{l=0}^k n_lx_1^{k-l}x_2^lx_3^{k_3}\cdots x_n^{k_n}$

and

$m_0+m_1+\cdots+m_k=n_0+n_1+\cdots+n_k=m$

but also such that

$m_l\neq 0\implies n_l=0\quad n_l\neq 0\implies m_l\neq 0$

(because otherwise there would be a cancellation). Thus, with every monomial $x_1^{k-l}x_2^lx_3^{k_3}\cdots x_n^{k_n}$ such that $m_l\neq 0$ we can (arbitrarily but bijectively) associated a monomial $x_1^{k-h}x_2^lx_3^{k_3}\cdots x_n^{k_n}$ with $m_h\neq 0$ and $h\neq l$. Now for $h>l$ we have the identity

$x_1^{k-l}x_2^l-x_1^{k-h}x_2^h=x_2^lx_1^{k-h}(x_1^{h-l}-x_2^{h-l})$

$=x_2^lx_1^{k-h}(x_1-x_2)(x_1^{h-l-1}+x_1^{h-l-2}x_2+\cdots+x_2^{h-l-1}).$

Exchanging $h$ with $l$ we get the analogous identity for $h. This shows that

$\sum_{l=0}^k m_lx_1^{k-l}x_2^lx_3^{k_3}\cdots x_n^{k_n}-\sum_{l=0}^k n_lx_1^{k-l}x_2^lx_3^{k_3}\cdots x_n^{k_n}$

is divisible by $x_1-x_2$.

Repeating the argument for each monomial $mx_1^kx_3^{k_3}\cdots x_n^{k_n}$ with $m>0$ and $k,k_3,\cdots,k_n\ge 0$ appearing in $P_1(x_1,\cdots,x_n)$, we deduce that $P(x_1,\cdots,x_n)$ is divisible by $x_1-x_2$.

Lemma 1: Let $p$ be a prime, $n$ a positive integer, and $P(x_1,x_2,\cdots,x_n)$ a polynomial with integer coefficients. Suppose that $\omega_1,\cdots,\omega_n$ are (not necessarily distinct) $p$th roots of unity such that $P(\omega_1,\cdots,\omega_n)=0$. Then $P(1,\cdots,1)=0$ is divisible by $p$.
Proof:
Letting $\omega=e^{\frac{2\pi i}{p}}$ we can find integers $0\le k_j\le p-1$ such that $\omega_j=\omega^{k_j}$ for $j=1,2,\cdots,n$.

Define the polynomials $q(x),r(x)\in\mathbb{Z}[x]$ by settinig

$P(x^{k_1},\cdots,x^{k_n})=(x^p-1)q(x)+r(x)$

where $\mathrm{deg}r. Then $r(\omega)=0$ and since $\mathrm{deg}r we deduce that $r(x)$ is a multiple of the minimal polynomial of $\omega$, that is, $r(x)=m(1+x+\cdots+x^{p-1})$ for some $m\in\mathbb{Z}$, where we used a fact in undergraduate algebra that the minimal polynomial of $\omega$ is $1+x+\cdots+x^{p-1}$. It follows that $P(1,\cdots,1)=r(1)=mp$.

Theorem 3(Chebotarev):
Let $p$ be a prime and $1\le n\le p$. Let $\eta_1,\cdots,\eta_n$ (respectively $\xi_1,\cdots,\xi_n$) be the distinct elements in $\{0,1,\cdots,p-1\}$. Then the matrix

$A=\left(e^{\frac{2\pi i\eta_h\xi_k}{p}}\right)_{h,k=1}^n$

is nonsingular.

Proof: Set $\omega_h=e^{\frac{2\pi i\eta_h}{p}}$ for $h=1,2,\cdots,n$. Note that the $\omega_h$s are distinct $p$th root of unity and $A=\left(\omega_h^{\xi_k}\right)_{h,k=1}^n$. Define the polynomial $D(x_1,\cdots,x_n)$ with integer coefficients by setting

$D(x_1,\cdots,x_n)=\mathrm{det}\left(x_h^{\xi_k}\right)_{h,k=1}^n.$

As the determinant is an alternating form, we have $D(x_1,\cdots,x_n)|{x_h=x_k}\equiv 0$ whenever $1\le h\neq k\le n$, so that, by recursively applying Proposition 1, we can find a polynomial $Q(x_1,\cdots,x_n)$ with integer coefficients such that

$D(x_1,\cdots,x_n)=Q(x_1,\cdots,x_n)\prod_{1\le h

To prove the theorem, it is equivalent to show that $Q(\omega_1,\cdots,\omega_n)\neq 0$ (because $\omega_h$s are all distinct) so that, by Lemma 1 it suffices to show that $p$ does not divide $Q(1,\cdots,1)$. For this we need the next three lemmas to complete this proof. Let us first introduce some notations.

Given an $n$-tuple $\mathbf{k}=(k_1,\cdots,k_n)$ of non-negative integers, we say that the (monomial) differential operator

$L=\mathbf{L_k}=\left(x_1\frac{\partial}{\partial x_1}\right)^{k_1}\cdots\left(x_n\frac{\partial}{\partial x_n}\right)^{k_n}$

is of type $\mathbf{k}$ and order $o(\mathbf{k})=k_1+\cdots+k_n$.

Lemma 2:
Let $L$ be a differential operator of type $\mathbf{k}$ and $F(x_1,\cdots,x_n)$ and $G(x_1,\cdots,x_n)$ two polynomials. Then

$L(FG)=\sum_{(\mathbf{i},\mathbf{j})}L_{\mathbf{i}}(F)\cdot L_{\mathbf{j}}(G)$

where the sum runs over all pairs $(\mathbf{i},\mathbf{j})$ such that (componentwise) $\mathbf{i}+\mathbf{j}=\mathbf{k}$ (and therefore $o(\mathbf{i})+o(\mathbf{j})=o(\mathbf{k})$).
Proof:
Left as as exercise.

Exercise 2: Prove Lemma 2. (Hint: use induction on the order of $L$.)

Lemma 3:
For $1\le j\le n$ and $1\le h\le j-1$ we have

$\left(x_j\frac{\partial}{\partial x_j}\right)^{h}(x_j-x_1)(x_j-x_2)\cdots (x_j-x_{j-1})=\sum_{t=1}^ha_{h,t}x_j^t\sum_{\mathbf{i}_t}\prod_{1\le i\le j-1,i\neq i_1,\cdots,i_t}(x_j-x_i)$

where $\sum_{\mathbf{i}_t}$ runs over all $\mathbf{i}_t=(i_1,\cdots,i_t)$ with $1\le i_1<\cdots and the $a_{h,t}=a_{h,t}(j)$s are non-negative integers such that $a_{h,h}=h!$. In particular,

$\left(x_j\frac{\partial}{\partial x_j}\right)^{j-1}(x_j-x_1)(x_j-x_2)\cdots (x_j-x_{j-1})=(j-1)!x_j^{j-1}+\text{terms containing at least one factor } (x_j-x_i)$

with $1\le i.
Proof:
Proof by induction on $h=1,\cdots,j-1$. For the details see Lemma 2.6.15 of section 2.6 of [1].

Lemma 4:
Let $L=L_{(0,1,\cdots,n-1)}$, that is,

$L=\left(x_1\frac{\partial}{\partial x_1}\right)^{0}\cdots\left(x_n\frac{\partial}{\partial x_n}\right)^{n-1}.$

Then if $D(x_1,\cdots,x_n)$ and $Q(x_1,\cdots,x_n)$ are given by

$D(x_1,\cdots,x_n)=Q(x_1,\cdots,x_n)\prod_{1\le h

we have

$[LD](1,\cdots,1)=\prod_{j=1}^n (j-1)!Q(1,\cdots,1).$

Proof:
By Lemma 2 and 3, we have

$[LD](1,\cdots,1)\prod_{j=1}^n(j-1)!x_j^{j-1}Q(x_1,\cdots,x_n)+\text{terms containing at least one factor }(x_j-x_i)$

with $1\le i. Here the first term is obtained by acting the whole $L$ on $\prod_{1\le h (the second term of $D$). In particular, taking $x_i=1$ for $i=1,\cdots,n$ we complete the proof.

Now we can finish the proof of Theorem 3 with these lemmas.

Proof of Theorem 3 (continued):
For $L$ as in Lemma 4, we have (where $S_n$ denotes the symmetric group of degree $n$)

$[LD](x_1,\cdots,x_n)=L\sum_{\sigma\in S_n}\epsilon(\sigma)x_1^{\xi_{\sigma(1)}}\cdots x_n^{\xi_{\sigma(n)}}$

$=\sum_{\sigma\in S_n}\epsilon(\sigma)\xi^0_{\sigma(1)}x_1^{\xi_{\sigma(1)}}\cdots \xi_{\sigma(n)}^{n-1}x_n^{\xi_{\sigma(n)}}$

since

$\left(x_j\frac{\partial}{\partial x_j}\right)^{j-1}x_j^{\xi_{\sigma(j)}}=\xi_{\sigma(j)}^{j-1}x_j^{\xi_{\sigma(j)}}$

for all $j=1,2,\cdots,n$. Thus,

$[LD](1,\cdots,1)=\sum_{\sigma\in S_n}\epsilon(\sigma)\xi_{\sigma(1)}^0\cdots\xi_{\sigma(n)}^{n-1}=\prod_{1\le i

is the Vandermonde determinant. Since $\xi_i\neq \xi_j$ for $i, we deduce that $[LD](1,\cdots,1)$ is not divisible by $p$ because $n\le p$. Thus, by Lemma 4, $Q(1,\cdots,1)$ is not divisible by $p$ either. By Lemma 1, this completes the proof of Theorem 3.

Given a non-empty subset $A\subset \mathbb{Z}_p$ and a function $f\in L(A)$, we denote by $\overline{f}$ its extension $\overline{f}:\mathbb{Z}_p\to \mathbb{C}$ defined by setting $\overline{f}(z)=0$ for all $z\in \mathbb{Z}_p\backslash A$. For simplicity, we regard the DFT as a map $L(\mathbb{Z}_p)\to L(\mathbb{Z}_p)$. In other words, for $f\in L(\mathbb{Z}_p)$ and $x\in \mathbb{Z}_p$,

$\hat{f}(x)=\frac{1}{p}\sum_{y\in\mathbb{Z}_p}f(y)\omega^{-xy}.$

Proposition 2:
Let $p$ be a prime. Let $A,B\subset\mathbb{Z}_p$ such that $|A|=|B|$. Then the linear map $T=T{A,B}:L(A)\to L(B)$ defined by $Tf=\widehat{\overline{f}}|B$ is invertible.

Proof:

Set $A={\xi_1,\cdots,\xi_n}$ and $B={\eta_1,\cdots,\eta_n}$ and consider the basis of $L(A)$ (respectively, of $L(B)$) consisting of the Dirac functions $\delta_{\xi_j}$ with $j=1,\cdots,n$ (respectively, $\delta_{\eta_k}$ with $k=1,\cdots,n$) and let $\omega=e^{\frac{2\pi i}{p}}$. Then we have

$[T\delta_{\xi_k}](\eta_h)=\widehat{\delta_{\xi_k}}(\eta_h)=\sum_{x\in\mathbb{Z}p}\delta_{\xi_k}(x)\omega^{-x\eta_h}=\omega^{-\eta_h\xi_k}.$

By virtue of Theorem 3, we have

$\mathrm{det}\left([T\delta_{\xi_k}](\eta_h)\right)_{h,k=1}^n\neq 0$,

showing that $T$ is indeed invertible.

Finally, we can prove the main result, Theorem 2.

Proof of Theorem 2:
Suppose by contradiction that there exists $f\in L(\mathbb{Z}_p)$ non-zero such that $\mathrm{supp}(f)=A$ and $\mathrm{supp}(\hat{f})=C$ with $|A|+|C|\ge p+1$. Then we can find a subset $B\subset \mathbb{Z}_p$ such that $|B|=|A|$ and $C\cap B=\emptyset$. We deduce that $Tf=\widehat{\overline{f}}|_B$ is identicially zero. Since $f\not\equiv 0$, this contradicts the injectivity of $T$ (c.f. Proposition 2).

Conversely, let $\emptyset\neq A,A'\subset\mathbb{Z}p$ be two subsets such that $|A|+|A'|=p+1$. Let $B\subset\mathbb{Z}_p$ such that $|B|=|A|$ and $B\cap A'={\xi}$. Note that $(\mathbb{Z}_p\backslash B)\cup{\xi}\supset A'$. Taking cardinalities, $|A'|=p+1-|A|=p+1-|B|=|(\mathbb{Z}_p\backslash B)\cup{\xi}|\ge |A'|$, which yields

$(\mathbb{Z}_p\backslash B)\cup{\xi}=A'.$

Consider the map $T=T_{A,B}:L(A)\to L(B)$. By Proposition 2, we can find $g\in L(A)$ such that $Tg=\delta_\xi|_B$ so that $\widehat{\Bar{g}}$ vanishes on $B\backslash{\xi}$ but $\widehat{\Bar{g}}(\xi)\neq 0$. Setting $f=\Bar{g}\in L(\mathbb{Z}_p)$ we clearly have $\mathrm{supp}(f)\subset A$ and $\mathrm{supp}(\hat{f})\subset (\mathbb{Z}_p\backslash B)\cup{\xi}$. By the first part of the theorem,

$p+1\le |\mathrm{supp}(f)|+|\mathrm{supp}(\hat{f})|\le |A|+|\mathbb{Z}_p\backslash B|+1 =|A|+p-|B|+1=p+1$

so that all the inequalities are in fact equalities, i.e., $\mathrm{supp}(f)=A$ and $\mathrm{supp}(\hat{f})= (\mathbb{Z}_p\backslash B)\cup{\xi}=A'$. This completes the second part of the theorem.

An immediate consequence of Theorem 2 is that any sparse polynomial $\sum_{j=0}^k c_jz^{n_j}$ with $k+1$ non-zero coefficients and $0, when restricted to the $p$th roots of unity $\{z:z^p=1\}$, can have at most $k$ zeroes. Indeed, such a polynomial is essentially the Fourier transform in $\mathbb{Z}_p$ of a function whose support has cardinality $k+1$, and so the support of the polynomial must contain at least $p-k$ $p$th roots of unity by Theorem 2, and the claim follows.

References:

## Math 202C: Lecture 19

Author: Qihao Ye

Andrew Dennehy have already introduced the concept of the discrete Fourier transform (DFT) to us in Lecture 18, but I would like to retake the path with more details, because there are some other concepts (which help for fully understanding) I would like to talk about.

We mainly talk about how fast Fourier transform (FFT) and inverse fast Fourier transform (IFFT) work, we sightly talk about the calculation error.

We will see how FFT and IFFT work from both the perspective of math and computer, along with their applications and some specific problems. Some problem may involve the number theoretic transform (NTT), which is recognized as the integer DFT over finite field.

We use Python3 for code examples, since we would not use version related feature, the specific version does not matter. (For instance, Python3.5.2 and Python3.8.1 are both ok.) We would call Python instead of Python3 in the following content. (If you do not install Python, there are many online interpreters that can run Python codes, for instance, Try It Online.) Recommend to view this page with computer for getting the best experience. Recommend to try the codes and solve some problems by yourself, which definitely would be challenging and interesting.

There are only 2 exercises in this lecture, other problems are interest based algorithm problems (no need to do as homework), which might be difficult to solve if you are not very familiar with the FFT and NTT in this form, but I will give hints to guide you.

Example 1: We consider two polynomials

Multiply them together (using distributive property), we would get

Definition 1 (coefficient representation): For a polynomial $P(x) = a_{0} + a_{1} x + \cdots + a_{n} x^{n}$, the list $[a_{0}, a_{1}, \ldots, a_{n}]$ is its coefficient representation. We denote $P[k]$ as the coefficient of $x^{k}$ in $P(x)$.

Use definition 1, we can write $P_{1}(x)$ as $[1, 2, 3, 4]$, $P_{2}(x)$ as $[2, 3, 4, 5]$ and $P_{1}(x) \cdot P_{2}(x) = Q(x)$ as $[2, 7, 16, 30, 34, 31, 20]$.

The naïve polynomial multiplication function is not hard to get:

def naive_poly_mul(P1, P2):
Q = [0] * (len(P1) + len(P2) - 1)
for i in range(len(P1)):
for j in range(len(P2)):
Q[i + j] += P1[i] * P2[j]
return Q


In the general case, i.e.,

it is easy to see that the complexity of the naïve polynomial multiplication function is $\mathcal{O}(m n)$. Note that $\text{len}(P_{1}) = n + 1$ (count from $0$ to $n$). If we consider the specific condition $m = n$, then the complexity becomes $\mathcal{O}(n^{2})$.

Definition 2 (degree of polynomial): The degree of a polynomial is the highest of the degrees of the polynomial’s monomials (individual terms) with non-zero coefficients.

In Example 1, $P_{1}, P_{2}$ are both $3$-degree polynomials.

Definition 3 (value representation): Except representing a $n-$th degree polynomial with $n + 1$ coefficients, we can also represent a $n-$th degree polynomial with proper (see below Exercise) $n + 1$ points on the polynomial, which is called the value representation.

Exercise 1 : Prove that $n + 1$ points $\{ (x_{i}, y_{i}) \}_{i = 1}^{n + 1}$ with distinct $\{ x_{i} \}$ determine a unique polynomial of degree $n$. Hint: use the fact that a Vandermonde matrix is nonsingular without proof.

Back to Example 1, to get $Q$, we can first get $\{ (x_{i}, P_{1}(x_{i}), P_{2}(x_{i})) \}_{i = 1}^{7}$ from $P_{1}, P_{2}$ with distinct $\{ x_{i} \}$, then the $7$ points $\{ (x_{i}, P_{1}(x_{i}) P_{2}(x_{i})) \}_{i = 1}^{7}$ just represent $Q$.

When the degree of $P_{1}, P_{2}$ are both $n$, we can see that the multiplication here just needs complexity $\mathcal{O}(n)$.

However, at most time, we only need the coefficient representation, because it is more suitable to calculate the values in its domain. It is not hard to figure out we can do the multiplication in this way:

If we only consider the situation that we aim to multiply two $n-$th degree polynomials, the multiplication part only costs $\mathcal{O}(n)$, so the bottleneck of the complexity lies in the algorithm changing the coefficient representation into the value representation and the algorithm changing it back.

The naïve way to get the value representation of a coefficient represented polynomial cost at least $\mathcal{O}(n^{2})$. (To get each value we need $\mathcal{O}(n)$, and we totally need $\mathcal{O}(n)$ values.)

The essential idea is selecting specific points to reduce the calculation cost.

The straight thought would be looking at the parity of a function. Because for any odd function $P_{\text{odd}}$, we have $P_{\text{odd}}(-x) = - P_{\text{odd}}(x)$ while for any even function $P_{\text{even}}$, we have $P_{\text{even}}(-x) = P_{\text{even}}(x)$.

Actually, we can divide any polynomial into one odd function plus one even function. Take a look back to $Q(x)$ in Example 1, we can write

Notice that $Q_{\text{odd}}(x^{2})$ is actually an even function, but write in this form allows us to take $x^{2}$ as the variable. It follows that

Note that $Q_{\text{even}}(x^{2})$ has degree $3$ and $Q_{\text{odd}}(x^{2})$ has degree $2$ while $Q$ has degree $6$. Once we get $Q_{\text{even}}(x^{2})$ and $Q_{\text{odd}}(x^{2})$, we would immediately get $Q(x)$ and $Q(-x)$.

What we only need to do is recursive process $Q_{\text{even}}(x^{2})$ and $Q_{\text{odd}}(x^{2})$. It would lead to an $\mathcal{O}(n \log n)$ recursive algorithm.

However, we would encounter the problem that the symmetric property could not be maintained further (we can pair $x$ and $-x$, but how to pair $x^{2}$).

As we already see in the previous Lecture, we can use the roots of unity to solve this problem. Denote $\omega_{n} = \exp(2 \pi i / n)$. Note that $\omega_{n}^{n} = \omega_{n}^{0} = 1$.

To make it easier to understand, we now only consider $n = 2^{k}$ for some positive integer $k$, we would evaluate polynomial at $[\omega_{n}^{0}, \omega_{n}^{1}, \ldots, \omega_{n}^{n - 1}]$, then $[\omega_{n}^{0}, \omega_{n}^{2}, \ldots, \omega_{n}^{n - 2}]$, next $[\omega_{n}^{0}, \omega_{n}^{4}, \ldots, \omega_{n}^{n - 4}]$, etc. Just as the figure below.

We pair each $\omega_{n}^{j}$ with $\omega_{n}^{-j} = \omega_{n}^{j + n / 2}$. For any polynomial $P$, we can get $[P(\omega_{n}^{0}), P(\omega_{n}^{1}), \ldots, P(\omega_{n}^{n - 1})]$ fast by evaluating $[P_{\text{even}}(\omega_{n}^{0}), P_{\text{even}}(\omega_{n}^{2}), \ldots, P_{\text{even}}(\omega_{n}^{n - 2})]$ and $[P_{\text{odd}}(\omega_{n}^{0}), P_{\text{odd}}(\omega_{n}^{2}), \ldots, P_{\text{odd}}(\omega_{n}^{n - 2})]$ recursively. Recall that we divide $P(x)$ as $P_{\text{even}}(x^{2}) + x P_{\text{odd}}(x^{2})$. The corresponding formula is

This leads to the naive FFT code:

from math import pi
from math import sin
from math import cos

def get_omega(n):
theta = 2 * pi / n
omega = cos(theta) + sin(theta) * 1j
return omega

def naive_FFT(P):
# P is the coefficient representation
# P = [a_{0}, a_{1}, ..., a_{n-1}]
# current we assume n = 2^{k}
n = len(P)
half_n = n // 2
if n == 1:
return P # constant function
omega = get_omega(n)
P_even, P_odd = P[::2], P[1::2]
V_even, V_odd = naive_FFT(P_even), naive_FFT(P_odd)
V = [0] * n # the value representation
for j in range(half_n):
V[j] = V_even[j] + omega ** j * V_odd[j]
V[j + half_n] = V_even[j] - omega ** j * V_odd[j]
return V


Feed $P_{1}$ in example 1 as input, we would get

Now we can apply Fourier transform to a coefficient representation to get the corresponding value representation, and the multiplication in the value representation form is easy to implement, what remains to solve is the inverse Fourier transform.

In the matrix-vector form for $P(x) = a_{0} + a_{1} x + \cdots + a_{n - 1} x^{n - 1}$, we have

Note that the character table of $C_{n}$ is just as the form of the DFT matrix:

To get the inverse Fourier transform, we can just use the inverse of the matrix:

Exercise 2: Check that the inverse DFT matrix is the inverse of the DFT matrix.

The difference between DFT and FFT is just the way of calculating these matrix-vector forms, where DFT uses the direct matrix-vector multiplication way in $\mathcal{O}(n^{2})$ and FFT uses the tricky recursive way to achieve $\mathcal{O}(n \log n)$.

The inverse DFT matrix leads to the naive IFFT code:

def naive_IFFT(V, is_outest_layer=False):
# V is the value representation
# w means omega_{n}
# V = [P(w^{0}), P(w^{1}), ..., P(w^{n-1})]
# current we assume n = 2^{k}
n = len(V)
half_n = n // 2
if n == 1:
return V # constant function
omega = 1.0 / get_omega(n) # omega_{n}^{-1}
V_even, V_odd = V[::2], V[1::2]
P_even, P_odd = naive_IFFT(V_even), naive_IFFT(V_odd)
P = [0] * n # the value representation
for j in range(half_n):
P[j] = P_even[j] + omega ** j * P_odd[j]
P[j + half_n] = P_even[j] - omega ** j * P_odd[j]
if is_outest_layer:
for j in range(n):
P[j] /= n
return P


Use $P_{1}$ in example 1, we would get

If we ignore the little error, this is just the coefficient representation of $P_{1}$.

The following materials might not be that clear and might not be easy to understand, but I will try my best. (Some material cannot be expanded too much, otherwise that would cost too much space and might confuse the main part.)

The lowest layer of Diagram 1 is just the bit-reversal permutation index and there is a neat code to generate:

def get_BRI(length):
# Bit-Reversal Index
n = 1
k = -1
while n < length:
n <<= 1
k += 1
BRI = [0] * n
for i in range(n):
BRI[i] = (BRI[i >> 1] >> 1) | ((i & 1) << k)
return BRI


It is more easy to see in a tabular (an example of $8$-length BRI)

Use this we can implement the Cooley–Tukey FFT algorithm, which is the most common FFT algorithm. Further, with proper manner of coding, we can devise an in-place algorithm that overwrites its input with its output data using only $\mathcal{O}(1)$ auxiliary storage, which is called the iterative radix-2 FFT algorithm. Moreover, since the form of FFT and IFFT are actually very similar, we can integrate them together.

def FFT(X, length, is_inverse=False):
# X : input, either coefficient representation
#            or value representation
# length : how much values need to evaluate
# is_inverse : indicate whether is FFT or IFFT
inverse_mul = [1, -1][is_inverse]
BRI = get_BRI(length)
n = len(BRI)
X += [0] * (n - len(X))
for index in range(n):
if index < BRI[index]: # only change once
X[index], X[BRI[index]] = X[BRI[index]], X[index]
bits = 1
while bits < n:
omega_base = cos(pi/bits) + inverse_mul * sin(pi/bits) * 1j
j = 0
while j < n:
omega = 1
for k in range(bits):
even_part = X[j + k]
odd_part = X[j + k + bits] * omega
X[j + k] = even_part + odd_part
X[j + k + bits] = even_part - odd_part
omega *= omega_base
j += bits << 1
bits <<= 1
if is_inverse:
for index in range(length):
X[index] = X[index].real / n
# only the real part is needed
return X[:length]


Note that we could ignore the return part, since $X$ is already changed. This algorithm would extend the input length to its closest larger bit number (of form $2^{k}$), but under most condition, we would take the length as $2^{k}$ before we use this algorithm (adding $0$‘s).

Because we use the complex number to implement the FFT algorithm, we can see that the error is hard to eliminate. Even though the initial polynomial is integer based, apply FFT to it, then apply IFFT, we would get a decimal list with some calculation error.

If we do the FFT in the field $\mathbb{Z} / P \mathbb{Z}$, where $P = Q \cdot 2^{k} + 1$, denote $g$ as a primitive root modulo $P$, then $\{ 1, g^{Q}, g^{2 Q}, \ldots \}$ is a cyclic group of order $2^{k}$, we can replace $\omega_{n}$ with $g$ to do the FFT. This method is called NTT, since only integers are involved, the errors are not possible to appear.

For arbitrary modulo $m$, the aiming NTT length $n$, we can take a set of distinct NTT modulo $\{ p_{i} \}_{i = 1}^{r}$ satisfies

do NTT respectively on all $p_{i}$, then use the Chinese remainder theorem to combine them together getting the final result modulo $m$.

Note that during the NTT algorithm, the maximum intermediate value would not exceed $n (m - 1)^{2}$.

We may say the FFT algorithm solves the convolution in the form of

in time $\mathcal{O}(n \log n)$.

Back in Example 1, we have

(Not mandatory) Problem 1: Give the formula of Stirling numbers of the second kind:

Use the NTT with some modulo of the form $P = Q \cdot 2^{k} + 1$ to calculate all $S(n, k)$ for $0 \leq k \leq n$ in time complexity $\mathcal{O}(n \log n)$.

(Not mandatory) Problem 2: PE 537. Hint: If denote $A$ as the list of the value of $\pi(x)$, i.e., $A[n] = \sum_{x} [\pi(x) = n]$, then the convolution of $A$ and $A$, named $B$, is the list of the value of $\pi(x) + \pi(y)$, i.e., $B[n] = \sum_{x, y} [\pi(x) + \pi(y) = n]$, then the convolution of $B$ and $B$, named $C$, is the list of the value of $\pi(x) + \pi(y) + \pi(z) + \pi(w)$, i.e., $C[n] = \sum_{x, y, z, w} [\pi(x) + \pi(y) + \pi(z) + \pi(w) = n]$, etc. You can consider $A, B, C$ as the generating function. You might need to learn how to sieve prime numbers and use fast exponentiation.

There are bunch of similar algorithms, for example, fast Walsh–Hadamard transform (FWHT) and fast wavelet transform (FWT). FWHT can solve the general convolution

in time complexity $\mathcal{O}(n \log n)$, where $\star$ is some binary operation, usually $\star$ is bitwise OR, bitwise AND, and bitwise XOR.

Using FFT, we can do a lot of things for polynomials fast, for instance, for a polynomial $P(x)$, we can find a polynomial $Q(x)$, such that $P(x) Q(x) \equiv 1 \mod{x^{n}}$, this $Q(x)$ is called the inverse of $P(x)$ under modulo $x^{n}$. The basic idea is to find the inverse polynomial under modulo $x^{n / 2}$, then $x^{n / 4}$, etc. Because the inverse polynomial under modulo $x^{1}$ is trivial, we can solve this recursively. Similar idea may be apply to the Newton’s method under modulo $x^{n}$, specifically, we can find the square root of a polynomial under modulo $x^{n}$.

(Not mandatory) Problem 3: PE 258. Hint: Consider the Cayley–Hamilton theorem ($x^{2000} - x - 1 = 0$) and use the polynomial inverse to do polynomial quotient on $x^{10^{18}}$. Or consider solving the homogeneous linear recurrence with constant coefficients by the Berlekamp–Massey algorithm, which involves polynomial multiplication. Note that $20092010 = 8590 \times 2339$.

FFT is also used to transform the time domain to the frequency domain in the signal area, while IFFT is used to transform reverse.

In the artical [Machine Learning from a Continuous Viewpoint I] by Weinan E, the Fourier representation

can be considered as a two-layer neural network model with activation function $\sigma$.

In the above figure, $\sigma(\boldsymbol{\omega}, \boldsymbol{x})$ calculates each hidden layer value using the input layer $\boldsymbol{x}$ with weight $\boldsymbol{\omega}$, $\int_{\mathbb{R}^{d}} a(\boldsymbol{\omega}) \sigma(\boldsymbol{\omega}, \boldsymbol{x}) d \boldsymbol{\omega}$ sums all the hidden layer with weight $a(\boldsymbol{\omega})$. Note this is an integral formula, we work on a continuous condition, which means that the hidden layer has infinite width (the hidden layer is considered to have infinite nodes).

References

## Math 202C: Lecture 18

Written by a countably infinite number of monkeys with an uncountably infinite number of typewriters, edited by Andrew Dennehy

In a bold twist on the standard formula, in this lecture post we will be discussing math. Specifically, we will be discussing the Cohn-Uman algorithm and how representation theory can provide us an algorithm to multiply two matrices together. We will then see how this perspective could potentially help us prove an important theorem in complexity theory, namely that there exists an algorithm which computes the product of two $n\times n$ matrices in $O(n^{2+\epsilon})$ time for any $\epsilon>0$. Before we get to that, though, we will first cover an algorithm with many similarities to the one we will be discussing.

Consider the problem of multiplying two polynomials

$p(x)=p_0+p_1x+\cdots +p_{n-1}x^{n-1}$

$q(x)=q_0+q_1x+\cdots +q_{n-1}x^{n-1}$

The naïve approach would be the one that’s taught in high school, namely finding all the $a_ib_j$ terms and using them to calculate the coefficients of $p(x)q(x)$. However, we’re significantly cooler than high school students, so we’re going to use some significantly cooler math. First, we introduce the concept of the discrete Fourier transform. Let $\omega_n=e^{i \frac{2\pi}{n}}$ be the $n$-th root of unity. The DFT of a vector $v$ given by

$\hat{v} := \begin{bmatrix}1 & 1 & 1 & \cdots & 1 \\ 1 & \omega_n & \omega_n^2 & \cdots & \omega_n^{n-1} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & \omega_n^{n-1} & \omega_n^{2(n-1)} & \cdots & \omega_n^{(n-1)^2} \end{bmatrix} v$

Equivalently, if $v_i$ is the $i$-th component of $v$,

$\hat{v}_i := v_0+v_1 \omega_n^{k} + \cdots + v_n\omega_n^{nk}$

So, if we apply this transformation to the vector of coefficients of $p$, we get that

$\hat{p}_k= p_0+p_1 \omega_n^{k} + \cdots + p_n\omega_n^{nk} =p(\omega_n^k)$

So, the problem of finding the Fourier coefficients of $p$ is equivalent to the problem of evaluating $p$ at $1,\omega_n,\omega_n^2,...,\omega_n^{n-1}$. From now on, assume $n$ is a power of 2, since if it isn’t we can just add a bunch of 0 coefficients until we have a power of 2. Define the following two polynomials:

$p_0(x)=p_0+p_2x + \cdots + p_{n-2} x^{n/2-1}$

$p_1(x)=p_1+p_3x + \cdots + p_{n-1} x^{n/2-1}$

This lets us decompose $p$ as

$p(x)=p_0(x^2)+xp_1(x^2)$

So, instead of evaluating $p$ at all those powers of $\omega_n$, we can instead evaluate $p_0$ and $p_1$ at $1,\omega_n^2, ..., \omega_n^{2(n-1)}$ and then combine the results. So, we have reduced the problem of plugging $n$ numbers into a degree $n$ polynomial into plugging $n/2$ numbers (specifically the $n/2$-th roots of unity) into two degree $n/2$ polynomials. Because we’ve assumed that $n$ is a power of 2, so is $n/2$, meaning we can repeat the same algorithm for $p_0$ and $p_1$. We can apply this process repeatedly, which gives us a recursive algorithm with complexity $O(n \log n)$ for calculating the coefficients. Similarly, one can show that we can invert the DFT by applying the same process to the polynomial

$\hat p (x)=\hat{p}_0+\hat{p}_1 x + \cdots + \hat{p}_n x^{n}$

with $\omega_n$ replaced with $\omega_n^{-1}=\omega_n^{n-1}$ and then dividing the result by $n$. The upshot of all of this work is that

$\widehat{p q}_k = [pq](\omega_n^k) = p(\omega_n^k) q(\omega_n^k)=\hat p_k \hat q_k$

So, we can calculate the Fourier coefficients of $pq$ by calculating the coefficients of each polynomial independently and multiplying them together. Since this has linear complexity and both the Fourier and Inverse Fourier transforms have $n\log n$ complexity, this method gives us an $O(n\log n)$ complexity algorithm to multiply two degree $n$ polynomials.

Notice that instead of using the roots of unity, we could have instead used a generator for $C_n$ and worked over $\mathbb C [C_n]$ with identical results. So, we can frame multiplication of polynomials as a special case of of multiplying elements in $\mathbb C [C_n]$.

Next, we take a slight detour to discuss some group theoretic concepts. For any subset $S$ of a group $G$ (not required to be a subgroup), let the right quotient set $Q(S)$ of $S$ be defined to be

$Q(S)=\left\{s_1s_2^{-1}:s_1,s_2\in S\right\}$

Exercise 1: Assume that $S\neq \emptyset$ and $Q(S)=S$. What does this imply about $S$? Show that the condition that this implies is actually equivalent to $Q(S)=S$, so the implication goes both ways.

Definition 1: A group $G$ realizes $\langle n_1,n_2,n_3 \rangle$ if there are subsets $S_1, S_2, S_3\subseteq G$ such that $|S_i|=n_i$ and for $q_i\in Q(S_i)$, if

$q_1q_2q_3=1$

then $q_1=q_2=q_3=1$. This condition is called the triple product property. Furthermore, this is equivalent to the condition that if $q_1q_2=q_3$, then $q_1=q_2=q_3=1$ .

Although it isn’t entirely trivial, one can show that if $G$ realizes $\langle n_1,n_2,n_3 \rangle$, then it realizes any permutation of those numbers.

Exercise 2: Prove that $C_n\times C_m\times C_p$ realizes $\langle n,m,p \rangle$.

This is the “trivial realization” of $\langle n,m,p \rangle$. We now get our first taste of why these ideas could be useful for the problem of matrix multiplication:

Theorem 1: Let $F$ be any field. If $G$ realizes $\langle n,m,p \rangle$, then the number of field operations required to multiply an $n\times m$ matrix by an $m\times p$ matrix over $F$ is at most the number of operations required to multiply two elements of $F[G]$

Proof: Let $G$ realize $\langle n,m,p \rangle$ through the subsets $S,T,U$. Then, for matrices $A$ and $B$ of sizes $n\times m$ and $m\times p$ respectively, we can index the rows of $A$ by $S$, the columns of $A$ and the rows of $B$ by $T$, and the columns of $B$ by $U$. Then, consider the product

$\left(\sum\limits_{s\in S,t\in T} A_{st}s ^{-1} t\right) \left(\sum \limits _{t'\in T, u\in U} B_{t'u} t'^{-1}u\right)$

By the triple product property, we have that

$(s^{-1} t)(t'^{-1} u)=s'^{-1} u'\iff (s' s^{-1}) (tt'^{-1} )(uu'^{-1})=1$

which is true iff $s=s'$, $t=t'$, and $u=u'$. So, the coefficient of $s^{-1} u$ in the product is

$\sum\limits_{t\in T}A_{st}B_{tu}=(AB)_{su}$

So, you can just read off the elements of $AB$ from these coefficients. $\square$

The process we just described, of converting matrix multiplication to multiplication in $F[G]$, is called the Cohn Uman algorithm. Also, we can now see the connection to the polynomial algorithm we described earlier; we’ve converted a problem of multiplying matrices to a problem of multiplying elements of $\mathbb C[G]$, which is what we did with polynomials before using $\mathbb C[C_n]$.

We can now define a measurement of the “quality” of embedding that a given group $G$ can provide.

Definition 2: The pseudo-exponent $\alpha(g)$ of a non-trivial finite group $G$ is the minimum of

$\frac{3\log |G|}{\log nmp}$

over all $n,m,p$ (not all 1) such that $G$ realizes $\langle n,m,p \rangle$.

Lemma 1: For any group $G$, $2< \alpha(G)\leq 3$. Further, if $G$ is abelian, then $\alpha(G)=3$

Proof: $\alpha(G)$ is clearly at most 3, since we can just take $S_1=S_2=\{1\}$ and $S_3=G$, meaning $G$ realizes $\langle 1,1,|G|\rangle$.

To show our lower bound, suppose $G$ realizes $\langle n_1,n_2,n_3\rangle$ (not all equal to 1) through the subsets $S_1,S_2,S_3$. Then, the triple product tells us that the map $(x,y)\mapsto x^{-1}y$ is injective on $S_1\times S_2$ and the image only intersects $Q(S_3)$ in the identity. So, $|G|\geq n_1n_2$ with equality only occurring if $n_3=1$. The same argument shows this is true for $n_2n_3$ and $n_3n_1$, so $|G|^3>(n_1n_2n_3)^2$. Therefore,

$\frac{3 \log |G|}{\log nmp}> \frac{3 \log (nmp)^{2/3}}{\log nmp}=\frac{3 \cdot 2}{3}=2$

So, $\alpha(G)>2$. Finally, if $G$ is abelian, then the product map from $S_1\times S_2 \times S_3$ to $G$ must be injective (because the triple product property implies the kernel of the map is trivial), so $|G|\geq n_1n_2n_3$, meaning $\alpha(G)\geq 3$. $\square$

Next, we want to relate $\alpha(G)$ to $\omega$, the exponent of complexity for matrix multiplication, i.e. the smallest number such that we can perform matrix multiplication in $O(n^{\omega +\epsilon})$ time for any $\epsilon>0$. To do this, we first recall from 202B that

$\mathbb{C}[G]\cong \mathrm{End} \mathbb{C}[G]\cong \mathcal{M}_{d_1\times d_1} \oplus \cdots \oplus \mathcal{M}_{d_k\times d_k}$

where the $d_i$‘s are the dimensions of the irreducible representations of $G$. Then the most important theorem we will discuss is as follows:

Theorem 2: Suppose that $G$ is a group with character degrees $d_1,...,d_k$. Then,

$|G|^{\omega/\alpha(G)}\leq \sum\limits_{i=1}^k d_i^{\omega}$

Intuitively, the idea of this theorem is that the problem of multiplying matrices of size $|G|^{1/\alpha(G)}$ reduces to multiplication of two elements in $\mathbb{C}[G]$ together, which itself reduces to a problem of multiplying a collection of matrices of size $d_i\times d_i$, each of which should take approximately $d_i^\omega$ operations. So, multiplying two matrices of size $|G|^{1/\alpha(G)}$ has complexity of order at most $\sum_{i=1}^k d_i^{\omega}$, which means that this sum is an upper bound for $|G|^{\omega/\alpha(G)}$. Actually proving the theorem is more difficult, so we won’t show it here, but this intuition is the most important part.

Importantly, notice that if $\alpha(G)$ were equal to 2, then this would imply that $\omega =2$ (because $\sum_i d_i^2 =|G|$). However, $\alpha(G)>2$, so we need to control the character degrees of $G$ to get non-trivial bounds on $\omega$. Define $\gamma(G)$ to be the number such that $|G|^{1/\gamma}$ is the maximum character degree of $G$. An important corollary of Theorem 2 is:

Corollary 1: Let $G$ be a finite group. If $\alpha(G)<\gamma(G)$, then

$\omega\leq \alpha(G)\left(\frac{\gamma(G)-2}{\gamma(G)-\alpha(G)} \right)$

Proof: Let $\{d_i\}$ be the character degrees of $G$. Then, by theorem 4.1,

$|G|^{\omega/\alpha(G)}\leq \sum_i d_i^{\omega-2}d_i^2\leq |G|^{(\omega-2)/\gamma(G)}\sum_i d_i^2\leq |G|^{1+(\omega-2)/\gamma(G)}$

This implies $\omega\left(\frac{1}{\alpha(G)}-\frac{1}{\gamma(G)}\right)\leq 1-\frac{2}{\gamma(G)}$. Dividing by $\frac{1}{\alpha(G)}-\frac{1}{\gamma(G)}$ (which is positive by assumption) gives us our result. $\square$

A special case of this corollary is given by

Corollary 2: If there exists a sequence $G_1,G_2,...$ of finite groups such that $\alpha(G_i)=2+o(1)$ as $i\rightarrow \infty$ and $\alpha(G_i)-2=o(\gamma(G_i)-2)$, then the exponent of matrix multiplication is 2.

While these are weaker versions of our original theorem, they only require knowledge of $\gamma(G)$, rather than all of the character degrees. Finally, we end with an open problem in representation theory whose positive answer could lead to a solution to our original problem:

Open Problem: For integers $n,m,p$ not all 1, does there exist a finite group that realizes $\langle n,m,p\rangle$ and has character degrees $d_1,...,d_k$ such that

$nmp>\sum_{i}d_i^3$

While this is not a necessary condition, a positive answer to this could lead to a proof that $\omega=2$ when combined with our theorem. This is because one limitation of our theorem is that unless we can rule out the case that $|G|^{3/\alpha}\leq \sum_{i}d_i^3$ in theorem 4.1, that theorem is satisfied trivially. So, for us to be able to show that a group gives us a non-trivial improvement, we have to show that for any $n,m,p\in \mathbb N$, there exists a group $G$ realizing $\langle n,m,p \rangle$ such that $|G|^{3/\alpha}> \sum\limits_{i}d_i^3$, which the open problem implies.

While it is an open problem whether these groups exist in general, there are a few concrete examples where such a group exists, such as the one that follows:

Let $H=C_2\times C_2 \times C_2$ and let $G=H^2 \rtimes C_2$, where the group action of $C_2$ on $H^2$ is given by switching the two factors. If we let $z$ be the generator of $C_2$, we can write the elements of $G$ as $(a,b)z^i$ with $a,b\in H$ and $i\in \{0,1\}$. Let $H_1,H_2,H_3$ be the three copies of $C_n$ in $H$, viewed as subgroups, and let $H_4:=H_1$ for notation. Then, we can define $S_1,S_2,S_3$ as

$S_i=\{(a,b)z^j:a\in H_i\setminus \{1\},b\in H_{i+1},j\in \{0,1\}\}$

One can show, although the proof is long, that these sets satisfy the triple product property. Furthermore, the character degrees of $G$ are all at most 2, since $H^2$ is an abelian subgroup of index 2 in $G$ and the maximum character degree of a group is at most the index of any abelian subgroup in the group. Since the sum of the squares of the character degrees is $|G|$, the sum of the cubes of the degrees is at most $2|G|=4n^6$. However, $|S_i|=2n(n-1)$, meaning that for $n\geq 5$,

$|S_1| |S_2| |S_3| =8n^3(n-1)^3\geq 4n^6= 2|G| \geq \sum\limits_{i}d_i^3$

## Lecture 16 – PageRank

Author: Jiyoung Choi

The PageRank is a search algorithm based on the linkage structure of the web, which Larry Page and Sergey Brin introduced. They built the search engine Google to test the utility of PageRank for search. (Origin of the name “Google”)

The goal of this lecture is to construct Google matrix and explore how to find PageRank vector. Let’s start by making an adjacency matrix based on web linkage.

Let $u$ be a webpage, and let $F_u$ be the set of pages $u$ points to and $B_u$ be the set of pages that point to $u$. Consider each webpage as vertices and link as directed edge. We can construct an adjacency matrix $A_{uv} = 1$ if there exist links from $v$ to $u$, otherwise $A_{uv}=0$ where $u, v$ are webpages. For example, if webpages have linkage in Figure 1, A is defined as follows.

$A= \begin{bmatrix} 0 & 1 & 1 & 1 & 0\\ 1 & 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 1 & 0\\ 1 & 0 & 0 & 0 & 0\\ 0 & 0 & 1 & 0 & 0\end{bmatrix}$

We can observe that the last column of A is a zero vector because $v_5$ does not have forward links. We call such a node a dangling node.

Let $N_u = |F_u|$ be the number of links from $u$. Then column normalized adjacency matrix $R$ of $A$ is defined by $R(u) = {A(u) \over N_u}$. $R(u)$ denotes column corresponding page $u$. In our example,

$R= \begin{bmatrix} 0 & 1 & 1 \over 2& 1 \over 3& 0\\1 \over 2 & 0 & 0 & 1 \over 3& 0\\0 & 0 & 0 & 1\over 3 & 0\\1\over 2 & 0 & 0 & 0 & 0\\0 & 0 & 1 \over 2& 0 & 0\end{bmatrix}.$

This matrix $R$ is not directly used in the PageRank. To compute PageRank vector, we will use the power method, an iterative procedure, has convergence problem. The procedure can cycle, or the limit may be dependent on the starting vector. To avoid this problem, we need to have a left stochastic matrix.

Like in our example, it can be possible that $R$ is not stochastic because of dangling nodes. Therefore, define stochastic matrix $R' = R + {\mathbf{e}\mathbf{a}^T \over n}.$ where $\mathbf{e}$ is a vector of all ones, $n$ is the number of web pages, and $\mathrm{a}(u)=1$ if $R(u) =\mathbf{0}$, otherwise $\mathrm{a}(u)=0$.

When you do web surfing, though there are no forward links (dangling nodes), we may visit other new web pages. The ${\mathbf{e}\mathbf{a}^T \over n}$ matrix add the situation with uniform probability for all web pages.

Problem 1: Find $R'$ for our example.

Then by the Gershgorin circle theorem, the largest eigenvalue of $R'$ is less than or equal to 1. Since det$(S-I)=0$, we know there is the largest eigenvalue $\lambda_n=1$. We want to find the dominant eigenvector of $R'$, which is PageRank vector. For finding eigenvector, PageRank uses Power Method.

Some of you may already realize, finding PageRank vector is finding stationary distribution (See Lecture 5, Lecture 6) of random walks on the web graph. However, if the stationary distribution vector is not unique, we cannot get the exact PageRank vector. This problem can be solved by Perron–Frobenius theorem. If $R'$ is irreducible, eigenvector corresponding eigenvalue 1 is unique by Perron-Frobenius.

Definition 1: A matrix $A$ is primitive if it is non-negative and its $k$th power is positive for some natural number $k$.

Definition 2: An $n \times n$ matrix $A$ is a reducible matrix if for some permutation matrix $P$, such that $P^TAP$ is block upper triangular. If a square matrix is not reducible, it is said to be an irreducible matrix.

For more knowledge related to primitive and irreducible, see [6].

Problem 2: Every primitive matrix is irreducible.

We can construct a primitive matrix $G = mR' + (1-m)E$ where $0 \le m \le 1$ and $E = {\mathbf{e}\mathbf{e}^T \over n}$. In reality, Google use $m=0.85$ ([1], [7]). Since $G$ is primitive, so it is irreducible, we can get the unique PageRank vector by Power Method. We call this matrix $G$ as Google matrix. Such $m$ is called damping constant, this reflects a person who is randomly clicking on links will eventually stop clicking. In our example, $G$ is

$G= \begin{bmatrix}3 \over 100 & 22 \over 25 & 91 \over 200& 47 \over 150& 1 \over 5\\91 \over 200 & 3 \over 100 & 3 \over 100 & 47 \over 150& 1 \over 5\\3 \over 100 & 3 \over 100 & 3 \over 100 & 47\over 150 & 1 \over 5\\91 \over 200 & 3 \over 100 & 3 \over 100 & 3 \over 100 & 1 \over 5\\3 \over 100 & 3 \over 100 & 91 \over 200& 3 \over 100 & 1 \over5\end{bmatrix}.$

when $m=0.85$.

Here, we call the unique largest eigenvalue $\lambda$ of $G$ as Perron root (for Google matrix $\lambda = 1$), and $\mathbf{x}$ is Perron vector if $G\mathbf{x}=\lambda \mathbf{x}$ and $||\mathbf{x}||_1 = 1$. $G$ has unique dominant eigenvalue implies that there is the unique PageRank vector. (If you want to check convergence, see [4], p. 179)

Finally, Power Method on $G$ ([2]), recursively compute $\mathbf{x}_k = G\mathbf{x}_{k-1}$, finds the limit of $\mathbf{x}_k$, say $\mathbf{x}$. This $\mathbf{x}$ is PageRank vector. That is, PageRank of page $u$ is $\mathbf{x}(u)$ where $\mathbf{x}$ is Perron vector and $||\mathbf{x}_1||=1$. Note that $\mathbf{x}$ is a probability vector reflects the possibility of click each webpage.

References

[1] Amy Langville and Carl Meyer (2003). Fiddling with PageRank.

[2] Amy Langville, Carl Meyer (2004). The Use of the Linear Algebra by Web Search Engines. IMAGE Newsletter. 33.

[3] Amy Langville and Carl Meyer (2004). Deeper Inside PageRank. Internet Mathematics. 1. 10.1080/15427951.2004.10129091.

[4] Amy Langville and Carl Meyer (2006). Google’s PageRank and Beyond: The Science of Search Engine Rankings

[5] Larry Page, Sergey Brin, R. Motwani, and T. Winograd (1998). The PageRank Citation Ranking: Bringing Order to the Web.

[6] Roger Horn and Charles Johnson (1985). Matrix Analysis.

[7] Taher Haveliwala and Sepandar Kamvar (2003). The Second Eigenvalue of the Google Matrix.

## Lecture 15 – Spectral Theory of Hypergraphs

Definition 1. A hypergraph $\Gamma$ is a pair $\Gamma = (V,E)$ where $V$ is a finite set and $E\subset 2^V$ is a nonempty collection of subsets of $V$. $\Gamma$ is called $k$-uniform if
$E\subset{V\choose k}:=\{X\subset V:|X| = k\}$. $\Gamma$ is called a graph if it is 2-uniform.

Our goal for this lecture is to explore the rudiments of the spectral theory of $k$-uniform hypergraphs. The natural starting point, if only for inspiration, is the case of graphs. Letting $\Gamma = (V,E)$ be a graph, and identifying $V = \{1,2,\dotsc,n\}$, we can begin by writing down the adjacency matrix $A = (A_{ij})$ of $\Gamma$ defined by

$A_{ij}=\begin{cases} 1&\text{ if }i\sim j\\ 0&\text{ otherwise} \end{cases}, \hspace{1cm}1\leq i,j\leq n.$

where $i\sim j\iff \{i,j\}\in E$. In this case $i,j$ are said to be adjacent.

The spectral theory of $A$ is rich. For example, it made an appearance in our 202C class for the special case where $\Gamma$ is the Cayley graph of a group. Our main focus however will be the matrix $L = (L_{ij})$, defined by

$L_{ij}=\begin{cases} d_{i}&\text{ if }i=j\\ -1&\text{ if }i\sim j\\ 0&\text{ otherwise} \end{cases}.$

where the degree $d_i$ is the number of edges containing $i\in V$. If $D$ is the diagonal matrix of degrees, then we may also write $L = D-A$. This matrix $L$ is called the Laplacian or Kirchoff matrix of $\Gamma$. The terminology arises from the fact that as an operator on $\mathbb{C}^V$, $L$ displays various properties canonically associated with the Laplacian on $\mathbb{R}^n$; e.g., mean value property, maximum principle, etc.. A very interesting paper of Wardetsky, Mathur, Kälberer, and Grinspun[6] explores these connections in detail, but they are outside of the scope of this lecture.

A few facts are worth collecting here- first, $L$ is a symmetric, positive semidefinite matrix in $M_n$ and hence admits nonnegative eigenvalues $0=\lambda_0\leq\lambda_1\leq\cdots\leq\lambda_{n-1}$. $L$ always has zero as an eigenvalue, for example, via the vector $\mathbf{1} = (1,\dotsc,1)^T$. But the multiplicity of said eigenvalue could be higher, depending on the number of connected components (Exercise 1). However when we assume $\Gamma$ is connected (i.e. contains a single connected component), it follows that $\lambda_1>0$. In much of the literature, we are often interested in the interplay and connections between geometric and combinatorial properties of $\Gamma$– things like its size, connectedness, density, or sparseness, for example; and the first eigenvalue $\lambda_1$ (either of $L$ or one of its many variants). For example…

Exercise 1.
(a) Show that the geometric multiplicity of the null eigenvalue of $L$ is the number of connected components of $\Gamma$.
(b) Show $\lambda_1\leq n,$ with equality if and only if $\Gamma$ is the complete graph. Hint: if $\Gamma$ is not complete, relate its Laplacian and that of its “complement” (suitably defined) to the Laplacian of the complete graph, and use eigenvalue interlacing.

For a more detailed introduction to spectral graph theory, see Fan Chung’s book[1, Ch. 1].

Our next goal is to try and take these notions and structures and construct extensions, or analogues, for the case where $\Gamma$ is a $k$-uniform hypergraph. As we shall see, this is not so easy. Consider if you desire the following perspective from simplicial homology (those unfamiliar or otherwise uninterested, skip to Definition 2).

One approach would be to try and construct $L$ not as a specific matrix, but as a descendant of an invariant of the topology of the underlying graph; which we could then hope admits a generalization to the case where the graph is instead a hypergraph. Specifically, the idea is to look at the homology of $\Gamma$ and build a Laplacian from its boundary operator- as follows.

Let $\Gamma$ be a graph and enumerate $E = \{e_1,\dotsc,e_m\}$. To each unordered pair $e_j=\{v,w\}\in E$ pick an orientation $(v,w)$ or $(w,v)$ and let $\widetilde{E}$ be the set of all such oriented edges. Let $S = (S_{v,e}) \in M_{n\times m}$ be the matrix whose rows are indexed by $v\in V$ and whose columns are indexed by the ordered edges $e\in\widetilde{E}$, defined by

$S_{v,e} = \begin{cases} 1&\text{if }e = (v,\cdot)\\ -1&\text{ if }e=(\cdot,v)\\ 0&\text{ otherwise} \end{cases}.$

It is left as an (unassigned) exercise to verify that $L=SS^T$; and in fact that the choice of orientations/signs in the preceding setup is completely arbitrary. Thinking of $S$ as a homomorphism mapping elements of the module $\mathbb{Z} E$ to the module $\mathbb{Z} V$ by matrix multiplication, we recognize $S$ as the boundary operator that is related to the homology groups of $\Gamma$ as a simplicial complex (see [3, Section 2.1, page 105]) and thus we have constructed $L$ as an invariant of the topology of $\Gamma$ (up to choices in the orientations of the edges).

But as it turns out, the capricious choice of orientation we made earlier to define $S$ is what makes graphs special. As soon as we allow 3-edges, or 1-edges for that matter, it is not at all obvious how to “orient” them, or how to distinguish between orientations (compare to the natural symbolic relationship $(v,w)=-(w,v)$). This, in fact, is a critically important distinction between graphs and hypergraphs. The answer is that we need more than linear operators, and so we look to tensors. The exposition that follows roughly follows two papers of Qi[4,5].

Definition 2. A tensor $\mathcal{T}$ of order $m$ and dimension $n$ over a field $\mathbb{F}$ is a multidimensional array

$\mathcal{T} = (t_{i_1i_2\cdots i_m}),\hspace{1cm}1\leq i_1,\dotsc,i_m\leq n.$

where each entry $t_{i_1i_1\cdots i_m}\in\mathbb{F}$. We call $\cal{T}$ an $(m,n)$-tensor over $\mathbb{F}$.

We can think of an $(m,n)$-tensor as a hypercubic array of size $n\times n\times\cdots\times n$, $m$ times). All of the tensors presented here are understood to be over $\mathbb{C}$. The case $(2,n)$ recovers the usual $n\times n$ square matrices. If $x\in\mathbb{C}^n$ then we define the product $\mathcal{T}x^{m-1}\in\mathbb{C}^n$ by

$(\mathcal{T}x^{m-1})_i = \sum_{i_2,\dotsc,i_m=1}^n t_{ii_2\cdots i_m}x_{i_2}\cdots x_{i_m},\hspace{1cm}1\leq i\leq n.$

(The expression $\mathcal{T}x^{m-1}\in\mathbb{C}^n$ is conventional notation.) Also as a matter of notation, for $x\in \mathbb{C}^n$ and $k\geq 0$, let $x^{[k]}\in\mathbb{C}^n$ be given by $(x^{[k]})_i = x_i^k$. There are lots of analogous notions and properties for tensors as compared to usual matrices; for example, we have the following formulation of the spectrum of a tensor.

Definition 3. Let $\cal{T}$ be an $(m,n)$-tensor. Then we say $\lambda\in\mathbb{C}$ is an eigenvalue of $\cal{T}$ if there exists a nonzero vector $x\in\mathbb{C}^n$, called an eigenvector of $\mathcal{T}$, which satisfies $\mathcal{T} x^{m-1} = \lambda x^{[m-1]}.$

Notice that no $(k,n)$ tensor $\mathcal{T}$ for $k\geq 3$ defines a true linear operator on $\mathbb{C}^n$, and hence we may not a priori speak of eigenspaces, kernels, etc. and their linear algebraic structure. (These notions do have meaning, but they require the tools of algebraic geometry to properly formulate, and are outside of our scope.) Nevertheless, we can construct tensors associated to $k$-uniform hypergraphs that will give rise to various eigenvalues. And if so, can we find relationships between the geometry of a hypergraph and its eigenvalues?

Definition 4. Let $\Gamma=(V,E)$ be a $k$-uniform hypergraph on $n$ vertices. The adjacency tensor of $\Gamma$ is the $(k,n)$-tensor $\mathcal{A} = (a_{i_1\cdots i_k})$ defined as follows. For each edge $e = \{i_1,\dotsc,i_k\}\in E$,

$a_{i_1i_2\cdots i_k} = \frac{1}{(k-1)!},$

and likewise for any rearrangment $(i_1',\dotsc,i_k')$ of the vertices in $e$. The values of $\mathcal{A}$ are defined to be zero otherwise. The degree tensor of $\Gamma$ is the $(k,n)$-tensor $\mathcal{D}=(d_{i_1\cdots i_k})$ defined by

$d_{i\cdots i} = d_i,$

where $d_i$ is the number of edges containing $i$, and the values of $\mathcal{D}$ are defined to be zero otherwise. The Laplacian tensor is the $(k,n)$ tensor $\mathcal{L}= (l_{i_1\cdots i_k})$ defined by $\mathcal{L} = \mathcal{D}-\mathcal{A}$.

A brief remark- our choice to focus on $k$-uniform hypergraphs is merely to simplify the exposition. There are well-developed versions of Definition 4 for general hypergraphs[2], but they are harder to write down and explore in this setting. With these fundamental notions established, the goal for the remainder will be to write down and explore a few basic propositions regarding the eigenvalues and eigenvectors of the tensors of a hypergraph $\Gamma$.

Proposition 1. Let $\Gamma$ be a $k$-uniform hypergraph, with $k\geq 3$, on $n$ vertices and $\mathcal{L}$ its Laplacian tensor.

• Let $\mathbf{e}_j\in\mathbb{C}^n$ denote the $j$-th standard basis vector. Then $\mathbf{e}_j$ is an eigenvector of $\mathcal{L}$ with eigenvalue $d_j$.
• The vector $\mathbf{1}$ occurs as an eigenvector of $\mathcal{L}$ with eigenvalue 0.

Proof. This is an exercise in calculations with tensors. For the first claim let $1\leq j\leq n$ be fixed, and notice that for any $m\geq 1$, $\mathbf{e}_j^{[m]} = \mathbf{e}_j$ and hence for each $1\leq i\leq n$, it holds

Notice that $a_{i_1\cdots i_k} = 0$ if any of the $i_k$‘s are repeated, and that $d_{ij\cdots j} = d_i$ if $i=j$ and 0 otherwise. It follows that $\mathcal{L}\mathbf{e}_j^{k-1} = d_j\mathbf{e}_j^{[k-1]}$. For the second claim, again we have $\mathbf{1}^{[m]} = \mathbf{1}$ for any $m$, and hence

where, with some abuse of notation in the second line, we emphasize that the sum contains all permutations of the $k-1$ terminal points in each edge $e=\{i,i_2,\dotsc,i_k\}\in E$. -QED.

Exercise 2. Let $\Gamma$ be a $k$-uniform hypergraph and $\mathcal{A}$ its adjacency tensor. Show that $\mathbf{e}_j$ occurs as an eigenvector of $\mathcal{A}$ with eigenvalue 0 for each $j$. Sanity check: why does this not imply that $\mathcal{A}=0$?

Our next proposition is a statement about the distribution of eigenvalues.

Proposition 2. Let $\Gamma$ be a $k$-uniform hypergraph on $n$ vertices, and $\mathcal{L}$ its Laplacian tensor. Let $\Delta$ denote the maximum degree of $\Gamma$. Then all of the eigenvalues of $\mathcal{L}$ lie in the disk $\{\lambda\in\mathbb{C}:|\lambda-\Delta|\leq\Delta\}$.

Proof. ([4, Thm. 6]) Let $\lambda\in\mathbb{C}$ be an eigenvalue for $\mathcal{L}$, and let $x\in\mathbb{C}^n$ be a corresponding eigenvector. Assume we have

$|x_i| = \max_{j=1,\dotsc,n}|x_j| \neq 0.$

Then by definition, it holds $\mathcal{L}x^{m-1} = \lambda x^{[m-1]}$, which in the $i$-th component reads

$\lambda x_i^{k-1} = \sum_{i_2,\dotsc,i_k=1}^n l_{i,i_2,\dotsc, i_k} x_{i_2}\cdots x_{i_k}.$

Removing a single term from the right-hand side, we get

But from the definition of $\mathcal{L}$ we recognize $l_{i\cdots i}$ as $d_i$ and hence by choice of $i$, we can estimate

where in the final equality we are using the definition of $\mathcal{L}$ and are recalling the calculation in the proof of proposition 1. We have hence proved that all eigenvalues $\lambda$ lie in a closed disk centered on $d_i\in\mathbb{C}$ of the same radius; but this $i\in V$ depends on the unknown eigenvector $x\in\mathbb{C}^n$, so by observing that for any $j\in V$,

$\{\lambda\in\mathbb{C}:|\lambda-d_j|\leq d_j\}\subset \{\lambda\in\mathbb{C}:|\lambda-\Delta|\leq\Delta\}$

the claim follows. -QED

Exercise 3. Let $\Gamma$ be a $k$-uniform hypergraph on $n$ vertices and $\mathcal{A}$ its adjacency tensor. Show that all of the eigenvalues of $\mathcal{A}$ lie in the disk $\{\lambda\in\mathbb{C}:|\lambda|\leq\Delta\}$.

For our final proposition, we strive for an analogue of Exercise 1(a)- that is, a relationship between the “size of the kernel” and the connected components of $\Gamma$. However for $k$-uniform hypergraphs, $\mathcal{L}$ will not be a linear operator for any $k\geq 3$, and hence a kernel is not well-defined a priori. But as we saw in Proposition 1, the same vector $\mathbf{1}$ shows up as an eigenvector for the null eigenvalue, and hence we are suspicious that more can be said.

A few small preliminaries are in order. A vector $x\in\mathbb{C}^n$ is called a binary vector if each of its components $x_i$ are either 0 or 1 for $1\leq i\leq n$. The support of a vector $x\in\mathbb{C}^n$ is the set $\text{supp}(x) = \{1\leq i\leq n:x_i\neq 0\}$. If $\mathcal{T}$ is a $(k,n)$-tensor over $\mathbb{C}$, then an an eigenvector $x\in\mathbb{C}^n$ associated to eigenvalue $\lambda\in\mathbb{C}$ is called a minimal binary eigenvector associated to $\lambda$ if $x$ is a binary vector, and if there does not exist another binary eigenvector $y\in\mathbb{C}^n$ with eigenvalue $\lambda$ for which $\text{supp}(y)$ is a proper subset of $\text{supp}(x)$.

Proposition 3. Let $\Gamma$ be a $k$-uniform hypergraph on $n$ vertices with Laplacian tensor $\mathcal{L}$. A binary vector $x\in\mathbb{C}^n$ is a minimal binary eigenvector associated to $\lambda = 0$ if and only if $\text{supp}(x)$ is the vertex set of a connected component of $\Gamma$

Proof. We first observe using the proof of Proposition 1 that a nonzero vector $x\in\mathbb{C}^n$ is an eigenvector associated to $\lambda=0$ if and only if, for each $1\leq i\leq n$, it holds

($\implies$). Let $x\in\mathbb{C}^n$ be any minimal binary eigenvector associated to $\lambda=0$ and let $i\in\text{supp}(x)$. It follows from equation (1) that for any $(i,i_2,\dotsc,i_k)\in E$ it holds $i_2,\dotsc,i_k\in \text{supp}(x)$ (otherwise, there will not be enough terms in the sum to achieve $d_i$). Applying this observation inductively to adjacent vertices in $\text{supp}(x)$, we see that the vertices $i\in\text{supp}(x)$ form either the vertex set of one component of $\Gamma$, or the disjoint union of several. But the latter case is incompatible with minimality, since any binary vector supported on a single component will give rise to an eigenvector with eigenvalue $\lambda=0$ (check!).

($\impliedby$). Now assume simply that $x\in\mathbb{C}^n$ is a binary vector with support equal to a single connected component of $\Gamma$. A short calculation verifies that the components $x_i$ for $i\in\text{supp}(x)$ satisfy equation (1), and hence that $x$ is a binary eigenvector for $\mathcal{L}$ assosicated to eigenvalue $\lambda=0$. All that remains to be checked is that $x$ is minimal; to this end, suppose $y\in\mathbb{C}^n$ is another binary eigenvector, and that $\text{supp}(y)\subsetneq \text{supp}(x)$. Then there must exist some vertex $i\in\text{supp}(y)$ for which there exists some edge $e= \{i,i_2,\dots,i_k\}\in E$ and an index $j\in e$ which does not belong to $\text{supp}(y)$. But then,

and equation (1) cannot hold for $y$, a contradiction. -QED

Corollary 1. A $k$-uniform hypergraph $\Gamma$ is connected if and only if the vector $\mathbf{1}$ is the unique minimal binary eigenvector with eigenvalue $\lambda=0$.

1. Fan Chung. Spectral Graph Theory. 1997.
2. Cunxiang Duan, Ligong Wang, Xihe Li, et al. Some properties of the signless laplacian andnormalized laplacian tensors of general hypergraphs. Taiwanese Journal of Mathematics, 24(2). 2020.
3. Allen Hatcher. Algebraic topology. 2005.
4. Liqun Qi. Eigenvalues of a real supersymmetric tensor. Journal of Symbolic Computation, 40(6). 2005.
5. Liqun Qi. H+-eigenvalues of laplacian and signless laplacian tensors. Communications in Mathematical Sciences, 12(6). 2013.
6. Max Wardetzky, Saurabh Mathur, Felix Kälberer, and Eitan Grinspun. Discrete laplace operators: no free lunch. In Symposium on Geometry processing. 2007.

## Math 202C: Lecture 14

Let us begin with a very brief overview of symmetric function theory. Let $X=\{x_1,x_2,x_3,\dots\}$ be a countable set of commuting indeterminates, referred to as an alphabet. Given $r \in \mathbb{N},$ let $\mathrm{Fun}(r,\mathbb{N})$ denote the set of functions

$i \colon \{1,\dots,r\} \longrightarrow \mathbb{N}.$

Consider the generating functions

$e_r = \sum\limits_{\substack{i \in \mathrm{Fun}(r,\mathbb{N}) \\ i \text{ strictly increasing}}} \prod\limits_{q=1}^r x_{i(q)},$

$h_r = \sum\limits_{\substack{i \in \mathrm{Fun}(r,\mathbb{N}) \\ i \text{ weakly increasing}}} \prod\limits_{q=1}^r x_{I(q)},$

$p_r = \sum\limits_{\substack{i \in \mathrm{Fun}(r,\mathbb{N}) \\ i \text{ constant}}} \prod\limits_{q=1}^r x_{I(r)}.$

These are respectively known as the degree $r$ elementary, complete, and power sum symmetric functions over the alphabet $X$. Note that they are not actually functions, but rather elements of the algebra $\mathbb{C}[[x_1,x_2,x_3,\dots]]$ of formal power series over $X.$ The “fundamental theorem of symmetric function theory” asserts that these series all generate the same subalgebra of $\mathbb{C}[[x_1,x_2,x_3,\dots]]$.

Theorem (Newton): We have

$\mathbb{C}[e_1,e_2,e_3,\dots]=\mathbb{C}[h_1,h_2,h_3,\dots]=\mathbb{C}[p_1,p_2,p_3,\dots].$

The subalgebra of $\mathbb{C}[[x_1,x_2,x_3,\dots]]$ generated by any of the above three fundamental families is called the algebra of symmetric functions, and typically denoted $\Lambda$ or $\Lambda[X]$ if the alphabet needs to be specified. This algebra has a lot of internal structure which expresses itself in combinatorially interesting ways. To see some of this, you might try expressing the elementary symmetric functions in terms of the power sums, as Newton did.

Definition 1: A specialization of $\Lambda$ is an algebra homomorphism from $\Lambda$ to a commutative $\mathbb{C}$-algebra $\mathbf{A}.$

In practice, specializations typically occur as families of substitutions. Here is a very simple example. For any $n \in \mathbb{N},$ we have a specialization

$\Xi_n \colon \Lambda \to \mathbb{C}$

defined by

$\Xi_n(f) = f(\underbrace{1,\dots,1}_n,0,0,0,\dots),$

i.e. by substituting the numerical value $1$ for each of the variables $x_1,\dots,x_n$ (or any other specified set of $n$ variables), and substituting the numerical value $0$ for the rest. It is natural to identify the homomorphism $\Xi_n$ with the numerical multiset

$\{\{\underbrace{1,\dots,1}_n,0,0,0,\dots\}\}$

and write $f(\Xi_n)$ rather than $\Xi_n(f)$ to reflect the substitution.

Problem 1: Explicitly compute $e_r(\Xi_n),h_r(\Xi_n),p_r(\Xi_n).$

Another interesting numerical specialization arises from taking

$\Xi_n = \{\{1,\dots,n,0,0,0,\dots\}\}.$

For this numerical substitution we have

$p_r(\Xi_n) = 1^r + 2^r + \dots +n^r,$

which is a sum people have been thinking about since antiquity. In particular, there are various formulas for this sum, many of which involve the Bernoulli numbers. In order to evaluate $e_r(\Xi_n)$ and $h_r(\Xi_n),$ one may proceed rather differently as follows. Let $t$ be a new variable, and consider the generating functions

$E(t) = 1 + \sum\limits_{r=1}^\infty e_r t^r \quad\text{ and }\quad H(t) = 1 + \sum\limits_{r=1}^\infty h_rt^r,$

which are elements of $\Lambda[[t]],$ the algebra of formal power series in the single variable $t$, with coefficients in $\Lambda$. These generating functions may be explicitly evaluated without too much effort.

Problem 2: Show that

$E(t) = \prod\limits_{t=1}^\infty (1+tx_i) \quad\text{ and }\quad H(t) = \prod\limits_{t=1}^\infty \frac{1}{1-tx_i}.$

From Problem 2, we immediately get the identities

$1+\sum\limits_{d=1}^\infty e_r(\Xi_n)t^r = (1+t)(1+2t) \dots(1+nt)$

and

$1+\sum\limits_{d=1}^\infty h_r(\Xi_n)t^r = \frac{1}{(1+t)(1+2t) \dots(1+nt)},$

from which one can conclude that $e_r(\Xi_n)$ and $h_r(\Xi_n)$ are in fact Stirling numbers.

We can repackage the the content of Lecture 13 as the study of a certain family of specializations

$\Lambda \longrightarrow \mathcal{Z}(n),$

where $\mathcal{Z}(n)$ is the center of the group algebra $\mathbb{C}\mathrm{S}(n).$ These specializations are defined using the substitution multiset

$\Xi_n =\{\{J_1,\dots,J_n,0,0,0,\dots\}\},$

where

$J_t = \sum\limits_{1 \leq s

are the Jucys-Murphy elements in $\mathbb{C}\mathrm{S}(n).$ The natural basis of $\mathcal{Z}(n)$ is the conjugacy class basis,

$\{C_\mu \colon \mu \vdash n\},$

so understanding the JM specialization of $\Lambda$ means being able to compute

$[C_\mu]f(\Xi_n),$

the coe