First up, a list of preliminary notations and definitions of terms that recur throughout this blog post.
denotes the -dimensional hypercube graph whose vertex set consists of all strings in and two vertices are adjacent iff the corresponding strings differ exactly in one-coordinate.
For an undirected graph , the maximum degree of a vertex in is denoted by and denotes the largest eigenvalue of the adjacency matrix of .
For and a subset of indices from , the string obtained from by flippling all entries of corresponding to the indices in is denoted by .
Any function is called a boolean function.
Definition (Local sensitivity and sensitivity). Given a boolean function , the local sensitivity on the input , denoted by , is defined as the number of indices , such that , and the sensitivity of is .
Definition (Local block sensitivity and sensitivity). Given a boolean function , the local block sensitivity on the input , denoted by is defined as the maximum number of disjoint blocks of such that for each , and the block sensitivity of is .
The sensitivity conjecture and the tale of three theorems
(Sensitivity conjecture) There exists an absolute constant , such that for every boolean function , .
Sensitivity and block sensitivity are examples of complexity measures for boolean functions. Two complexity measures and are said to be polynomially related to each other if there exist polynomials such that for any Boolean function , we have and . The block sensitivity is known to be polynomially related to other complexity measures of boolean functions such as the degree (to be defined later). However, the question (a.k.a the sensitivity conjecture) of whether the block sensitivity is polynomially related to the sensitivity was posed as an open problem by Nisan and Szagedy in 1989 (Note that by a direct consequence of their definitons. The open problem was showing whether for some ). A few years later in 1992, Gotsman and Linial [2] showed the equivalence of the sensitivity conjecture to another open problem on the hypercube. Finally in 2019, Hao Huang resolved the sensitivity conjecture by leveraging on the equivalence shown by Gotsman and Linial.
Cue harmonic analysis on finite abelian groups
In lecture post 7 by Prof. Novak, we saw that is isomorphic to the group , i.e. bitstrings of length (Recall that denotes the subgroup of generated by the n disjoint transpositions Thus, a boolean function , can be identified with a member of the Hilbert space and the rich tools of harmonic analysis on discussed in lecture posts 7-9 are applicable to boolean functions. Note that the hypercube is the subgraph of the Caley graph of induced by the subgroup . Thus, the sensitivity conjecture can be rephrased in the language of induced subgraphs of the hypercube and this is exactly the essence of the work done by Gotsman and Linial in 1992.
Before we dive into the details, a few (post)-preliminary notations and remarks.
Given an induced subgraph of , let denote the subgraph of induced on the vertex set .
For , the function is defined as , with the convention that for all where denotes the empty set.
In lecture post 7 (and also in the qual exam) we saw that the functions (which exactly correspond to the characters of since is isomorphic to ) form an orthogonal basis for under the inner product .
Thus, given a boolean function , by Fourier expansion, can be written as , where denotes the Fourier coefficients.
The average of a boolean function on is denoted as . Note that since for all , we have the identity which is quite useful when proving theorem 1 (see below).
Definition. Given a boolean function , the degree of is defined as .
Theorem 1 [Gotsman and Linial]. The following two statements are equivalent for any monotone strictly increasing function . a. For any induced subgraph of with , we have . b. For any boolean function , we have that .
Gotsman and Linial show that proving the equivalence of statements a and b above is equivalent to proving the equivalence of yet another two statements and then proceed to prove this second equivalence.
Proof of Theorem 1. First, we show the equivalence of statement a to following statement a’:
a’. For any boolean function , implies there exists an input such that .
a’ implies a : Given an induced subgraph of , it can be associated with a corresponding boolean function defined such that iff . By a direct consequence of the definitions of adjacency in and the local sensitivity of for and for (Note that here denotes the degree of the vertex in the subgraph and is not to be confused with the degree of a boolean function that was defined earlier). Thus, . Let denote the average value of on , i.e. . Note that implies . Thus, statement a’ implies .
a implies a’ : Given a boolean function , define to be the subgraph induced by the set of vertices . By an identical arument as in the previous paragraph, . Also, implies . Thus, statement a implies thereby showing the equivalent of statements a and a’.
Now, we show the equivalence of statement b to statement b’:
b’. For any boolean function , implies .
Assuming b and that the premise for b’ is true, then and since is assumed to be monotonic strictly increasing this shows that and thus b implies b’. Now, towards showing the other implication let denote a boolean function with degree . Assume without loss of generality that for . Now, consider the function defined as . By construction and , i.e has full degree as a boolean function on . Now, the contrapositive of b’ says that implies . Thus, and this finshes showing the equivalence of b and b’.
Thus, proving theorem 1 has been reduced to the task of proving the equivalence of statements a’ and b’. Now, given a boolean function , define another boolean function where is the parity function of . Note that . Thus,
Thus, for all , . Also note that for all , because every bit flip which causes a change in value of does not cause a change in function value of and vice-versa. Thus, where denotes the empty set.
a’ implies b’: Towards a contradiction assume that and . This by definition implies and thus . By a’, there exist a such that and since this implies there exists a such that which contradicts the premise that .
b’ implies a’: Again towards a contradiction assume that given such that we have that for all . Since this implies that which by 2′ implies . Note that is equivalent to and thus which contradicts the premise that and this completes the proof of theorem 1.
Theorem 2 [Huang] . For every integer , let be an arbitrary -vertex induced subgraph of , then
The proof of theorem 2 is going to be the main focus for the remainder of this blogpost but before that we discuss two corollaries which show why theorem 2 resolves the sensitivity conjecture.
Corollary 1. For every boolean function , .
Proof of Corollary 1. For any induced subgraph of with , since the total number of vertices in is , either contains atleast vertices or does. So without loss of generality, assume contains atleast vertices (If not, just replace with in the remainder of the proof). Now, theorem 2 (along with the fact that the maximum degree is non-decreasing in the number of vertices in ) implies . Let and note that it is a monotone strictly increasing function on . Now, theorem 1 implies and this completes the proof.
Theorem 3 [Tal]. For every boolean function , .
It should be noted that theorem 3 is an imporvement on the work by Nisan and Szegedy in 1992 wherein they showed that .
Corollary 2. For every boolean function , .
Note that corollary 2 follows immediately from theorem 3 and corollary 1. Thus, corollary 2 confirms the sensitivity conjecture by showing that the block sensitivity and the sensitivity of a boolean function are indeed polynomially related.
Cue matrix analysis – “A proof that came straight from THE Book”
In this section, we will focus on Hao Huang’s proof of theorem 2. The core chunk of the proof invovles a creative construction of an iterative sequence of matrices whose instance is closely related to the adjacency matrix of and an application of Cauchy’s interlace theorem which is stated as follows,
(Cauchy’s interlace theorem) Let be a symmetric , be a principal submatrix of for some . If the eigenvalues of are , and the eigenvalues of are , then for all ,
Recall that we saw in Math 202A that Cauchy’s interlace theorem can be proven via a direct application of the Courant-Fischer theorem.
Now, we move on to the iterative matrix sequence that turns out to be a bite-sized kryptonite for the sensitivity conjecture. Consider the sequence of symmetric square matrices defined iteratively as follows,
The first exercise is to study the spectrum of and the second exercise is to compute an upper bound for the largest eigenvalue of a specific type of matrix. These two exercises then set us up nicely for a clever application of Cauchy’s interlace theorem to prove theorem 2.
Exercise 1. a) Show that where denotes the identity matrix. b) Show that the trace of is . c) Conclude that the eigenvalues of are with multiplicity and with multiplicity .
Exercise 2. Suppose is a -vertex undirected graph and is a symmetric matrix whose entries are in , and whose rows and columns are indexed by the vertex set of . Also assume whenever and are non-adjacent vertices in . Then show that
The defintion of this iterative sequence of matrices is not without precedent. Note that the adjacency matrix of can also be defined in a similarly iterative manner. Recall that a vertex of is a string in . Given a -length string of -1s and 1s it naturally extends to exactly two strings in and these two -strings correspond to the only two vertices of whose corresponding strings have their first entries to be the same as the original string (note that these two vetices of are adjacent). Exapnding on this observation, the hypercube can be thought of as having two sub-hypercubes with a perfect matching connecting these two subcubes. In otherwords, consider the following sequence of iteratively defined matrices,
Then represents the adjacency matrix of and in particular, note that is equal to the matrix obtained by changing every entry of to .
Now that we have all the necessary ingredients, let’s discuss the proof of Theorem 2.
(Restated)Theorem 2 [Huang] . For every integer , let be an arbitrary -vertex induced subgraph of , then
Proof of Theorem 2[Haung].
Let be the sequence of matrices that we defined above and let be a -vertex induced subgraph of . Note that naturally induces a principal submatrix of (written with respect to an appropriate permutation of the standard basis). As discussed in the preceeding paragraph, by changing every entry of to 1, we get the adjacency matrix of . So, in particular, and satisfy the conditions of the statement in Exercise 2 and thus,
Also, since is a principal submatrix of the matrix , by Cauchy’s interlace theorem,
From Exercise 1, we know that the eigenvalues of are with multiplicity and with multiplicity . Thus, and thus,
and this completes the proof.
Closing Remarks
I will end this blog post for our Math 202C class with a quote from Scott Aaronson’s blog post about Hao Huang’s proof of the sensitivity conjecture.
Paul Erdös famously spoke of a book, maintained by God, in which was written the simplest, most beautiful proof of each theorem. The highest compliment Erdös could give a proof was that it “came straight from the book.” In this case, I find it hard to imagine that even God knows how to prove the Sensitivity Conjecture in any simpler way than this.
-Scott Aaronson
The funny thing is, a few days after Prof. Aaronson’s blogpost, his former student Shalev Ben-David posted in the comments section, a simplication of Huang’s proof which does not use Cauchy’s interlacing theorem.
Anyways, talking about the sensitivity conjecture is indeed quite a fitting end to the Math 202 series since both, Gotsman & Linial’s work and Hao Huang’s work on this 30 year old ex-open problem utilize almost exclusively, concepts which we have studied over the past three quarters.
Thanks for reading my post and have a great summer everyone!
References
[1] Huang, H. (2019). Induced subgraphs of hypercubes and a proof of the sensitivity conjecture. Annals of Mathematics, 190(3), 949-955.
[2] N. Nisan, M. Szegedy, On the degree of Boolean functions as real polynomials, Comput. Complexity, 4 (1992), 462–467.
[3] C. Gotsman, N. Linial, The equivalence of two problems on the cube, J. Combin.Theory Ser. A, 61 (1) (1992), 142–146.
[4] F. Chung, Z. F¨uredi, R. Graham, P. Seymour, On induced subgraphs of the cube, J. Comb. Theory, Ser. A, 49 (1) (1988), 180–187.
[5] A. Tal, Properties and applications of boolean function composition, Proceedings of the 4th conference on Innovations in Theoretical Computer Science, ITCS ’13, 441–454.
[6] Karthikeyan, Rohan & Sinha, Siddharth & Patil, Vallabh. (2020). On the resolution of the sensitivity conjecture. Bulletin of the American Mathematical Society. 57. 1. 10.1090/bull/1697.
In this lecture, we will introduce the Fourier transform on finite Abelian group and a discrete version of “uncertainty principle” based on the chapter 2 of “Discrete Harmonic Analysis” by Ceccherini-Sillberstein[1]. In quantum mechanics, there is a well-known uncertainty principle saying that the more precisely the position of some particle is determined, the less precisely its momentum can be predicted from initial conditions, and vice versa. For finite Abelian group, there is a theorem that assimilates this uncertainty principle, so people refer to this group-theoretic version as “uncertainty principle” too. However, our focus today will be Tao’s uncertainty principle for cyclic group of prime order[2], which is a refinement of the discrete uncertainty principle due to the special structure of the group.
First, let us introduce some notations and redefine the Discrete Fourier transform as Andrew did in Lecture 18. In fact, these materials already appeared in Lecture 8, where we studied the concrete example of the the rank hypercube group .
Let be a finite set and denote by the vector space of all complex -valued functions defined on . Then , where denotes cardinality.
For we denote by the Dirac function centered at , that is , the element defined by if and if for all .
The set is a natural basis for and if then .
The space is endowed with the scalar product defined by setting
for , and we denote by the norm of . Note that is an orthonormal basis of with respect to .
Let be a finite Abelian group. A character of is a map such that
for all . The set of all characters of is an Abelian group with respect to the product defined by for all . is called the dual of .
For , where is the cyclic group of order , , . Let . For , define by the function .
Exercise 1: Check that for , is indeed a character of , and is isomorphic to .
Now we define the Discrete Fourier transform. Let be a finite Abelian group. The Fourier transform of a function is the function defined by
for all . Then is called the Fourier coefficient of with respect to . When and , we call the Discrete Fourier transform (DFT) of . Specifically, for and , the DFT of is
The uncertainty principle is the following (see Theorem 2.5.4 of [1]): Theorem (Uncertainty principle): Let and suppose that . Then
Now we come to the main part of this lecture: Tao’s uncertainty principle for cyclic group of prime order. This improves the inequality in the above discrete uncertainty principle when the finite Abelian group is cyclic with prime order . To prove the theorem, we present some specific tools developed in Tao’s paper [2].
The main theorem in [2] is the following: Theorem (Tao’s uncertainty principle): Let be a prime number and non-zero. Then
Conversely, if are two subsets such that , then there exists such that and .
To prove the theorem, we present some specific tools developed in Tao’s paper [2].
Recall that denotes the ring of polynomials with integer coefficients.
Proposition 1: Let be a polynomial in the variables with integer coefficients. Suppose that for some ,
Then there exists a polynomial with integer coefficients such that
.
Remark: The proof of this proposition can be omitted for the first reading. Instead, considering and trying to find corresponding by using this proposition would be helpful for understanding.
Proof: For the sake of simplicity, suppose that and so that . Let us denote by (respectively ) the sum of the monomials of with positive (respectively negative) coefficients so that
Note that
since . This implies that there exists a bijection between the monomials in and those in . More precisely, let us fix and ; then the monomial appears in if and only if appears in . Suppose this is the case. Then we can find and non-negative integers such that the sum of the monomials of whose variables have degree for and is
and
but also such that
(because otherwise there would be a cancellation). Thus, with every monomial such that we can (arbitrarily but bijectively) associated a monomial with and . Now for we have the identity
Exchanging with we get the analogous identity for . This shows that
is divisible by .
Repeating the argument for each monomial with and appearing in , we deduce that is divisible by .
Lemma 1: Let be a prime, a positive integer, and a polynomial with integer coefficients. Suppose that are (not necessarily distinct) th roots of unity such that . Then is divisible by . Proof: Letting we can find integers such that for .
Define the polynomials by settinig
where . Then and since we deduce that is a multiple of the minimal polynomial of , that is, for some , where we used a fact in undergraduate algebra that the minimal polynomial of is . It follows that .
Theorem 3(Chebotarev): Let be a prime and . Let (respectively ) be the distinct elements in . Then the matrix
is nonsingular.
Proof: Set for . Note that the s are distinct th root of unity and . Define the polynomial with integer coefficients by setting
As the determinant is an alternating form, we have whenever , so that, by recursively applying Proposition 1, we can find a polynomial with integer coefficients such that
To prove the theorem, it is equivalent to show that (because s are all distinct) so that, by Lemma 1 it suffices to show that does not divide . For this we need the next three lemmas to complete this proof. Let us first introduce some notations.
Given an -tuple of non-negative integers, we say that the (monomial) differential operator
is of type and order .
Lemma 2: Let be a differential operator of type and and two polynomials. Then
where the sum runs over all pairs such that (componentwise) (and therefore ). Proof: Left as as exercise.
Exercise 2: Prove Lemma 2. (Hint: use induction on the order of .)
Lemma 3: For and we have
where runs over all with and the s are non-negative integers such that . In particular,
with . Proof: Proof by induction on . For the details see Lemma 2.6.15 of section 2.6 of [1].
Lemma 4: Let , that is,
Then if and are given by
we have
Proof: By Lemma 2 and 3, we have
with . Here the first term is obtained by acting the whole on (the second term of ). In particular, taking for we complete the proof.
Now we can finish the proof of Theorem 3 with these lemmas.
Proof of Theorem 3 (continued): For as in Lemma 4, we have (where denotes the symmetric group of degree )
since
for all . Thus,
is the Vandermonde determinant. Since for , we deduce that is not divisible by because . Thus, by Lemma 4, is not divisible by either. By Lemma 1, this completes the proof of Theorem 3.
Given a non-empty subset and a function , we denote by its extension defined by setting for all . For simplicity, we regard the DFT as a map . In other words, for and ,
Proposition 2: Let be a prime. Let such that . Then the linear map defined by is invertible.
Proof:
Set and and consider the basis of (respectively, of ) consisting of the Dirac functions with (respectively, with ) and let . Then we have
By virtue of Theorem 3, we have
,
showing that is indeed invertible.
Finally, we can prove the main result, Theorem 2.
Proof of Theorem 2: Suppose by contradiction that there exists non-zero such that and with . Then we can find a subset such that and . We deduce that is identicially zero. Since , this contradicts the injectivity of (c.f. Proposition 2).
Conversely, let be two subsets such that . Let such that and . Note that . Taking cardinalities, , which yields
Consider the map . By Proposition 2, we can find such that so that vanishes on but . Setting we clearly have and . By the first part of the theorem,
so that all the inequalities are in fact equalities, i.e., and . This completes the second part of the theorem.
An immediate consequence of Theorem 2 is that any sparse polynomial with non-zero coefficients and , when restricted to the th roots of unity , can have at most zeroes. Indeed, such a polynomial is essentially the Fourier transform in of a function whose support has cardinality , and so the support of the polynomial must contain at least th roots of unity by Theorem 2, and the claim follows.
Andrew Dennehy have already introduced the concept of the discrete Fourier transform (DFT) to us in Lecture 18, but I would like to retake the path with more details, because there are some other concepts (which help for fully understanding) I would like to talk about.
We mainly talk about how fast Fourier transform (FFT) and inverse fast Fourier transform (IFFT) work, we sightly talk about the calculation error.
We will see how FFT and IFFT work from both the perspective of math and computer, along with their applications and some specific problems. Some problem may involve the number theoretic transform (NTT), which is recognized as the integer DFT over finite field.
We use Python3 for code examples, since we would not use version related feature, the specific version does not matter. (For instance, Python3.5.2 and Python3.8.1 are both ok.) We would call Python instead of Python3 in the following content. (If you do not install Python, there are many online interpreters that can run Python codes, for instance, Try It Online.) Recommend to view this page with computer for getting the best experience. Recommend to try the codes and solve some problems by yourself, which definitely would be challenging and interesting.
There are only 2 exercises in this lecture, other problems are interest based algorithm problems (no need to do as homework), which might be difficult to solve if you are not very familiar with the FFT and NTT in this form, but I will give hints to guide you.
Now let us start with an easy example.
Example 1: We consider two polynomials
Multiply them together (using distributive property), we would get
Definition 1 (coefficient representation): For a polynomial , the list is its coefficient representation. We denote as the coefficient of in .
Use definition 1, we can write as , as and as .
The naïve polynomial multiplication function is not hard to get:
def naive_poly_mul(P1, P2):
Q = [0] * (len(P1) + len(P2) - 1)
for i in range(len(P1)):
for j in range(len(P2)):
Q[i + j] += P1[i] * P2[j]
return Q
In the general case, i.e.,
it is easy to see that the complexity of the naïve polynomial multiplication function is . Note that (count from to ). If we consider the specific condition , then the complexity becomes .
Definition 2 (degree of polynomial): The degree of a polynomial is the highest of the degrees of the polynomial’s monomials (individual terms) with non-zero coefficients.
In Example 1, are both -degree polynomials.
Definition 3 (value representation): Except representing a th degree polynomial with coefficients, we can also represent a th degree polynomial with proper (see below Exercise) points on the polynomial, which is called the value representation.
Exercise 1 : Prove that points with distinct determine a unique polynomial of degree . Hint: use the fact that a Vandermonde matrix is nonsingular without proof.
Back to Example 1, to get , we can first get from with distinct , then the points just represent .
When the degree of are both , we can see that the multiplication here just needs complexity .
However, at most time, we only need the coefficient representation, because it is more suitable to calculate the values in its domain. It is not hard to figure out we can do the multiplication in this way:
If we only consider the situation that we aim to multiply two th degree polynomials, the multiplication part only costs , so the bottleneck of the complexity lies in the algorithm changing the coefficient representation into the value representation and the algorithm changing it back.
The naïve way to get the value representation of a coefficient represented polynomial cost at least . (To get each value we need , and we totally need values.)
The essential idea is selecting specific points to reduce the calculation cost.
The straight thought would be looking at the parity of a function. Because for any odd function , we have while for any even function , we have .
Actually, we can divide any polynomial into one odd function plus one even function. Take a look back to in Example 1, we can write
Notice that is actually an even function, but write in this form allows us to take as the variable. It follows that
Note that has degree and has degree while has degree . Once we get and , we would immediately get and .
What we only need to do is recursive process and . It would lead to an recursive algorithm.
However, we would encounter the problem that the symmetric property could not be maintained further (we can pair and , but how to pair ).
As we already see in the previous Lecture, we can use the roots of unity to solve this problem. Denote . Note that .
To make it easier to understand, we now only consider for some positive integer , we would evaluate polynomial at , then , next , etc. Just as the figure below.
Diagram 1
We pair each with . For any polynomial , we can get fast by evaluating and recursively. Recall that we divide as . The corresponding formula is
This leads to the naive FFT code:
from math import pi
from math import sin
from math import cos
def get_omega(n):
theta = 2 * pi / n
omega = cos(theta) + sin(theta) * 1j
return omega
def naive_FFT(P):
# P is the coefficient representation
# P = [a_{0}, a_{1}, ..., a_{n-1}]
# current we assume n = 2^{k}
n = len(P)
half_n = n // 2
if n == 1:
return P # constant function
omega = get_omega(n)
P_even, P_odd = P[::2], P[1::2]
V_even, V_odd = naive_FFT(P_even), naive_FFT(P_odd)
V = [0] * n # the value representation
for j in range(half_n):
V[j] = V_even[j] + omega ** j * V_odd[j]
V[j + half_n] = V_even[j] - omega ** j * V_odd[j]
return V
Feed in example 1 as input, we would get
Now we can apply Fourier transform to a coefficient representation to get the corresponding value representation, and the multiplication in the value representation form is easy to implement, what remains to solve is the inverse Fourier transform.
In the matrix-vector form for , we have
Note that the character table of is just as the form of the DFT matrix:
To get the inverse Fourier transform, we can just use the inverse of the matrix:
Exercise 2: Check that the inverse DFT matrix is the inverse of the DFT matrix.
The difference between DFT and FFT is just the way of calculating these matrix-vector forms, where DFT uses the direct matrix-vector multiplication way in and FFT uses the tricky recursive way to achieve .
The inverse DFT matrix leads to the naive IFFT code:
def naive_IFFT(V, is_outest_layer=False):
# V is the value representation
# w means omega_{n}
# V = [P(w^{0}), P(w^{1}), ..., P(w^{n-1})]
# current we assume n = 2^{k}
n = len(V)
half_n = n // 2
if n == 1:
return V # constant function
omega = 1.0 / get_omega(n) # omega_{n}^{-1}
V_even, V_odd = V[::2], V[1::2]
P_even, P_odd = naive_IFFT(V_even), naive_IFFT(V_odd)
P = [0] * n # the value representation
for j in range(half_n):
P[j] = P_even[j] + omega ** j * P_odd[j]
P[j + half_n] = P_even[j] - omega ** j * P_odd[j]
if is_outest_layer:
for j in range(n):
P[j] /= n
return P
Use in example 1, we would get
If we ignore the little error, this is just the coefficient representation of .
The following materials might not be that clear and might not be easy to understand, but I will try my best. (Some material cannot be expanded too much, otherwise that would cost too much space and might confuse the main part.)
The lowest layer of Diagram 1 is just the bit-reversal permutation index and there is a neat code to generate:
def get_BRI(length):
# Bit-Reversal Index
n = 1
k = -1
while n < length:
n <<= 1
k += 1
BRI = [0] * n
for i in range(n):
BRI[i] = (BRI[i >> 1] >> 1) | ((i & 1) << k)
return BRI
It is more easy to see in a tabular (an example of -length BRI)
Use this we can implement the Cooley–Tukey FFT algorithm, which is the most common FFT algorithm. Further, with proper manner of coding, we can devise an in-place algorithm that overwrites its input with its output data using only auxiliary storage, which is called the iterative radix-2 FFT algorithm. Moreover, since the form of FFT and IFFT are actually very similar, we can integrate them together.
def FFT(X, length, is_inverse=False):
# X : input, either coefficient representation
# or value representation
# length : how much values need to evaluate
# is_inverse : indicate whether is FFT or IFFT
inverse_mul = [1, -1][is_inverse]
BRI = get_BRI(length)
n = len(BRI)
X += [0] * (n - len(X))
for index in range(n):
if index < BRI[index]: # only change once
X[index], X[BRI[index]] = X[BRI[index]], X[index]
bits = 1
while bits < n:
omega_base = cos(pi/bits) + inverse_mul * sin(pi/bits) * 1j
j = 0
while j < n:
omega = 1
for k in range(bits):
even_part = X[j + k]
odd_part = X[j + k + bits] * omega
X[j + k] = even_part + odd_part
X[j + k + bits] = even_part - odd_part
omega *= omega_base
j += bits << 1
bits <<= 1
if is_inverse:
for index in range(length):
X[index] = X[index].real / n
# only the real part is needed
return X[:length]
Note that we could ignore the return part, since is already changed. This algorithm would extend the input length to its closest larger bit number (of form ), but under most condition, we would take the length as before we use this algorithm (adding ‘s).
Because we use the complex number to implement the FFT algorithm, we can see that the error is hard to eliminate. Even though the initial polynomial is integer based, apply FFT to it, then apply IFFT, we would get a decimal list with some calculation error.
If we do the FFT in the field , where , denote as a primitive root modulo , then is a cyclic group of order , we can replace with to do the FFT. This method is called NTT, since only integers are involved, the errors are not possible to appear.
For arbitrary modulo , the aiming NTT length , we can take a set of distinct NTT modulo satisfies
do NTT respectively on all , then use the Chinese remainder theorem to combine them together getting the final result modulo .
Note that during the NTT algorithm, the maximum intermediate value would not exceed .
We may say the FFT algorithm solves the convolution in the form of
Use the NTT with some modulo of the form to calculate all for in time complexity .
(Not mandatory) Problem 2:PE 537. Hint: If denote as the list of the value of , i.e., , then the convolution of and , named , is the list of the value of , i.e., , then the convolution of and , named , is the list of the value of , i.e., , etc. You can consider as the generating function. You might need to learn how to sieve prime numbers and use fast exponentiation.
in time complexity , where is some binary operation, usually is bitwise OR, bitwise AND, and bitwise XOR.
Using FFT, we can do a lot of things for polynomials fast, for instance, for a polynomial , we can find a polynomial , such that , this is called the inverse of under modulo . The basic idea is to find the inverse polynomial under modulo , then , etc. Because the inverse polynomial under modulo is trivial, we can solve this recursively. Similar idea may be apply to the Newton’s method under modulo , specifically, we can find the square root of a polynomial under modulo .
In the above figure, calculates each hidden layer value using the input layer with weight , sums all the hidden layer with weight . Note this is an integral formula, we work on a continuous condition, which means that the hidden layer has infinite width (the hidden layer is considered to have infinite nodes).
Written by a countably infinite number of monkeys with an uncountably infinite number of typewriters, edited by Andrew Dennehy
In a bold twist on the standard formula, in this lecture post we will be discussing math. Specifically, we will be discussing the Cohn-Uman algorithm and how representation theory can provide us an algorithm to multiply two matrices together. We will then see how this perspective could potentially help us prove an important theorem in complexity theory, namely that there exists an algorithm which computes the product of two matrices in time for any . Before we get to that, though, we will first cover an algorithm with many similarities to the one we will be discussing.
Consider the problem of multiplying two polynomials
The naïve approach would be the one that’s taught in high school, namely finding all the terms and using them to calculate the coefficients of . However, we’re significantly cooler than high school students, so we’re going to use some significantly cooler math. First, we introduce the concept of the discrete Fourier transform. Let be the -th root of unity. The DFT of a vector given by
Equivalently, if is the -th component of ,
So, if we apply this transformation to the vector of coefficients of , we get that
So, the problem of finding the Fourier coefficients of is equivalent to the problem of evaluating at . From now on, assume is a power of 2, since if it isn’t we can just add a bunch of 0 coefficients until we have a power of 2. Define the following two polynomials:
This lets us decompose as
So, instead of evaluating at all those powers of , we can instead evaluate and at and then combine the results. So, we have reduced the problem of plugging numbers into a degree polynomial into plugging numbers (specifically the -th roots of unity) into two degree polynomials. Because we’ve assumed that is a power of 2, so is , meaning we can repeat the same algorithm for and . We can apply this process repeatedly, which gives us a recursive algorithm with complexity for calculating the coefficients. Similarly, one can show that we can invert the DFT by applying the same process to the polynomial
with replaced with and then dividing the result by . The upshot of all of this work is that
So, we can calculate the Fourier coefficients of by calculating the coefficients of each polynomial independently and multiplying them together. Since this has linear complexity and both the Fourier and Inverse Fourier transforms have complexity, this method gives us an complexity algorithm to multiply two degree polynomials.
Notice that instead of using the roots of unity, we could have instead used a generator for and worked over with identical results. So, we can frame multiplication of polynomials as a special case of of multiplying elements in .
Next, we take a slight detour to discuss some group theoretic concepts. For any subset of a group (not required to be a subgroup), let the right quotient set of be defined to be
Exercise 1: Assume that and . What does this imply about ? Show that the condition that this implies is actually equivalent to , so the implication goes both ways.
Definition 1: A group realizes if there are subsets such that and for , if
then . This condition is called the triple product property. Furthermore, this is equivalent to the condition that if , then .
Although it isn’t entirely trivial, one can show that if realizes , then it realizes any permutation of those numbers.
Exercise 2: Prove that realizes .
This is the “trivial realization” of . We now get our first taste of why these ideas could be useful for the problem of matrix multiplication:
Theorem 1: Let be any field. If realizes , then the number of field operations required to multiply an matrix by an matrix over is at most the number of operations required to multiply two elements of
Proof: Let realize through the subsets . Then, for matrices and of sizes and respectively, we can index the rows of by , the columns of and the rows of by , and the columns of by . Then, consider the product
By the triple product property, we have that
which is true iff , , and . So, the coefficient of in the product is
So, you can just read off the elements of from these coefficients.
The process we just described, of converting matrix multiplication to multiplication in , is called the Cohn Uman algorithm. Also, we can now see the connection to the polynomial algorithm we described earlier; we’ve converted a problem of multiplying matrices to a problem of multiplying elements of , which is what we did with polynomials before using .
We can now define a measurement of the “quality” of embedding that a given group can provide.
Definition 2: The pseudo-exponent of a non-trivial finite group is the minimum of
over all (not all 1) such that realizes .
Lemma 1: For any group , . Further, if is abelian, then
Proof: is clearly at most 3, since we can just take and , meaning realizes .
To show our lower bound, suppose realizes (not all equal to 1) through the subsets . Then, the triple product tells us that the map is injective on and the image only intersects in the identity. So, with equality only occurring if . The same argument shows this is true for and , so . Therefore,
So, . Finally, if is abelian, then the product map from to must be injective (because the triple product property implies the kernel of the map is trivial), so , meaning .
Next, we want to relate to , the exponent of complexity for matrix multiplication, i.e. the smallest number such that we can perform matrix multiplication in time for any . To do this, we first recall from 202B that
where the ‘s are the dimensions of the irreducible representations of . Then the most important theorem we will discuss is as follows:
Theorem 2: Suppose that is a group with character degrees . Then,
Intuitively, the idea of this theorem is that the problem of multiplying matrices of size reduces to multiplication of two elements in together, which itself reduces to a problem of multiplying a collection of matrices of size , each of which should take approximately operations. So, multiplying two matrices of size has complexity of order at most , which means that this sum is an upper bound for . Actually proving the theorem is more difficult, so we won’t show it here, but this intuition is the most important part.
Importantly, notice that if were equal to 2, then this would imply that (because ). However, , so we need to control the character degrees of to get non-trivial bounds on . Define to be the number such that is the maximum character degree of . An important corollary of Theorem 2 is:
Corollary 1: Let be a finite group. If , then
Proof: Let be the character degrees of . Then, by theorem 4.1,
This implies . Dividing by (which is positive by assumption) gives us our result.
A special case of this corollary is given by
Corollary 2: If there exists a sequence of finite groups such that as and , then the exponent of matrix multiplication is 2.
While these are weaker versions of our original theorem, they only require knowledge of , rather than all of the character degrees. Finally, we end with an open problem in representation theory whose positive answer could lead to a solution to our original problem:
Open Problem: For integers not all 1, does there exist a finite group that realizes and has character degrees such that
While this is not a necessary condition, a positive answer to this could lead to a proof that when combined with our theorem. This is because one limitation of our theorem is that unless we can rule out the case that in theorem 4.1, that theorem is satisfied trivially. So, for us to be able to show that a group gives us a non-trivial improvement, we have to show that for any , there exists a group realizing such that , which the open problem implies.
While it is an open problem whether these groups exist in general, there are a few concrete examples where such a group exists, such as the one that follows:
Let and let , where the group action of on is given by switching the two factors. If we let be the generator of , we can write the elements of as with and . Let be the three copies of in , viewed as subgroups, and let for notation. Then, we can define as
One can show, although the proof is long, that these sets satisfy the triple product property. Furthermore, the character degrees of are all at most 2, since is an abelian subgroup of index 2 in and the maximum character degree of a group is at most the index of any abelian subgroup in the group. Since the sum of the squares of the character degrees is , the sum of the cubes of the degrees is at most . However, , meaning that for ,
As demonstrated in [1], expander graphs are a sparse graph that mimic the structure properties of complete graphs, quantified by vertex, edge or spectral expansion. At the same time, expander constructions have spawned research in pure and applied mathematics, with several applications to complexity theory, design of robust computer networks, and the theory of error-correcting codes.
This lecture is organized as follows. We introduce two different definitions of expanders in the first section, and then explore the relationship between them in second part.
17.1. Two Definitions
We start our discussion in expansion with two definitions of expansion: combinatorial and spectral. In the rest of this post, we consider an undirected graph with , where self loops and multiple edges are allowed. Given two subsets , the set of edges between and is denoted by .
17.1.1. Combinatorial expansion
Intuitively, a graph is an expander if subsets expand outwards into . In other words, all small subsets will have large boundary, respectively. Follow this intuition, we introduce the definition of boundary, based on edges and vertices respectively, to quantify the corresponding expansion.
Definition 1(Boundary):
The Edge Boundary of a set , denoted by , is the set of edges , which is the set of edges between disjoint set and .
The Vertex Boundary of a set is a subset of vertices in adjacent to vertices in , denoted as .
Obviously, vertex boundary is the same as the number of neighboring vertices of vertex sets . With those notations, we then define the expansion over the entire graph via maximizing over subsets .
Definition 2(Expander): For some , a graph with is an
-edge-expander if .
-vertex-expander if .
Remark 3: The reason for the restriction is that we are examining the relative size of every cut in the graph, i.e., the number of edges crossing between and , and we examine each case only once.
Problem 1: Show that the complete graph is
at least a -edge-expander.
a -vertex-expander. (We only consider here.)
From the definition above, the maximum , which makes an -edge-expander, is an important criteria, leading to the definition of Cheeger constant.
Definition 4(Cheeger Constant): The Edge Expansion Ratio, or Cheeger constant, of graph with , denoted by , is defined as
Remark 5: Actually, Cheeger constant can be defined alternatively as . We didn’t follow this definition since we want to keep .
17.1.2. Spectral Expansion
To simplify our discussion, we focus on regular graphs in the rest of this lecture. Let denote the adjacency Matrixof a -regular graph with and being real symmetric, then has real eigenvalues, denoted by , and corresponding orthonormal eigenvectors with . The spectrum of encodes a lot of information about the graph structure. Recall the following properties, which we have seen in previous lectures.
The largest eigenvalue is and the corresponding eigenvector is .
The graph is connected if and only if .
The graph is bipartite if and only if .
The definition of spectral expansion and spectral gap then follows.
Definition 6(Spectral Expansion): A graph with is a -spectral-expander if .
In other words, the second largest eigenvalue of , in absolute value, is at most . This also gives rise to the definition of Ramanujan graph.
Definition 7(Ramanujan graph): A connected -regular graph is a Ramanujan graph if .
We now refer to the spectral gap.
Definition 8(Spectral Gap): The spectral gap of -regular graph is .
Remark 9: Sometimes the spectral gap is defined as , which is often referred as one-sided.
Problem 2:
Show that the complete graph has spectral gap . Actually, we can also show that the one-sided spectral gap is .
Show that the complete bipartite graph has spectral gap . (Hint: use the spectral property above.)
17.2. Relating Two Definitions
It is not so obvious that the combinatorial and spectral versions of expansion are related. In this section, we will introduce two relations between edge and spectral expansion: the Expander-Mixing Lemma and Cheeger’s inequality.
17.2.1. The Expander-Mixing Lemma
Lemma 10(Expander-Mixing Lemma[2]): Let be a -regular graph with vertices and eigenvalues . Then for all :
As we can see, the left-hand side measures the deviation between two quantities:
: the number of edges between the two subsets and .
: the expected number of edges between two subsets and , drawn from an Erdos–Renyi random graph .
The Expander-Mixing lemma tell us a small (a.k.a. large spectral gap) implies that this discrepancy is small, leading to the fact that the graph is nearly random in this sense, as discussed in [1]. We should mention that the converse of expander-mixing is true as well, stated as follows.
Lemma 11(Expander-Mixing Lemma Converse [4]): Let be a -regular graph with vertices and eigenvalues satisfying:
for any two disjoint subsets and some positive . Then .
Proof of Expander-Mixing Lemma: Recall that the adjacency matrix has eigenvalues with corresponding orthonormal eigenvectors . Let and be the characteristic vectors of subsets and (a.k.a., the -th coordinate of is if and otherwise). We then decompose and in the orthonormal basis of eigenvectors as
.
Observing that , we expand out the right hand side and apply orthogonality, which yields
.
where the last step follows from the facts with corresponding , and . Shifting the first term on right hand side to the left, the desired result follows from Cauchy-Schwarz inequality:
Remark 12: If we normalize the adjacency matrix and obtain , where each entry of is divided by , then the corresponding eigenvalues of , namely , lie in the interval . The Expander-Mixing Lemma is in the form of
It is sometimes convenient to consider the normalized spectrum. For example in random graph theory, it would be convenient to look at the normalized spectrum when the expected degree . However, we can focus on the spectrum of $\latex A = A(G)$ without normalization when the expected degree $\latex d = O(1)$.
Regular -spectral expanders with small (a.k.a. large spectral gap) have some of significant properties, listed as follows.
Corollary 13: An independent set , in a graph with , is a set of vertices , where no two vertices in are adjacent, i.e. with . An independent set , in a -regular -spectral expander , has cardinality at most , i.e., , which follows immediately from Lemma 10(Expander-Mixing Lemma[2]).
Corollary 14: Given a graph with , the diameter of is defined as , where is the length of shortest path between and . If is a -regular -spectral expander with , then we have
17.2.2. Cheeger’s Inequality
As we have seen from previous discussion, spectral expansion implies strong structural properties on the edge-structure of graphs. In this section, we introduce another famous result, Cheeger’s inequality, indicating that the Cheeger constant(Edge Expansion Ratio) of a graph can be approximated by the its spectral gap .[6]
Theorem 15(Cheeger’s inequality, [3]): Let be a -regular graph with eigenvalues , then
.
In the normalized scenario with eigenvalues , we have
.
Remark 16: The proof for the normalized scenario is different from the regular scenario. We should start from the desired quantity and finish the calculation step by step. However, the strategy are the same. We use the Rayleigh Quotient to obtain upper bound and lower bound.
In the context of , the estimate of spectral gap was given by Nilli Alon.
Theorem 17([5]): For every -regular graph , we have
The term is a quantity that tends to zero for every fixed as .
[1]. Shlomo Horry, N. L. and Wigderson, A. (2006). Expander graphs andtheir applications.BULLETIN (New Series) OF THE AMERICAN MATHEMATICAL SOCIETY, 43(4).[2]. Alon, N. and Chung, F. R. K. (1988). Explicit construction of linear sized tolerantnetworks.Discrete Mathematics, 42 [3]. Cheeger, J. (1970). A lower bound for the smallest eigenvalue of the laplacian.Problems inanalysis, 26.[Nilli, 1991] Nilli, A. (1991). On the secon[4]. Yonatan Bilu, N. L. (2006). Lifts, discrepancy and nearly optimal spectral gap.Com-binatorica, 26[5]. Nilli, A. (1991). On the second eigenvalue of a graph.Discrete Mathematics, 91(2):207–210.[6]. Lovett, S. (2021). Lecture notes for expander graphs and high-dimensional expanders.
The PageRank is a search algorithm based on the linkage structure of the web, which Larry Page and Sergey Brin introduced. They built the search engine Google to test the utility of PageRank for search. (Origin of the name “Google”)
The goal of this lecture is to construct Google matrix and explore how to find PageRank vector. Let’s start by making an adjacency matrix based on web linkage.
Let be a webpage, and let be the set of pages points to and be the set of pages that point to . Consider each webpage as vertices and link as directed edge. We can construct an adjacency matrix if there exist links from to , otherwise where are webpages. For example, if webpages have linkage in Figure 1, A is defined as follows.
Figure 1. A simple example consists of 5 webpages
We can observe that the last column of A is a zero vector because does not have forward links. We call such a node a dangling node.
Let be the number of links from . Then column normalized adjacency matrix of is defined by . denotes column corresponding page . In our example,
This matrix is not directly used in the PageRank. To compute PageRank vector, we will use the power method, an iterative procedure, has convergence problem. The procedure can cycle, or the limit may be dependent on the starting vector. To avoid this problem, we need to have a left stochastic matrix.
Like in our example, it can be possible that is not stochastic because of dangling nodes. Therefore, define stochastic matrix where is a vector of all ones, is the number of web pages, and if , otherwise .
When you do web surfing, though there are no forward links (dangling nodes), we may visit other new web pages. The matrix add the situation with uniform probability for all web pages.
Problem 1: Find for our example.
Then by the Gershgorin circle theorem, the largest eigenvalue of is less than or equal to 1. Since det, we know there is the largest eigenvalue . We want to find the dominant eigenvector of , which is PageRank vector. For finding eigenvector, PageRank uses Power Method.
Some of you may already realize, finding PageRank vector is finding stationary distribution (See Lecture 5, Lecture 6) of random walks on the web graph. However, if the stationary distribution vector is not unique, we cannot get the exact PageRank vector. This problem can be solved by Perron–Frobenius theorem. If is irreducible, eigenvector corresponding eigenvalue 1 is unique by Perron-Frobenius.
Definition 1: A matrix is primitive if it is non-negative and its th power is positive for some natural number .
Definition 2: An matrix is a reducible matrix if for some permutation matrix , such that is block upper triangular. If a square matrix is not reducible, it is said to be an irreducible matrix.
For more knowledge related to primitive and irreducible, see [6].
Problem 2: Every primitive matrix is irreducible.
We can construct a primitive matrix where and . In reality, Google use ([1], [7]). Since is primitive, so it is irreducible, we can get the unique PageRank vector by Power Method. We call this matrix as Google matrix. Such is called damping constant, this reflects a person who is randomly clicking on links will eventually stop clicking. In our example, is
when .
Here, we call the unique largest eigenvalue of as Perron root (for Google matrix ), and is Perron vector if and . has unique dominant eigenvalue implies that there is the unique PageRank vector. (If you want to check convergence, see [4], p. 179)
Finally, Power Method on ([2]), recursively compute , finds the limit of , say . This is PageRank vector. That is, PageRank of page is where is Perron vector and . Note that is a probability vector reflects the possibility of click each webpage.
Definition 1. A hypergraph is a pair where is a finite set and is a nonempty collection of subsets of . is called -uniform if . is called a graph if it is 2-uniform.
Our goal for this lecture is to explore the rudiments of the spectral theory of -uniform hypergraphs. The natural starting point, if only for inspiration, is the case of graphs. Letting be a graph, and identifying , we can begin by writing down the adjacency matrix of defined by
where . In this case are said to be adjacent.
The spectral theory of is rich. For example, it made an appearance in our 202C class for the special case where is the Cayley graph of a group. Our main focus however will be the matrix , defined by
where the degree is the number of edges containing . If is the diagonal matrix of degrees, then we may also write . This matrix is called the Laplacian or Kirchoff matrixof . The terminology arises from the fact that as an operator on , displays various properties canonically associated with the Laplacian on ; e.g., mean value property, maximum principle, etc.. A very interesting paper of Wardetsky, Mathur, Kälberer, and Grinspun[6] explores these connections in detail, but they are outside of the scope of this lecture.
A few facts are worth collecting here- first, is a symmetric, positive semidefinite matrix in and hence admits nonnegative eigenvalues . always has zero as an eigenvalue, for example, via the vector . But the multiplicity of said eigenvalue could be higher, depending on the number of connected components (Exercise 1). However when we assume is connected (i.e. contains a single connected component), it follows that . In much of the literature, we are often interested in the interplay and connections between geometric and combinatorial properties of – things like its size, connectedness, density, or sparseness, for example; and the first eigenvalue (either of or one of its many variants). For example…
Exercise 1. (a) Show that the geometric multiplicity of the null eigenvalue of is the number of connected components of . (b) Show with equality if and only if