First up, a list of preliminary notations and definitions of terms that recur throughout this blog post.
denotes the -dimensional hypercube graph whose vertex set consists of all strings in and two vertices are adjacent iff the corresponding strings differ exactly in one-coordinate.
For an undirected graph , the maximum degree of a vertex in is denoted by and denotes the largest eigenvalue of the adjacency matrix of .
For and a subset of indices from , the string obtained from by flippling all entries of corresponding to the indices in is denoted by .
Any function is called a boolean function.
Definition (Local sensitivity and sensitivity). Given a boolean function , the local sensitivity on the input , denoted by , is defined as the number of indices , such that , and the sensitivity of is .
Definition (Local block sensitivity and sensitivity). Given a boolean function , the local block sensitivity on the input , denoted by is defined as the maximum number of disjoint blocks of such that for each , and the block sensitivity of is .
The sensitivity conjecture and the tale of three theorems
(Sensitivity conjecture) There exists an absolute constant , such that for every boolean function , .
Sensitivity and block sensitivity are examples of complexity measures for boolean functions. Two complexity measures and are said to be polynomially related to each other if there exist polynomials such that for any Boolean function , we have and . The block sensitivity is known to be polynomially related to other complexity measures of boolean functions such as the degree (to be defined later). However, the question (a.k.a the sensitivity conjecture) of whether the block sensitivity is polynomially related to the sensitivity was posed as an open problem by Nisan and Szagedy in 1989 (Note that by a direct consequence of their definitons. The open problem was showing whether for some ). A few years later in 1992, Gotsman and Linial [2] showed the equivalence of the sensitivity conjecture to another open problem on the hypercube. Finally in 2019, Hao Huang resolved the sensitivity conjecture by leveraging on the equivalence shown by Gotsman and Linial.
Cue harmonic analysis on finite abelian groups
In lecture post 7 by Prof. Novak, we saw that is isomorphic to the group , i.e. bitstrings of length (Recall that denotes the subgroup of generated by the n disjoint transpositions Thus, a boolean function , can be identified with a member of the Hilbert space and the rich tools of harmonic analysis on discussed in lecture posts 7-9 are applicable to boolean functions. Note that the hypercube is the subgraph of the Caley graph of induced by the subgroup . Thus, the sensitivity conjecture can be rephrased in the language of induced subgraphs of the hypercube and this is exactly the essence of the work done by Gotsman and Linial in 1992.
Before we dive into the details, a few (post)-preliminary notations and remarks.
Given an induced subgraph of , let denote the subgraph of induced on the vertex set .
For , the function is defined as , with the convention that for all where denotes the empty set.
In lecture post 7 (and also in the qual exam) we saw that the functions (which exactly correspond to the characters of since is isomorphic to ) form an orthogonal basis for under the inner product .
Thus, given a boolean function , by Fourier expansion, can be written as , where denotes the Fourier coefficients.
The average of a boolean function on is denoted as . Note that since for all , we have the identity which is quite useful when proving theorem 1 (see below).
Definition. Given a boolean function , the degree of is defined as .
Theorem 1 [Gotsman and Linial]. The following two statements are equivalent for any monotone strictly increasing function . a. For any induced subgraph of with , we have . b. For any boolean function , we have that .
Gotsman and Linial show that proving the equivalence of statements a and b above is equivalent to proving the equivalence of yet another two statements and then proceed to prove this second equivalence.
Proof of Theorem 1. First, we show the equivalence of statement a to following statement a’:
a’. For any boolean function , implies there exists an input such that .
a’ implies a : Given an induced subgraph of , it can be associated with a corresponding boolean function defined such that iff . By a direct consequence of the definitions of adjacency in and the local sensitivity of for and for (Note that here denotes the degree of the vertex in the subgraph and is not to be confused with the degree of a boolean function that was defined earlier). Thus, . Let denote the average value of on , i.e. . Note that implies . Thus, statement a’ implies .
a implies a’ : Given a boolean function , define to be the subgraph induced by the set of vertices . By an identical arument as in the previous paragraph, . Also, implies . Thus, statement a implies thereby showing the equivalent of statements a and a’.
Now, we show the equivalence of statement b to statement b’:
b’. For any boolean function , implies .
Assuming b and that the premise for b’ is true, then and since is assumed to be monotonic strictly increasing this shows that and thus b implies b’. Now, towards showing the other implication let denote a boolean function with degree . Assume without loss of generality that for . Now, consider the function defined as . By construction and , i.e has full degree as a boolean function on . Now, the contrapositive of b’ says that implies . Thus, and this finshes showing the equivalence of b and b’.
Thus, proving theorem 1 has been reduced to the task of proving the equivalence of statements a’ and b’. Now, given a boolean function , define another boolean function where is the parity function of . Note that . Thus,
Thus, for all , . Also note that for all , because every bit flip which causes a change in value of does not cause a change in function value of and vice-versa. Thus, where denotes the empty set.
a’ implies b’: Towards a contradiction assume that and . This by definition implies and thus . By a’, there exist a such that and since this implies there exists a such that which contradicts the premise that .
b’ implies a’: Again towards a contradiction assume that given such that we have that for all . Since this implies that which by 2′ implies . Note that is equivalent to and thus which contradicts the premise that and this completes the proof of theorem 1.
Theorem 2 [Huang] . For every integer , let be an arbitrary -vertex induced subgraph of , then
The proof of theorem 2 is going to be the main focus for the remainder of this blogpost but before that we discuss two corollaries which show why theorem 2 resolves the sensitivity conjecture.
Corollary 1. For every boolean function , .
Proof of Corollary 1. For any induced subgraph of with , since the total number of vertices in is , either contains atleast vertices or does. So without loss of generality, assume contains atleast vertices (If not, just replace with in the remainder of the proof). Now, theorem 2 (along with the fact that the maximum degree is non-decreasing in the number of vertices in ) implies . Let and note that it is a monotone strictly increasing function on . Now, theorem 1 implies and this completes the proof.
Theorem 3 [Tal]. For every boolean function , .
It should be noted that theorem 3 is an imporvement on the work by Nisan and Szegedy in 1992 wherein they showed that .
Corollary 2. For every boolean function , .
Note that corollary 2 follows immediately from theorem 3 and corollary 1. Thus, corollary 2 confirms the sensitivity conjecture by showing that the block sensitivity and the sensitivity of a boolean function are indeed polynomially related.
Cue matrix analysis – “A proof that came straight from THE Book”
In this section, we will focus on Hao Huang’s proof of theorem 2. The core chunk of the proof invovles a creative construction of an iterative sequence of matrices whose instance is closely related to the adjacency matrix of and an application of Cauchy’s interlace theorem which is stated as follows,
(Cauchy’s interlace theorem) Let be a symmetric , be a principal submatrix of for some . If the eigenvalues of are , and the eigenvalues of are , then for all ,
Recall that we saw in Math 202A that Cauchy’s interlace theorem can be proven via a direct application of the Courant-Fischer theorem.
Now, we move on to the iterative matrix sequence that turns out to be a bite-sized kryptonite for the sensitivity conjecture. Consider the sequence of symmetric square matrices defined iteratively as follows,
The first exercise is to study the spectrum of and the second exercise is to compute an upper bound for the largest eigenvalue of a specific type of matrix. These two exercises then set us up nicely for a clever application of Cauchy’s interlace theorem to prove theorem 2.
Exercise 1. a) Show that where denotes the identity matrix. b) Show that the trace of is . c) Conclude that the eigenvalues of are with multiplicity and with multiplicity .
Exercise 2. Suppose is a -vertex undirected graph and is a symmetric matrix whose entries are in , and whose rows and columns are indexed by the vertex set of . Also assume whenever and are non-adjacent vertices in . Then show that
The defintion of this iterative sequence of matrices is not without precedent. Note that the adjacency matrix of can also be defined in a similarly iterative manner. Recall that a vertex of is a string in . Given a -length string of -1s and 1s it naturally extends to exactly two strings in and these two -strings correspond to the only two vertices of whose corresponding strings have their first entries to be the same as the original string (note that these two vetices of are adjacent). Exapnding on this observation, the hypercube can be thought of as having two sub-hypercubes with a perfect matching connecting these two subcubes. In otherwords, consider the following sequence of iteratively defined matrices,
Then represents the adjacency matrix of and in particular, note that is equal to the matrix obtained by changing every entry of to .
Now that we have all the necessary ingredients, let’s discuss the proof of Theorem 2.
(Restated)Theorem 2 [Huang] . For every integer , let be an arbitrary -vertex induced subgraph of , then
Proof of Theorem 2[Haung].
Let be the sequence of matrices that we defined above and let be a -vertex induced subgraph of . Note that naturally induces a principal submatrix of (written with respect to an appropriate permutation of the standard basis). As discussed in the preceeding paragraph, by changing every entry of to 1, we get the adjacency matrix of . So, in particular, and satisfy the conditions of the statement in Exercise 2 and thus,
Also, since is a principal submatrix of the matrix , by Cauchy’s interlace theorem,
From Exercise 1, we know that the eigenvalues of are with multiplicity and with multiplicity . Thus, and thus,
and this completes the proof.
Closing Remarks
I will end this blog post for our Math 202C class with a quote from Scott Aaronson’s blog post about Hao Huang’s proof of the sensitivity conjecture.
Paul Erdös famously spoke of a book, maintained by God, in which was written the simplest, most beautiful proof of each theorem. The highest compliment Erdös could give a proof was that it “came straight from the book.” In this case, I find it hard to imagine that even God knows how to prove the Sensitivity Conjecture in any simpler way than this.
-Scott Aaronson
The funny thing is, a few days after Prof. Aaronson’s blogpost, his former student Shalev Ben-David posted in the comments section, a simplication of Huang’s proof which does not use Cauchy’s interlacing theorem.
Anyways, talking about the sensitivity conjecture is indeed quite a fitting end to the Math 202 series since both, Gotsman & Linial’s work and Hao Huang’s work on this 30 year old ex-open problem utilize almost exclusively, concepts which we have studied over the past three quarters.
Thanks for reading my post and have a great summer everyone!
References
[1] Huang, H. (2019). Induced subgraphs of hypercubes and a proof of the sensitivity conjecture. Annals of Mathematics, 190(3), 949-955.
[2] N. Nisan, M. Szegedy, On the degree of Boolean functions as real polynomials, Comput. Complexity, 4 (1992), 462–467.
[3] C. Gotsman, N. Linial, The equivalence of two problems on the cube, J. Combin.Theory Ser. A, 61 (1) (1992), 142–146.
[4] F. Chung, Z. F¨uredi, R. Graham, P. Seymour, On induced subgraphs of the cube, J. Comb. Theory, Ser. A, 49 (1) (1988), 180–187.
[5] A. Tal, Properties and applications of boolean function composition, Proceedings of the 4th conference on Innovations in Theoretical Computer Science, ITCS ’13, 441–454.
[6] Karthikeyan, Rohan & Sinha, Siddharth & Patil, Vallabh. (2020). On the resolution of the sensitivity conjecture. Bulletin of the American Mathematical Society. 57. 1. 10.1090/bull/1697.
In this lecture, we will introduce the Fourier transform on finite Abelian group and a discrete version of “uncertainty principle” based on the chapter 2 of “Discrete Harmonic Analysis” by Ceccherini-Sillberstein[1]. In quantum mechanics, there is a well-known uncertainty principle saying that the more precisely the position of some particle is determined, the less precisely its momentum can be predicted from initial conditions, and vice versa. For finite Abelian group, there is a theorem that assimilates this uncertainty principle, so people refer to this group-theoretic version as “uncertainty principle” too. However, our focus today will be Tao’s uncertainty principle for cyclic group of prime order[2], which is a refinement of the discrete uncertainty principle due to the special structure of the group.
First, let us introduce some notations and redefine the Discrete Fourier transform as Andrew did in Lecture 18. In fact, these materials already appeared in Lecture 8, where we studied the concrete example of the the rank hypercube group .
Let be a finite set and denote by the vector space of all complex -valued functions defined on . Then , where denotes cardinality.
For we denote by the Dirac function centered at , that is , the element defined by if and if for all .
The set is a natural basis for and if then .
The space is endowed with the scalar product defined by setting
for , and we denote by the norm of . Note that is an orthonormal basis of with respect to .
Let be a finite Abelian group. A character of is a map such that
for all . The set of all characters of is an Abelian group with respect to the product defined by for all . is called the dual of .
For , where is the cyclic group of order , , . Let . For , define by the function .
Exercise 1: Check that for , is indeed a character of , and is isomorphic to .
Now we define the Discrete Fourier transform. Let be a finite Abelian group. The Fourier transform of a function is the function defined by
for all . Then is called the Fourier coefficient of with respect to . When and , we call the Discrete Fourier transform (DFT) of . Specifically, for and , the DFT of is
The uncertainty principle is the following (see Theorem 2.5.4 of [1]): Theorem (Uncertainty principle): Let and suppose that . Then
Now we come to the main part of this lecture: Tao’s uncertainty principle for cyclic group of prime order. This improves the inequality in the above discrete uncertainty principle when the finite Abelian group is cyclic with prime order . To prove the theorem, we present some specific tools developed in Tao’s paper [2].
The main theorem in [2] is the following: Theorem (Tao’s uncertainty principle): Let be a prime number and non-zero. Then
Conversely, if are two subsets such that , then there exists such that and .
To prove the theorem, we present some specific tools developed in Tao’s paper [2].
Recall that denotes the ring of polynomials with integer coefficients.
Proposition 1: Let be a polynomial in the variables with integer coefficients. Suppose that for some ,
Then there exists a polynomial with integer coefficients such that
.
Remark: The proof of this proposition can be omitted for the first reading. Instead, considering and trying to find corresponding by using this proposition would be helpful for understanding.
Proof: For the sake of simplicity, suppose that and so that . Let us denote by (respectively ) the sum of the monomials of with positive (respectively negative) coefficients so that
Note that
since . This implies that there exists a bijection between the monomials in and those in . More precisely, let us fix and ; then the monomial appears in if and only if appears in . Suppose this is the case. Then we can find and non-negative integers such that the sum of the monomials of whose variables have degree for and is
and
but also such that
(because otherwise there would be a cancellation). Thus, with every monomial such that we can (arbitrarily but bijectively) associated a monomial with and . Now for we have the identity
Exchanging with we get the analogous identity for . This shows that
is divisible by .
Repeating the argument for each monomial with and appearing in , we deduce that is divisible by .
Lemma 1: Let be a prime, a positive integer, and a polynomial with integer coefficients. Suppose that are (not necessarily distinct) th roots of unity such that . Then is divisible by . Proof: Letting we can find integers such that for .
Define the polynomials by settinig
where . Then and since we deduce that is a multiple of the minimal polynomial of , that is, for some , where we used a fact in undergraduate algebra that the minimal polynomial of is . It follows that .
Theorem 3(Chebotarev): Let be a prime and . Let (respectively ) be the distinct elements in . Then the matrix
is nonsingular.
Proof: Set for . Note that the s are distinct th root of unity and . Define the polynomial with integer coefficients by setting
As the determinant is an alternating form, we have whenever , so that, by recursively applying Proposition 1, we can find a polynomial with integer coefficients such that
To prove the theorem, it is equivalent to show that (because s are all distinct) so that, by Lemma 1 it suffices to show that does not divide . For this we need the next three lemmas to complete this proof. Let us first introduce some notations.
Given an -tuple of non-negative integers, we say that the (monomial) differential operator
is of type and order .
Lemma 2: Let be a differential operator of type and and two polynomials. Then
where the sum runs over all pairs such that (componentwise) (and therefore ). Proof: Left as as exercise.
Exercise 2: Prove Lemma 2. (Hint: use induction on the order of .)
Lemma 3: For and we have
where runs over all with and the s are non-negative integers such that . In particular,
with . Proof: Proof by induction on . For the details see Lemma 2.6.15 of section 2.6 of [1].
Lemma 4: Let , that is,
Then if and are given by
we have
Proof: By Lemma 2 and 3, we have
with . Here the first term is obtained by acting the whole on (the second term of ). In particular, taking for we complete the proof.
Now we can finish the proof of Theorem 3 with these lemmas.
Proof of Theorem 3 (continued): For as in Lemma 4, we have (where denotes the symmetric group of degree )
since
for all . Thus,
is the Vandermonde determinant. Since for , we deduce that is not divisible by because . Thus, by Lemma 4, is not divisible by either. By Lemma 1, this completes the proof of Theorem 3.
Given a non-empty subset and a function , we denote by its extension defined by setting for all . For simplicity, we regard the DFT as a map . In other words, for and ,
Proposition 2: Let be a prime. Let such that . Then the linear map defined by is invertible.
Proof:
Set and and consider the basis of (respectively, of ) consisting of the Dirac functions with (respectively, with ) and let . Then we have
By virtue of Theorem 3, we have
,
showing that is indeed invertible.
Finally, we can prove the main result, Theorem 2.
Proof of Theorem 2: Suppose by contradiction that there exists non-zero such that and with . Then we can find a subset such that and . We deduce that is identicially zero. Since , this contradicts the injectivity of (c.f. Proposition 2).
Conversely, let be two subsets such that . Let such that and . Note that . Taking cardinalities, , which yields
Consider the map . By Proposition 2, we can find such that so that vanishes on but . Setting we clearly have and . By the first part of the theorem,
so that all the inequalities are in fact equalities, i.e., and . This completes the second part of the theorem.
An immediate consequence of Theorem 2 is that any sparse polynomial with non-zero coefficients and , when restricted to the th roots of unity , can have at most zeroes. Indeed, such a polynomial is essentially the Fourier transform in of a function whose support has cardinality , and so the support of the polynomial must contain at least th roots of unity by Theorem 2, and the claim follows.
Andrew Dennehy have already introduced the concept of the discrete Fourier transform (DFT) to us in Lecture 18, but I would like to retake the path with more details, because there are some other concepts (which help for fully understanding) I would like to talk about.
We mainly talk about how fast Fourier transform (FFT) and inverse fast Fourier transform (IFFT) work, we sightly talk about the calculation error.
We will see how FFT and IFFT work from both the perspective of math and computer, along with their applications and some specific problems. Some problem may involve the number theoretic transform (NTT), which is recognized as the integer DFT over finite field.
We use Python3 for code examples, since we would not use version related feature, the specific version does not matter. (For instance, Python3.5.2 and Python3.8.1 are both ok.) We would call Python instead of Python3 in the following content. (If you do not install Python, there are many online interpreters that can run Python codes, for instance, Try It Online.) Recommend to view this page with computer for getting the best experience. Recommend to try the codes and solve some problems by yourself, which definitely would be challenging and interesting.
There are only 2 exercises in this lecture, other problems are interest based algorithm problems (no need to do as homework), which might be difficult to solve if you are not very familiar with the FFT and NTT in this form, but I will give hints to guide you.
Now let us start with an easy example.
Example 1: We consider two polynomials
Multiply them together (using distributive property), we would get
Definition 1 (coefficient representation): For a polynomial , the list is its coefficient representation. We denote as the coefficient of in .
Use definition 1, we can write as , as and as .
The naïve polynomial multiplication function is not hard to get:
def naive_poly_mul(P1, P2):
Q = [0] * (len(P1) + len(P2) - 1)
for i in range(len(P1)):
for j in range(len(P2)):
Q[i + j] += P1[i] * P2[j]
return Q
In the general case, i.e.,
it is easy to see that the complexity of the naïve polynomial multiplication function is . Note that (count from to ). If we consider the specific condition , then the complexity becomes .
Definition 2 (degree of polynomial): The degree of a polynomial is the highest of the degrees of the polynomial’s monomials (individual terms) with non-zero coefficients.
In Example 1, are both -degree polynomials.
Definition 3 (value representation): Except representing a th degree polynomial with coefficients, we can also represent a th degree polynomial with proper (see below Exercise) points on the polynomial, which is called the value representation.
Exercise 1 : Prove that points with distinct determine a unique polynomial of degree . Hint: use the fact that a Vandermonde matrix is nonsingular without proof.
Back to Example 1, to get , we can first get from with distinct , then the points just represent .
When the degree of are both , we can see that the multiplication here just needs complexity .
However, at most time, we only need the coefficient representation, because it is more suitable to calculate the values in its domain. It is not hard to figure out we can do the multiplication in this way:
If we only consider the situation that we aim to multiply two th degree polynomials, the multiplication part only costs , so the bottleneck of the complexity lies in the algorithm changing the coefficient representation into the value representation and the algorithm changing it back.
The naïve way to get the value representation of a coefficient represented polynomial cost at least . (To get each value we need , and we totally need values.)
The essential idea is selecting specific points to reduce the calculation cost.
The straight thought would be looking at the parity of a function. Because for any odd function , we have while for any even function , we have .
Actually, we can divide any polynomial into one odd function plus one even function. Take a look back to in Example 1, we can write
Notice that is actually an even function, but write in this form allows us to take as the variable. It follows that
Note that has degree and has degree while has degree . Once we get and , we would immediately get and .
What we only need to do is recursive process and . It would lead to an recursive algorithm.
However, we would encounter the problem that the symmetric property could not be maintained further (we can pair and , but how to pair ).
As we already see in the previous Lecture, we can use the roots of unity to solve this problem. Denote . Note that .
To make it easier to understand, we now only consider for some positive integer , we would evaluate polynomial at , then , next , etc. Just as the figure below.
Diagram 1
We pair each with . For any polynomial , we can get fast by evaluating and recursively. Recall that we divide as . The corresponding formula is
This leads to the naive FFT code:
from math import pi
from math import sin
from math import cos
def get_omega(n):
theta = 2 * pi / n
omega = cos(theta) + sin(theta) * 1j
return omega
def naive_FFT(P):
# P is the coefficient representation
# P = [a_{0}, a_{1}, ..., a_{n-1}]
# current we assume n = 2^{k}
n = len(P)
half_n = n // 2
if n == 1:
return P # constant function
omega = get_omega(n)
P_even, P_odd = P[::2], P[1::2]
V_even, V_odd = naive_FFT(P_even), naive_FFT(P_odd)
V = [0] * n # the value representation
for j in range(half_n):
V[j] = V_even[j] + omega ** j * V_odd[j]
V[j + half_n] = V_even[j] - omega ** j * V_odd[j]
return V
Feed in example 1 as input, we would get
Now we can apply Fourier transform to a coefficient representation to get the corresponding value representation, and the multiplication in the value representation form is easy to implement, what remains to solve is the inverse Fourier transform.
In the matrix-vector form for , we have
Note that the character table of is just as the form of the DFT matrix:
To get the inverse Fourier transform, we can just use the inverse of the matrix:
Exercise 2: Check that the inverse DFT matrix is the inverse of the DFT matrix.
The difference between DFT and FFT is just the way of calculating these matrix-vector forms, where DFT uses the direct matrix-vector multiplication way in and FFT uses the tricky recursive way to achieve .
The inverse DFT matrix leads to the naive IFFT code:
def naive_IFFT(V, is_outest_layer=False):
# V is the value representation
# w means omega_{n}
# V = [P(w^{0}), P(w^{1}), ..., P(w^{n-1})]
# current we assume n = 2^{k}
n = len(V)
half_n = n // 2
if n == 1:
return V # constant function
omega = 1.0 / get_omega(n) # omega_{n}^{-1}
V_even, V_odd = V[::2], V[1::2]
P_even, P_odd = naive_IFFT(V_even), naive_IFFT(V_odd)
P = [0] * n # the value representation
for j in range(half_n):
P[j] = P_even[j] + omega ** j * P_odd[j]
P[j + half_n] = P_even[j] - omega ** j * P_odd[j]
if is_outest_layer:
for j in range(n):
P[j] /= n
return P
Use in example 1, we would get
If we ignore the little error, this is just the coefficient representation of .
The following materials might not be that clear and might not be easy to understand, but I will try my best. (Some material cannot be expanded too much, otherwise that would cost too much space and might confuse the main part.)
The lowest layer of Diagram 1 is just the bit-reversal permutation index and there is a neat code to generate:
def get_BRI(length):
# Bit-Reversal Index
n = 1
k = -1
while n < length:
n <<= 1
k += 1
BRI = [0] * n
for i in range(n):
BRI[i] = (BRI[i >> 1] >> 1) | ((i & 1) << k)
return BRI
It is more easy to see in a tabular (an example of -length BRI)
Use this we can implement the Cooley–Tukey FFT algorithm, which is the most common FFT algorithm. Further, with proper manner of coding, we can devise an in-place algorithm that overwrites its input with its output data using only auxiliary storage, which is called the iterative radix-2 FFT algorithm. Moreover, since the form of FFT and IFFT are actually very similar, we can integrate them together.
def FFT(X, length, is_inverse=False):
# X : input, either coefficient representation
# or value representation
# length : how much values need to evaluate
# is_inverse : indicate whether is FFT or IFFT
inverse_mul = [1, -1][is_inverse]
BRI = get_BRI(length)
n = len(BRI)
X += [0] * (n - len(X))
for index in range(n):
if index < BRI[index]: # only change once
X[index], X[BRI[index]] = X[BRI[index]], X[index]
bits = 1
while bits < n:
omega_base = cos(pi/bits) + inverse_mul * sin(pi/bits) * 1j
j = 0
while j < n:
omega = 1
for k in range(bits):
even_part = X[j + k]
odd_part = X[j + k + bits] * omega
X[j + k] = even_part + odd_part
X[j + k + bits] = even_part - odd_part
omega *= omega_base
j += bits << 1
bits <<= 1
if is_inverse:
for index in range(length):
X[index] = X[index].real / n
# only the real part is needed
return X[:length]
Note that we could ignore the return part, since is already changed. This algorithm would extend the input length to its closest larger bit number (of form ), but under most condition, we would take the length as before we use this algorithm (adding ‘s).
Because we use the complex number to implement the FFT algorithm, we can see that the error is hard to eliminate. Even though the initial polynomial is integer based, apply FFT to it, then apply IFFT, we would get a decimal list with some calculation error.
If we do the FFT in the field , where , denote as a primitive root modulo , then is a cyclic group of order , we can replace with to do the FFT. This method is called NTT, since only integers are involved, the errors are not possible to appear.
For arbitrary modulo , the aiming NTT length , we can take a set of distinct NTT modulo satisfies
do NTT respectively on all , then use the Chinese remainder theorem to combine them together getting the final result modulo .
Note that during the NTT algorithm, the maximum intermediate value would not exceed .
We may say the FFT algorithm solves the convolution in the form of
Use the NTT with some modulo of the form to calculate all for in time complexity .
(Not mandatory) Problem 2:PE 537. Hint: If denote as the list of the value of , i.e., , then the convolution of and , named , is the list of the value of , i.e., , then the convolution of and , named , is the list of the value of , i.e., , etc. You can consider as the generating function. You might need to learn how to sieve prime numbers and use fast exponentiation.
in time complexity , where is some binary operation, usually is bitwise OR, bitwise AND, and bitwise XOR.
Using FFT, we can do a lot of things for polynomials fast, for instance, for a polynomial , we can find a polynomial , such that , this is called the inverse of under modulo . The basic idea is to find the inverse polynomial under modulo , then , etc. Because the inverse polynomial under modulo is trivial, we can solve this recursively. Similar idea may be apply to the Newton’s method under modulo , specifically, we can find the square root of a polynomial under modulo .
In the above figure, calculates each hidden layer value using the input layer with weight , sums all the hidden layer with weight . Note this is an integral formula, we work on a continuous condition, which means that the hidden layer has infinite width (the hidden layer is considered to have infinite nodes).
Written by a countably infinite number of monkeys with an uncountably infinite number of typewriters, edited by Andrew Dennehy
In a bold twist on the standard formula, in this lecture post we will be discussing math. Specifically, we will be discussing the Cohn-Uman algorithm and how representation theory can provide us an algorithm to multiply two matrices together. We will then see how this perspective could potentially help us prove an important theorem in complexity theory, namely that there exists an algorithm which computes the product of two matrices in time for any . Before we get to that, though, we will first cover an algorithm with many similarities to the one we will be discussing.
Consider the problem of multiplying two polynomials
The naïve approach would be the one that’s taught in high school, namely finding all the terms and using them to calculate the coefficients of . However, we’re significantly cooler than high school students, so we’re going to use some significantly cooler math. First, we introduce the concept of the discrete Fourier transform. Let be the -th root of unity. The DFT of a vector given by
Equivalently, if is the -th component of ,
So, if we apply this transformation to the vector of coefficients of , we get that
So, the problem of finding the Fourier coefficients of is equivalent to the problem of evaluating at . From now on, assume is a power of 2, since if it isn’t we can just add a bunch of 0 coefficients until we have a power of 2. Define the following two polynomials:
This lets us decompose as
So, instead of evaluating at all those powers of , we can instead evaluate and at and then combine the results. So, we have reduced the problem of plugging numbers into a degree polynomial into plugging numbers (specifically the -th roots of unity) into two degree polynomials. Because we’ve assumed that is a power of 2, so is , meaning we can repeat the same algorithm for and . We can apply this process repeatedly, which gives us a recursive algorithm with complexity for calculating the coefficients. Similarly, one can show that we can invert the DFT by applying the same process to the polynomial
with replaced with and then dividing the result by . The upshot of all of this work is that
So, we can calculate the Fourier coefficients of by calculating the coefficients of each polynomial independently and multiplying them together. Since this has linear complexity and both the Fourier and Inverse Fourier transforms have complexity, this method gives us an complexity algorithm to multiply two degree polynomials.
Notice that instead of using the roots of unity, we could have instead used a generator for and worked over with identical results. So, we can frame multiplication of polynomials as a special case of of multiplying elements in .
Next, we take a slight detour to discuss some group theoretic concepts. For any subset of a group (not required to be a subgroup), let the right quotient set of be defined to be
Exercise 1: Assume that and . What does this imply about ? Show that the condition that this implies is actually equivalent to , so the implication goes both ways.
Definition 1: A group realizes if there are subsets such that and for , if
then . This condition is called the triple product property. Furthermore, this is equivalent to the condition that if , then .
Although it isn’t entirely trivial, one can show that if realizes , then it realizes any permutation of those numbers.
Exercise 2: Prove that realizes .
This is the “trivial realization” of . We now get our first taste of why these ideas could be useful for the problem of matrix multiplication:
Theorem 1: Let be any field. If realizes , then the number of field operations required to multiply an matrix by an matrix over is at most the number of operations required to multiply two elements of
Proof: Let realize through the subsets . Then, for matrices and of sizes and respectively, we can index the rows of by , the columns of and the rows of by , and the columns of by . Then, consider the product
By the triple product property, we have that
which is true iff , , and . So, the coefficient of in the product is
So, you can just read off the elements of from these coefficients.
The process we just described, of converting matrix multiplication to multiplication in , is called the Cohn Uman algorithm. Also, we can now see the connection to the polynomial algorithm we described earlier; we’ve converted a problem of multiplying matrices to a problem of multiplying elements of , which is what we did with polynomials before using .
We can now define a measurement of the “quality” of embedding that a given group can provide.
Definition 2: The pseudo-exponent of a non-trivial finite group is the minimum of
over all (not all 1) such that realizes .
Lemma 1: For any group , . Further, if is abelian, then
Proof: is clearly at most 3, since we can just take and , meaning realizes .
To show our lower bound, suppose realizes (not all equal to 1) through the subsets . Then, the triple product tells us that the map is injective on and the image only intersects in the identity. So, with equality only occurring if . The same argument shows this is true for and , so . Therefore,
So, . Finally, if is abelian, then the product map from to must be injective (because the triple product property implies the kernel of the map is trivial), so , meaning .
Next, we want to relate to , the exponent of complexity for matrix multiplication, i.e. the smallest number such that we can perform matrix multiplication in time for any . To do this, we first recall from 202B that
where the ‘s are the dimensions of the irreducible representations of . Then the most important theorem we will discuss is as follows:
Theorem 2: Suppose that is a group with character degrees . Then,
Intuitively, the idea of this theorem is that the problem of multiplying matrices of size reduces to multiplication of two elements in together, which itself reduces to a problem of multiplying a collection of matrices of size , each of which should take approximately operations. So, multiplying two matrices of size has complexity of order at most , which means that this sum is an upper bound for . Actually proving the theorem is more difficult, so we won’t show it here, but this intuition is the most important part.
Importantly, notice that if were equal to 2, then this would imply that (because ). However, , so we need to control the character degrees of to get non-trivial bounds on . Define to be the number such that is the maximum character degree of . An important corollary of Theorem 2 is:
Corollary 1: Let be a finite group. If , then
Proof: Let be the character degrees of . Then, by theorem 4.1,
This implies . Dividing by (which is positive by assumption) gives us our result.
A special case of this corollary is given by
Corollary 2: If there exists a sequence of finite groups such that as and , then the exponent of matrix multiplication is 2.
While these are weaker versions of our original theorem, they only require knowledge of , rather than all of the character degrees. Finally, we end with an open problem in representation theory whose positive answer could lead to a solution to our original problem:
Open Problem: For integers not all 1, does there exist a finite group that realizes and has character degrees such that
While this is not a necessary condition, a positive answer to this could lead to a proof that when combined with our theorem. This is because one limitation of our theorem is that unless we can rule out the case that in theorem 4.1, that theorem is satisfied trivially. So, for us to be able to show that a group gives us a non-trivial improvement, we have to show that for any , there exists a group realizing such that , which the open problem implies.
While it is an open problem whether these groups exist in general, there are a few concrete examples where such a group exists, such as the one that follows:
Let and let , where the group action of on is given by switching the two factors. If we let be the generator of , we can write the elements of as with and . Let be the three copies of in , viewed as subgroups, and let for notation. Then, we can define as
One can show, although the proof is long, that these sets satisfy the triple product property. Furthermore, the character degrees of are all at most 2, since is an abelian subgroup of index 2 in and the maximum character degree of a group is at most the index of any abelian subgroup in the group. Since the sum of the squares of the character degrees is , the sum of the cubes of the degrees is at most . However, , meaning that for ,
The PageRank is a search algorithm based on the linkage structure of the web, which Larry Page and Sergey Brin introduced. They built the search engine Google to test the utility of PageRank for search. (Origin of the name “Google”)
The goal of this lecture is to construct Google matrix and explore how to find PageRank vector. Let’s start by making an adjacency matrix based on web linkage.
Let be a webpage, and let be the set of pages points to and be the set of pages that point to . Consider each webpage as vertices and link as directed edge. We can construct an adjacency matrix if there exist links from to , otherwise where are webpages. For example, if webpages have linkage in Figure 1, A is defined as follows.
Figure 1. A simple example consists of 5 webpages
We can observe that the last column of A is a zero vector because does not have forward links. We call such a node a dangling node.
Let be the number of links from . Then column normalized adjacency matrix of is defined by . denotes column corresponding page . In our example,
This matrix is not directly used in the PageRank. To compute PageRank vector, we will use the power method, an iterative procedure, has convergence problem. The procedure can cycle, or the limit may be dependent on the starting vector. To avoid this problem, we need to have a left stochastic matrix.
Like in our example, it can be possible that is not stochastic because of dangling nodes. Therefore, define stochastic matrix where is a vector of all ones, is the number of web pages, and if , otherwise .
When you do web surfing, though there are no forward links (dangling nodes), we may visit other new web pages. The matrix add the situation with uniform probability for all web pages.
Problem 1: Find for our example.
Then by the Gershgorin circle theorem, the largest eigenvalue of is less than or equal to 1. Since det, we know there is the largest eigenvalue . We want to find the dominant eigenvector of , which is PageRank vector. For finding eigenvector, PageRank uses Power Method.
Some of you may already realize, finding PageRank vector is finding stationary distribution (See Lecture 5, Lecture 6) of random walks on the web graph. However, if the stationary distribution vector is not unique, we cannot get the exact PageRank vector. This problem can be solved by Perron–Frobenius theorem. If is irreducible, eigenvector corresponding eigenvalue 1 is unique by Perron-Frobenius.
Definition 1: A matrix is primitive if it is non-negative and its th power is positive for some natural number .
Definition 2: An matrix is a reducible matrix if for some permutation matrix , such that is block upper triangular. If a square matrix is not reducible, it is said to be an irreducible matrix.
For more knowledge related to primitive and irreducible, see [6].
Problem 2: Every primitive matrix is irreducible.
We can construct a primitive matrix where and . In reality, Google use ([1], [7]). Since is primitive, so it is irreducible, we can get the unique PageRank vector by Power Method. We call this matrix as Google matrix. Such is called damping constant, this reflects a person who is randomly clicking on links will eventually stop clicking. In our example, is
when .
Here, we call the unique largest eigenvalue of as Perron root (for Google matrix ), and is Perron vector if and . has unique dominant eigenvalue implies that there is the unique PageRank vector. (If you want to check convergence, see [4], p. 179)
Finally, Power Method on ([2]), recursively compute , finds the limit of , say . This is PageRank vector. That is, PageRank of page is where is Perron vector and . Note that is a probability vector reflects the possibility of click each webpage.
Definition 1. A hypergraph is a pair where is a finite set and is a nonempty collection of subsets of . is called -uniform if . is called a graph if it is 2-uniform.
Our goal for this lecture is to explore the rudiments of the spectral theory of -uniform hypergraphs. The natural starting point, if only for inspiration, is the case of graphs. Letting be a graph, and identifying , we can begin by writing down the adjacency matrix of defined by
where . In this case are said to be adjacent.
The spectral theory of is rich. For example, it made an appearance in our 202C class for the special case where is the Cayley graph of a group. Our main focus however will be the matrix , defined by
where the degree is the number of edges containing . If is the diagonal matrix of degrees, then we may also write . This matrix is called the Laplacian or Kirchoff matrixof . The terminology arises from the fact that as an operator on , displays various properties canonically associated with the Laplacian on ; e.g., mean value property, maximum principle, etc.. A very interesting paper of Wardetsky, Mathur, Kälberer, and Grinspun[6] explores these connections in detail, but they are outside of the scope of this lecture.
A few facts are worth collecting here- first, is a symmetric, positive semidefinite matrix in and hence admits nonnegative eigenvalues . always has zero as an eigenvalue, for example, via the vector . But the multiplicity of said eigenvalue could be higher, depending on the number of connected components (Exercise 1). However when we assume is connected (i.e. contains a single connected component), it follows that . In much of the literature, we are often interested in the interplay and connections between geometric and combinatorial properties of – things like its size, connectedness, density, or sparseness, for example; and the first eigenvalue (either of or one of its many variants). For example…
Exercise 1. (a) Show that the geometric multiplicity of the null eigenvalue of is the number of connected components of . (b) Show with equality if and only if is the complete graph. Hint: if is not complete, relate its Laplacian and that of its “complement” (suitably defined) to the Laplacian of the complete graph, and use eigenvalue interlacing.
For a more detailed introduction to spectral graph theory, see Fan Chung’s book[1, Ch. 1].
Our next goal is to try and take these notions and structures and construct extensions, or analogues, for the case where is a -uniform hypergraph. As we shall see, this is not so easy. Consider if you desire the following perspective from simplicial homology (those unfamiliar or otherwise uninterested, skip to Definition 2).
One approach would be to try and construct not as a specific matrix, but as a descendant of an invariant of the topology of the underlying graph; which we could then hope admits a generalization to the case where the graph is instead a hypergraph. Specifically, the idea is to look at the homology of and build a Laplacian from its boundary operator- as follows.
Let be a graph and enumerate . To each unordered pair pick an orientation or and let be the set of all such oriented edges. Let be the matrix whose rows are indexed by and whose columns are indexed by the ordered edges , defined by
It is left as an (unassigned) exercise to verify that ; and in fact that the choice of orientations/signs in the preceding setup is completely arbitrary. Thinking of as a homomorphism mapping elements of the module to the module by matrix multiplication, we recognize as the boundary operator that is related to the homology groups of as a simplicial complex (see [3, Section 2.1, page 105]) and thus we have constructed as an invariant of the topology of (up to choices in the orientations of the edges).
But as it turns out, the capricious choice of orientation we made earlier to define is what makes graphs special. As soon as we allow 3-edges, or 1-edges for that matter, it is not at all obvious how to “orient” them, or how to distinguish between orientations (compare to the natural symbolic relationship ). This, in fact, is a critically important distinction between graphs and hypergraphs. The answer is that we need more than linear operators, and so we look to tensors. The exposition that follows roughly follows two papers of Qi[4,5].
Definition 2. A tensor of order and dimension over a field is a multidimensional array
where each entry . We call an -tensor over .
We can think of an -tensor as a hypercubic array of size , times). All of the tensors presented here are understood to be over . The case recovers the usual square matrices. If then we define the product by
(The expression is conventional notation.) Also as a matter of notation, for and , let be given by . There are lots of analogous notions and properties for tensors as compared to usual matrices; for example, we have the following formulation of the spectrum of a tensor.
Definition 3. Let be an -tensor. Then we say is an eigenvalue of if there exists a nonzero vector , called an eigenvector of , which satisfies
Notice that no tensor for defines a true linear operator on , and hence we may not a priori speak of eigenspaces, kernels, etc. and their linear algebraic structure. (These notions do have meaning, but they require the tools of algebraic geometry to properly formulate, and are outside of our scope.) Nevertheless, we can construct tensors associated to -uniform hypergraphs that will give rise to various eigenvalues. And if so, can we find relationships between the geometry of a hypergraph and its eigenvalues?
Definition 4. Let be a -uniform hypergraph on vertices. The adjacency tensor of is the -tensor defined as follows. For each edge ,
and likewise for any rearrangment of the vertices in . The values of are defined to be zero otherwise. The degree tensor of is the -tensor defined by
where is the number of edges containing , and the values of are defined to be zero otherwise. The Laplacian tensor is the tensor defined by .
A brief remark- our choice to focus on -uniform hypergraphs is merely to simplify the exposition. There are well-developed versions of Definition 4 for general hypergraphs[2], but they are harder to write down and explore in this setting. With these fundamental notions established, the goal for the remainder will be to write down and explore a few basic propositions regarding the eigenvalues and eigenvectors of the tensors of a hypergraph .
Proposition 1. Let be a -uniform hypergraph, with , on vertices and its Laplacian tensor.
Let denote the -th standard basis vector. Then is an eigenvector of with eigenvalue .
The vector occurs as an eigenvector of with eigenvalue 0.
Proof. This is an exercise in calculations with tensors. For the first claim let be fixed, and notice that for any , and hence for each , it holds
Notice that if any of the ‘s are repeated, and that if and 0 otherwise. It follows that . For the second claim, again we have for any , and hence
where, with some abuse of notation in the second line, we emphasize that the sum contains all permutations of the terminal points in each edge . -QED.
Exercise 2. Let be a -uniform hypergraph and its adjacency tensor. Show that occurs as an eigenvector of with eigenvalue 0 for each . Sanity check: why does this not imply that ?
Our next proposition is a statement about the distribution of eigenvalues.
Proposition 2. Let be a -uniform hypergraph on vertices, and its Laplacian tensor. Let denote the maximum degree of . Then all of the eigenvalues of lie in the disk .
Proof. ([4, Thm. 6]) Let be an eigenvalue for , and let be a corresponding eigenvector. Assume we have
Then by definition, it holds , which in the -th component reads
Removing a single term from the right-hand side, we get
But from the definition of we recognize as and hence by choice of , we can estimate
where in the final equality we are using the definition of and are recalling the calculation in the proof of proposition 1. We have hence proved that all eigenvalues lie in a closed disk centered on of the same radius; but this depends on the unknown eigenvector , so by observing that for any,
the claim follows. -QED
Exercise 3. Let be a -uniform hypergraph on vertices and its adjacency tensor. Show that all of the eigenvalues of lie in the disk .
For our final proposition, we strive for an analogue of Exercise 1(a)- that is, a relationship between the “size of the kernel” and the connected components of . However for -uniform hypergraphs, will not be a linear operator for any , and hence a kernel is not well-defined a priori. But as we saw in Proposition 1, the same vector shows up as an eigenvector for the null eigenvalue, and hence we are suspicious that more can be said.
A few small preliminaries are in order. A vector is called a binary vector if each of its components are either 0 or 1 for . The support of a vector is the set . If is a -tensor over , then an an eigenvector associated to eigenvalue is called a minimal binary eigenvector associated to if is a binary vector, and if there does not exist another binary eigenvector with eigenvalue for which