Written by a countably infinite number of monkeys with an uncountably infinite number of typewriters, edited by Andrew Dennehy
In a bold twist on the standard formula, in this lecture post we will be discussing math. Specifically, we will be discussing the Cohn-Uman algorithm and how representation theory can provide us an algorithm to multiply two matrices together. We will then see how this perspective could potentially help us prove an important theorem in complexity theory, namely that there exists an algorithm which computes the product of two matrices in time for any . Before we get to that, though, we will first cover an algorithm with many similarities to the one we will be discussing.
Consider the problem of multiplying two polynomials
The naïve approach would be the one that’s taught in high school, namely finding all the terms and using them to calculate the coefficients of . However, we’re significantly cooler than high school students, so we’re going to use some significantly cooler math. First, we introduce the concept of the discrete Fourier transform. Let be the -th root of unity. The DFT of a vector given by
Equivalently, if is the -th component of ,
So, if we apply this transformation to the vector of coefficients of , we get that
So, the problem of finding the Fourier coefficients of is equivalent to the problem of evaluating at . From now on, assume is a power of 2, since if it isn’t we can just add a bunch of 0 coefficients until we have a power of 2. Define the following two polynomials:
This lets us decompose as
So, instead of evaluating at all those powers of , we can instead evaluate and at and then combine the results. So, we have reduced the problem of plugging numbers into a degree polynomial into plugging numbers (specifically the -th roots of unity) into two degree polynomials. Because we’ve assumed that is a power of 2, so is , meaning we can repeat the same algorithm for and . We can apply this process repeatedly, which gives us a recursive algorithm with complexity for calculating the coefficients. Similarly, one can show that we can invert the DFT by applying the same process to the polynomial
with replaced with and then dividing the result by . The upshot of all of this work is that
So, we can calculate the Fourier coefficients of by calculating the coefficients of each polynomial independently and multiplying them together. Since this has linear complexity and both the Fourier and Inverse Fourier transforms have complexity, this method gives us an complexity algorithm to multiply two degree polynomials.
Notice that instead of using the roots of unity, we could have instead used a generator for and worked over with identical results. So, we can frame multiplication of polynomials as a special case of of multiplying elements in .
Next, we take a slight detour to discuss some group theoretic concepts. For any subset of a group (not required to be a subgroup), let the right quotient set of be defined to be
Exercise 1: Assume that and . What does this imply about ? Show that the condition that this implies is actually equivalent to , so the implication goes both ways.
Definition 1: A group realizes if there are subsets such that and for , if
then . This condition is called the triple product property. Furthermore, this is equivalent to the condition that if , then .
Although it isn’t entirely trivial, one can show that if realizes , then it realizes any permutation of those numbers.
Exercise 2: Prove that realizes .
This is the “trivial realization” of . We now get our first taste of why these ideas could be useful for the problem of matrix multiplication:
Theorem 1: Let be any field. If realizes , then the number of field operations required to multiply an matrix by an matrix over is at most the number of operations required to multiply two elements of
Proof: Let realize through the subsets . Then, for matrices and of sizes and respectively, we can index the rows of by , the columns of and the rows of by , and the columns of by . Then, consider the product
By the triple product property, we have that
which is true iff , , and . So, the coefficient of in the product is
So, you can just read off the elements of from these coefficients.
The process we just described, of converting matrix multiplication to multiplication in , is called the Cohn Uman algorithm. Also, we can now see the connection to the polynomial algorithm we described earlier; we’ve converted a problem of multiplying matrices to a problem of multiplying elements of , which is what we did with polynomials before using .
We can now define a measurement of the “quality” of embedding that a given group can provide.
Definition 2: The pseudo-exponent of a non-trivial finite group is the minimum of
over all (not all 1) such that realizes .
Lemma 1: For any group , . Further, if is abelian, then
Proof: is clearly at most 3, since we can just take and , meaning realizes .
To show our lower bound, suppose realizes (not all equal to 1) through the subsets . Then, the triple product tells us that the map is injective on and the image only intersects in the identity. So, with equality only occurring if . The same argument shows this is true for and , so . Therefore,
So, . Finally, if is abelian, then the product map from to must be injective (because the triple product property implies the kernel of the map is trivial), so , meaning .
Next, we want to relate to , the exponent of complexity for matrix multiplication, i.e. the smallest number such that we can perform matrix multiplication in time for any . To do this, we first recall from 202B that
where the ‘s are the dimensions of the irreducible representations of . Then the most important theorem we will discuss is as follows:
Theorem 2: Suppose that is a group with character degrees . Then,
Intuitively, the idea of this theorem is that the problem of multiplying matrices of size reduces to multiplication of two elements in together, which itself reduces to a problem of multiplying a collection of matrices of size , each of which should take approximately operations. So, multiplying two matrices of size has complexity of order at most , which means that this sum is an upper bound for . Actually proving the theorem is more difficult, so we won’t show it here, but this intuition is the most important part.
Importantly, notice that if were equal to 2, then this would imply that (because ). However, , so we need to control the character degrees of to get non-trivial bounds on . Define to be the number such that is the maximum character degree of . An important corollary of Theorem 2 is:
Corollary 1: Let be a finite group. If , then
Proof: Let be the character degrees of . Then, by theorem 4.1,
This implies . Dividing by (which is positive by assumption) gives us our result.
A special case of this corollary is given by
Corollary 2: If there exists a sequence of finite groups such that as and , then the exponent of matrix multiplication is 2.
While these are weaker versions of our original theorem, they only require knowledge of , rather than all of the character degrees. Finally, we end with an open problem in representation theory whose positive answer could lead to a solution to our original problem:
Open Problem: For integers not all 1, does there exist a finite group that realizes and has character degrees such that
While this is not a necessary condition, a positive answer to this could lead to a proof that when combined with our theorem. This is because one limitation of our theorem is that unless we can rule out the case that in theorem 4.1, that theorem is satisfied trivially. So, for us to be able to show that a group gives us a non-trivial improvement, we have to show that for any , there exists a group realizing such that , which the open problem implies.
While it is an open problem whether these groups exist in general, there are a few concrete examples where such a group exists, such as the one that follows:
Let and let , where the group action of on is given by switching the two factors. If we let be the generator of , we can write the elements of as with and . Let be the three copies of in , viewed as subgroups, and let for notation. Then, we can define as
One can show, although the proof is long, that these sets satisfy the triple product property. Furthermore, the character degrees of are all at most 2, since is an abelian subgroup of index 2 in and the maximum character degree of a group is at most the index of any abelian subgroup in the group. Since the sum of the squares of the character degrees is , the sum of the cubes of the degrees is at most . However, , meaning that for ,