Math 202C: Lecture 18

Written by a countably infinite number of monkeys with an uncountably infinite number of typewriters, edited by Andrew Dennehy

In a bold twist on the standard formula, in this lecture post we will be discussing math. Specifically, we will be discussing the Cohn-Uman algorithm and how representation theory can provide us an algorithm to multiply two matrices together. We will then see how this perspective could potentially help us prove an important theorem in complexity theory, namely that there exists an algorithm which computes the product of two n\times n matrices in O(n^{2+\epsilon}) time for any \epsilon>0. Before we get to that, though, we will first cover an algorithm with many similarities to the one we will be discussing.

Consider the problem of multiplying two polynomials

p(x)=p_0+p_1x+\cdots +p_{n-1}x^{n-1}

q(x)=q_0+q_1x+\cdots +q_{n-1}x^{n-1}

The na├»ve approach would be the one that’s taught in high school, namely finding all the a_ib_j terms and using them to calculate the coefficients of p(x)q(x). However, we’re significantly cooler than high school students, so we’re going to use some significantly cooler math. First, we introduce the concept of the discrete Fourier transform. Let \omega_n=e^{i \frac{2\pi}{n}} be the n-th root of unity. The DFT of a vector v given by

\hat{v} := \begin{bmatrix}1 & 1 & 1 & \cdots & 1 \\  1 & \omega_n &  \omega_n^2 & \cdots &  \omega_n^{n-1} \\ \vdots & \vdots  & \vdots  & \ddots  & \vdots \\  1 & \omega_n^{n-1} &  \omega_n^{2(n-1)} & \cdots &  \omega_n^{(n-1)^2}  \end{bmatrix} v

Equivalently, if v_i is the i-th component of v,

\hat{v}_i := v_0+v_1  \omega_n^{k} + \cdots + v_n\omega_n^{nk}

So, if we apply this transformation to the vector of coefficients of p, we get that

\hat{p}_k= p_0+p_1  \omega_n^{k} + \cdots + p_n\omega_n^{nk} =p(\omega_n^k)

So, the problem of finding the Fourier coefficients of p is equivalent to the problem of evaluating p at 1,\omega_n,\omega_n^2,...,\omega_n^{n-1}. From now on, assume n is a power of 2, since if it isn’t we can just add a bunch of 0 coefficients until we have a power of 2. Define the following two polynomials:

p_0(x)=p_0+p_2x + \cdots + p_{n-2} x^{n/2-1}

p_1(x)=p_1+p_3x + \cdots + p_{n-1} x^{n/2-1}

This lets us decompose p as

p(x)=p_0(x^2)+xp_1(x^2)

So, instead of evaluating p at all those powers of \omega_n, we can instead evaluate p_0 and p_1 at 1,\omega_n^2, ..., \omega_n^{2(n-1)} and then combine the results. So, we have reduced the problem of plugging n numbers into a degree n polynomial into plugging n/2 numbers (specifically the n/2-th roots of unity) into two degree n/2 polynomials. Because we’ve assumed that n is a power of 2, so is n/2, meaning we can repeat the same algorithm for p_0 and p_1. We can apply this process repeatedly, which gives us a recursive algorithm with complexity O(n \log n) for calculating the coefficients. Similarly, one can show that we can invert the DFT by applying the same process to the polynomial

\hat p (x)=\hat{p}_0+\hat{p}_1 x + \cdots + \hat{p}_n x^{n}

with \omega_n replaced with \omega_n^{-1}=\omega_n^{n-1} and then dividing the result by n. The upshot of all of this work is that

\widehat{p q}_k = [pq](\omega_n^k) = p(\omega_n^k) q(\omega_n^k)=\hat p_k \hat q_k

So, we can calculate the Fourier coefficients of pq by calculating the coefficients of each polynomial independently and multiplying them together. Since this has linear complexity and both the Fourier and Inverse Fourier transforms have n\log n complexity, this method gives us an O(n\log n) complexity algorithm to multiply two degree n polynomials.

Notice that instead of using the roots of unity, we could have instead used a generator for C_n and worked over \mathbb C [C_n] with identical results. So, we can frame multiplication of polynomials as a special case of of multiplying elements in \mathbb C [C_n].

Next, we take a slight detour to discuss some group theoretic concepts. For any subset S of a group G (not required to be a subgroup), let the right quotient set Q(S) of S be defined to be

Q(S)=\left\{s_1s_2^{-1}:s_1,s_2\in S\right\}

Exercise 1: Assume that S\neq \emptyset and Q(S)=S. What does this imply about S? Show that the condition that this implies is actually equivalent to Q(S)=S, so the implication goes both ways.

Definition 1: A group G realizes \langle n_1,n_2,n_3 \rangle if there are subsets S_1, S_2, S_3\subseteq G such that |S_i|=n_i and for q_i\in Q(S_i), if

q_1q_2q_3=1

then q_1=q_2=q_3=1. This condition is called the triple product property. Furthermore, this is equivalent to the condition that if q_1q_2=q_3, then q_1=q_2=q_3=1 .

Although it isn’t entirely trivial, one can show that if G realizes \langle n_1,n_2,n_3 \rangle, then it realizes any permutation of those numbers.

Exercise 2: Prove that C_n\times C_m\times C_p realizes \langle n,m,p \rangle.

This is the “trivial realization” of \langle n,m,p \rangle. We now get our first taste of why these ideas could be useful for the problem of matrix multiplication:

Theorem 1: Let F be any field. If G realizes \langle n,m,p \rangle, then the number of field operations required to multiply an n\times m matrix by an m\times p matrix over F is at most the number of operations required to multiply two elements of F[G]

Proof: Let G realize \langle n,m,p \rangle through the subsets S,T,U. Then, for matrices A and B of sizes n\times m and m\times p respectively, we can index the rows of A by S, the columns of A and the rows of B by T, and the columns of B by U. Then, consider the product

\left(\sum\limits_{s\in S,t\in T} A_{st}s ^{-1}  t\right) \left(\sum \limits _{t'\in T, u\in U} B_{t'u} t'^{-1}u\right)

By the triple product property, we have that

(s^{-1}  t)(t'^{-1} u)=s'^{-1} u'\iff (s' s^{-1}) (tt'^{-1} )(uu'^{-1})=1

which is true iff s=s', t=t', and u=u'. So, the coefficient of s^{-1} u in the product is

\sum\limits_{t\in T}A_{st}B_{tu}=(AB)_{su}

So, you can just read off the elements of AB from these coefficients. \square

The process we just described, of converting matrix multiplication to multiplication in F[G], is called the Cohn Uman algorithm. Also, we can now see the connection to the polynomial algorithm we described earlier; we’ve converted a problem of multiplying matrices to a problem of multiplying elements of \mathbb C[G], which is what we did with polynomials before using \mathbb C[C_n].

We can now define a measurement of the “quality” of embedding that a given group G can provide.

Definition 2: The pseudo-exponent \alpha(g) of a non-trivial finite group G is the minimum of

\frac{3\log |G|}{\log nmp}

over all n,m,p (not all 1) such that G realizes \langle n,m,p \rangle.

Lemma 1: For any group G, 2< \alpha(G)\leq 3. Further, if G is abelian, then \alpha(G)=3

Proof: \alpha(G) is clearly at most 3, since we can just take S_1=S_2=\{1\} and S_3=G, meaning G realizes \langle 1,1,|G|\rangle.

To show our lower bound, suppose G realizes \langle n_1,n_2,n_3\rangle (not all equal to 1) through the subsets S_1,S_2,S_3. Then, the triple product tells us that the map (x,y)\mapsto x^{-1}y is injective on S_1\times S_2 and the image only intersects Q(S_3) in the identity. So, |G|\geq n_1n_2 with equality only occurring if n_3=1. The same argument shows this is true for n_2n_3 and n_3n_1, so |G|^3>(n_1n_2n_3)^2. Therefore,

\frac{3 \log |G|}{\log nmp}> \frac{3 \log (nmp)^{2/3}}{\log nmp}=\frac{3 \cdot 2}{3}=2

So, \alpha(G)>2. Finally, if G is abelian, then the product map from S_1\times S_2 \times S_3 to G must be injective (because the triple product property implies the kernel of the map is trivial), so |G|\geq n_1n_2n_3, meaning \alpha(G)\geq 3. \square

Next, we want to relate \alpha(G) to \omega, the exponent of complexity for matrix multiplication, i.e. the smallest number such that we can perform matrix multiplication in O(n^{\omega +\epsilon}) time for any \epsilon>0. To do this, we first recall from 202B that

\mathbb{C}[G]\cong \mathrm{End} \mathbb{C}[G]\cong \mathcal{M}_{d_1\times d_1} \oplus \cdots \oplus \mathcal{M}_{d_k\times d_k}

where the d_i‘s are the dimensions of the irreducible representations of G. Then the most important theorem we will discuss is as follows:

Theorem 2: Suppose that G is a group with character degrees d_1,...,d_k. Then,

|G|^{\omega/\alpha(G)}\leq \sum\limits_{i=1}^k d_i^{\omega}

Intuitively, the idea of this theorem is that the problem of multiplying matrices of size |G|^{1/\alpha(G)} reduces to multiplication of two elements in \mathbb{C}[G] together, which itself reduces to a problem of multiplying a collection of matrices of size d_i\times d_i, each of which should take approximately d_i^\omega operations. So, multiplying two matrices of size |G|^{1/\alpha(G)} has complexity of order at most \sum_{i=1}^k d_i^{\omega}, which means that this sum is an upper bound for |G|^{\omega/\alpha(G)}. Actually proving the theorem is more difficult, so we won’t show it here, but this intuition is the most important part.

Importantly, notice that if \alpha(G) were equal to 2, then this would imply that \omega =2 (because \sum_i d_i^2 =|G|). However, \alpha(G)>2, so we need to control the character degrees of G to get non-trivial bounds on \omega. Define \gamma(G) to be the number such that |G|^{1/\gamma} is the maximum character degree of G. An important corollary of Theorem 2 is:

Corollary 1: Let G be a finite group. If \alpha(G)<\gamma(G), then

\omega\leq \alpha(G)\left(\frac{\gamma(G)-2}{\gamma(G)-\alpha(G)} \right)

Proof: Let \{d_i\} be the character degrees of G. Then, by theorem 4.1,

|G|^{\omega/\alpha(G)}\leq \sum_i d_i^{\omega-2}d_i^2\leq |G|^{(\omega-2)/\gamma(G)}\sum_i d_i^2\leq |G|^{1+(\omega-2)/\gamma(G)}

This implies \omega\left(\frac{1}{\alpha(G)}-\frac{1}{\gamma(G)}\right)\leq 1-\frac{2}{\gamma(G)}. Dividing by \frac{1}{\alpha(G)}-\frac{1}{\gamma(G)} (which is positive by assumption) gives us our result. \square

A special case of this corollary is given by

Corollary 2: If there exists a sequence G_1,G_2,... of finite groups such that \alpha(G_i)=2+o(1) as i\rightarrow \infty and \alpha(G_i)-2=o(\gamma(G_i)-2), then the exponent of matrix multiplication is 2.

While these are weaker versions of our original theorem, they only require knowledge of \gamma(G), rather than all of the character degrees. Finally, we end with an open problem in representation theory whose positive answer could lead to a solution to our original problem:

Open Problem: For integers n,m,p not all 1, does there exist a finite group that realizes \langle n,m,p\rangle and has character degrees d_1,...,d_k such that

nmp>\sum_{i}d_i^3

While this is not a necessary condition, a positive answer to this could lead to a proof that \omega=2 when combined with our theorem. This is because one limitation of our theorem is that unless we can rule out the case that |G|^{3/\alpha}\leq \sum_{i}d_i^3 in theorem 4.1, that theorem is satisfied trivially. So, for us to be able to show that a group gives us a non-trivial improvement, we have to show that for any n,m,p\in \mathbb N, there exists a group G realizing \langle n,m,p \rangle such that |G|^{3/\alpha}> \sum\limits_{i}d_i^3, which the open problem implies.

While it is an open problem whether these groups exist in general, there are a few concrete examples where such a group exists, such as the one that follows:

Let H=C_2\times C_2 \times C_2 and let G=H^2 \rtimes C_2, where the group action of C_2 on H^2 is given by switching the two factors. If we let z be the generator of C_2, we can write the elements of G as (a,b)z^i with a,b\in H and i\in \{0,1\}. Let H_1,H_2,H_3 be the three copies of C_n in H, viewed as subgroups, and let H_4:=H_1 for notation. Then, we can define S_1,S_2,S_3 as

S_i=\{(a,b)z^j:a\in H_i\setminus \{1\},b\in H_{i+1},j\in \{0,1\}\}

One can show, although the proof is long, that these sets satisfy the triple product property. Furthermore, the character degrees of G are all at most 2, since H^2 is an abelian subgroup of index 2 in G and the maximum character degree of a group is at most the index of any abelian subgroup in the group. Since the sum of the squares of the character degrees is |G|, the sum of the cubes of the degrees is at most 2|G|=4n^6. However, |S_i|=2n(n-1), meaning that for n\geq 5,

|S_1| |S_2| |S_3| =8n^3(n-1)^3\geq 4n^6= 2|G| \geq \sum\limits_{i}d_i^3