Author: Qihao Ye

Andrew Dennehy have already introduced the concept of the discrete Fourier transform (DFT) to us in Lecture 18, but I would like to retake the path with more details, because there are some other concepts (which help for fully understanding) I would like to talk about.

We mainly talk about how fast Fourier transform (FFT) and inverse fast Fourier transform (IFFT) work, we sightly talk about the calculation error.

We will see how FFT and IFFT work from both the perspective of math and computer, along with their applications and some specific problems. Some problem may involve the number theoretic transform (NTT), which is recognized as the integer DFT over finite field.

We use Python3 for code examples, since we would not use version related feature, the specific version does not matter. (For instance, Python3.5.2 and Python3.8.1 are both ok.) We would call Python instead of Python3 in the following content. (If you do not install Python, there are many online interpreters that can run Python codes, for instance, Try It Online.) *Recommend to view this page with computer for getting the best experience. Recommend to try the codes and solve some problems by yourself, which definitely would be challenging and interesting.*

There are only 2 exercises in this lecture, other problems are interest based algorithm problems (no need to do as homework), which might be difficult to solve if you are not very familiar with the FFT and NTT in this form, but I will give hints to guide you.

Now let us start with an easy example.

**Example 1:** We consider two polynomials

Multiply them together (using distributive property), we would get

**Definition 1 (coefficient representation):** For a polynomial , the list is its coefficient representation. We denote as the coefficient of in .

Use definition 1, we can write as , as and as .

The naïve polynomial multiplication function is not hard to get:

```
def naive_poly_mul(P1, P2):
Q = [0] * (len(P1) + len(P2) - 1)
for i in range(len(P1)):
for j in range(len(P2)):
Q[i + j] += P1[i] * P2[j]
return Q
```

In the general case, i.e.,

it is easy to see that the complexity of the naïve polynomial multiplication function is . Note that (count from to ). If we consider the specific condition , then the complexity becomes .

**Definition 2 (degree of polynomial):** The degree of a polynomial is the highest of the degrees of the polynomial’s monomials (individual terms) with non-zero coefficients.

In Example 1, are both -degree polynomials.

**Definition 3 (value representation):** Except representing a th degree polynomial with coefficients, we can also represent a th degree polynomial with proper (see below Exercise) points on the polynomial, which is called the value representation.

**Exercise 1 :** Prove that points with distinct determine a unique polynomial of degree . ** Hint:** use the fact that a Vandermonde matrix is nonsingular without proof.

Back to Example 1, to get , we can first get from with distinct , then the points just represent .

When the degree of are both , we can see that the multiplication here just needs complexity .

However, at most time, we only need the coefficient representation, because it is more suitable to calculate the values in its domain. It is not hard to figure out we can do the multiplication in this way:

If we only consider the situation that we aim to multiply two th degree polynomials, the multiplication part only costs , so the bottleneck of the complexity lies in the algorithm changing the coefficient representation into the value representation and the algorithm changing it back.

The naïve way to get the value representation of a coefficient represented polynomial cost at least . (To get each value we need , and we totally need values.)

The essential idea is selecting specific points to reduce the calculation cost.

The straight thought would be looking at the parity of a function. Because for any odd function , we have while for any even function , we have .

Actually, we can divide any polynomial into one odd function plus one even function. Take a look back to in Example 1, we can write

Notice that is actually an even function, but write in this form allows us to take as the variable. It follows that

Note that has degree and has degree while has degree . Once we get and , we would immediately get and .

What we only need to do is recursive process and . It would lead to an recursive algorithm.

However, we would encounter the problem that the symmetric property could not be maintained further (we can pair and , but how to pair ).

As we already see in the previous Lecture, we can use the roots of unity to solve this problem. Denote . Note that .

To make it easier to understand, we now only consider for some positive integer , we would evaluate polynomial at , then , next , etc. Just as the figure below.

We pair each with . For any polynomial , we can get fast by evaluating and recursively. Recall that we divide as . The corresponding formula is

This leads to the naive FFT code:

```
from math import pi
from math import sin
from math import cos
def get_omega(n):
theta = 2 * pi / n
omega = cos(theta) + sin(theta) * 1j
return omega
def naive_FFT(P):
# P is the coefficient representation
# P = [a_{0}, a_{1}, ..., a_{n-1}]
# current we assume n = 2^{k}
n = len(P)
half_n = n // 2
if n == 1:
return P # constant function
omega = get_omega(n)
P_even, P_odd = P[::2], P[1::2]
V_even, V_odd = naive_FFT(P_even), naive_FFT(P_odd)
V = [0] * n # the value representation
for j in range(half_n):
V[j] = V_even[j] + omega ** j * V_odd[j]
V[j + half_n] = V_even[j] - omega ** j * V_odd[j]
return V
```

Feed in example 1 as input, we would get

Now we can apply Fourier transform to a coefficient representation to get the corresponding value representation, and the multiplication in the value representation form is easy to implement, what remains to solve is the inverse Fourier transform.

In the matrix-vector form for , we have

Note that the character table of is just as the form of the DFT matrix:

To get the inverse Fourier transform, we can just use the inverse of the matrix:

**Exercise 2:** Check that the inverse DFT matrix is the inverse of the DFT matrix.

The difference between DFT and FFT is just the way of calculating these matrix-vector forms, where DFT uses the direct matrix-vector multiplication way in and FFT uses the tricky recursive way to achieve .

The inverse DFT matrix leads to the naive IFFT code:

```
def naive_IFFT(V, is_outest_layer=False):
# V is the value representation
# w means omega_{n}
# V = [P(w^{0}), P(w^{1}), ..., P(w^{n-1})]
# current we assume n = 2^{k}
n = len(V)
half_n = n // 2
if n == 1:
return V # constant function
omega = 1.0 / get_omega(n) # omega_{n}^{-1}
V_even, V_odd = V[::2], V[1::2]
P_even, P_odd = naive_IFFT(V_even), naive_IFFT(V_odd)
P = [0] * n # the value representation
for j in range(half_n):
P[j] = P_even[j] + omega ** j * P_odd[j]
P[j + half_n] = P_even[j] - omega ** j * P_odd[j]
if is_outest_layer:
for j in range(n):
P[j] /= n
return P
```

Use in example 1, we would get

If we ignore the little error, this is just the coefficient representation of .

The following materials might not be that clear and might not be easy to understand, but I will try my best. (Some material cannot be expanded too much, otherwise that would cost too much space and might confuse the main part.)

The lowest layer of Diagram 1 is just the bit-reversal permutation index and there is a neat code to generate:

```
def get_BRI(length):
# Bit-Reversal Index
n = 1
k = -1
while n < length:
n <<= 1
k += 1
BRI = [0] * n
for i in range(n):
BRI[i] = (BRI[i >> 1] >> 1) | ((i & 1) << k)
return BRI
```

It is more easy to see in a tabular (an example of -length BRI)

Use this we can implement the Cooley–Tukey FFT algorithm, which is the most common FFT algorithm. Further, with proper manner of coding, we can devise an in-place algorithm that overwrites its input with its output data using only auxiliary storage, which is called the iterative radix-2 FFT algorithm. Moreover, since the form of FFT and IFFT are actually very similar, we can integrate them together.

```
def FFT(X, length, is_inverse=False):
# X : input, either coefficient representation
# or value representation
# length : how much values need to evaluate
# is_inverse : indicate whether is FFT or IFFT
inverse_mul = [1, -1][is_inverse]
BRI = get_BRI(length)
n = len(BRI)
X += [0] * (n - len(X))
for index in range(n):
if index < BRI[index]: # only change once
X[index], X[BRI[index]] = X[BRI[index]], X[index]
bits = 1
while bits < n:
omega_base = cos(pi/bits) + inverse_mul * sin(pi/bits) * 1j
j = 0
while j < n:
omega = 1
for k in range(bits):
even_part = X[j + k]
odd_part = X[j + k + bits] * omega
X[j + k] = even_part + odd_part
X[j + k + bits] = even_part - odd_part
omega *= omega_base
j += bits << 1
bits <<= 1
if is_inverse:
for index in range(length):
X[index] = X[index].real / n
# only the real part is needed
return X[:length]
```

Note that we could ignore the return part, since is already changed. This algorithm would extend the input length to its closest larger bit number (of form ), but under most condition, we would take the length as before we use this algorithm (adding ‘s).

Because we use the complex number to implement the FFT algorithm, we can see that the error is hard to eliminate. Even though the initial polynomial is integer based, apply FFT to it, then apply IFFT, we would get a decimal list with some calculation error.

If we do the FFT in the field , where , denote as a primitive root modulo , then is a cyclic group of order , we can replace with to do the FFT. This method is called NTT, since only integers are involved, the errors are not possible to appear.

For arbitrary modulo , the aiming NTT length , we can take a set of distinct NTT modulo satisfies

do NTT respectively on all , then use the Chinese remainder theorem to combine them together getting the final result modulo .

Note that during the NTT algorithm, the maximum intermediate value would not exceed .

We may say the FFT algorithm solves the convolution in the form of

in time .

Back in Example 1, we have

**(Not mandatory) Problem 1:** Give the formula of Stirling numbers of the second kind:

Use the NTT with some modulo of the form to calculate all for in time complexity .

**(Not mandatory) Problem 2:** PE 537. ** Hint:** If denote as the list of the value of , i.e., , then the convolution of and , named , is the list of the value of , i.e., , then the convolution of and , named , is the list of the value of , i.e., , etc. You can consider as the generating function. You might need to learn how to sieve prime numbers and use fast exponentiation.

There are bunch of similar algorithms, for example, fast Walsh–Hadamard transform (FWHT) and fast wavelet transform (FWT). FWHT can solve the general convolution

in time complexity , where is some binary operation, usually is bitwise OR, bitwise AND, and bitwise XOR.

Using FFT, we can do a lot of things for polynomials fast, for instance, for a polynomial , we can find a polynomial , such that , this is called the inverse of under modulo . The basic idea is to find the inverse polynomial under modulo , then , etc. Because the inverse polynomial under modulo is trivial, we can solve this recursively. Similar idea may be apply to the Newton’s method under modulo , specifically, we can find the square root of a polynomial under modulo .

**(Not mandatory) Problem 3:** PE 258. ** Hint:** Consider the Cayley–Hamilton theorem () and use the polynomial inverse to do polynomial quotient on . Or consider solving the homogeneous linear recurrence with constant coefficients by the Berlekamp–Massey algorithm, which involves polynomial multiplication. Note that .

FFT is also used to transform the time domain to the frequency domain in the signal area, while IFFT is used to transform reverse.

In the artical [Machine Learning from a Continuous Viewpoint I] by Weinan E, the Fourier representation

can be considered as a two-layer neural network model with activation function .

In the above figure, calculates each hidden layer value using the input layer with weight , sums all the hidden layer with weight . Note this is an integral formula, we work on a continuous condition, which means that the hidden layer has infinite width (the hidden layer is considered to have infinite nodes).

**References**

- https://www.nayuki.io/page/number-theoretic-transform-integer-dft
- https://www.geeksforgeeks.org/fast-fourier-transformation-poynomial-multiplication/
- Thomas H. Cormen, Charles E. Leiserson, and Ronald L. Rivest. Introduction to Algorithms CHAPTER 32: POLYNOMIALS AND THE FFT.
- https://jakevdp.github.io/blog/2013/08/28/understanding-the-fft/
- Polynomial Multiplication and Fast Fourier Transform: Yan-Bin Jia
- https://www.youtube.com/watch?v=h7apO7q16V0
- Weinan, E., Ma, C., & Wu, L. (2020). Machine learning from a continuous viewpoint, I.
*Science China Mathematics*,*63*(11), 2233-2266. - Ceccherini-Silberstein. Discrete Harmonic Analysis Chapter 5.
- https://github.com/wcysai/FFT-and-Polynomial-Operations-slides-/blob/master/FFT%20and%20Polynomial%20Operations.pdf
- https://alan20210202.github.io/2020/08/07/FWHT/
- Pan, H., Dabawi, D., & Cetin, A. E. (2021). Fast Walsh-Hadamard Transform and Smooth-Thresholding Based Binary Layers in Deep Neural Networks.
*arXiv preprint arXiv:2104.07085*.