Math 202C: Lecture 19

Author: Qihao Ye

Andrew Dennehy have already introduced the concept of the discrete Fourier transform (DFT) to us in Lecture 18, but I would like to retake the path with more details, because there are some other concepts (which help for fully understanding) I would like to talk about.

We mainly talk about how fast Fourier transform (FFT) and inverse fast Fourier transform (IFFT) work, we sightly talk about the calculation error.

We will see how FFT and IFFT work from both the perspective of math and computer, along with their applications and some specific problems. Some problem may involve the number theoretic transform (NTT), which is recognized as the integer DFT over finite field.

We use Python3 for code examples, since we would not use version related feature, the specific version does not matter. (For instance, Python3.5.2 and Python3.8.1 are both ok.) We would call Python instead of Python3 in the following content. (If you do not install Python, there are many online interpreters that can run Python codes, for instance, Try It Online.) Recommend to view this page with computer for getting the best experience. Recommend to try the codes and solve some problems by yourself, which definitely would be challenging and interesting.

There are only 2 exercises in this lecture, other problems are interest based algorithm problems (no need to do as homework), which might be difficult to solve if you are not very familiar with the FFT and NTT in this form, but I will give hints to guide you.

Now let us start with an easy example.

Example 1: We consider two polynomials

Multiply them together (using distributive property), we would get

Definition 1 (coefficient representation): For a polynomial P(x) = a_{0} + a_{1} x + \cdots + a_{n} x^{n}, the list [a_{0}, a_{1}, \ldots, a_{n}] is its coefficient representation. We denote P[k] as the coefficient of x^{k} in P(x).

Use definition 1, we can write P_{1}(x) as [1, 2, 3, 4], P_{2}(x) as [2, 3, 4, 5] and P_{1}(x) \cdot P_{2}(x) = Q(x) as [2, 7, 16, 30, 34, 31, 20].

The naïve polynomial multiplication function is not hard to get:

def naive_poly_mul(P1, P2):
    Q = [0] * (len(P1) + len(P2) - 1)
    for i in range(len(P1)):
        for j in range(len(P2)):
            Q[i + j] += P1[i] * P2[j]
    return Q

In the general case, i.e.,

it is easy to see that the complexity of the naïve polynomial multiplication function is \mathcal{O}(m n). Note that \text{len}(P_{1}) = n + 1 (count from 0 to n). If we consider the specific condition m = n, then the complexity becomes \mathcal{O}(n^{2}).

Definition 2 (degree of polynomial): The degree of a polynomial is the highest of the degrees of the polynomial’s monomials (individual terms) with non-zero coefficients.

In Example 1, P_{1}, P_{2} are both 3-degree polynomials.

Definition 3 (value representation): Except representing a n-th degree polynomial with n + 1 coefficients, we can also represent a n-th degree polynomial with proper (see below Exercise) n + 1 points on the polynomial, which is called the value representation.

Exercise 1 : Prove that n + 1 points \{ (x_{i}, y_{i}) \}_{i = 1}^{n + 1} with distinct \{ x_{i} \} determine a unique polynomial of degree n. Hint: use the fact that a Vandermonde matrix is nonsingular without proof.

Back to Example 1, to get Q, we can first get \{ (x_{i}, P_{1}(x_{i}), P_{2}(x_{i})) \}_{i = 1}^{7} from P_{1}, P_{2} with distinct \{ x_{i} \}, then the 7 points \{ (x_{i}, P_{1}(x_{i}) P_{2}(x_{i})) \}_{i = 1}^{7} just represent Q.

When the degree of P_{1}, P_{2} are both n, we can see that the multiplication here just needs complexity \mathcal{O}(n).

However, at most time, we only need the coefficient representation, because it is more suitable to calculate the values in its domain. It is not hard to figure out we can do the multiplication in this way:

If we only consider the situation that we aim to multiply two n-th degree polynomials, the multiplication part only costs \mathcal{O}(n), so the bottleneck of the complexity lies in the algorithm changing the coefficient representation into the value representation and the algorithm changing it back.

The naïve way to get the value representation of a coefficient represented polynomial cost at least \mathcal{O}(n^{2}). (To get each value we need \mathcal{O}(n), and we totally need \mathcal{O}(n) values.)

The essential idea is selecting specific points to reduce the calculation cost.

The straight thought would be looking at the parity of a function. Because for any odd function P_{\text{odd}}, we have P_{\text{odd}}(-x) = - P_{\text{odd}}(x) while for any even function P_{\text{even}}, we have P_{\text{even}}(-x) = P_{\text{even}}(x).

Actually, we can divide any polynomial into one odd function plus one even function. Take a look back to Q(x) in Example 1, we can write

Notice that Q_{\text{odd}}(x^{2}) is actually an even function, but write in this form allows us to take x^{2} as the variable. It follows that

Note that Q_{\text{even}}(x^{2}) has degree 3 and Q_{\text{odd}}(x^{2}) has degree 2 while Q has degree 6. Once we get Q_{\text{even}}(x^{2}) and Q_{\text{odd}}(x^{2}), we would immediately get Q(x) and Q(-x).

What we only need to do is recursive process Q_{\text{even}}(x^{2}) and Q_{\text{odd}}(x^{2}). It would lead to an \mathcal{O}(n \log n) recursive algorithm.

However, we would encounter the problem that the symmetric property could not be maintained further (we can pair x and -x, but how to pair x^{2}).

As we already see in the previous Lecture, we can use the roots of unity to solve this problem. Denote \omega_{n} = \exp(2 \pi i / n). Note that \omega_{n}^{n} = \omega_{n}^{0} = 1.

To make it easier to understand, we now only consider n = 2^{k} for some positive integer k, we would evaluate polynomial at [\omega_{n}^{0}, \omega_{n}^{1}, \ldots, \omega_{n}^{n - 1}], then [\omega_{n}^{0}, \omega_{n}^{2}, \ldots, \omega_{n}^{n - 2}], next [\omega_{n}^{0}, \omega_{n}^{4}, \ldots, \omega_{n}^{n - 4}], etc. Just as the figure below.

Diagram 1

We pair each \omega_{n}^{j} with \omega_{n}^{-j} = \omega_{n}^{j + n / 2}. For any polynomial P, we can get [P(\omega_{n}^{0}), P(\omega_{n}^{1}), \ldots, P(\omega_{n}^{n - 1})] fast by evaluating [P_{\text{even}}(\omega_{n}^{0}), P_{\text{even}}(\omega_{n}^{2}), \ldots, P_{\text{even}}(\omega_{n}^{n - 2})] and [P_{\text{odd}}(\omega_{n}^{0}), P_{\text{odd}}(\omega_{n}^{2}), \ldots, P_{\text{odd}}(\omega_{n}^{n - 2})] recursively. Recall that we divide P(x) as P_{\text{even}}(x^{2}) + x P_{\text{odd}}(x^{2}). The corresponding formula is

This leads to the naive FFT code:

from math import pi
from math import sin
from math import cos

def get_omega(n):
    theta = 2 * pi / n
    omega = cos(theta) + sin(theta) * 1j
    return omega

def naive_FFT(P):
    # P is the coefficient representation
    # P = [a_{0}, a_{1}, ..., a_{n-1}]
    # current we assume n = 2^{k}
    n = len(P)
    half_n = n // 2
    if n == 1:
        return P # constant function
    omega = get_omega(n)
    P_even, P_odd = P[::2], P[1::2]
    V_even, V_odd = naive_FFT(P_even), naive_FFT(P_odd)
    V = [0] * n # the value representation
    for j in range(half_n):
        V[j] = V_even[j] + omega ** j * V_odd[j]
        V[j + half_n] = V_even[j] - omega ** j * V_odd[j]
    return V

Feed P_{1} in example 1 as input, we would get

Now we can apply Fourier transform to a coefficient representation to get the corresponding value representation, and the multiplication in the value representation form is easy to implement, what remains to solve is the inverse Fourier transform.

In the matrix-vector form for P(x) = a_{0} + a_{1} x + \cdots + a_{n - 1} x^{n - 1}, we have

Note that the character table of C_{n} is just as the form of the DFT matrix:

To get the inverse Fourier transform, we can just use the inverse of the matrix:

Exercise 2: Check that the inverse DFT matrix is the inverse of the DFT matrix.

The difference between DFT and FFT is just the way of calculating these matrix-vector forms, where DFT uses the direct matrix-vector multiplication way in \mathcal{O}(n^{2}) and FFT uses the tricky recursive way to achieve \mathcal{O}(n \log n).

The inverse DFT matrix leads to the naive IFFT code:

def naive_IFFT(V, is_outest_layer=False):
    # V is the value representation
    # w means omega_{n}
    # V = [P(w^{0}), P(w^{1}), ..., P(w^{n-1})]
    # current we assume n = 2^{k}
    n = len(V)
    half_n = n // 2
    if n == 1:
        return V # constant function
    omega = 1.0 / get_omega(n) # omega_{n}^{-1}
    V_even, V_odd = V[::2], V[1::2]
    P_even, P_odd = naive_IFFT(V_even), naive_IFFT(V_odd)
    P = [0] * n # the value representation
    for j in range(half_n):
        P[j] = P_even[j] + omega ** j * P_odd[j]
        P[j + half_n] = P_even[j] - omega ** j * P_odd[j]
    if is_outest_layer:
        for j in range(n):
            P[j] /= n
    return P

Use P_{1} in example 1, we would get

If we ignore the little error, this is just the coefficient representation of P_{1}.

The following materials might not be that clear and might not be easy to understand, but I will try my best. (Some material cannot be expanded too much, otherwise that would cost too much space and might confuse the main part.)

The lowest layer of Diagram 1 is just the bit-reversal permutation index and there is a neat code to generate:

def get_BRI(length):
    # Bit-Reversal Index
    n = 1
    k = -1
    while n < length:
        n <<= 1
        k += 1
    BRI = [0] * n
    for i in range(n):
        BRI[i] = (BRI[i >> 1] >> 1) | ((i & 1) << k)
    return BRI

It is more easy to see in a tabular (an example of 8-length BRI)

Use this we can implement the Cooley–Tukey FFT algorithm, which is the most common FFT algorithm. Further, with proper manner of coding, we can devise an in-place algorithm that overwrites its input with its output data using only \mathcal{O}(1) auxiliary storage, which is called the iterative radix-2 FFT algorithm. Moreover, since the form of FFT and IFFT are actually very similar, we can integrate them together.

def FFT(X, length, is_inverse=False):
    # X : input, either coefficient representation
    #            or value representation
    # length : how much values need to evaluate
    # is_inverse : indicate whether is FFT or IFFT
    inverse_mul = [1, -1][is_inverse]
    BRI = get_BRI(length)
    n = len(BRI)
    X += [0] * (n - len(X))
    for index in range(n):
        if index < BRI[index]: # only change once
            X[index], X[BRI[index]] = X[BRI[index]], X[index]
    bits = 1
    while bits < n:
        omega_base = cos(pi/bits) + inverse_mul * sin(pi/bits) * 1j
        j = 0
        while j < n:
            omega = 1
            for k in range(bits):
                even_part = X[j + k]
                odd_part = X[j + k + bits] * omega
                X[j + k] = even_part + odd_part
                X[j + k + bits] = even_part - odd_part
                omega *= omega_base
            j += bits << 1
        bits <<= 1
    if is_inverse:
        for index in range(length):
            X[index] = X[index].real / n
            # only the real part is needed
    return X[:length]

Note that we could ignore the return part, since X is already changed. This algorithm would extend the input length to its closest larger bit number (of form 2^{k}), but under most condition, we would take the length as 2^{k} before we use this algorithm (adding 0‘s).

Because we use the complex number to implement the FFT algorithm, we can see that the error is hard to eliminate. Even though the initial polynomial is integer based, apply FFT to it, then apply IFFT, we would get a decimal list with some calculation error.

If we do the FFT in the field \mathbb{Z} / P \mathbb{Z}, where P = Q \cdot 2^{k} + 1, denote g as a primitive root modulo P, then \{ 1, g^{Q}, g^{2 Q}, \ldots \} is a cyclic group of order 2^{k}, we can replace \omega_{n} with g to do the FFT. This method is called NTT, since only integers are involved, the errors are not possible to appear.

For arbitrary modulo m, the aiming NTT length n, we can take a set of distinct NTT modulo \{ p_{i} \}_{i = 1}^{r} satisfies

do NTT respectively on all p_{i}, then use the Chinese remainder theorem to combine them together getting the final result modulo m.

Note that during the NTT algorithm, the maximum intermediate value would not exceed n (m - 1)^{2}.

We may say the FFT algorithm solves the convolution in the form of

in time \mathcal{O}(n \log n).

Back in Example 1, we have

(Not mandatory) Problem 1: Give the formula of Stirling numbers of the second kind:

Use the NTT with some modulo of the form P = Q \cdot 2^{k} + 1 to calculate all S(n, k) for 0 \leq k \leq n in time complexity \mathcal{O}(n \log n).

(Not mandatory) Problem 2: PE 537. Hint: If denote A as the list of the value of \pi(x), i.e., A[n] = \sum_{x} [\pi(x) = n], then the convolution of A and A, named B, is the list of the value of \pi(x) + \pi(y), i.e., B[n] = \sum_{x, y} [\pi(x) + \pi(y) = n], then the convolution of B and B, named C, is the list of the value of \pi(x) + \pi(y) + \pi(z) + \pi(w), i.e., C[n] = \sum_{x, y, z, w} [\pi(x) + \pi(y) + \pi(z) + \pi(w) = n], etc. You can consider A, B, C as the generating function. You might need to learn how to sieve prime numbers and use fast exponentiation.

There are bunch of similar algorithms, for example, fast Walsh–Hadamard transform (FWHT) and fast wavelet transform (FWT). FWHT can solve the general convolution

in time complexity \mathcal{O}(n \log n), where \star is some binary operation, usually \star is bitwise OR, bitwise AND, and bitwise XOR.

Using FFT, we can do a lot of things for polynomials fast, for instance, for a polynomial P(x), we can find a polynomial Q(x), such that P(x) Q(x) \equiv 1 \mod{x^{n}}, this Q(x) is called the inverse of P(x) under modulo x^{n}. The basic idea is to find the inverse polynomial under modulo x^{n / 2}, then x^{n / 4}, etc. Because the inverse polynomial under modulo x^{1} is trivial, we can solve this recursively. Similar idea may be apply to the Newton’s method under modulo x^{n}, specifically, we can find the square root of a polynomial under modulo x^{n}.

(Not mandatory) Problem 3: PE 258. Hint: Consider the Cayley–Hamilton theorem (x^{2000} - x - 1 = 0) and use the polynomial inverse to do polynomial quotient on x^{10^{18}}. Or consider solving the homogeneous linear recurrence with constant coefficients by the Berlekamp–Massey algorithm, which involves polynomial multiplication. Note that 20092010 = 8590 \times 2339.

FFT is also used to transform the time domain to the frequency domain in the signal area, while IFFT is used to transform reverse.

In the artical [Machine Learning from a Continuous Viewpoint I] by Weinan E, the Fourier representation

can be considered as a two-layer neural network model with activation function \sigma.

In the above figure, \sigma(\boldsymbol{\omega}, \boldsymbol{x}) calculates each hidden layer value using the input layer \boldsymbol{x} with weight \boldsymbol{\omega}, \int_{\mathbb{R}^{d}} a(\boldsymbol{\omega}) \sigma(\boldsymbol{\omega}, \boldsymbol{x}) d \boldsymbol{\omega} sums all the hidden layer with weight a(\boldsymbol{\omega}). Note this is an integral formula, we work on a continuous condition, which means that the hidden layer has infinite width (the hidden layer is considered to have infinite nodes).

References