Computing Discrete Fourier Transform

Nikhil Yewale

29th April 2021

Introduction

This is a general topic dealing with the commonly known ways of computing 1D Discrete Fourier Transform(DFT). We try to explain the general $O(N^2)$ and $O(N.log(N))$ methods for computing DFT. We also demonstrate the performance of our custom DFT functions with the julia bindings of the standard FFTW package.

Let's start by importing the standard FFTW package.

using FFTW

Create a 1D signal or array of size $2^N$

x = rand(2^10);      𝑁 = length(x);

Discrete Fourier transform(DFT) converts the sequence of $N$ complex numbers ${x_1 , x_2, .., x_N}$ into another sequence of $N$ complex numbers ${X_1 , X_2, .., X_N}$. This transformation is expressed as follows:

\[ \begin{align} X_k = \sum_{n=1}^{N} x_n. e^{-\frac{2\pi i}{N}(k-1)(n-1)} \end{align} \]

where $k = 1,2,...,N$

# DFT_1 is plain definition of DFT... O(𝑁²)
function DFT_1(x::AbstractArray; 𝑁::Int = length(x))
    X::Array{ComplexF64} = zeros(ComplexF64, 𝑁)
    for k ∈ 1:𝑁
        for n ∈ 1:𝑁
            X[k] += x[n]*ℯ^(-im*2π*(k-1)*(n-1)/𝑁)
        end
    end
    return X
end
DFT_1(x; 𝑁 = 𝑁) ≈ fft(x)

true

True !, so our function does compute accurately. Did you notice a little symmetry in the output of DFT_1 function ? No ? Take a look again !

DFT_1(rand(10))

10-element Vector{ComplexF64}:
   5.897454472122993 + 0.0im
 -0.3823454543891961 + 1.760004999160685im
 -0.6192775437121578 + 0.060679302598261264im
  0.5161852281512058 + 0.7918002641968952im
 0.44467328164178876 - 0.9071126434806949im
 -0.7818396467065358 - 6.53986062552143e-16im
  0.4446732816417883 + 0.9071126434806948im
  0.5161852281512024 - 0.791800264196892im
 -0.6192775437121574 - 0.0606793025982626im
 -0.3823454543891939 - 1.760004999160682im

The DFT values for indices 𝑁:-1:(𝑁/2 + 1 + 1) are conjugate of the entries prior to index 𝑁/2 + 1. This warrants some changes in the indices outer loop along with conditional evaluation. DFT_2 function employs these improvements as follows:

# DFT_2 with Little improvement over... O(𝑁²)
function DFT_2(x::AbstractArray; 𝑁::Int = length(x))
    X::Array{ComplexF64} = zeros(ComplexF64, 𝑁)
    for k ∈ 1:(𝑁÷2 + 1)
        for n ∈ 1:𝑁
            X[k] += x[n]*ℯ^(-im*2π*(k-1)*(n-1)/𝑁)
        end
        if k ≠ 1   # symmetric
            X[end - (k-2)] = conj(X[k])
        end
    end
    return X
end
DFT_2(x; 𝑁 = 𝑁) ≈ fft(x)

true

True !,the improvements are accurately accomodated.

DFT_1 and DFT_2, though accurate, their computing time scales as $O(N^2)$. Is there a better way ?

Cooley-Tuckey algorithm - The $O(N.log(N))$ way

This algorithm recursively breaks down a DFT of size $N$ into smaller DFTs. The most commonly known recipe of Cooley-Tuckey algorithm is by recursively dividing the sequence of DFT into $N/2$ sizes at each step.This is called radix 2 decimation-in-time(DIT) case. Naturally, this approach restricts the size of $N$ to be as power of $2$.

Radix-2 DIT computes the DFTs of the even-indexed inputs, and of the odd-indexed inputs, and then combines the results to produce the DFT of the whole sequence. This step is performed recursively to scale-down the overall runtime to $O(N.log N)$. Detailed steps and pseudocode for radix-2 DIT can be found here. DIT_FFT_radix2 is our own implementation of radix-2 DIT case, and it too computes DFTs accurately!

# Cooley_Tuckey_FFT radix 2, based on divide and conquer
function DIT_FFT_radix2(x::AbstractArray; 𝑁::Int64 = length(x))
    X₁ = Vector{ComplexF64}()
    X₂ = Vector{ComplexF64}()
    if 𝑁 == 1
        return Array{ComplexF64}(x)
    else
        Xeven = DIT_FFT_radix2(x[1:2:end])  # recursion
        Xodd = DIT_FFT_radix2(x[2:2:end])
        for k ∈ 1:(𝑁÷2)
            push!(X₁, Xeven[k] + ℯ^(-2π*im*(k-1)/𝑁)*Xodd[k])
            push!(X₂, Xeven[k] - ℯ^(-2π*im*(k-1)/𝑁)*Xodd[k])
        end
        return [X₁; X₂]
    end
end
DIT_FFT_radix2(x; 𝑁 = 𝑁) ≈ fft(x)

true

Now, let's get greedy! Can we do even better(without parallelisation) ? There are multiple methods to optimise this even further, but it always scales as $O(N.log(N))$. We can still certainly speed up these $O(N.log(N))$ operations by making the recursion more efficient. One way is memoization, where we save some expensive function calls, so that it can be reused when the same inputs occur again later.

Memoization

A really good example to understand memoization is by studying a memoised factorial function. Let us make the function remember some of it's old calls and reuse this past knowledge for speedy future unknown calls.

Lucikly(lazily), Julia package Memoize provides a macro @memoize to memoize the function DIT_FFT_radix2_mem. ! And this too accurately returns the required DFT.

# Memoization of Cooley_Tuckey_FFT radix 2, based on divide and conquer
using Memoize
@memoize function DIT_FFT_radix2_mem(x::AbstractArray; 𝑁::Int64 = length(x))
    X₁ = Vector{ComplexF64}()
    X₂ = Vector{ComplexF64}()
    if 𝑁 == 1
        return Array{ComplexF64}(x)
    else
        Xeven = DIT_FFT_radix2(x[1:2:end])  # recursion
        Xodd = DIT_FFT_radix2(x[2:2:end])
        for k ∈ 1:(𝑁÷2)
            push!(X₁, Xeven[k] + ℯ^(-2π*im*(k-1)/𝑁)*Xodd[k])
            push!(X₂, Xeven[k] - ℯ^(-2π*im*(k-1)/𝑁)*Xodd[k])
        end
        return [X₁; X₂]
    end
end
DIT_FFT_radix2_mem(x; 𝑁 = 𝑁) ≈ fft(x)

true

Time for some measurements now!

using BenchmarkTools
@btime DFT_1(x);

31.128 ms (1 allocation: 16.12 KiB)

DFT_1 Not so good ! 😢

@btime DFT_2(x);

15.402 ms (1 allocation: 16.12 KiB)

DFT_2 Little better than DFT_1! 😅

@btime DIT_FFT_radix2(x);

884.891 μs (10395 allocations: 1.25 MiB)

DIT_FFT_radix2, Wow ! 😍

@btime DIT_FFT_radix2_mem(x);

70.551 ns (1 allocation: 32 bytes)

DIT_FFT_radix2_men, I love memoization 💓

fft and plan_fft are standard functions in FFTW julia bindings. plan_fft is little better than fft, but both of them lag behind the memoized function ! To know why, I will update the page after I understand the inner-workings of FFTW package.

@btime fft(x);      # less efficient built-in

23.545 μs (34 allocations: 34.19 KiB)

@btime plan_fft(x); # more efficient built-in

16.641 μs (33 allocations: 18.06 KiB)

Hope you had fun reading about DFT !

Note

This page is open for updation on more content related to computing DFT, and you are welcome to open PR/issues !
The functions are meant for computing 1D DFT at the moment. Generalizations for higher-dimensions will follow soon.
The functions are not yet written to be executed on multiple-threads or do not support any sort of parallelism. That too shall be a topic for another day.
The function DIT_FFT_radix2 and DIT_FFT_radix2_mem compute DFTs only for arrays of size $2^N$