This is a general topic dealing with the commonly known ways of computing 1D Discrete Fourier Transform(DFT). We try to explain the general $O(N^2)$ and $O(N.log(N))$ methods for computing DFT. We also demonstrate the performance of our custom DFT functions with the julia bindings of the standard FFTW package.
Let's start by importing the standard FFTW package.
using FFTW
Create a 1D signal or array of size $2^N$
x = rand(2^10); ๐ = length(x);
Discrete Fourier transform(DFT) converts the sequence of $N$ complex numbers ${x_1 , x_2, .., x_N}$ into another sequence of $N$ complex numbers ${X_1 , X_2, .., X_N}$. This transformation is expressed as follows:
\[ \begin{align} X_k = \sum_{n=1}^{N} x_n. e^{-\frac{2\pi i}{N}(k-1)(n-1)} \end{align} \]
where $k = 1,2,...,N$
# DFT_1 is plain definition of DFT... O(๐ยฒ) function DFT_1(x::AbstractArray; ๐::Int = length(x)) X::Array{ComplexF64} = zeros(ComplexF64, ๐) for k โ 1:๐ for n โ 1:๐ X[k] += x[n]*โฏ^(-im*2ฯ*(k-1)*(n-1)/๐) end end return X end DFT_1(x; ๐ = ๐) โ fft(x)
true
True !
, so our function does compute accurately. Did you notice a little symmetry in the output of DFT_1
function ? No ? Take a look again !
DFT_1(rand(10))
10-element Vector{ComplexF64}: 5.897454472122993 + 0.0im -0.3823454543891961 + 1.760004999160685im -0.6192775437121578 + 0.060679302598261264im 0.5161852281512058 + 0.7918002641968952im 0.44467328164178876 - 0.9071126434806949im -0.7818396467065358 - 6.53986062552143e-16im 0.4446732816417883 + 0.9071126434806948im 0.5161852281512024 - 0.791800264196892im -0.6192775437121574 - 0.0606793025982626im -0.3823454543891939 - 1.760004999160682im
The DFT values for indices ๐:-1:(๐/2 + 1 + 1)
are conjugate of the entries prior to index ๐/2 + 1
. This warrants some changes in the indices outer loop along with conditional evaluation. DFT_2
function employs these improvements as follows:
# DFT_2 with Little improvement over... O(๐ยฒ) function DFT_2(x::AbstractArray; ๐::Int = length(x)) X::Array{ComplexF64} = zeros(ComplexF64, ๐) for k โ 1:(๐รท2 + 1) for n โ 1:๐ X[k] += x[n]*โฏ^(-im*2ฯ*(k-1)*(n-1)/๐) end if k โ 1 # symmetric X[end - (k-2)] = conj(X[k]) end end return X end DFT_2(x; ๐ = ๐) โ fft(x)
true
True !
,the improvements are accurately accomodated.
DFT_1
and DFT_2
, though accurate, their computing time scales as $O(N^2)$. Is there a better way ?
This algorithm recursively breaks down a DFT of size $N$ into smaller DFTs. The most commonly known recipe of Cooley-Tuckey algorithm is by recursively dividing the sequence of DFT into $N/2$ sizes at each step.This is called radix 2 decimation-in-time(DIT) case. Naturally, this approach restricts the size of $N$ to be as power of $2$.
Radix-2 DIT computes the DFTs of the even-indexed inputs, and of the odd-indexed inputs, and then combines the results to produce the DFT of the whole sequence. This step is performed recursively to scale-down the overall runtime to $O(N.log N)$. Detailed steps and pseudocode for radix-2 DIT can be found here. DIT_FFT_radix2
is our own implementation of radix-2 DIT case, and it too computes DFTs accurately!
# Cooley_Tuckey_FFT radix 2, based on divide and conquer function DIT_FFT_radix2(x::AbstractArray; ๐::Int64 = length(x)) Xโ = Vector{ComplexF64}() Xโ = Vector{ComplexF64}() if ๐ == 1 return Array{ComplexF64}(x) else Xeven = DIT_FFT_radix2(x[1:2:end]) # recursion Xodd = DIT_FFT_radix2(x[2:2:end]) for k โ 1:(๐รท2) push!(Xโ, Xeven[k] + โฏ^(-2ฯ*im*(k-1)/๐)*Xodd[k]) push!(Xโ, Xeven[k] - โฏ^(-2ฯ*im*(k-1)/๐)*Xodd[k]) end return [Xโ; Xโ] end end DIT_FFT_radix2(x; ๐ = ๐) โ fft(x)
true
Now, let's get greedy! Can we do even better(without parallelisation) ? There are multiple methods to optimise this even further, but it always scales as $O(N.log(N))$. We can still certainly speed up these $O(N.log(N))$ operations by making the recursion more efficient. One way is memoization, where we save some expensive function calls, so that it can be reused when the same inputs occur again later.
A really good example to understand memoization is by studying a memoised factorial function. Let us make the function remember some of it's old calls and reuse this past knowledge for speedy future unknown calls.
Lucikly(lazily), Julia package Memoize
provides a macro @memoize
to memoize the function DIT_FFT_radix2_mem
. ! And this too accurately returns the required DFT.
# Memoization of Cooley_Tuckey_FFT radix 2, based on divide and conquer using Memoize @memoize function DIT_FFT_radix2_mem(x::AbstractArray; ๐::Int64 = length(x)) Xโ = Vector{ComplexF64}() Xโ = Vector{ComplexF64}() if ๐ == 1 return Array{ComplexF64}(x) else Xeven = DIT_FFT_radix2(x[1:2:end]) # recursion Xodd = DIT_FFT_radix2(x[2:2:end]) for k โ 1:(๐รท2) push!(Xโ, Xeven[k] + โฏ^(-2ฯ*im*(k-1)/๐)*Xodd[k]) push!(Xโ, Xeven[k] - โฏ^(-2ฯ*im*(k-1)/๐)*Xodd[k]) end return [Xโ; Xโ] end end DIT_FFT_radix2_mem(x; ๐ = ๐) โ fft(x)
true
Time for some measurements now!
using BenchmarkTools @btime DFT_1(x);
31.128 ms (1 allocation: 16.12 KiB)
DFT_1
Not so good ! ๐ข
@btime DFT_2(x);
15.402 ms (1 allocation: 16.12 KiB)
DFT_2
Little better than DFT_1
! ๐
@btime DIT_FFT_radix2(x);
884.891 ฮผs (10395 allocations: 1.25 MiB)
DIT_FFT_radix2
, Wow ! ๐
@btime DIT_FFT_radix2_mem(x);
70.551 ns (1 allocation: 32 bytes)
DIT_FFT_radix2_men
, I love memoization ๐
fft
and plan_fft
are standard functions in FFTW
julia bindings. plan_fft
is little better than fft
, but both of them lag behind the memoized function ! To know why, I will update the page after I understand the inner-workings of FFTW
package.
@btime fft(x); # less efficient built-in
23.545 ฮผs (34 allocations: 34.19 KiB)
@btime plan_fft(x); # more efficient built-in
16.641 ฮผs (33 allocations: 18.06 KiB)
Hope you had fun reading about DFT !
This page is open for updation on more content related to computing DFT, and you are welcome to open PR/issues !
The functions are meant for computing 1D DFT at the moment. Generalizations for higher-dimensions will follow soon.
The functions are not yet written to be executed on multiple-threads or do not support any sort of parallelism. That too shall be a topic for another day.
The function DIT_FFT_radix2
and DIT_FFT_radix2_mem
compute DFTs only for arrays of size $2^N$