Skip to content

rlcamp/fft_anywhere

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

fft_anywhere

This is a good-enough FFT implementation which benchmarks within a respectable distance of the fastest known implementations on platforms of interest, and usually beats them on non-SIMD platforms, when compiled with a modern C compiler. This implementation should work where other FFT implementations do not, hence the name. If you are on a SIMD processor and need a permissive-licensed FFT implementation, jpommier/pffft is considerably faster, and you should prefer it to this or fftw.

How it works

The core of the implementation is a set of primitives for the DFTs of sizes 3, 5, and 7, and unrolled FFTs of sizes 4 and 8. A second set of functions implements the Cooley-Tukey decomposition for T / S x S, for S = 2, 3, 4, 5, and 7. These are used recursively to decompose a length-T Fourier transform into S transforms of length T/S, followed by T/S transforms of length S, where the latter are implemented using the relevant primitive DFT or FFT function.

In this way, all FFTs of length T=2^a 3^b 5^c 7^d, for nonnegative integers a-d, may be computed. The real-to-complex and complex-to-real transform variants require that T be a multiple of 4, otherwise there are no further restrictions.

The recursive function call structure, and the twiddle factors for each stage of the recursion, are precomputed for a given FFT length and may be reused indefinitely and across threads, with no per-use requirement for a memory allocator. The transform is out-of-place, with the destination buffer used as scratch space. The inverse complex-to-real transform distorts its input, but the other three transforms do not.

Caveats about compilation

This code (and all high-performance C code using complex arithmetic) must be compiled with at least one of -ffinite-math-only, -fcx-limited-range, or -fcx-fortran-rules, in order to avoid a significant slowdown due to the default semantics of complex multiplication on infs and nans, which should not occur anyway when inputs have been properly sanitized. On some platforms and compilers, it may (counterintuitively) be beneficial to force -ffp-contract=off.

About

Decompose any signal into a weighted sum of squiggly lines

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors