Phase-Based Effects Using Giant FFT

Vesa Välimäki, Roope Salmi, Stefan Bilbao, Sebastian J. Schlecht, David Zicarelli, Joshua Kit Clayton

Companion page for the paper submitted to the Journal of the AES.

Abstract

The Fast Fourier Transform (FFT) enables the processing of extremely long signals very quickly. This has opened up new possibilities in audio processing, generally called giant-FFT methods. This paper presents audio modification techniques based on the scaling of the phase in the frequency domain. In zero-phase conversion, the signal's phase is reset to zero, and the inverse FFT is performed. The result is a modified sound that retains the original frequency content but has a different temporal structure. Zero-phase sound is palindromic and sounds the same when played forward or backward. It makes speech sounds unintelligible, thereby serving as a method for synthesizing babble noise. A variation of the zero-phase method can produce a corresponding stereo signal. Scaling of the phase with a positive constant also leads to exciting modifications. Phase-scaling with an integer produces a time-stretching effect on wideband sounds having their energy concentrated tightly in time, whereas a factor slightly smaller than 1 leads to whisperization. A real-time version of phase-scaling methods, which processes the input signal in frames and uses overlap-add resynthesis, produces texture-like sounds with the original overall timbre. Phase-scaling effects can be applied in sound design for music, games, and film production.

Zero-Phase Examples

String quartet

Original excerpt from Martin Arnold, contact;vault performed by Quatuor Bozzini.

String quartet, P=2 zero padding

Speech, one speaker

Speech, one speaker, no zero padding (P=1)

Speech, one speaker, P=2 zero padding

Speech, two speakers (babble noise)

Speech, two speakers, no zero padding

Singing

Singing, P=2 zero padding

Pseudo Time-Stretching Examples

Gong

Original xiaoluo_02 by ajaysm, CC BY 4.0.

Phase scaling produces successful time stretching in this example. For comparison, we include time-stretched versions of the same file using SiTraNo*. Some of these same signals are shown in Fig. 13 of the paper.

Bass

The processing of the following synthetic tone fails to produce correct time stretching. The synthetic partials remain too long at the same frequencies, so the signal is not mono-component in time.

Cello

Original Cello - C2 - other by MTG, CC BY 3.0.

The time stretching results are good in this example, but envelope correction is needed. For comparison, we include a time-stretched version of the same file using SiTraNo*.

Phase Scaling with a Small Factor

See also the animation at the top of the page. Original excerpt from Tom's Diner by Suzanne Vega. Some of these signals are the same as shown in Fig. 15 of the paper.

Here, both stereo channels of the whole song were separately processed with c = 0.9, and FFT size N = 11,553,024 (P = 2). Some silence was inserted at the beginning of the song to allow for temporal smearing. We only listen to the first half.

Windowed Phase Effects

Examples produced using the Giant FFT Playground patch for Max. The input audio is played back and processed in real time (not stretched).

Wild is The Wind by Nina Simone

In these examples, the FFT size is N = 524,288 with 16x overlap

Zero phase (without full gain compensation)

Phase scaling (c=2)


Last updated: 20 Feb 2026

Contact: rpsalmi@gmail.com, vesa.valimaki@aalto.fi