Noise Morphing for Audio Time Stretching

Eloi Moliner, Leonardo Fierro, Alec Wright, Matti Hämäläinen, and Vesa Välimäki

Companion page for a journal article submitted to IEEE Signal Processing Letters

The article can be accessed here.

Abstract

This letter introduces an innovative method to enhance the quality of audio time stretching by precisely decomposing a sound into sines, transients, and noise and by improving the processing of the latter component. While there are established methods for time-stretching sines and transients with high quality, the manipulation of noise or residual components has lacked robust solutions in prior research. The proposed method combines sound decomposition with previous techniques for audio spectral resynthesis. %of noisy signals. The time-stretched noise component is achieved by morphing its time-interpolated spectral magnitude with a white-noise excitation signal. This method stands out for its simplicity, efficiency, and audio quality. The results of a subjective experiment affirm the superiority of this approach over current state-of-the-art methods across all evaluated stretch factors. The proposed technique notably excels in extreme stretching scenarios, signifying a substantial elevation in performance. The proposed method holds promise for a wide range of applications in slow-motion media content, such as music or sports video production.

Figure1
Figure 1: Conceptualization of noise morphing, for $\alpha = 3$. The original noise log-magnitude spectra (yellow) are time-interpolated (red) and used to modulate the white-noise spectra (green) to produce the time-stretched output.
Figure2
Figure 2: Original signals of a can opening (a) and a guitar (d) at normal speed. Subplots (b) and (e) demonstrate the impact of stretching by a factor of 3 without transient separation, leading to undesirable transient smearing. In contrast, subplots (c) and (f) demonstrate the proposed method's effective preservation of transients during time stretching, with apt handling of the noise component when transients are separated

Audio examples: \alpha = 2

Listening test examples

Figure2
Figure 3: Listening test results, showing MOS with 95\% confidence intervals for $\alpha = 2$

Car: Live recording of a rally car passing by

Soda: Hiss and click sounds from a can opening

Cut: A knife cutting food on a cutting board

EDM: Electronic music example

PP (Ping Pong): Sounds from an amateur ping pong game

Additional examples

Drums

Guitar

Hand Saw

Seagulls

Audio examples: \alpha = 4

Listening test examples

Figure2
Figure 4: Listening test results, showing MOS with 95\% confidence intervals for $\alpha = 4$

Car: Live recording of a rally car passing by

Soda: Hiss and click sounds from a can opening

Cut: A knife cutting food on a cutting board

EDM: Electronic music example

PP (Ping Pong): Sounds from an amateur ping pong game

Additional examples

Drums

Guitar

Hand Saw

Seagulls

Audio examples: \alpha = 8

Listening test examples

Figure2
Figure 5: Listening test results, showing MOS with 95\% confidence intervals for $\alpha = 8$

Car: Live recording of a rally car passing by

Soda: Hiss and click sounds from a can opening

Cut: A knife cutting food on a cutting board

EDM: Electronic music example

PP (Ping Pong): Sounds from an amateur ping pong game

Additional examples

Drums

Guitar

Hand Saw

Seagulls

Supplemental listening test results

Non-parametric test results

Figure1
Figure 6: (left) Representation of the p-values obtained from a Wilcoxon t-test. (right) tresholded p-values where statistical significance (p<=0.05) is highlighted by the coloring cell