Noise Morphing for Audio Time Stretching

Abstract

This letter introduces an innovative method to enhance the quality of audio time stretching by precisely decomposing a sound into sines, transients, and noise and by improving the processing of the latter component. While there are established methods for time-stretching sines and transients with high quality, the manipulation of noise or residual components has lacked robust solutions in prior research. The proposed method combines sound decomposition with previous techniques for audio spectral resynthesis. %of noisy signals. The time-stretched noise component is achieved by morphing its time-interpolated spectral magnitude with a white-noise excitation signal. This method stands out for its simplicity, efficiency, and audio quality. The results of a subjective experiment affirm the superiority of this approach over current state-of-the-art methods across all evaluated stretch factors. The proposed technique notably excels in extreme stretching scenarios, signifying a substantial elevation in performance. The proposed method holds promise for a wide range of applications in slow-motion media content, such as music or sports video production.

Figure 1: Conceptualization of noise morphing, for $\alpha = 3$. The original noise log-magnitude spectra (yellow) are time-interpolated (red) and used to modulate the white-noise spectra (green) to produce the time-stretched output.

Figure 2: Original signals of a can opening (a) and a guitar (d) at normal speed. Subplots (b) and (e) demonstrate the impact of stretching by a factor of 3 without transient separation, leading to undesirable transient smearing. In contrast, subplots (c) and (f) demonstrate the proposed method's effective preservation of transients during time stretching, with apt handling of the noise component when transients are separated

Audio examples: \alpha = 2

Listening test examples

Figure 3: Listening test results, showing MOS with 95\% confidence intervals for $\alpha = 2$

Car: Live recording of a rally car passing by

Soda: Hiss and click sounds from a can opening

Cut: A knife cutting food on a cutting board

EDM: Electronic music example

PP (Ping Pong): Sounds from an amateur ping pong game

Additional examples

Drums

Guitar

Hand Saw

Seagulls

Audio examples: \alpha = 4

Listening test examples

Figure 4: Listening test results, showing MOS with 95\% confidence intervals for $\alpha = 4$

Car: Live recording of a rally car passing by

Soda: Hiss and click sounds from a can opening

Cut: A knife cutting food on a cutting board

EDM: Electronic music example

PP (Ping Pong): Sounds from an amateur ping pong game

Additional examples

Drums

Guitar

Hand Saw

Seagulls

Audio examples: \alpha = 8

Listening test examples

Figure 5: Listening test results, showing MOS with 95\% confidence intervals for $\alpha = 8$

Car: Live recording of a rally car passing by

Soda: Hiss and click sounds from a can opening

Cut: A knife cutting food on a cutting board

EDM: Electronic music example

PP (Ping Pong): Sounds from an amateur ping pong game

Additional examples

Drums

Guitar

Hand Saw

Seagulls

Supplemental listening test results

Non-parametric test results

Figure 6: (left) Representation of the p-values obtained from a Wilcoxon t-test. (right) tresholded p-values where statistical significance (p<=0.05) is highlighted by the coloring cell