Zero-Shot Blind Audio Bandwidth Extension

Eloi Moliner, Filip Elvander and Vesa Välimäki

Companion page for a journal article submitted to IEEE Transactions on Audio, Speech, and Language Processing

The article can be accessed here.

Abstract

Audio Bandwidth Extension involves the realistic reconstruction of high-frequency spectra from bandlimited observations. In cases where the lowpass degradation is unknown, such as in restoring historical audio recordings, this becomes a blind problem. This paper introduces a novel method called BABE (Blind Audio Bandwidth Extension) that addresses the blind problem in a zero-shot setting, leveraging the generative priors of a pretrained unconditional diffusion model. During the inference process, BABE utilizes a generalized version of diffusion posterior sampling, where the degradation operator is unknown but parameterized and inferred iteratively. The performance of the proposed method is evaluated using objective and subjective metrics, and the results show that BABE surpasses state-of-the-art blind bandwidth extension baselines and achieves competitive performance compared to non-blind filter-informed methods when tested with synthetic data. Moreover, BABE exhibits robust generalization capabilities when enhancing real historical recordings, effectively reconstructing the missing high-frequency content while maintaining coherence with the original recording. Subjective preference tests confirm that BABE significantly improves audio quality of historical music recordings.

Figure1
Figure 1: Graphical representation of the inference process. (a) The input observations were produced by applying a lowpass filter (red dotted line) to (f) the Ground Truth (GT) reference signal. The proposed method, BABE, iteratively reconstructs the missing high-frequency spectra through a reverse diffusion process (b), (c), (e), while it blindly estimates the lowpass filter degradation (magenta line overlayed in (b), (c), (e)). A sampling step is represented in closer detail in (d), where the denoising Deep Neural Network (DNN) is applied, the filter parameters $\phi_i$ are iteratively optimized and the audio data $\mathbf{x}_i$ is updated using reconstruction guidance. The dotted lines represent backward computations for evaluating the gradient.
Figure1
Figure 2: Diagram of the inference process in a real historical recording. The original recording is firstly denoised before being used as a guiding signal for the generation. Throughout the generation, BABE estimates the (unknown) lowpass degradation of the original recording, here depicted by a magenta line overlay on the spectrograms.

Listening test examples

Figure1
Figure 3: Results of the subjective evaluation of lowpass filtered signals.

Lowpass filtered piano recordings (\(f_c\) = 1 kHz)

Example 1

Example 2

Example 3

Example 4

Example 5

Example 6

Example 7

Example 8

Lowpass filtered piano recordings (\(f_c\) = 3 kHz)

Example 1

Example 2

Example 3

Example 4

Example 5

Example 6

Example 7

Example 8

Real historical piano recordings

Figure1
Figure 4: Results of the subjective evaluation of processed historical recordings.

Example 1

Example 2

Example 3

example 4

Audio Restoration Examples

Figure1
Figure 5: Spectrogram representation of different historical recordings denoised and bandwidth-extended with the proposed BABE method. A high-frequency emphasis filter was used for visualization purposes
Figure1
Figure 6: Preference test results for (yellow) denoised and BABE-processed and (pink) denoised-only real recordings, showing an advantage for BABE on strings and woodwinds but no effect on brass. The confidence intervals assume a binomial distribution.

Piano

Title ID Performer Composer Year
Etude in G Sharp Minor Victrola (917-B) Ignace Jan Paderewski Chopin 1923

Title ID Performer Composer Year
Etude in F Minor Victrola (66059) Sergei Rachmaninoff E. Donagnyi 1921

Strings

Title ID Performer Composer Year
Canzonetta (From String Quartet in E Flat) Victrola (626-B) Flonzaley Quartet; Adolfo Betti; Alfred Pochon; Louis Bailly; Iwan d'Archambeau Mendelssohn 1920

Title ID Performer Composer Year
Canzonetta (From String Quartet in E Flat) Victrola (626-B) Flonzaley Quartet; Adolfo Betti; Alfred Pochon; Louis Bailly; Iwan d'Archambeau Mendelssohn 1920

Woodwind

Title ID Performer Composer Year
HUMORESQUE Consolidated (A 1984) Unknown ? 1909?

Title ID Performer Composer Year
MANZANILLO Columbia (A1603) Unknown Robyn 1914

Brass

Title ID Performer Composer Year
INTERNATIONAL MARCH Columbia (E4534) Boudble Brass Quartete ? 1920

Title ID Performer Composer Year
"Old Black Joe"; "Massa's In De Cold, Cold Ground" Pathe Freres (20489) JULES LEVY, Jr.'s BRASS QUARTET Foster 1921

On the importance of guidance

We investigate the significance of reconstruction guidance by contrasting it with an unrestricted refinement approach. This ablation study utilizes diffusion model that only relies on warm initialization for conditioning, without any additional guidance for the restoration process.