Francesc Lluís, Nils Meyer-Kahlen
Companion page for the paper "Blind Spatial Impulse Response Generation from Separate Room- and Scene-Specific Information" submitted for ICASSP 2025.
For audio in augmented reality (AR), knowledge of the users' real acoustic environment is crucial for rendering virtual sounds that seamlessly blend into the environment. As acoustic measurements are usually not feasible in practical AR applications, information about the room needs to be inferred from available sound sources. Then, additional sound sources can be rendered with the same room acoustic qualities. Crucially, these are placed at different positions than the sources available for estimation. Here, we propose to use an encoder network trained using a contrastive loss that maps input sounds to a low-dimensional feature space representing only room-specific information. Then, a diffusion-based spatial room impulse response generator is trained to take the latent space and generate a new response, given a new source-receiver position. We show how both room- and position-specific parameters are considered in the final output.
The examples show three renderings. Two use simulated responses from different positions within the same room, whereas the third uses a generated response. Binaural renderings were produced using the spatial decomposition method based on four-channel responses.
Please use headphones for the correct binaural experience. Red squares indicate source positions. Black dots ground truth positions, and blue dots generated positions. The receiver is facing in positive x direction.