Christoph Hold,

Higher-Order Ambisonics Codec (HOAC)

for Compression and Upmixing

Companion page

For the articles:

Hold, C., McCormack, L., Politis, A., & Pulkki, V. (2024). Perceptually-Motivated Spatial Audio Codec for Higher-Order Ambisonics Compression. ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Hold, C., Pulkki, V., Politis, A., & McCormack, L. (2024). Compression of Higher-Order Ambisonic Signals using Directional Audio Coding. IEEE/ACM Transactions on Audio Speech and Language Processing

Hold, C., McCormack, L., Politis, A., & Pulkki, V. (2023). Optimizing Higher-Order Directional Audio Coding with Adaptive Mixing and Energy Matching for Ambisonic Compression and Upmixing. In Proceedings of the 2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2023 (IEEE Workshop on Applications of Signal Processing to Audio and Acoustics; Vol. 2023-October). IEEE.

A reference implementation can be found on GitHub.


Please reach out for further information!

[HOAC-Encoder]
Figure: Encoder structure of HOAC.
[HOAC-Decoder]
Figure: Decoder structure of HOAC.

Sound examples

All are 5th order (36 channel) HOA files, rendered using MagLS binaural decoding for demonstration.

Condition Music Scene Orchestra
Input
12TCs
6TCs
4TCs

This also works for microphone array recordings, demonstrated here with an em32 capture.

Condition Orchestra EM32
Input
12TCs
6TCs
4TCs

Download all files here: Download ZIP

This contains a variety of items. Please consult the publications, or contact us for more details! The filesnames desribe the number of transport channels (TCs) and the spatial covariance matching technique, where EstE does not require any additional meta-data, and is hence proposed for compression scenarios. The others may be considered for upmixing scenarios. There are binaural versions of all items included.

The listening test items of [ICASSP2024] can be accessed here.

Bitrate HOAC Opus Ambix
Input (uncompressed)
1296 kbit/s
768 kbit/s
512 kbit/s
This poster gives a high-level overview of the codec proposed in [ICASSP2024]
[Poster]
Figure: HOAC overview.

Details

In the publication [WASPAA2023] we explored different adaptive mixing variants of a model based post processing to match the spatial covariance of the coded output to the input. We have shown that this technique can reduce coding artefacts, even without requiring additional side-information over the standard HO-DirAC parameters.
[Details]
Figure: Input to output SHD (5th order) RMSE Error plot showing coder performance for 'Orchestra' item. Label 'NO' shows no optimization, and 'OMatch-E' the performance without additional side-information.
In publication [ICASSP2024], we are presenting a full codec including perceptual coders on the audio transport channels and metadata coding. As additional material, a perceptual model based on energy-weighted ViSQOL scores shows a comparable trend as observed in the perceptual listening test.
[Details-ambivisqol1] [Details-ambivisqol2]
Figure: Perceptual performance prediction of item 'Band', coded at 768 kbit/s.
Further additional material is the RMSE of the codec outputs. Keep in mind that all the presented items are perceptual (lossy) audio codecs, so concluding perceptual quality from RMSE is not meaningful.
[Details-RMSE]
Figure: Input to output SHD (5th order) RMS Error averaged per order, clipped for visualization.
The next plots show a bit more insight on the spatial spects.
[Details-movingin] [Details-movingout] [Details-pars]
Figure: Input to output SHD (5th order) RMS of the 'Moving Scene' item at 1296kbit/s, besides the visualized metadata of one frame.