Higher-Order Ambisonics Codec

Please reach out for further information!

[HOAC-Encoder] — Figure: Encoder structure of HOAC.

[HOAC-Decoder] — Figure: Decoder structure of HOAC.

Sound examples

All are 5th order (36 channel) HOA files, rendered using MagLS binaural decoding for demonstration.

Condition	Music	Scene	Orchestra
Input
12TCs
6TCs
4TCs

This also works for microphone array recordings, demonstrated here with an em32 capture.

Condition	Orchestra EM32
Input
12TCs
6TCs
4TCs

Download all files here: Download ZIP

This contains a variety of items. Please consult the publications, or contact us for more details! The filesnames desribe the number of transport channels (TCs) and the spatial covariance matching technique, where EstE does not require any additional meta-data, and is hence proposed for compression scenarios. The others may be considered for upmixing scenarios. There are binaural versions of all items included.

The listening test items of [ICASSP2024] can be accessed here.

Bitrate	HOAC	Opus Ambix
Input (uncompressed)
1296 kbit/s
768 kbit/s
512 kbit/s

This poster gives a high-level overview of the codec proposed in [ICASSP2024]

Details

In the publication [WASPAA2023] we explored different adaptive mixing variants of a model based post processing to match the spatial covariance of the coded output to the input. We have shown that this technique can reduce coding artefacts, even without requiring additional side-information over the standard HO-DirAC parameters.

In publication [ICASSP2024], we are presenting a full codec including perceptual coders on the audio transport channels and metadata coding. As additional material, a perceptual model based on energy-weighted ViSQOL scores shows a comparable trend as observed in the perceptual listening test.

[Details-ambivisqol1] — Figure: Perceptual performance prediction of item 'Band', coded at 768 kbit/s.

[Details-ambivisqol2] — Figure: Perceptual performance prediction of item 'Band', coded at 768 kbit/s.

Further additional material is the RMSE of the codec outputs. Keep in mind that all the presented items are perceptual (lossy) audio codecs, so concluding perceptual quality from RMSE is not meaningful.

[Details-RMSE] — Figure: Input to output SHD (5th order) RMS Error averaged per order, clipped for visualization.

The next plots show a bit more insight on the spatial spects.

[Details-movingin] — Figure: Input to output SHD (5th order) RMS of the 'Moving Scene' item at 1296kbit/s, besides the visualized metadata of one frame.

[Details-movingout] — Figure: Input to output SHD (5th order) RMS of the 'Moving Scene' item at 1296kbit/s, besides the visualized metadata of one frame.

Christoph Hold,

Higher-Order Ambisonics Codec (HOAC)

for Compression and Upmixing

Companion page

Sound examples

Details