Christoph Hold,

Higher-Order Ambisonics Codec (HOAC)

for Compression and Upmixing

WORK IN PROGRESS!! Companion page .

The article can be accessed here:

[Graphical Abstract]

A reference implementation can be found on GitHub.


Please reach out for further information!

[HOAC-Encoder]
Figure: Encoder structure of HOAC.
[HOAC-Decoder]
Figure: Decoder structure of HOAC.

Sound examples

All are 5th order (36 channel) HOA files, rendered using MagLS binaural decoding for demonstration.

Condition Music Scene Orchestra
Input
12TCs
6TCs
4TCs

This also works for microphone array recordings, demonstrated here with an em32 capture.

Condition Orchestra EM32
Input
12TCs
6TCs
4TCs

Download all files here: Download ZIP

This contains a variety of items. Please consult the publications, or contact us for more details! The filesnames desribe the number of transport channels (TCs) and the spatial covariance matching technique, where EstE does not require any additional meta-data, and is hence proposed for compression scenarios. The others may be considered for upmixing scenarios. There are binaural versions of all items included.

The listening test items of [inReview2024] can be accessed here.

Bitrate HOAC Opus Ambix
Input (uncompressed)
1296 kbit/s
768 kbit/s
512 kbit/s

Details

In the publication [WASPAA2023] we explored different adaptive mixing variants of a model based post processing to match the spatial covariance of the coded output to the input. We have shown that this technique can reduce coding artefacts, even without requiring additional side-information over the standard HO-DirAC parameters.
[Details]
Figure: Input to output SHD (5th order) RMSE Error plot showing coder performance for 'Orchestra' item. Label 'NO' shows no optimization, and 'OMatch-E' the performance without additional side-information.
Finally, in the publication [inReview2024], we are presenting a full codec including perceptual coders on the audio transport channels and metadata coding. As additional material, a perceptual model based on energy-weighted ViSQOL scores shows a comparable trend as observed in the perceptual listening test.
[Details-ambivisqol1] [Details-ambivisqol2]
Figure: Perceptual performance prediction of item 'Band', coded at 768 kbit/s.
The next plots show a bit more insight on the spatial spects.
[Details-movingin] [Details-movingout] [Details-movingout]
Figure: Input to output SHD (5th order) RMS of the 'Moving Scene' item at 1296kbit/s, besides the visualized metadata of one frame.