Coding and Multidirectional Parameterisation of Ambisonic Sound Scenes (COMPASS)

About

COMPASS VSTs is a collection of flexible VST audio plug-ins for spatial audio production, manipulation, and reproduction, developed by Dr. Archontis Politis, Leo McCormack and Dr. Sakari Tervo in the Acoustics Lab at Aalto University.

COMPASS is a framework for parametric spatial audio processing of sound scenes captured in the Ambisonics format. Parametric methods, such as Directional Audio Coding (DirAC) or HARPEX have gained notoriety recently for being able to achieve sharpness or envelopment beyond first or lower-order traditional Ambisonics playback, using the same lower-order Ambisonics signals. Contrary to the time-invariant linear processing of Ambisonics, which does not consider the sound components that comprise the sound scene, parametric methods assume a sound-field model for the sound scene and track the model parameters in the Ambisonics recording, in both time and frequency. The parameters are then used to render or upmix the sound scene flexibly to any playback system, without the constraints of lower-order Ambisonics. Furthermore, the spatial parameters allow flexible manipulation of the sound scene content in ways that are not possible with traditional Ambisonics processing.

The COMPASS framework has been developed by Dr. Archontis Politis with contributions from Dr. Sakari Tervo and Leo McCormack, and published in [1]. The method is quite general in its model and estimates multiple direct sound components in every time-frequency block, and an ambient component capturing reverberation and other diffuse sounds. Here is a table of the COMPASS model compared to other published parametric techniques (note that M is the number of channels):

In COMPASS, the ambient component is also spatial and can have directionality, contrary to previous models that force it to be isotropic. The VST plugins apply this framework to different spatial audio production tasks. Note that the plugins are still work in progress and we expect to keep improving them in the future, however, we believe that they can already prove useful to users and creators.

  • Download links can be found here.

  • The plug-ins employ the Spatial_Audio_Framework, which can be found here

  • A detailed description of each plug-in can be found below, or in this publication.

Comments and feedback to archontis.politis and/or leo.mccormack is very much welcomed! : )

The COMPASS Plug-ins

All plug-ins conform to the Ambisonic Channel Number (ACN) ordering convention and offer support for both orthonormalised (N3D) and semi-normalised (SN3D) scalings (note: AmbiX uses ACN/SN3D). The maximum Ambisonic order for these plug-ins is 3.

COMPASS|Decoder

The COMPASS decoder is a parametric decoder for first, second, and third-order Ambisonics to arbitrary loudspeaker setups. The plugin offers the following functionality:

  • User-specified loudspeaker angles for up to 64 channels, or alternatively, presets for popular 2D and 3D set-ups.
  • Headphone binaural monitoring of the loudspeaker outputs, with support for user-provided personalised binaural filters (HRTFs) in the SOFA format.
  • Balance control between the extracted direct sound components and the ambient component, in frequency bands.
  • Mixing control between fully parametric decoding and linear Ambisonic decoding, in frequency bands.

The "Diffuse-to-Direct" control allows the user to give more prominence to the direct sound components (an effect similar to subtle dereverberation), or to the ambient component (an effect similar to emphasising reverberation in the recording). When set in the middle, the two are balanced. Note that the parametric processing can be quite aggressive, and if one pushes it to fully direct rendering in a complex multi-source sound scene with FOA signals only, artefacts can easily appear. However, with more balanced settings, such artefacts should become imperceptible.

The "Linear-to-Parametric" control allows the user to mix the output between standard linear Ambisonic decoding and the COMPASS parametric decoding. This control can be used in cases where parametric processing sounds too aggressive, or if the user prefers some degree of increased localisation blur, offered by linear Ambisonic decoding.

The plugin is considered by the authors a production tool and, due to its time-frequency processing, requires audio buffer sizes of at least 1024 samples. Hence we do not consider it as a low-latency plugin and therefore it is not suitable for interactive input. For cases such as interactive binaural rendering for VR with head-tracking, see the COMPASS|Binaural variant.

A video showing the plugin in action and demonstrating its functionality can be found here:

This plug-in was developed by Leo McCormack and Archontis Politis.

COMPASS|Binaural

This is an optimised version of the COMPASS decoder for binaural playback, bypassing loudspeaker rendering and using binaural filters (HRTFs) directly, which can be user-provided and personalised with the SOFA format. For the plugin parameters, see the description of the Binaural|Decoder above. Additionally the plugin can receive OSC rotation angles from a headtracker at a user specified port, in the yaw-pitch-roll convention.

This version is intended mostly for head-tracked binaural playback of Ambisonic content at interactive update rates, usually in conjunction with a head-mounted display (HMD). The plugin requires an audio buffer size of at least 512 samples (~10msec at 48kHz). The averaging parameters can be used to make the parametric analysis and synthesis more or less responsive, providing the user with a means to adjust them optimally for a particular sound scene.

This plug-in was developed by Leo McCormack and Archontis Politis.

COMPASS|Upmixer

This VST employs COMPASS for the task of upmixing a lower-order Ambisonic recording to a higher-order Ambisonic recording. It is intended for users that are already working with a preferred linear Ambisonic decoding workflow of higher-order Ambisonic content, and wish to combine lower-order Ambisonic material with increased spatial resolution. One can upmix first, second, or third-order material (4,9,16 channels) to up-to seventh-order material (64 channels).

This plug-in was developed by Leo McCormack and Archontis Politis.

COMPASS|SpatEdit (Coming soon)

The SpatEdit plug-in is a parametric spatial editor, which allows the user to emphasise or attenuate specific directional regions in the sound scene. This plug-in uses the analysis part of the COMPASS framework, namely the multi-source DoA estimates, and the ambient component; in order to extract or modify sounds which emanate from user-defined spatial targets. These targets are controlled via markers, which may be placed at arbitrary angles on the azimuth-elevation plane. The number of markers is dictated by the transform order. Higher-orders allow the placement of more markers and hence finer spatial control. Additionally, the user may specify a gain value applied to each target, which can amplify, attenuate, or attempt to eliminate sound incident from that direction. In the case of the linear processing mode, a reasonable degree of spatial enhancement or separation can be attained with zero distortion; especially at higher-orders. However, should the user wish to conduct more aggressive manipulations of the sound scene, the plug-in may also operate in the parametric mode. In this case, the target source signals can become more spatially-selective, by taking into account the analysed parameters. For each target, an additional enhancement value can be specified, this essentially determines how spatially-selective the parametric processing may aspire to be. Higher values can separate and modify the sounds coming from the targets to a much greater degree, but with the undesirable possibility of introducing more artefacts. Since artefacts in parametric processing are always scene-dependent, the user is then responsible for ascertaining which settings work most optimally for a given input scene.

The plug-in may be configured to either output the individual beams and the residual; or it can be set to output the manipulated scene in the SHD via suitable reencoding. This allows the user to process sounds from individual regions separately (e.g. apply equalisation), or receive the modified scene re-packaged conveniently in the input ambisonic format for immediate rendering or further ambisonic processing.

This plug-in was developed by Leo McCormack and Archontis Politis.

About the authors

  • Archontis Politis post doctorate researcher at Aalto University, specialising in spatial sound recording and reproduction, acoustic scene analysis and microphone array processing.
  • Leo McCormack: a doctoral candidate at Aalto University.

References

[1] Politis, A., Tervo S., and Pulkki, V. (2018) COMPASS: Coding and Multidirectional Parameterization of Ambisonic Sound Scenes.
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2] Pulkki, V. (2007) Spatial sound reproduction with directional audio coding.
Journal of the Audio Engineering Society 55.6: 503-516.

[3] Pulkki, V., Politis, A., Laitinen, M.-V., Vilkamo, J., Ahonen, J. (2017). First-order directional audio coding (DirAC).
in Parametric Time-Frequency Domain Spatial Audio, Wiley, p.89-138.

[4] Berge, S. and Barrett, N. (2010). High angular resolution planewave expansion.
2nd International Symposium on Ambisonics and Spherical Acoustics.

[5] Politis, A., Vilkamo, J., and Pulkki, V. (2015). Sector-based parametric sound field reproduction in the spherical harmonic domain.
IEEE Journal of Selected Topics in Signal Processing, 9(5), 852-866.

[6] Politis, A. and Pulkki, V. (2017). Higher-Order Directional Audio Coding.
in Parametric Time-Frequency Domain Spatial Audio, Wiley, p.141.

[7] Wabnitz, A., Epain, N., McEwan, A., Jin, C. (2011). Upscaling ambisonic sound scenes using compressed sensing techniques.
IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[8] Kolundzija, M., and Faller, C. (2018). Advanced B-Format Analysis.
Audio Engineering Society Convention 144.

[9] Schörkhuber, C., and Höldrich, R. (2019, March). Linearly and Quadratically Constrained Least-Squares Decoder for Signal-Dependent Binaural Rendering of Ambisonic Signals.
in Audio Engineering Society Conference: 2019 AES International Conference on Immersive and Interactive Audio. Audio Engineering Society.


Updated on Monday 26th of August, 2019
This page uses HTML5, CSS, and JavaScript