## Spatial Audio Real-time Applications (SPARTA)## AboutSPARTA is a collection of flexible VST audio plug-ins for spatial audio production, reproduction and visualisation, developed by members of the Acoustics Lab at Aalto University. These plug-ins have been previously used internally within the Acoustics Lab for educational and research purposes; however, they have now been released as an open source project. Our hope is that they may be useful to those interested in the world of real-time spatial audio processing. -
The SPARTA installer now also includes the parametric COMPASS suite.
## The SPARTA Plug-insAll plug-ins are tested using REAPER (64-bit), which is a very affordable and flexible DAW and is currently the only recommended host for these plug-ins. Currently, the plug-ins support sampling rates of 44.1/48kHz, and block sizes that are a multiple of 64 or 128 samples, unless otherwise stated. All spherical harmonic-related plug-ins conform to the Ambisonic Channel Number (ACN) ordering convention and offer support for both orthonormalised (N3D) and semi-normalised (SN3D) scalings; note: AmbiX uses ACN/SN3D. The maximum transform order for these plug-ins is 7. Also, thanks to help from Daniel Rudrich, the relevant plug-ins now support importing and exporting of loudspeaker, source, and sensors directions via .json configuration files; allowing for cross-compatibility between SPARTA and the IEM Ambisonics plug-in suite. More information regarding the structure of these files can be found ## SPARTA | AmbiBINA binaural Ambisonic decoder for headphone playback of spherical harmonic signals (aka Ambisonic signals), with a built-in rotator and head-tracking support via OSC messages. The rotation angles are updated after the time-frequency transform, which allows for reduced latency compared to its loudspeaker counterpart 'AmbiDEC' when paired with 'Rotator'. The plug-in also allows the user to import their own HRIRs via the SOFA standard. The plug-in offers a variety of different decoding methods, including: Least-Squares (LS), Spatial re-sampling (SPR), Time-Alignment (TA) [11], and Magnitude Least-Squares (MagLS) [12]. It can also impose a diffuse-coherence contraint/correction on the current decoder, as described in [11]. This plug-in was developed by Leo McCormack and Archontis Politis. ## SPARTA | AmbiDECA frequency-dependent Ambisonic decoder for loudspeakers. The loudspeaker directions can be user-specified for up to 64 channels, or alternatively, presets for popular 2D and 3D set-ups can be selected. For headphone reproduction, the loudspeaker audio is convolved with interpolated HRTFs for each loudspeaker direction (the virtual loudspeaker approach). The plug-in also permits importing custom HRIRs via the SOFA standard. The plug-in employs a dual decoding approach, whereby different decoder settings may be selected for the low and high frequencies; the cross-over frequency can be dictated by the user. Several ambisonic decoders have been integrated, including more perceptually motivated methods such as the All-Round Ambisonic Decoder (AllRAD) [1] and Energy-Preserving Ambisonic Decoder (EPAD) [2]. The max-rE weighting [1] may also be enabled for either decoder. Furthermore, in the case of non-ideal spherical harmonic signals as input (i.e. those that are derived from physical/simulated microphone arrays), the decoding order may be specified for the appropriate frequency ranges; where energy-preserving (EP) or amplitude-preserving (AP) normalisation can be selected to keep the loudness between decoding orders consistent. This ability to change the decoding order for different frequency bands can also make for an insightful demonstration, regarding the limitations of lower-order ambisonics decoding. Note that when the loudspeakers are uniformly distributed, all of the decoding approaches that are implemented in the plug-in are equivelent. This can be effectively demonstrated by selecting a T-design loudspeaker set-up (a nearly-uniform distribution of points on a sphere). The benefits of the Mode-Matching decoding (MMD), AllRAD and EPAD approaches can then be observed for non-uniform arrangements (22.x for example). This plug-in was developed by Leo McCormack and Archontis Politis. ## SPARTA | AmbiDRCThe AmbiDRC plug-in is based on A frequency-dependent spherical harmonic domain dynamic range compressor (DRC). The gain factors are derived by analysing the omnidirectional component for each frequency band, which are then applied also to the higher-order components; the spatial properties of the original signals remains unchanged. The implementation also keeps track of the frequency-dependent gain factors for the omnidirectional component over time, which is then plotted on the user interface for visual feedback. This plug-in was developed by Leo McCormack. ## SPARTA | AmbiENCA bare-bones Ambisonic encoder (also referred to as an Ambisonic panner) which takes input signals (up to 64 channels) and encodes them into spherical harmonic signals at specified directions. Essentially, these spherical harmonic signals describe a synthesised sound-field, where the spatial resolution of this encoding is determined by the transform order. Several presets have been included for convenience (which allow for 22.x etc. audio to be encoded into 1-7th order ambisonics, for example). The panning window is also fully mouse driven, and uses an equirectangular respresentation of the sphere to depict the azimuth and elevation angles of each source. This plug-in was developed by Leo McCormack. ## SPARTA | Array2SHThe Array2SH plug-in is related to 'Array2SH' spatially encodes spherical/cylindrical array signals into spherical harmonic signals (aka: Ambisonic or B-Format signals). The plug-in utilises analytical solutions, which ascertain the frequency and order-dependent influence that the array has on the initial estimate. The plug-in allows the user to specify: the array type (spherical or cylindrical), whether the array has an open or rigid enclosure, the radius of the array, the radius of the sensors (in cases where they protrude out from the array), the sensor coordinates (up to 64 channels), sensor directivity (omni-dipole-cardioid), the speed of sound, and the acoustical admittance of the array material (in the case of rigid arrays). The plug-in then determines the order-dependent equalisation curves which need to be imposed onto the initial spherical harmonic signals estimate, in order to remove the influence of the array itself. However, especially for higher-orders, this generally results in a large amplification of the low frequencies (including the sensor noise at these frequencies that accompanies it); therefore, two popular regularisation approaches have been integrated into the plug-in, which allow the user to make a compromise between noise amplification and transform accuracy. These target and regularised equalisation curves are depicted on the user interface to provide visual feedback. The plug-in also allows the user to 'Analyse' the spatial encoding performance using objective measures described in [8,10], namely: the spatial correlation and the level difference. Here, the encoding matrices are applied to a simulated array, which is described by multichannel transfer functions of plane waves for 812 points on the surface of the spherical/cylindrical array. The resulting encoded array responses should ideally resemble spherical harmonic functions at the grid points. The spatial correlation is then derived by comparing the patterns of these responses with the patterns of ideal spherical harmonics, where '1' means they are perfect, and '0' completely uncorrelated; the spatial aliasing frequency can therefore be observed for each order, as the point where the spatial correlation tends towards 0. The level difference is then the mean level difference over all directions (diffuse level difference) between the ideal and simulated components. One can observe that higher permitted amplification limits [Max Gain (dB)] will result in noisier signals; however, this will also result in a wider frequency range of useful spherical harmonic components at each order. This analysis is primarily based on code written for publication [10], which compared the performance of various regularisation approaches of encoding filters, based on both theoretical and measured array responses. Note that this ability to balance the noise amplification with the accuracy of the spatial encoding (to better suit a given application) is very important, for example: the perceived fidelity of Ambisonic decoded audio can be rather poor if the noise amplification is set too high; therefore, typically a much lower amplification regularisation limit is used in Ambisonics reproduction when compared to sound-field visualisation algorithms, or beamformers that employ appropriate post-filtering. For convenience, the specifications for several commercially available microphone arrays have been integrated as presets; including: MH Acoustic's Eigenmike, the Zylia array, and various A-format microphone arrays. Additionally, by releasing this plug-in, one now has the ability to build/3-D print thier own spherical and cylindrical array, while having a convenient means of obtaining the corresponing spherical harmonic siganls; for example, a four capsule open-body hydrophone array was presented in [9], which utilised this Array2SH plug-in as the first step in visualising and auralising an underwater sound scene in real-time. This plug-in was developed by Leo McCormack, Symeon Delikaris-Manias and Archontis Politis. ## SPARTA | BeamformerA simple beamforming plug-in. Currently includes static beam patterns only (cardioid, hyper-cardioid or max_rE weighted hyper-cardioid). More pattern options to follow in future. This plug-in was developed by Leo McCormack. ## SPARTA | BinauraliserA plug-in which convolves input audio (up to 64 channels) with interpolated HRTFs in the time-frequency domain. The HRTFs are interpolated by applying amplitude-normalised VBAP gains [4] to the HRTF magnitude responses and inter-aural time differences (ITDs) individually, before being re-combined. The plug-in also allows the user to specify an external SOFA file for the convolution. Presets for popular 2D and 3D formats are included for convenience; however, the directions for up to 64 channels can be independently controlled. Head-tracking is also supported via OSC messages in the same manner as with the Rotator plug-in. Please note that this plug-in is only suitable for HRTF-based convolution. This plug-in was developed by Leo McCormack and Archontis Politis. ## SPARTA | DirASSThe DirASS plug-in is related to A sound-field visualiser, which is based on the directional re-assignment of beamformer energy. This energy re-assignment is based on local DoA estimates for each scanning direction, and may be quantised to the nearest direction or upscaled to a higher-order than the input; resulting in sharper activity-maps. For example, a second-order input may be displayed with (up to) 20th order output resolution. The plug-in also allows the user to place real-time video footage behind the activity-map, in order to create a make-shift acoustic camera. This plug-in was developed by Leo McCormack and Archontis Politis. ## SPARTA | PannerA frequency-dependent 3D panner based on the Vector-base Amplitude Panning (VBAP) method [4]. Presets for popular 2D and 3D formats are included for convenience; however, the directions for up to 64 channels can be independently controlled for both inputs and outputs; allowing, for example, 9.x input audio to be panned for a 22.2 setup. The panning is frequency-dependent to accommodate the method described in [5], which allows for more consistent loudness when sources are panned in-between the loudspeaker directions. Set the "DTT" parameter to 0 for standard power-normalisation, 0.5 for a listening room, and 1 for an anechoic chamber. This plug-in was developed by Leo McCormack, Archontis Politis and Ville Pulkki. ## SPARTA | PowerMapThe PowerMap plug-in is a modified version of the plug-in described in 'PowerMap' is a plug-in that represents the relative sound energy, or the statistical likelihood of a source, arriving at the listening position from a particular direction, using a colour gradient; where yellow indicates high sound energy/likelihood and blue indicates low sound energy/likelihood. The plug-in integrates a variety of different approaches, including: standard Plane-Wave Decomposition (PWD) beamformer-based, Minimum-Variance Distortionless Response (MVDR) beamformer-based, Multiple Signal Classification (MUSIC) pseudo-spectrum-based, and the Cross-Pattern Coherence (CroPaC) algorithm [3]; all of which are written to operate on spherical harmonic signals up to 7th order. Note that the analysis order per frequency band is entirely user definable, and presets for higher order microphone arrays have been included for convience (which provide some rough yet appropriate starting values). The plug-in utilises a 812 point uniformly-distributed spherical grid, which is then interpolated into a 2D powermap using amplitude-normalised VBAP gains (i.e. triangular interpolation). The plug-in also allows the user to place real-time video footage behind the activity-map, in order to create a make-shift acoustic camera. Note that this plug-in supports frame sizes of 1024 or 2048 only. Also the 'CroPaC LCMV' option is very experimental, so you may see the devil. This plug-in was developed by Leo McCormack. ## SPARTA | RotatorThis plug-in applies a spherical harmonic rotation matrix [6] to the input spherical harmonic signals. The rotation angles can be controlled using a head tracker via OSC messages. Simply configure the headtracker to send a vector: '\ypr[3]' to OSC port 9000 (default); where \ypr[0], \ypr[1], \ypr[2] are the yaw-pitch-roll angles, respectively. The angles can also be flipped +/- in order to support a wider range of devices. The rotation order (yaw-pitch-roll (default) or roll-pitch-yaw) can also be specified. This plug-in was developed by Leo McCormack. ## SPARTA | SLDoAThe SLDoA plug-in is related to A spatially localised direction-of-arrival (DoA) estimator. The plug-in first uses VBAP beam patterns (for directions that are uniformly distributed on the surface of a shere) to obtain spatially-biased zeroth and first-order signals, which are subsequently used for the active-intensity vector estimation; therefore, allowing for DoA estimation in several spatially-constrained sectors for each sub-band. The low frequency estimates are then depicted with blue icons, mid-frequencies with green, and high-frequencies with red. The size of the icon and its opacity correspond to the energy of the sector, which are normalised and scaled in ascending order for each frequency band. The plug-in employs two times as many sectors as the analysis order, with the exception of the first-order analysis, which uses the traditional active-intensity approach. The analysis order per frequency band is user definable, as is the frequency range at which to analyse. This approach to sound-field visualisation/DoA estimation represents a much more computationally efficient option, when compared to the algorithms that are integrated into the 'Powermap' plug-in, for instance. The plug-in also allows the user to place real-time video footage behind the activity-map, in order to create a make-shift acoustic camera. This plug-in was developed by Leo McCormack and Symeon Delikaris-Manias. ## Experimental plug-insThe plug-ins described in this section are in the early stages of development; therefore, expect that they may undergo drastic changes over time, and could be unstable. ## CroPaC | BinauralA parametric first-order Ambisonic decoder for headphones, based on segregating the sound-field into directional and diffuse components using the Cross-Pattern Coherence (CroPaC) [3] spatial filter. This plug-in was developed by Leo McCormack. ## About the authors- Leo McCormack: a doctoral candidate at Aalto University.
- Symeon Delikaris-Manias: post doctorate researcher at Aalto University, specialising in compact microphone array processing for DoA estimation and sound-field reproduction. His doctoral research included work on the Cross-Pattern Coherence (CroPaC) algorithm, which is a spatial post-filter optimised for high noise/reverberant environments.
- Archontis Politis: post doctorate researcher at Aalto University, specialising in spatial sound recording and reproduction, acoustic scene analysis and microphone array processing.
- Ville Pulkki: Professor at Aalto University, known for VBAP, SIRR, DirAC and eccentric behaviour.
