Olli Rummukainen and Catarina Mendonça

Reproducing Reality: Multimodal Contributions in Natural Scene Discrimination

Companion page for a paper in review


Most research on multisensory processing focuses on impoverished stimuli and simple tasks. In consequence, very little is known about the sensory contributions in the perception of real environments. Here, we presented 23 participants with paired comparison tasks, where natural scenes were discriminated in three perceptually meaningful attributes: movement, openness, and noisiness. The goal was to assess the auditory and visual modality contributions in scene discrimination with short (≤500 ms) natural scene exposures. The scenes were reproduced in an immersive audiovisual environment with 3D sound and surrounding visuals. Movement and openness were found to be mainly visual attributes with some input from auditory information. In some scenes, the auditory system was able to derive information about movement and openness that was comparable with audiovisual condition already after 500 ms stimulation. Noisiness was dominantly auditory, but visual information was found to have a facilitatory role in a few scenes. The sensory weights were highly imbalanced in favor of the stronger modality, but the weaker modality was able to affect the bimodal estimate in some scenes.

Perceptual data

Data from the perceptual experiment can be downloaded here in CSV-format.

Columns in the data frame:

  • Winner: 0 reference scene was chosen, 1 stimulus scene was chosen, 3 no answer
  • Question: Attribute being tested
  • Anchor: Reference scene
  • Stimulus: Stimulus scene
  • Duration: Stimulation duration in [ms]
  • Modality: Modality being tested (Audio, Video, or Audiovisual)
  • Participant: Individual participant number (1-22)

Spherical video and soundfield recording database

Low-resolution previews of the stimulus scenes are presented here. It is important to note that the video projection used in the experiment displayed only 226° of the horizontal FOV and 57° of the vertical FOV. The previews and the full-resolution video files presented here contain the full spherical video as it was obtained from the Point Grey Research: Ladybug 3 camera, and therefore the scenes contain some unrelated visual objects, like the researchers or the microphone, that were not visible in the actual experiment.

The resolution in the preview videos is 1920x960 px accompanied by a stereo rendering of the original soundfield recording. Full-resolution spherical videos (5400x2700 px, Huffyuv encoding) along with the uncompressed A-format microphone signals can be downloaded here (Username: avscenes, Password: isotVIDEOT).





Market square:

Railway station:


Traffic behind:

Updated on Monday February 22, 2016
Creative Commons License
The spherical videos and soundfield recordings in the database by Olli Rummukainen and Catarina Mendonça
are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License