Meeting Report

Creating Audio for Interactive Environments

On Monday October 10th, AES Melbourne members and guests assembled in the SAE Melbourne Lecture Theatre to hear Stephan Schutze talk to us on the topic “Creating Audio for Interactive Environments”
The evening was a most interesting journey through his thoughts and experiences in audio for gaming and virtual/augmented reality (VR/AR).

He recounted his recent trip to the AES Convention in LA, particularly the AR/VR section.
He mentioned that the challenge of gatherings like this is that so many people working in the field are subject to non-disclosure agreements; so many presentations are general in nature and from people without true in-depth knowledge of the technology. He also quoted a presenter exclaiming that “VR is currently like the Wild West where heroes were forged and legends made”. Stephen’s comment to this was “yes but there are also a whole bunch of snake-oil salesmen, and people wearing black hats”.
He went on to hypothesise that VR/AR is not a new format – it is a whole new medium (like going from photos to film).

stephan2 — Stephan talks on Audio for Virtual/Augmented Reality (Photo – Rod Staples)

He then described AR as an alternate reality. He commented that he has been very active in this field lately with AR pioneer Magic Leap, but was unable to speak about that work due to a non-disclosure agreement. Instead he used Microsoft’s HoloLens for his example.
In AR, unlike VR, the user can see the real world around them while having information (video, graphics, and text) projected onto their view.

He covered the challenges for audio in AR – with the need to be able to still hear outside audio – so enclosed or ear-canal headphones cannot be used. He noted that the current practice of using tiny speakers in the glasses frames provided sub-optimal audio quality as well as significant bleed. In answering a question about using bone-conduction headphones, he commented that he didn’t believe that this technology was the answer yet. He said that the bone-conduction units he had heard indicated that the quality had a long way to go.

He then spoke about Kickstarters and the YATDAC problem (Yet Another Three D Audio Company), as the rapid growth in VR/AR had spawned a lot of start-ups, not always with viable procducts. He counselled to be careful when evaluating start-ups in the field. The problem being so many people are climbing on the bandwagon, that it’s difficult to identifying the (very) few true innovations.
He then spent some time discussing the production of audio for games, noting that the primary difference between games audio and other audio forms is that with games the final mix does not occur until the person is playing the game. It is object based – each playable element is a 3D model and has a sound source attached to it, and the game engine or sound engine calculates all this information on the fly.
He then spent some time discussing localization – outlining a major difficulty with the human ear – we don’t localize effectively without turning our heads, unlike some animals like dogs/wolves that are able to manipulate their outer ears to localize sounds. He suggested that the way human hearing works with localizing multiple similar sounds (like successive hand claps) is that the localization happens with the first clap. This can be used to simplify sound design by presenting the first sound as a binaural/ambisonic sound for localization, and subsequent sounds in the series as 3D objects.

The question of Dolby Atmos was raised and Stephan suggested that, while it’s great it simply introduces techniques that have long been known to game developers to the theatre environment.
He then outlined the challenges of mixing in VR environment, where practitioners need to work within the immersive environment (wearing the headset) where you have no visibility of your mixing tools, predicting that this will have a major impact on audio work practices.
He then went on to emphasize the importance of evaluating a VR/360 video mix in the correct environment – such as, when evaluating a mix the same headphones should be used as were used when making the mix.

In conclusion he quoted film director George Lucas as saying sound is 50% of the experience (and someone in the audience suggested that Danny Boyle upped that figure to 80%).
With this in mind he said that with VR good audio is not just nice to have, or important – but utterly mission critical. When you replace your usual visuals with VR – if the audio’s not right the experience is shattered – and an important factor in achieving believable audio is convincingly portraying an accurate sense of depth.

The talk was followed by a wide-ranging and spirited Q&A session.