User:Ad Astra Per Scientia/sandbox

==

= Virtual Audio Environment =

Introduction
Virtual Audio Environment is the simulation of a three-dimensional (3D) soundscape. Sounds are placed inside a virtual environment and audio effects are applied to create the illusion of three-dimensional (3D) space around the listener. Computer processing is used to generate and transform recorded sound waves to mimic the natural sound waves emanating from a location in the 3D space. Virtual audio environments are often integrated with virtual visual environments to create the experience of 3D space or overlaid on physical environments to provide background noise. A completely virtual audiovisual environment allows interactions and reactions to both audio and visual cues, immersing the user in their virtual surroundings. Applications of virtual audio environments are focused on entertainment (i.e. video games, virtual reality, ), telecommunication (i.e. background noise, audience applause), and education (i.e. audio simulation).

Current virtual audio environments are generally created as part of an audiovisual environment for video games and virtual reality. The user is immersed in the video game when the audio and visual stimulus overpowers the sensory information from the physical environment. These setups primarily rely on headphones or earphones to deliver the sound to the listener. In these environments, the user controls a virtual avatar to look around, move around and interact with virtual items or features. As the avatar’s location and orientation changes, the transfer functions alter the sound to match the avatar’s movements. This effect creates the sensation of spatially located sounds coming from set points inside the environment.

Definition
Virtual Audio Environments are audio simulations that give the listener the sense of immersion in 3D space. These simulations are generated via computer programs from recorded audio elements. For programming purposes, the audio elements are generally divided into the categories of direct, indirect, and environmental. These categories depend on the differences in how the sound effects are triggered rather than the differences in the sounds. For virtual environments, user interaction decides coding complexity required for the sound effects.

Direct Audio is an audio object that reacts to actions from the user. For example, a video game with an interactive interface which responds with a “peep” when the user presses a button. The “peep” cues the user that the button has been pressed and an action is being taken. Usually, these audio objects are spatially close to the user in the virtual environment and fairly load.

Indirect Audio is an audio object that reacts to actions not from the user. For example, a storm inside a video game will include the sound of a door when a Non-Player-Character (NPC) opens the door and enters a room.

Environmental Audio is all of the background noises that are normally associated with an environment. These sounds are used to set the scene and feeling of an environment. A background of city noises immerses the user inside the virtual city environment. Similarly, the sounds of a storm at sea immerses the user aboard their ship inside that storm.

Listener's Audio Setup
For this article, the sound system setup is assumed to be a headphone using an industry standard head related transfer function. For video games, the headphones are paired with a computer setup with keyboard, mouse and monitor or a console setup with a TV screen and controller. And for VR, the headphones are paired with a VR headset and handheld controller. See figures below for examples.

Applications
The development of computers has led to the development of multidimensional modeling, sound synthesis, and digital signal processing software. Increased availability, ease of use has created realistic 3D audio effects. These have many applications for virtual audio environments in the entertainment, telecommunication and educational fields. Currently, virtual audio environments are routinely used in video games to increase the players immersion in their virtual surroundings. This application uses all three types of audio elements, direct, indirect, and environmental. In telecommunication, a virtual audio environment can be overlaid on a physical audio environment and mixed in to provide background sounds. Generally, this uses environmental audio for a looped background and indirect audio for audience sound effects. For education, development of VR technology has allowed virtual audio environments to be integrated into training programs for doctors, nurses, soldiers, etc.

Head Related Transfer Function Mismatch
A Head Related Transfer Function (HRTF) describes the filtering between a person’s ears and a sound source. This filtering for the HRTF is defined as the set of direction-dependent audio cues used to infer the angular direction of a sound source relative to the listener. The HRTF are based on sounds recorded at the ears of subjects and compared to the actual sounds at the point of origin. From this, the HRTF is developed using finite impulse response (FIR) filters for the left and right ears. These transfer functions are dependent on the size and shape of pinna (outer ear), inner ear, and the head and are unique to each person.

This uniqueness of the HRTF leads to mismatches between the listener’s HRTF and the assumed HRTF of the audio system. The most common problems are a misperception of sound sources with regards to front/back localizations and up/down localizations. Front/back localization is where sounds virtually placed directly in behind of a listener will be heard directly in front and vis versa. This problem is solved by offsetting the sound source from the front/back line or adding a small echo. Either of these methods allows the listener to correctly locatize the sound source. Elevated sound sources also suffer from mislocalization. In this case, the cause is the standard HRTF used in the majority of audio systems. Research for spatially distributed sound sources, which HRTF are based on, focuses mainly on the horizontal plane. The result is the standard HRTF used in headphones, video games and VR are excellent for sounds on or around the horizontal plane. However, sounds that are greatly above or below the horizontal plane are not realistically represented by the standard HRTF. There is also an up/down problem where a sound source exactly on the front/back plane will sound higher or lower than it actually is.

Sample-based Audio
Sample-based Audio is the standard technique used to generate sound sources in a virtual audio environment. This technique takes a recorded audio clip, positions it inside the virtual environment, and sets the volume and triggering action. The advantages of this technique are that multiple audio clips from different sources can be easily integrated in the virtual environment and the audio clips are easily manipulated with digital signal processing. VR, gaming, and film industries rely heavily on this technique of sample-based audio. These audio clips suffer from two problems. The first issue is the audio clips’ repetitive nature. To save processing time, the same audio clip and transfer function will be used. Only the geometry transfer function will be changed. In a video game that does this, all of the birds will sound alike because they are the same bird. This is particularly apparent to the listener with a flock of birds calling in a video game. The audio environment that is generated is also limited by the available audio clips in the database. Physical audio environments are complex with noise from multiple overlapping sources. Virtual audio environments based on sample-based audio are simplistic with fewer sources. The second issue is linearity of the sound. In a physical environment, sound is nonlinear due to distortion as it travels through the environment. For high fidelity audio effects that mirror the real world, the sound is slightly off due to the linearity of the audio effects compared to the nonlinearity of the physical effects. Most video games and VR games accept this tradeoff as audio clips and linear transfer function are computationally inexpensive.

Head Related Transfer Function
VR and video games use standardized HRTFs that roughly fit the average listener. The difference between the game’s HRTF and the person’s HRTF causes shifts in the perceived auditory environment and locations of sound sources. A focus of current research is the differences between standard HRTFs, best-match HRTFs and individualized HRTFs for subjects in VR games which utilize audio cues. David Poirier-Quinot and Brian F.G. Katz of Sorbonne Université assessed the impact of HRTFs on task performance. The experiment utilized a VR shooter game that depended on rapid sound source localization inside a 3D environment. Participants were tested with both worst-match standard and best-match standard HRTFs. There was no significant difference in performance between worst-match and best-match HRTFs in the majority of cases. However, there was a slight improvement for sound source localization (54% vs 44%) for elevated sources. The conclusion of authors is that changing between different standard HRTFs has no effect on task performance in VR. However, more research is needed on alternate HRTFs to improve front/back and up/down auditory localization.

Procedural Audio
Procedural Audio is real-time nonlinear sound synthesis for a virtual environment. This generates a more realistic sound than the standard sample-based audio. Procedural audio models the surrounding environment and alters sound source in real-time to create a realistic nonlinear sound effect.

A paper by Jake Ryan Rajjayabun Lee and Joshua D. Reiss of Queen Mary University of London demonstrated the feasibility of procedural audio by simulating audience applause. Commonly, audience applause is used in the broadcasted programs to increase the enthusiasm of a distant audience. However, this applause is a recording of an audience clapping mixed into the broadcast as a sound effect. The recording suffers from repetition, fixed point of recording, fixed reverberation, and fixed audience size. Real-time synthesis allows the audience applause to be created at the same time as the broadcast while creating whatever physical environment is required. Different types of clapping can be simulated, the size and acoustic properties of the room can be changed and location of the listener can be changed in real-time. Possible future applications in video games and VR. Both of which seek to increase the immersion by producing extremely realistic audio environments. Further research is needed to lower the computation requirements and increase the sound effects to include more options.

= Conclusions and Summary = Virtual Audio Environments have become common throughout the world. Current research is focused on creating a more immersive audio experience for the listener, transitioning from “beeps” of the first video games to a realistic audio environment. Virtual audio is frequently combined with virtual visuals to completely immerse the user in a simulated world. The line between the virtual audio and physical audio is also blurring as the better HRTFs are applied to open ear headphone designs and augmented reality is developed to include both audio and visual components.