Research — Lena Matienko<br/>

The First Dawn

Practical investigation of spatio-compositional techniques for cine-VR scores

An exegesis submitted in partial fulfilment of the requirements for the degree of Master of Arts Screen: Music

Australian Film Television & Radio School

2022

How can I compose music for a 360° immersive experience that, without the framing of a 2D or 180° presentation, allows me to highlight various aspects of the environment for the viewer?
Can I tie specific instrumental lines within the music composition to certain elements of the environment, thus create an aural motif to complement the visual imagery?

In my research I aim to test:

What instruments are appropriate for certain elements of the environment.
What instrumental lines and melodies can be written for those elements.
How audible those spatialised elements will be in the mixed composition.
Will the spatial score be more impactful in comparison to the stereo mix?

The purpose of this project is not to test and discuss different spatial mixing tools or technical challenges associated with the process of creating a 3D-mix, but rather on investigating what conventional screen composer tools can be applied to scoring cine-VR.

Methodology and Methods

This creative practice research at some point involves all three kinds of research: research into practice, research through practice, and research for practice, but it is mainly research through art: it “centres on a ‘studio/creative project’ which results in the production and presentation of a body of ‘finished’ creative work, where, additionally, the documentation of what is done in the process of creating these works is taken as a significant component of the research.” (Dallow, 2003)

This exegesis is intended to critically reflect on the creation of my project; it is to be viewed in conjunction with my work The First Dawn.

The methods I used to complete this research are as follows:

First, I identified the most significant VR films and analysed how music and sound are used in them and what role they play in creating an immersive experience. I also identified other research projects that investigate spatio-musical compositional techniques and evaluated how they informed my own work. The films and my findings will be discussed in Chapter 2: Source Review.

Second, I interviewed people who created similar works and used the materials they published in relation to their projects.

Third, I diarised my own process and evaluated the work’s effectiveness through ongoing testing and experimentation and personal reflection on the project development.

The final stage was to show the work to several individual testers and document their feedback, thus allowing for further reflection and development.

The Final Project

The First Dawn is a 12-minute cine-VR experience based on the Māori legend of Ranginui and Papatūānuku and featuring the landscapes of New Zealand. This meditative experience incorporates five different shots of landscapes and is tied together with a voiceover telling the story, which guides the viewer to immerse themselves in the environment.

The aim of this project is to determine how different elements of the environment can be scored; whether placing them in a 3D-mix can be an effective tool to draw the viewer’s attention and substitute the framing of a single focal point; and whether applying spatial compositional techniques to a 360°-video can create a more meaningful audio-visual experience.
‘Spatial audio not only serves as a mechanism to complete the immersion but is also very effective as a UI element: audio cues can call attention to various focal points in a narrative or draw the user into watching specific parts of a 360° video.’ (Thakur, 2022)

Given that The First Dawn was deliberately created without specific focal points, such as action, the viewer will instead be encouraged to focus on the elemental aspects of the environment, such as the earth, mountains, and sky. This creates an additional challenge, as there is no visual cue for the viewer to fall back on, and instead the aural experience must be the sole driver of the experience.

Research Relevance

Cine-VR has been around since 2015. One of the first VR films, Henry (2015), was created by Oculus for display in their VR headset. With the arrival of affordable head-mounted displays, not just independent filmmakers are exploring the possibilities of this medium, but large Hollywood studios are also investing in cine-VR productions. (“Best Virtual Reality Movies to Watch In 2021,” 2021)
During the global pandemic, it is becoming ‘the ultimate panacea to sheltering in place, where people can gather without inhaling the same air in immersive spaces.’ (Kohn, 2020)
My personal favourite Australian works include Collisions (Wallworth, 2016), Awavena (Wallworth, 2018), Ecosphere (Purdam, 2020), and an augmented reality project, Rewild (Purdam, 2019).
While spatialized sound design has been used extensively and is playing a significant role in creating the 360° films, the music, either bespoke or sourced, does not utilise all the capabilities of the immersive sound. Some attempts to use spatial mixing have been made, however the results only begin to touch on the potential capabilities of this emerging medium.

The intention of this project is to explore the compositional techniques tailored specifically for cine-VR.

Key Terms

Below are the terms used in this paper.

Emotional impact
In relation to a musical experience, I identify an emotional impact as a feeling of awe and excitement from being immersed in a new aural space, whether listening to a stereo mix or a spatial composition.

Immersive experience
The goal of an immersive experience is to make the viewer feel that they are present within a believable environment.
‘Immersion is a psychological state characterized by perceiving oneself to be enveloped by, included in, and interacting with an environment that provides a continuous stream of stimuli and experiences.’ (Witmer & Singer, 1998)

Conventional score in the context of my research is a score for a traditional 2D-cinema where the main purpose of the music is to create a dramatic effect, define qualities of a character or set a scene in a specific time or location.

Conventional scoring techniques are the screen composer’s tools used in traditional 2D cinema. They include but are not limited to, melodies, themes, motifs, leitmotifs, arrangements, as well as the fundamental decisions that are made prior to writing: the style of music, the instrument palette, and the purpose of the score in a particular scene or in the whole film.

Spatial music is music where the location and the movement of individual sounds is a primary featured element of the composition.

Musicality
A musical piece is a structured composition combining rhythm, harmony, and melody. I mention musicality as a significant portion of this work was creating a musical experience by placing different sounds into structured composition.

Atonal
Atonal sound is not limited to a Western 12-tonal equal temperament scale. It could be microtonal sounds or ethnic scales or, the sounds with wider harmonics with no definite pitch.

Exegesis Structure

This exegesis is divided into three parts.

In the context review, I will situate my project within the following overlapping contexts:

Discussion of significant VR films and how music was used in them.
Investigation into some earlier spatio-musical compositional techniques and how the approach the composers used in their research projects informed my work.

In the main section, Project and Process, I will share my composition process for 360° videos and compare it to conventional scoring for 2D screen cinema. I will focus on the sound quality and some of the production techniques as well as the challenges I faced during the compositional process.
I will share my own reflection as well as the feedback from individual testers who watched the experience for the first time.

I will present my results and their evaluation in the Conclusion.

Source Review

Overview

In the development of The First Dawn, a 3D spatialized score to a 360° video, there were no established audience expectations or conventions that could serve as a basis for my workflow.

In developing my project and my research, I will use the guidelines of:

Spatial music composition techniques presented by the composers;
The sound design principles for immersive experiences;
The conventional scoring techniques;
The findings of AFTRS alumni Mark Williams.

VR Experiences and Use of Sound and Music in Them

Awavena (Wallworth, 2018) and Collisions (Wallworth, 2016) are the two works that inspired my area of creative inquiry.

Captured with the 1st order ambisonics microphone (Wallworth, 2016), the sound of Collisions takes the viewer on the journey of indigenous elder, Nyarri Morgan and the Martu tribe. The film utilises a stunning immersive soundscape to transport the viewer to the remote Western Australian desert.

Awavena takes the viewer to the depths of the Amazon jungle and the struggles of the Yawanawa tribe’s first female shaman. An incredible sound design accompanying visual effects create an unforgettable experience, making me want to watch this film many times.

Another immersive experience by Wallworth, Coral: Rekindling Venus (2012) was created for full-dome digital planetariums. It is a visual experience accompanied by a voluminous soundtrack composed mainly by Max Richter. I took an opportunity to watch this film in Wonderdome, Sydney, however, the venue acoustics with a lot of echo did not allow the audience to hear whether any sound or music spatialization was utilised, which highlights the importance of sound delivery via a dedicated headset that can provide a unique soundscape to the viewer, without interference.

One of the later VR experiences is Notes on Blindness (Colinart, La Burthe, Middleton, & Spinney, 2019). It is a linear 360° animation with some interactive elements, built around the audio recordings by John Hull who lost his vision. Driven by sound design, it builds amazing visual effects and immerses the viewer in the world of vision through sounds. The sounds are easy to locate and even multiple voices sounding at the same time are clearly distinguished.

There is very little score, which is not spatialised but rather used as an ambiance.

Agency and Control

The notions of control and agency defined by Eric Willams, Carrie Love and Matt Love, are the most significant aspects of cine-VR that provide a theoretical framework for my research.

In literature, theatre and film, the audience’s experience is fully controlled by staging, framing, lighting, editing and many other tools, whereas video games provide more agency to the player, by giving them control over their actions, gaze, and a physical choice within a story. (Williams, Love, & Love, 2021)

The linear cine-VR experience does not provide the same level of agency as a game, nor is the outcome of the experience depends on the viewer’s actions, however, the authors argue that within those linear scenes, the viewer still chooses to observe different things and therefore perceive different information, so every experience is unique.

The authors identify three forms of agency and control:

Directorial, whether the filmmakers chose to encourage the viewers to look in a certain direction or let them move freely and observe the surroundings;
Emotional, whether the storytellers guide the viewer to connect with a specific character, to make them feel a certain way;
Intellectual, whether the viewers can choose what details they pay attention to and in what order.

All three are interconnected. Using their directorial agency, for example, ‘the viewers can choose to spend their intellectual energies on certain details’.

In Chapter 2: Narrative Storytelling in Cine-VR, they break down two opposite approaches to how much of the 360° space the directors should leverage.
The Newton style suggests maintaining all the relevant action in a 120° window, directly in front of the viewer. This configuration retards directorial agency completely. The audience is not encouraged to turn 360°. This work is to be viewed in a stationary chair and hence provides the director more control over the scene.
The Copernicus approach gives the viewer full directorial agency inviting them to spin around and observe the surroundings freely. This is viewed in a swivel chair or standing up.

Working with landscape footage that has no action or characters present, and hence falls into the Copernicus category, I aim to test whether writing a spatial score to it can provide enough directorial control.

Having said that, I can now rephrase my research question the following way:
In the cine-VR experience, where the footage was obtained using the Copernicus approach, can spatialized music seize the directorial agency from the viewer?

According to the authors, the footage I work with was shot from the first person. This is not to be confused with the three POVs from literature, where the first person POV means that the story is told by the main character.
In cine-VR, the same concepts are applied, however, it depends on whether the audience/camera is an observer or an active character, interacting with others, and whether they can see their own body.
In the case of The First Dawn, the camera is the main character, Tane, the god of the forests and the birds. His body is not shown, and he is being addressed by the narrator, an invisible character, so it falls into the Griffin 1 category, the first-person perspective.

Immersive Sound: Practical Guidelines

Roginska and Geluso suggested the same principles for creating immersive audio as for more common stereo or surround mixes: ensuring an appropriate dynamic range, the right use of effects such as reverb, as well as the appropriate frequency range. They also emphasised the importance of composing around the dialogue: “First and foremost, dialogue is king. In most films, video games, or even in music, the human voice is the central story-telling element.” (Roginska & Geluso, 2017)

They noted the challenges associated with the lack of tools for immersive audio production, which was probably true when the book was first published. Today there are multiple different plugins for spatial mixing, many of which are available for free. Even so, I faced some challenges with exporting multi-channel media, which I will discuss in my mixing process.

Tom Smurdon, Audio Lead Content, Oculus, shared some techniques on how to create a believable 3D soundscape and explained how sound design for VR is different from 2D films and games. He covered technical details on mixing, such as attenuation curves (depicting the sound loudness depending on the distance to it), and many other tips to build a realistic experience. (Oculus, 2015)

The most important fundamental practical basics that informed my workflow include: using mono content for HRTF spatial mixing and reverb settings to recreate ambiance of open spaces. He recommended using 2-4 emitters around the viewer with different loops (tracks) to create a sound that does not have a source, for example, wind. Smurdon reiterated that high-frequency sounds are easier to localise than low-frequency content, as well as the importance of avoiding busy mixes to retain the directorial control. For the voiceover, Smurdon recommended attaching an emitter to a visual sound source, such as a radio, or in two sources above the listener, to avoid breaking the immersion.

Spatio-Compositional Works

In 2014 Martin Jaroszewicz, a composer, created series of avant-garde works, named Labyrinth (Jaroszewicz, 2021), using loudspeaker spatialization technology.

He argued that spatialization can be used as a compositional tool and incorporated in the score and notated, alongside with rhythm, harmony, and melody.

His experiments with different speaker layouts and placement of the speakers according to the shape of the room demonstrated that circular panning techniques provide a good perception of circular motion regardless of the system used, but more complex trajectories are difficult to follow. He showed that the orbital paths of four objects sounding at the same time can be followed. He also concluded that sound timbre and the appropriate frequency range are important for clear localization. This reinforces Smurdon’s claims that higher frequency sounds are easier for the human ear to locate spatially than low frequency.

Natasha Barret, a composer specialising in acousmatic and live electroacoustic works, and multi-media installations, identified some of the ‘spatial laws’ to create an aural image that would be perceived as real:

The effect of sound transmission includes different parameters, however, the air absorption coefficient (low-pass filter on the sound source) is the only practical spatio-compositional tool;
The properties of the reverberant field, i.e., the space where the sound is located that creates multiple reflections;
Sound object size and relationships between multiple objects.

Barret, however, pointed out that an average listener can distinguish five out of eight simultaneous static sounds. If they are moved slowly, it increases the number of sounds the listeners are able to perceive. (Barrett, 2002)

Spatial Scores

Mark Williams is my direct precursor in spatialized scores, and my research will correspond to his final project, Heartscapes.
In his exegesis, he challenges the traditional screen music conventions.
Unlike the conventional immutable score, his music is presented by individual parts that move in the 3D space.

Heartscapes is an interactive 3D experience developed using Unity 3D game engine; it is currently viewed on a 2D screen and controlled with a mouse and a keyboard. With active player engagement, Williams wanted to explore:

If the virtual spaces that are explorable, i.e., offering a high level of agency, will create a more engaging experience;
If the instrumental lines with the visual objects attached to them will heighten the feeling of immersion, or if the visual orbs attached to the instrumental lines will draw the viewer’s attention from the music. (Williams, 2019)

He created an experience where the audience is not just a passive viewer of the screen media with a finished music piece added to it. He created a space where the music and its parts are the main elements of that space: it is primarily a musical experience with the visual component aiding it.
Williams concluded that:
Active player engagement was a major contributor to their experience: ‘The impact is further reinforced, not diminished by active player control’ (Williams, 2019). By being able to explore the space, they could recompose a piece.
Heartscapes makes use of visual orbs attached to the instrumental lines, which aid the perception of the spatialized and trajectorized music, without dominating the experience.

Williams also mentioned the new way of composing music: ‘It requires new ways of thinking about musical structure, in composing works that, (a) are specifically designed to be pulled apart and explored, and (b) will work with direct visual counterparts. He also emphasised that the direct impact of the sound ‘is dependent on how clearly it can be distinguished from other sounds, and from the reverberant ‘noise’ in the space’. This is a key tip for my own process that I did not take seriously enough during my first mixing attempt.

Whereas Williams’ work is in a non-linear, interactive experience, with visual objects attached to sound, I created a linear experience attempting to link musical lines to the visual objects.

Another work of Mark Williams is Ascendance, a collaboration with the director Louise Harvey.
It is an abstract 360° animation with a spatialized score. It can be viewed through a VR headset or on a smartphone that also has HRTF. (Harvey, 2021)

This project required an entirely different workflow. The piece of music was pre-written for sextet: string quartet, guitar, and clarinet. Williams created a very dynamic 3D mix. The animation was then created in response to the music, with Williams and Harvey constantly refining the work, to ensure clear correspondence between aural and visual elements. Williams then remixed the music to follow the main animated objects once the final visual sequence had been set.
The composition was limited to only six instruments which reduced the complexity, allowing for more attention to detail.

With spatial mixing, Williams created a close correspondence between musical events and their visual counterparts.

The resulting piece used distinct voices with different articulations that could be presented individually, creating the effect desired by both creators.

This work provided some guidelines for creating my own spatial score:

All the different instruments do not need to be attached to different objects that represented them;
The audio will be distracting if too many voices are coming from different directions, and their locations are hard to perceive.

To create a spatial score, Williams used different tools including:
Reaper in conjunction with spatializing plugins such as Blue Ripple O3A for encoding/panning, reverb EQ; Audio Ease 360 pan suite, decoder plugins O3A, FB and IEM (M. D. Williams, personal communication, 2021).

Project and Process

Early Project Development

The First Dawn was inspired by an immersive guided meditation: 360° videos of a forest and a river where a voice encourages the viewer to do simple mindfulness exercises such as focusing on their own body sensations, their breathing, and the surroundings.
This experience is to be viewed through a VR headset, with the viewer seated on a swivel chair, allowing them the freedom to rotate within their virtual environment and perceive the full 360° view.
Cameron Patrick and I spoke about the main points of difference of such an experience compared to a traditional 2D cinema, chiefly, the absence of framing.
When a viewer can move around freely and observe the environment, can the music aid the narration and gently guide their focus to certain visual elements?

For the development of my own project, I was fortunate enough to be offered an opportunity to work with the footage by Phoria. This footage consisted of five different 360° shots of stunning New Zealand landscapes, featuring mountains, rivers, and trees.

Two authors from New Zealand, Courtney Harland, and Gabriela Visini, constructed the narrative that would lead the viewer’s experience. It was inspired by the Māori legend of Ranginui and Papatūānuku and the landscapes provided. The narrative uses highly descriptive language, focusing the audience’s attention on the beautiful nature of New Zealand.
The team was joined by two Māori community representatives: Stacey Bird, who reviewed and refined the script, provided Te Reo translations, and recorded the voiceover; and Noe Soames, a musician, who recorded Taonga pūoro.

The First Dawn is a linear immersive experience designed for the Oculus headset. It combines a series of five stereoscopic 360° videos under one narrative arc.

The script is written from the second person’s narrative POV, guiding the viewer to experience the landscape through the eyes of the god Tane, moments after he forcibly separated the earth and the sky, thus letting in the first sunlight. A traditional karanga (call out) opens the piece, evoking a sense of groundedness in the viewer. Te Reo translations intersperse the narrative, immersing the viewer in Māori culture and mythology.

Recording

Due to the global pandemic, I had to hand over the crucial recording sessions to a studio in Wellington, where narrator Stacey Bird was located.
It was important to me that the voiceover edit, and the rough cut, were completed before I attempted to write music for this experience, as the guiding narrative would be the only element of change or ‘action’ within the piece, and therefore would set the overall pacing for the entire score.

Noe Soames, a Māori musician based in Wellington, came onboard at the narrative script development stage. It was not expected of him to read Western notation, nor do I have the ability or privilege to write Māori music. Instead, we agreed that I would write the basics for him to improvise to. For this relaxing meditative experience, the sound quality is far more important than technique, so we agreed for him to focus on the tone and play by ear.

I started writing the music using Pro Tools, by identifying the key hit points and mapping the tempo, just as I would do for traditional screen music.
There are not many hit points, since there is no action in the footage, but I chose to treat the image transitions as hit points, as they follow the narrative, to add dramatic effect. I also ensured every hit point falls onto the downbeat for easy session management.
The project tempo was not crucial, as I was using long sustained synths for the draft. I did not encourage the talent to follow the tempo but rather play freely, however, I choose to use a tempo map in case I needed to use tempo-based effects during mixing, such as delay.
I drafted a synth layer for Soames to improvise to, keeping the music simple, allowing the viewer to focus on the narration, occasionally introducing key changes where appropriate.

The music recording sessions were done remotely, in Sandbox Studio, Wellington. I could not visit and be a part of the process, due to travel restrictions, so instead, I wrote a brief to Lorenzo Buhne, the recording engineer, and Soames.
Rather than telling them what instruments to play and how to play, I described the elements of nature and the desired qualities of the sounds: I created multiple markers in the project, identifying key moments, where I would like Soames to improvise, allowing the musician to interpret it and use his authentic voice. The elemental guides are listed below:

Wind – gentle, sometimes heavy, howling
Water – rippling, calm, some movement
Waves – dangerous
Rocks – sustained, low, heavy

The main elements of nature are already in the script, however, there are others that I chose to add to create a more interesting soundscape and a more engaging immersive experience, for instance, the sound of the wind.
There are elements both visible and audible in the footage, such as waves; visible only, such as sky, clouds, sun, rocks, trees, mountains; and audible only, such as wind.
Since my aim was not to create a realistic sound design, that would only allow me to have sounds of the wind and the waves, but rather underscore the nature, I assigned sounds to all the elements above, following the narration, including the rocks, creating a non-sounding spatial analogy, ‘a musical-spatial implication without a real-world sounding counterpart.’ (Barrett, 2002)

It turned out to be challenging to communicate the variety of sounds required prior to the recording session: I discussed with Soames that I do not expect him to play Western 12-tone music but be authentic and follow his intuition when imitating sounds of nature. However, when he turned up for the first recording session, he soon realised that he could do a lot more with the percussive instruments he did not have with him, so we immediately blocked another four hours the next day for him to experiment more.

Taonga pūoro used in the work

The sounds recorded by Soames and Buhne were surprisingly close to sound design: the atonal voice of kōauau imitating wind; the different size kōauau and ocarina played with a palm sounding like water droplets.

Sounds of the wind

The sound of water

The amount of material recorded during the two sessions was tremendous. I consulted with Soames whether some of the sounds he created could be modified or repurposed, and he gave me full freedom to use them the way I see fit for my research.

Ocarina: could be used as the sound of birds

Workflow

This compositional process is very different to anything I have written before, requiring both new techniques and thought processes.

For my traditional cinema scores, I use Logic Pro: I mark the hit points, map out the tempo, write a melody with the main instrument and then create the arrangement around it. With the rough demo, I have a clear idea of what the score will sound like and get approval from the director. I then export the MIDI tracks to Sibelius to prepare the music sheet for the musicians, create a recording session in Pro Tools and export all the digital instrument tracks to the session for the mixing.

I adopted a similar workflow when writing a score to my 360° visual poem, a collaboration with the cinematographer Hamish Gregory:
It is a short story from a main character’s point of view who ends their life, not being able to handle the loss of their daughter.

Similarly to my research project, I wanted to test how different instruments could be grouped and assigned to certain elements of the environment to guide the viewer’s attention to them in the absence of framing.
The visual poem consists of Newton footage mostly, however, some of the landscapes could be treated like Copernicus shots: They are wide and even though there is one character only who draws the viewer’s attention, I wanted to accompany her with an instrumental line, to make it easier for the viewer to follow her.

At first, I found it challenging to prioritise different inputs from different collaborators, including myself: Not only did I have to create a sound-like composition, using a reference track as an inspiration, but write and arrange for a small orchestra, and keep in mind that this is a 360° video where different instruments or groups of instruments will be placed in various locations, with their retrospective visuals.
After writing a few minutes of music, I realised I was writing a conventional 2D cinema score, focusing on the emotional journey, compositional tools, sound-alike, but without thinking how the instruments would be allocated.
I had to start again. I first decided what elements in the scene were present, what characteristics they had and what instruments and techniques would represent them best. By putting them into groups, I already made the process of arranging a lot simpler, and the arrangement became a primal factor in the compositional process.

In the case of The First Dawn, the whole comfortable workflow was discarded.

I wrote a minimalistic synth line with very simple harmonies as a basis to write an arrangement around. I expected the synth would not end up in the final mix, since I could not see what it could represent. But I needed something solid to carry the viewer throughout the experience and hold all other pieces together. Having it as a stereo track ‘glued’ to the viewer’s ears would defeat the purpose of this whole research project. I decided to use it as an ambient sound and place it in two different spots above the listener.

Another important question was where to place the narration.
The narrator represents an invisible character speaking to Tane, the main character who is also the camera/the audience. Having it near or above the listener would be confusing and distracting from the spatialized music. Having it in the listener’s head or in their chest would make more sense: it would make them feel like an imaginary voice, or their own conscience, was speaking to Tane.

Upon receiving the Pro Tools session after the second recording, I realised Soames did the job too well: The sound library he created with percussive instruments was too realistic and might lead this project to a different discipline: sound design. My intention shifted to finding a purpose for as many tracks as possible, and to see how they could be manipulated to create a more musical (to a Western ear) experience but remain culturally authentic.

Extending the Edit

The initial voiceover edit needed more work before I could proceed to the music edit: Greg Wise came back to me with some suggestions on the dialogue editing before I started placing the voices in the timeline and reorganising them. He noticed that the pacing in the original edit was a lot more dense between 01:00 and 02:00 – in the second shot. It felt to me as well that the first dialogue edit was not appropriate for the experience: there was not enough room for the music to evolve and for the viewer to let the words and the information from the audio-visual environment sink in.
After the Māori instruments were recorded, I decided to get back to the dialogue edit. I started by expanding the shots, to ensure enough room between the lines, allowing enough time to perceive the information coming from the voiceover. I also added more space where I felt the spatial score could take over before bringing the viewer back to the story.

With the edit extended by two minutes, I now had to adjust the synth track that is a foundation for the harmonic structure of the piece. Adjusting the tempo map and moving the old synth track into place would have been harder and taken so much longer than simply rewriting it. So, I deleted all the hit point markers and marked the project with the updated video. I then remapped the tempo and re-recorded the same synth track to the newly edited voiceover.

Pro Tools session with the first and the second voiceover edits

The new voiceover edit gave enough time for the viewer to immerse themselves into the story, and thus better experience the environment. I still had doubts that viewers would have enough time to perceive all the audio and visual information, so I would need to be mindful of that and not create a score or a mix that would be too busy.
I also brought the dialogue from the Pro Tools session, where it was originally edited, to the music recording/editing session. This allowed me more control over the pacing, especially in spots where the beginning of a phrase would fall onto the downbeat, or where there was a chord change and new instruments were introduced.

Bringing all audio clips into one session

Composing

At the initial stages of composition, I was focusing on different sounds that were recorded and trying to place them into something musical. I was trying to put together a puzzle from a thousand pieces without having a picture to guide me. There were too many variables, too many different sounds recorded, and I struggled to choose what would work the best, how to manipulate them, and where exactly to place them. I got caught up in the audio editing process rather than on music composition.

I had a harmonic structure that aided building an arrangement, but the main breakthrough happened when I realised I did not have a theme. I needed a simple motif that I would remind the viewer about throughout the whole 12-minute composition and come back to in the credits.
A two-note motif would make a solid recognisable composition and make it a lot simpler to build an arrangement around.

Due to time limitations, I had to utilise MIDI and digital instruments. I first used them for the main elements of the environment I struggled to find ideal sounds for. I will then replace them with authentic voices, newly designed sounds, or a new live recording.

Even though this composition will be mixed in ambisonics, I tried not to get in the way of the voiceover and instead build the textures around it. Trying not to overwhelm the viewer with too much information, I bring in the new textures gently, using the mod wheel with every digital instrument.

Sound Production

I struggled to keep my session organised. With so many tracks pre-recorded, the length of the composition and different visual elements to assign voices to, I ended up with too many tracks and multiple copies of them.
Colour coding and clearly labeling the tracks simplified the workflow, but because I did not have a clear idea of where and how to use most of the clips, it was hard to find them and keep things in a logical order.

To create a more musical experience out of atonal voices, I tried to use Morph, a software that merges two different audio tracks. My intention was to take the sounds of nature recorded with Soames and combine them with musical lines to see if I could create a nice blend that would still be melodic. My initial tests were not successful, however, further experimentation with different sound combinations and more processing illustrated that unique textures appropriate for the work could be created.

Merging the sound of stones with a cello sample

This led to further success with placing the Māori instruments into the mix, along with some fine-tuning and equalisation.
I used some of the recordings as percussive elements.

Sound of paper being torn before and after processing:

With simple editing and processing, the sound of the paper served as the sound of waves in the composition.

I did not attempt to sample the instruments to speed up the process for two reasons: Firstly, in the interest of cultural sensitivity, and, secondly, to retain the unique texture and a human touch in the sounds, rather than creating multiple copies of them.

The best way I found to process the sustained atonal sounds to make them more musical and blend with the composition was as follows:

Adjust the pitch of the original atonal sound to match more closely the harmony of the composition;
Boost the fundamental and second harmonics and filter out other nearby frequencies that make the voice sound ‘out of tune’.

The sound of Kōauau representing Tāwhirimātea: before and after processing

Another point of difference between this production and production of a traditional score is that I needed to keep in mind the spatial mixing:
If the position of the sound in a 3D environment changes during a transition, I needed to split a track into two to allow a smooth crossfading between the two shots.
A good example of it is the position of the sun in the two final shots:

In the first shot, there is some sunlight on the horizon, but in the second, final, shot, the sun is above the viewer, so the sound will have to disappear on the horizon line and appear above.
I split the midi track in the point of transition and bounce (in Pro Tools terms, commit) them separately.

To create a spatial mix in Reaper, the stems need to be exported dry and in mono. I tried to complete all audio manipulations on Pro Tools, before moving to the less familiar interface of Reaper.

Mixing

For mixing, I used Reaper DAW, which allows 64-channel processing (The 7th order Ambisonics), with SSA spatializing plugins and FND Reverb by IEN.
Oculus, however, supports up to 9-channel audio (the 2nd order Ambisonics).

Unfortunately, as of today, small studio setups do not allow mixing while watching 360° content via DAW. Some spatializing plugins such as O3A Core or SSA Plugins allow head tracking to monitor the audio spatialization, however, the 360° videos are viewed expanded on a 2D screen.

The same image via a headset and on a computer screen

With my current setup, I mixed by listening to static sounds through stereo headphones using the SSA Monitor that decodes ambisonics sound to binaural. This makes it difficult to create a direct correlation between the visual and audio objects in a 3D space and requires a lot of testing with a VR headset and manual repositioning. Facebook VR Video Player allows real-time video playback on a remote computer; however, I have not yet had the chance to explore its functionality.

I started my mixing process by panning the static sounds: the ambience, the ground, and the wind. I decided to keep them in one position regardless of the shot, to ground the viewer in the environment.

Panning three mono wind emitters above the listener

I then positioned the sounds of water, clouds, sunlight, and the trees.

The audio spatializing plugins do not take into account the distance to the objects. This can be achieved by adjusting the levels using attenuation curves explained by Smurdon, amount of reverb, and equalisation, which is out of the scope of this research project.

Finding the right tools to encode the video to playback on Oculus was another challenge. Reaper encoding significantly increased the file size and the video did not playback on an Oculus headset. After trying multiple output settings, I managed to get Audio 360 Encoder by Facebook to export the file that would play on my headset. The struggle associated with the lack of technical knowledge created more time pressure and shifted my focus from creating an impactful mix to exporting the files in the right format.

The first mix fell short of my expectations. The high register content was lost completely, however, I managed to localise some low-frequency sound I positioned near the listener, but I could not locate or even hear the sounds of water, sun and the mountains.
I needed to significantly boost the levels of all elevated sounds as well as reduce the amount of reverb used.

Testing

I presented The First Dawn to individual testers: first I showed them the HD version with a stereo mix, displayed on a widescreen, then the 360° version via an Oculus headset.

Even in the 2D version, they enjoyed the sound and the dramatic build-up. It filled the space between the phrases perfectly and was stylistically and culturally appropriate.

Watching a 360°-video via a VR headset, however, they were more impressed by the visuals.
Overall, they felt immersed in the experience and noted that the music aided the sense of envelopment but did not distract from the story or from the visuals.
They could locate some of the instrumental lines positioned left and right, however panning up and down was not audible. It could be due to a few reasons. First, the mix is quite busy: throughout the whole piece I have at least seven sound sources present at the same time. Second, the complex textures I used in the composition with rich frequency range could also be hard to locate. Third, my lack of technical knowledge regarding encoding requirements for different platforms. And fourth, the first mix was tested on an Oculus headset through the built-in speakers that are positioned above human ears. Testing with the headphones plugged into the headset could provide a better sound localisation.

By adjusting the mix by significantly lowering the levels of the ambient sounds and the reverb, as well as extreme automation for individual voices that needed to be distinguished from the texture, I managed to achieve an easier localisation of the instrumental lines and some directorial control. Nonetheless, the mix will need more refining and testing after the musical composition is complete.

Conclusion

In order to relinquish the directorial agency from the viewer completely, the spatial mix needs more refining, with the focus on clearly localised individual sounds coming out of the overall immersive texture.

The initial testing with the first-time viewers illustrated that The First Dawn is predominantly a visual experience with the score aiding that experience and strengthening the emotional impact of the story.

Regardless of 3D spatialization, I achieved a meaningful immersive score for The First Dawn with the conventional scoring techniques, including the appropriate choice of instrument palette, harmonic structure, and evolving textures.

The technical challenges associated with the different file formats and new software and the more complicated workflow, due to different displays used for mixing and watching without a real-time connection, make the spatial scores less appealing for composers without strong technical knowledge in sound production, such as myself.

The First Dawn is my passion project, one I intend to continue working on once other pressures are over. I will complete the arrangement with live instruments, hone the mix, and add special effects, graphics, and colour grade the video.

My aim is to strengthen my knowledge in the sound production techniques for cine-VR and learn tools for creating sound design in non-linear, interactive immersive experiences such as Wwise, Unreal Engine and Unity 3D.
I would also like to try applying the same spatio-scoring techniques to more narrative-driven 360° films with active characters and changing scenery, wherein it is even more imperative that the audio follow the visual trajectory of the moving elements.
My long-term ambition is to inspire immersive scores utilising HRTF for cine-VR projects rather than conventional stereo mixes ‘glued’ to the viewer’s head.

Reference List

Barrett, N. (2002). Spatio-musical composition strategies. Organised Sound, 7(3), 313–323.

Colinart, A., La Burthe, A., Middleton, P., & Spinney, J. (2019). Notes on Blindness [Film]. Retrieved from https://www.oculus.com/experiences/quest/1946326588770583/

Dallow, P. (2003). Representing creativeness: Practice-based approaches to research in creative arts. Art, Design & Communication in Higher Education, 2(1), 49–66. https://doi.org/10.1386/adch.2.1.49/0

Harvey, L. (2020). Ascendance [Video]. Retrieved from https://www.youtube.com/watch?v=vzHKmjpGtbA

Harvey, L. (2021, April 3). Ascendance. Retrieved October 14, 2021, from Ascendance website: https://louiseharvey.com.au/welcome/ascendance/

Jaroszewicz, M. (2015). Compositional Strategies in Spectral Spatialization [PHD dissertation, University of California]. University of California, Riverside, California, United States. Retrieved from https://escholarship.org/content/qt7wm025g7/qt7wm025g7.pdf

Jaroszewicz, M. (2021). List of Works. Retrieved February 10, 2022, from Martin Jaroszewicz website: https://www.martinjaroszewicz.com/works.html

Kohn, E. (2020, June 7). VR Can Be the Film Industry’s Future, but the Barriers to Entry Are Surreal. Retrieved February 10, 2022, from IndieWire website: https://www.indiewire.com/2020/06/virtual-reality-could-save-film-industry-1202234384/

Newnham, N. (2020). Awavena Press Kit. Retrieved May 5, 2021, from https://www.awavenavr.com/ website: http://www.awavenavr.com/assets/downloads/AWAVENA%20PRESS%20KIT%20Web%20Version.pdf

Prokhorov, M. (2021, January 25). Best Virtual Reality Movies to Watch In 2021. Retrieved February 10, 2022, from Sensorium website: https://sensoriumxr.com/articles/best-virtual-reality-movies

Purdam, J. (2019). Rewild [Mobile application]. Retrieved from https://play.google.com/store/apps/details?id=com.PHORIA.RewildAtHome&hl=en_AU&gl=US

Purdam, J. (2020). Ecosphere [Documentary]. Phoria. Retrieved from https://www.oculus.com/experiences/quest/2926036530794417

Roginska, A., & Geluso, P. (2017). Immersive Sound: The Art and Science of Binaural and Multi-Channel Audio (1st ed.). Milton, UNITED KINGDOM: Taylor & Francis Group. Retrieved from http://ebookcentral.proquest.com/lib/aftrs/detail.action?docID=4913229

Thakur, A. (2022). Spatial Audio for Cinematic VR and 360 Videos. Retrieved February 10, 2022, from Oculus Creators website: https://creator.oculus.com/learn/spatial-audio/

Wallworth, L. (2012). Coral Rekindling Venus: A major work for fulldome digital planetariums. Retrieved February 10, 2022, from Coral Rekindling Venus website: https://coralrekindlingvenus.com/media/press-kit/

Wallworth, L. (2016a). Collisions [Film].

Wallworth, L. (2016b). The Story Behind Collisions [Video]. Retrieved from https://www.youtube.com/watch?v=-NZHLtmNi_s

Wallworth, L. (2018). Awavena [Film]. Retrieved from https://www.viveport.com/6792ef3d-0775-4ab4-b3d3-3d9c15b64d47

Williams, E. R., Love, C., & Love, M. (2021). Virtual Reality Cinema: Narrative Tips and Techniques. Milton, UNITED KINGDOM: Taylor & Francis Group. Retrieved from http://ebookcentral.proquest.com/lib/aftrs/detail.action?docID=6457825

Williams, M. D. (2019). Heartscapes: A practical investigation of 3D spatialised and trajectorised music in an explorable virtual space [Exegesis, Australian Film Television and Radio School]. Australian Film Television and Radio School, Sydney, Australia. Retrieved from https://libguides.aftrs.edu.au/ld.php?content_id=48515183

Witmer, B. G., & Singer, M. J. (1998). Measuring Presence in Virtual Environments: A Presence Questionnaire. Presence: Teleoperators and Virtual Environments, 7(3), 225–240. https://doi.org/10.1162/105474698565686

Meta Quest (2015). Oculus Connect 2: 3D Audio: Designing Sounds for VR [Video]. Retrieved from https://www.youtube.com/watch?v=IAwFN9sFcso

The First Dawn

Table of Contents