Web Audio: An Introductory Tutorial

The WebAudio API is surprisingly powerful and easy to use. It is useful for everything from music playback and sound effects to heavy real-time audio processing and audio visualizations.

In this introductory tutorial, I am going to show you the basics of how the API works. We will generate sound programmatically, play with some audio processing, and, finally, load sound files using two different methods.

A Word of Warning

When developing any sort of audio software, it is very common for bugs to play sound glitches or other ear-piercing tones at unpredictably high volume. I would not use headphones until everything is solidly working and recommend keeping your speakers at a low volume.


WebAudio API Fundamentals

The WebAudio API operates as a graph of AudioNodes. Each node is either the source of a sound, processes the sound, or is a destination for the sound (i.e. the speakers or a file). Each node can be connected as an input into any other node and made into a processing chain to create elaborate audio effects.

For instance, you could have an OscillatorNode as the source to create a sine wave that links into a ConvolverNode that applies a reverb effect, which then links into a GainNode that controls the volume, which finally links to the destination node that outputs to the speakers.

Source Nodes:

  • OscillatorNode - Creates a waveform by repeating a sound wave at a certain frequency.
  • AudioBufferSourceNode - Plays an in-memory audio buffer, either created programmatically or loaded from a file.
  • MediaElementAudioSourceNode - Uses the sound from an <audio> or <video> html element.
  • MediaStreamAudioSourceNode - Uses sound from the microphone or a WebRTC stream.

Audio Effect Nodes:

  • BiquadFilterNode - Filters the sound in many different ways, for example, a lowpass or highpass filter.
  • ConvolverNode - Performs a Linear Convolution on the audio to create a reverb effect, for instance.
  • DelayNode - Delays the output by a set time.
  • DynamicsCompressorNode - Compresses the volume to limit the dynamic range of the audio.
  • GainNode - Changes the volume of the audio.
  • IIRFilterNode - A general infinite impulse response filter. Could be used as graphic equalizer, for instance.
  • WaveShaperNode - Uses a curve to distort the sound.
  • StereoPannerNode - Pans the audio to either the left or right channel.

Other Nodes:

  • AnalyserNode - Uses FFT to analyse the sound, but doesn't alter the sound.
  • ChannelMergerNode - Merges sound channels together, for example, stereo to mono.
  • ChannelSplitterNode - Splits sound channels into individual mono nodes. Allows for changing the gain of each individual channel, for example.

As you can see, there is plenty to work with for creating and manipulating audio. Let's get started by playing some basic generated sounds.


Getting Started

The WebAudio API is built into the browser, so there is no need for any libraries to download or include in your HTML.

A simple way to start is by playing a generated tone. We will use an OscillatorNode to create a simple square wave and a GainNode to control the volume. Make sure to use that GainNode or your generated sounds will be incredibly loud.

First of all, add this button to an HTML page. We will fill in the script tag with the following code.

<button onclick="playSquare();">Play Square Wave</button>
<script></script>

We need to get the AudioContext from the browser. This is the entrance to the WebAudio API. From this context, we create the AudioNodes that we want to use, in this case an OscillatorNode and GainNode. The audio graph always has to start with a source node, followed by the processing nodes and ending with a destination node. The most basic graph will look like this: Oscillator -> Gain -> Destination.

To actually play a sound, we need to tell the source to start making noise. The WebAudio API has a built-in timing system, so we can schedule sounds to start and stop at anytime in the future from when we call the function. In this case, we tell the oscillator to start producing sound immediately and to stop in one second.

// Get the AudioContext
var AudioContext = window.AudioContext || window.webkitAudioContext;
var audioCtx = new AudioContext();

function playSquare() {
    // Create the GainNode to control the volume.
    var gainNode = audioCtx.createGain();
    gainNode.gain.value = 0.2;

    // Create the OscillatorNode setting its waveform and frequency.
    var osc = audioCtx.createOscillator();
    osc.type = 'square';
    osc.frequency.value = 440;

    // Connect the nodes together.
    osc.connect(gainNode).connect(audioCtx.destination);

    // Play a square wave for 1 second.
    osc.start(audioCtx.currentTime);
    osc.stop(audioCtx.currentTime + 1);
}


Altering Parameters

We can do a lot of cool things with the timing feature of WebAudio, far more than just controlling start and stop times. You can control most parameters of any audio nodes over time using the timing functions.

For instance, we can control the frequency of the oscillator to create a vibrato effect and the amount of gain of the gain node to fade in and out. Notice in the above example, we didn't set the frequency directly, but using frequency.value. It turns out that the parameters are actually objects of type AudioParam.

These AudioParam objects have an event queue in them. If you set the value directly as we did above, the event queue is bypassed. To add to the queue, you call setValueAtTime(), linearRampToValueAtTime() or exponentialRampToValueAtTime(). For each of these events, the starting value is the ending value of the previous event in the queue. This way you can setup a long list of changes very naturally.

We are going to set the gain node to fade in and fade out using the setValueAtTime() and linearRampToValueAtTime() functions. At the same time, we will have the oscillator frequency ramp higher and lower to create the vibrato effect.

function playVibrato() {
    // Set the Gain to fade in and fade out.
    var gainNode = audioCtx.createGain();
    gainNode.gain.setValueAtTime(0, audioCtx.currentTime);
    gainNode.gain.linearRampToValueAtTime(0.2, audioCtx.currentTime+0.75);
    gainNode.gain.setValueAtTime(0.2, audioCtx.currentTime+1.25);
    gainNode.gain.linearRampToValueAtTime(0.0, audioCtx.currentTime+2);

    // Create the OscillatorNode setting its waveform
    var osc = audioCtx.createOscillator();
    osc.type = 'square';

    // Set the frequency to change up and down by this amount.
    var vibratoDelta = 5;

    osc.frequency.setValueAtTime(440, audioCtx.currentTime);
    for(var i=0; i<2; i+= 0.125) {
        osc.frequency.linearRampToValueAtTime(440 + vibratoDelta, audioCtx.currentTime+i);
        vibratoDelta *= -1;
    }

    // Connect the nodes together.
    osc.connect(gainNode).connect(audioCtx.destination);

    // Play for two seconds.
    osc.start(audioCtx.currentTime);
    osc.stop(audioCtx.currentTime + 2);
}


Audio Processing

A slightly more complicated node is the ConvolverNode. Convolver nodes take an impulse response audio file and use this to modify the audio that passes through them. This allows the creation of a reverb effect, for example.

We have this impulse response file that contains a base64 encoded string of the impulse data stored in a variable called "impulseResponse." We will have to decode this and pass it to the ConvolverNode when we create the audio graph.

We will add this impulseResponse.js file into our HTML to get access to the data.

<script src="impulseResponse.js"></script>

We need to convert the base64 encoding to a binary byte stream using this function:

function base64ToArrayBuffer(base64) {
    var binaryString = window.atob(base64);
    var len = binaryString.length;
    var bytes = new Uint8Array(len);
    for (var i = 0; i < len; i++) {
        bytes[i] = binaryString.charCodeAt(i);
    }
    return bytes.buffer;
}

The byte stream has to be decoded into an audioBuffer that the ConvolverNode can use. This is handled by the audio context using the decodeAudioData() function. On successful decoding, we pass the audioBuffer to the convolver.

function playReverb() {
    // Set the Gain to fade in and fade out.
    var gainNode = audioCtx.createGain();
    gainNode.gain.setValueAtTime(0, audioCtx.currentTime);
    gainNode.gain.linearRampToValueAtTime(0.2, audioCtx.currentTime+0.75);
    gainNode.gain.setValueAtTime(0.2, audioCtx.currentTime+1.25);
    gainNode.gain.linearRampToValueAtTime(0.0, audioCtx.currentTime+2);

    // 1. Create the Convolver
    // 2. Decode from base64
    // 3. Decode the byte stream.
    // 4. On success, pass the decoded audio buffer to the reverbNode
    var reverbNode = audioCtx.createConvolver();
    var byteStream = base64ToArrayBuffer(impulseResponse);
    audioCtx.decodeAudioData(byteStream,
        function(audioBuffer) {
            reverbNode.buffer = audioBuffer;
        },
        function(e) {
            console.log("Error decoding audio data: " + e.err);
        });

    // Create the OscillatorNode setting its waveform
    var osc = audioCtx.createOscillator();
    osc.type = 'square';

    // 5. Add the reverbNode into the audio graph.
    osc.connect(gainNode).connect(reverbNode).connect(audioCtx.destination);

    // Set the frequency to change up and down by this amount.
    var vibratoDelta = 5;

    osc.frequency.setValueAtTime(440, audioCtx.currentTime);
    for(var i=0; i<2; i+= 0.125) {
        osc.frequency.linearRampToValueAtTime(440 + vibratoDelta, audioCtx.currentTime+i);
        vibratoDelta *= -1;
    }

    // Play for two seconds.
    osc.start(audioCtx.currentTime);
    osc.stop(audioCtx.currentTime + 2);
}

In the introduction to this post, I listed all the audio effects nodes that are available. You can combine all these different nodes in an infinite number of ways. Nodes are not limited to connecting to a single node. They are free to connect to any number of nodes, including making feedback loops to create all sorts of effects. This takes a lot of experimentation, much like you would have to do in a recording studio, to find combinations that sound good and do what you want.


Loading Audio Files

So far, we have been using a synthesized sound source by using the OscillatorNode. More often than not, however, you already have a sound source in the form of an audio file, whether it be an MP3, AAC, WAV or other format. Using the WebAudio API, it is very easy to use these as a source node.

If you have a long audio file, typically background music of some sort, you will use an HTML <audio> or <video> element as the source of the audio using the MediaElementAudioSourceNode. Short audio files, typically sound effects, you can load directly into memory using the AudioBufferSourceNode.


HTML Element

Let's start by using an HTML element as the source for the audio. The main difference as compared to the OscillatorNode is how to control the playback of the sound. In this case, the HTML element itself controls the playback and the MediaElementAudioSourceNode simply acts as a conduit to get the audio into the WebAudio API.

As a consequence, you don't get the WebAudio timing features for starting and stopping the sound. However, it means you can leave the HTML element UI on screen, letting the user control the playback as they normally would, but you have the sound running through your processing layer.

For this example, this file will be loaded as an HTML <audio> element. Sound by RHumphries (https://freesound.org/people/RHumphries/).

In this case, we will hide the audio controls and implement our own to demonstrate how that works. In the HTML file, we need to add the <audio> element with our audio file.

<button onclick="playAudioElement();">Play / Pause Audio Element</button>

<audio>
    <source src="crickets.ogg" type="audio/ogg">
    Your browser does not support the audio element.
</audio>

In the JavaScript, we get a reference to this element from the DOM and pass it to the audio context to create a MediaElementSourceNode. From the WebAudio API perspective, it is just another audio signal. You can take any audio that will play in the HTML element and completely alter the sound by passing it through as many audio nodes as you want. This is a very powerful feature.

var audioElem = document.querySelector('audio');

var elemGainNode = audioCtx.createGain();
elemGainNode.gain.setValueAtTime(1, audioCtx.currentTime);

var elemSource = audioCtx.createMediaElementSource(audioElem);
elemSource.connect(elemGainNode).connect(audioCtx.destination);

To start and stop the audio, we simply call play() and pause() on the audio element itself.

function playAudioElement() {
    if( audioElem.paused ) {
        audioElem.play();
    }
    else {
        audioElem.pause();
    }
}


Sound Effects

For shorter audio clips, it's easier to load them directly into memory for playback. You have to download the files using an XMLHttpRequest, so it's a little bit more complicated to set up in code. After that, the WebAudio API is exactly the same.

We will use this file as the audio to load. Sound by saphe (https://freesound.org/people/saphe/).

We use an XMLHttpRequest to get a sound file from the server. The XHR response byte stream is sent to the audio context decodeAudioData() function. The audio buffer is then sent to an AudioBufferSourceNode. This node is a one-time use only. You create it, put it in the audio graph and start it. You have to do that every time you want to play the sound effect.

You only load the audio buffer once, but you create a new AudioBufferSourceNode every time you play the sound effect, which is a trivial operation. There is effectively no limit as to how many sounds you can play at the same time.

// Stored sound effect audio buffer.
var soundEffectBuffer;

// Load the sound effect file using an XHR.
var xhr = new XMLHttpRequest();
xhr.open('GET', encodeURI("dog-barks.ogg"), true);
xhr.responseType = 'arraybuffer';

// Error handling
xhr.onerror = function() {
    console.log('Error loading from server');
};

// On successful loading, decode the xhr response and store
// in the soundEffectBuffer for later use.
xhr.onload = function() {

    audioCtx.decodeAudioData(xhr.response,
        function(audioBuffer) {
            soundEffectBuffer = audioBuffer;
        },
        function(e) {
            console.log("Error decoding audio data: " + e.err);
        });
};

// Execute the request
xhr.send();

// Play the sound effect buffer
function playSoundEffect() {
    if( soundEffectBuffer ) {
        var gainNode = audioCtx.createGain();
        gainNode.gain.value = 1;

        var bufferSource = audioCtx.createBufferSource();
        bufferSource.buffer = soundEffectBuffer;
        bufferSource.connect(gainNode).connect(audioCtx.destination);
        bufferSource.start();
    }
}


Conclusion

As you can see, the WebAudio API is not difficult to use. The documentation at the MDN is very good for the whole API and includes a lot more examples of advanced techniques.

Surely the most difficult part of this kind of audio programming is not in the code, but in learning the actual audio processing techniques. But even just playing back music and sound effects can be a great tool to have available for all sorts of applications.

I hope you have found this useful. Please let me know if you have any comments or questions.

Question or Comment?