To normalize audio is to change its overall volume by a fixed amount to reach a target level. It is different from compression that changes volume over time in varying amounts. It does not affect dynamics like compression, and ideally does not change the sound in any way other than purely changing its volume.
Why would we want to do this, what is the best way of doing it and what are the hidden dangers in terms of reducing sound quality? Let’s find out!
Why normalize audio?
There are only two good reasons to normalize:
1. Getting the maximum volume
If you have a quiet audio file you may want to make it as loud as possible (0 dBFS) without changing its dynamic range. This process is illustrated below.
2. Matching volumes
If you have a group of audio files at different volumes you may want to make them all as close as possible to the same volume. It may be individual snare hits or even full mixes.
Normalization can be done automatically without changing the sound as compression does. While this is a huge advantage, it can’t replace compression as it can’t affect the peaks in relation to the bulk of the sound.
This means you have far less control. Often normalizing audio just won’t work for matching volume levels, mastering engineers need not loose any sleep.
What is the best method to normalize audio?
There are different ways of measuring the volume of audio. We must first decide how we are going to measure the volume in the first place before we can calculate how to alter it, the results will be very different depending on what method we use.
Peak volume detection
This only considers how loud the peaks of the waveform are for deciding the overall volume of the file. This is the best method if you want to make the audio as loud as possible.
In digital audio you can’t get any louder than the highest peak at 0 dBFS, so normalizing to this value will create the loudest file you can.
RMS volume detection
This considers the “overall” loudness of a file. There may be large peaks, but also softer sections. It takes an average and calls that the volume.
This method is closer to how the human ear works and will create more natural results across varying audio files.
We are still limited to the fact that digital audio can’t go above 0 dBFS. This means that to make a group of audio files the same volume we may need to turn them all down so that none of their peaks clip (goes over 0 dBFS). This may not be desirable, an example would be in mastering.
Another problem is that RMS volume detection is not really like human hearing. Humans perceive different frequencies at different volumes. This is shown on the Fletcher-Munson curve below.
If one sound file has many frequencies between 1000 – 6000 Hz as shown in the diagram, it will sound louder.
RMS doesn’t take this into account. Luckily there is a recent solution, the new standard in broadcast audio, the catchily titled EBU R 128
This is a similar way to measure volume as RMS, but can be thought of as emulating a human ear. It listens to the volume intelligently and thinks how we will hear it. It understands that we hear frequencies between 1000 – 6000 Hz as louder and takes that into account.
We still have the same 0 dBFS problem mentioned for RMS, but now the different normalized audio files should sound much more consistent in volume.
What are the hidden dangers?
Normalization can be performed in a standalone program, usually an audio editor (like Sound Forge), or also inside your DAW. For the sake of this section we are assuming you are using an audio editor.
Inside a multi-track DAW project, when you are not exporting the normalized files individually, you probably won’t suffer from the problems we now mention.
- Peak normalization to 0 dBFS is a very bad idea for any parts to be used in a multi-track recording. It may not clip by itself, but as soon as you add any extra processing or play tracks simultaneously your DAW or plugins may overload. This subject comes under “gain staging”, a big subject to cover in the future.
- It is a destructive process. Performing any digital processing to a file is going to change it. Its bad reputation was mainly earned back in the days when digital files were all stored as 16 bit. If you turned the volume down you effectively reduced the bit depth. Your CD-quality 16-bit file could end up 12-bit or less. Even turning it up with peak normalization caused damage.
Nowadays audio editing software works internally at a much higher bit depth (often 32-bit floating point). This means that calculations are done much more accurately, and therefore affect the sound quality far less. This is only the case if we keep the file at the higher resolution once it has been processed!
To take advantage of the high quality of high bit depth inside audio editing software make sure all your temporary files are stored as 32-bit floating point. Also consider saving them in this format if you are going to do further processing.
Other important points to consider
- People often peak normalize their audio just so they can see the waveforms more clearly on the screen. This is a bad idea, your software should have an option to make the waveforms bigger without resorting to permanently altering the audio file.
- For the matching of volume levels for finished tracks, virtual normalization is possible inside many media players (inc foobar2000), the most popular is called ReplayGain. The aim of ReplayGain is to try to make all the different tracks of music play back at the same volume level without changing the actual file. It works by measuring the RMS or EBU R 128 volume of a file and then deciding how much it should be turned down to match other music also using the ReplayGain system. This figure is stored inside the audio file, when it is played the software can turn the volume down itself. It’s not perfect, but it’s a very interesting method to hear different songs at the same volume level and make the loudness war a total waste of time.
In summary, normalization is a very useful tool, but also one that can easily be abused and cause an unnecessary loss of sound quality. Understanding the difference between peak and RMS volume is vital. Use with caution.