This is the study of how the human brain, receives, processes and interprets sounds. This is a hugely complex field, in which an entire PhD can be focused on, so needless to say in this short chapter we will barely scratch the surface of the subject. However there are some more accessible aspects to the science which will have a positive bearing on your production skills and mixes when considered and then studied further. Think of this article as the entrance to the rabbit hole!
Firstly there are some software tools which will help you to further control and enhance your music’s psychoacoustic potential, by give accurate readings in terms of frequency content, stereo spread and phase correlation.... Let’s look further.
Spectrum Analysers
Analysers are devices which provide visual readings for the frequency content of a piece of audio, in real-time, or otherwise. They operate on the same principle as an Equalizer in that the frequency reading increases from left to right (x axis). And the volume of each frequency is measured in decibels (y axis). So we can basically see what frequencies a certain sound / part is made up of by running it into a Spectrum (or Frequency) Analyzer. These are very useful in modern production, as this now allows us to really get inside the tune and have an accurate visual display of what certain effects or processes are doing to the sound when applied. A big part of getting a good mix is to make sure the tune sounds full but not crowded. Using a combination of analysers and equalisers, you can essentially sculpt every sound to fit into your track so that it has its own space in the mix, but also is not leaving any frequency holes in the overall mix which can give a cheap, almost lacking sound. Generally spectrum analysers can be found in plug-in format, and there are even several freeware versions available online. When using a sequencing program it’s advisable to place a real-time analyser on the main output of the mixer so that whenever a sound is played on its own, a spectral reading can be taken, and a more accurate understanding of the particular sound is gained. In reverse you can play all the sounds together as the finished track, and then see what the overall frequency response is for the tune, maybe making adjustments so that Fletcher Munson theory can be implemented for a more aurally pleasing result (see later section on F-M curves). Some Frequency analysers (like the Waves™ PAZ seen above) also have a stereo readout built in, which makes it possible to see how much stereo variance there is within a musical part or recording. I.e. how much the signal naturally fluctuates from left to right in volume.
Stereo Imagers
Once the stereo content of a signal has been analysed, it can be further shaped and controlled so as not to take up too much space in the mix of the finished track, this is done with imagers. There are many available in plug-in format, ranging from multiband imagers (which break the frequency of the signal down into several bands which can be independently adjusted in terms of their stereo content) to simplified imagers which can easily turn a stereo signal into a mono one for purposes of a fuller mix down. The opposite process is often applied if a sound is sounding too central and flat. To resolve this, the sound can be ‘imaged’ or spread further across the stereo field to give a more acoustically pleasing effect, by using more of the distance between the left and right speakers. Ideally you are looking for a mix of mono (central sounds) and imaged (wider sounds) within your mix. Items such as Kick drums, Bottoms of snares and bass-lines tend to work better when in Mono, as this means they can be pushed louder, to give a punchier sound which comes across better on nightclub systems. As you move further up the frequency spectrum within your song sounds can be allowed to be spread wider. Also various stereo effects, such as delay, can be applied to provide interest to the mix.
Fletcher-Munson / Equal Loudness Curves
Ears do the Maths
As Humans we do not perceive all frequencies of sound at the same level. Our ears are more sensitive to some frequencies and less sensitive to others. In addition, this sensitivity changes with the sound pressure level (SPL)(Volume), too. From looking at the chart above you'll notice it is marked horizontally with a scale denoting the frequency of sound. Vertically it's marked in Volume (decibels). On the chart are a number of curved lines, each with a number (loudness level) marked. Let's begin by looking at the lowest solid line marked with a loudness level of 10 phons. (The loudness level in phons is a subjective sensation--this is the level we perceive the sound to be at.) From about 500Hz to roughly 1,500Hz the line is flat on the 10dB scale. This means that for us to perceive the sound being a loudness level (LL) of 10 phons, (the overall curved line), frequencies from 500Hz to 1,500 Hz must be 10dB. Make sense so far? OK, now look further into the higher frequencies, say 5,000Hz. Notice the line dips here--this says we perceive 5,000Hz to be 10 phons when the source is actually only 6dB. To perceive 10,000Hz at the same level (10 phons), it would need to be about 20dB. From this we can clearly see the ear is more sensitive in the 2,000Hz to 5,000Hz range, yet not nearly as sensitive in the 6,000Hz and up range.
Boost The Bass,,,,,,
Let’s take a look down at the lower frequencies now, say 100Hz. For us to perceive 100Hz as loud as we do 1,000Hz (when the source is at 10dB), the 100Hz source must be at 30dB–that's 20dB higher than the 1,000Hz signal! Looking even farther down, a 20Hz signal must be nearly 75dB (65dB higher than the 1,000Hz signal)! We can clearly see our ears are not very sensitive to the lower frequencies, even more so at lower SPL levels.
Why is this? A simply physical explanation is that resonance in the ear and ear-canal amplifies frequencies typically between 2,500Hz and 4,000Hz. Why didn't nature design our ears to hear every frequency at the same level? One reason could be this--because most intelligibility is found in the 2,000Hz to 5,000Hz range, He designed our ears to be more sensitive here. While our ears are capable of hearing the lower frequencies, our bodies feel them more than we actually hear them. This is the reason why many people who are nearly or completely deaf can still enjoy music--they can still feel the low frequency content in their bodies. (This assumes the level is sufficient that they can feel it. Often such people will actually sit on a speaker so they're in direct contact with it and the vibrations of the speaker are conducted right into their body.)
Notice how as the overall loudness level increases that the low frequency curved lines flatten out. This is because at higher SPL's we're more sensitive to those lower frequencies. Also notice that as the SPL increases we're less and less sensitive to the frequencies above 6,000Hz. This explains why soft music seems to sound less rich and full than louder music--the louder the music is, the more we perceive the lower frequencies, thus it sounds more full and rich. This is why many stereo systems have a loudness switch--when you're listening to the stereo at low volumes, you activate this switch which boosts the low and some of the high frequencies of the sound.
Typically people become uncomfortable with levels above 100dB. You'll notice 100dB is needed to perceive a loudness level of 100 phons at 1,000Hz--only 90dB is required to give a perceived loudness level of 100 phons at 4,000Hz. Again, about 104dB is required to produce a perceived loudness level of 100 phons at 100Hz.
Why is all of this so important? Simply put, it helps us understand why many subwoofers are required to produce a loudness level equal to those attained at higher frequencies. It shows us how much more sensitive our ears are to the higher frequencies which can become very piercing if too loud.
Equalization Tip
Many times it helps to use an equalizer to cut some of the frequencies around 2,000Hz to 5,000Hz a little if music is being played loudly. This action keeps the sound crisp sounding, but not distorted and piercing at higher SPL (volume) levels. With careful use of analyser readings you are able to follow the General shape of a Fletcher Munson curve over the output of your main mix for improved Psychoacoustic effect. As this matches the natural response curve of your ear, the overall effect is a balanced sounding mixdown.
Simultaneous Frequency Masking
Frequency masking is a Psychoacoustic effect when two or more sounds or instruments are competing for space within your mix when they occur at the same time. This is something that you will already be having problems with in your music, but are unaware of its nature.
Basically, a sound is normally made up of several harmonic ‘sub sounds’ that contribute to its overall timbre. If two parts share similar frequencies you normally find yourself in the position where some of these harmonics are being masked in the mix; meaning that the instruments sound different in the mix than they do when in solo (often appearing thinner)
Testing for Frequency Masking
- Listen to your mix in mono. This sums everything together and pressure tests the overall track, this is particularly if you are looking to play your music on a nightclub sound system as they often output stereo recordings in mono.
- Level all your tracks to unity 0 dB and then pan each track to the left speaker followed by the right. You're listening to see if each sound sweeps audibly from the left, through the centre, and off to the right - If you can hear it crossing over, then you're more than likely free of any heavy masking.
Sort it out...
- Try using imagers set to mono on offending sounds and then re-panning from left to right
- Establish the most important sound. Make it the priority over the other accompanying parts - In other words: Filter the secondary sound with EQ where possible.
- Use the problematic part elsewhere in the song, when it is not clashing with others.
- Sometimes an entirely new sound (particularly when working with synthesisers) needs to be made / used as the current one is naturally sitting on the wrong note (too high or low) and masking other instruments.
Temporal masking or "non-simultaneous masking"
This is when the signal and masker are not presented at the same time. This can be split into forward masking and backward masking. Forward masking is when the masker is presented first and the signal follows it. Masking that obscures a sound immediately preceding the masker is called backward masking or pre-masking and masking that obscures a sound immediately following the masker is called forward masking or post-masking. Temporal masking's effectiveness attenuates exponentially from the onset and offset of the masker, with the onset attenuation lasting approximately 20 ms and the offset attenuation lasting approximately 100 ms. Similar to simultaneous masking, temporal masking reveals the frequency analysis performed by the auditory system; forward masking thresholds for complex harmonic tones (e.g., a saw tooth wave with a fundamental frequency of 500 Hz) exhibit threshold peaks (i.e., high masking levels) for frequency bands centered on the first several harmonics. In fact, auditory bandwidths measured from forward masking thresholds are narrower and more accurate than those measured using simultaneous masking.
Auditory Streams
Our auditory system has evolved to become remarkably effective at separating streams of sound from different sources, and in fact, it is difficult for us, and requires considerable conscious effort, to group even very similar ‘sound streams’ (in terms of pitch and timbre) into a single perceived stream. Our ears perceive melodies much easier if all the tones belong to the same audio stream – you don’t hear too many catchy riffs that are made up of several different instruments! This concept is a useful one to remember when you building a track – think of complex evolving pads and acid leads that ‘move’ gradually over time – these are often perceived as single streams, but the composite sounds change gradually, timbres shifting, ADSR settings adjusting, etc, and our minds are guided into analysing these as single streams, despite the fact that the start and end of a stream might be perceived as separate if heard together.
Experimenting with this idea can give very interesting sonic perceptions. Try making sounds jump out suddenly from a stream by introducing a sudden distortion, or creating a riff from a series of sounds that are sonically similar, but not identical, to give a fast dynamic edge to a track– Noisia style Reese progressions, for example - or try modulating multiple parameters on a looped sample gradually to give an evolving deep atmosphere or psychedelic house lead.