Before we talk about center channels, it’s probably helpful to understand a bit about how recordings are made and why and under what conditions a stereo recording works with a stereo system.
2-channel recordings are made up of three kinds of “sounds”:
- Pure stereo information: These are sounds that are ONLY recorded in the left or the right channel. Sometimes these are called “differential.”
- Mono information: These are sounds that are recorded exactly the same in the left and right channels. These are sometimes called “common.”
- Off-center information: These are sounds that are recorded in both left and right, but at different levels or different phase.
The top half of this display is the left channel and the bottom half is the right channel.
It’s pretty simple to see that the left drum beat (the first one) is only recorded in the left channel and the right drum beat is only recorded in the right channel. The drum beat in the center is recorded in both channels.
If you look carefully, you’ll see that in the left channel, the beats gradually decrease in level and in the right channel, the beats gradually increase in level.
The fourth one (the one in the center is the same level in both channels.
In order for the drum beats to sound like they are the same volume, the total energy of the two channels combined has to be the same. For the center drum beat to be exactly as loud as the left or the right, the sound has to be recorded 6dB lower in the right and the left. If it’s less than -6dB, then the center will be quieter than the left and the right. If it’s more than -6dB, then it will be louder.
The second and third drum beats and the fifth and sixth, are recorded so that they appear in between the center and the left and the center and the right. A 6dB difference in level moves the image HALFWAY.
the sum of the energy in the left and the right channels is the same for each of the drum beats, then the sound of each of the beats will sound like they are the same level.
In the picture below, I’ve mixed the stereo track to mono to see if this is the case for this track.
It isn’t. The second, third, fourth, fifth and sixth are all louder than the first (left) and the 7th (right). The fourth (center) is slightly quieter than the second, third, fifth and 6th.
Nevertheless, in a properly set up stereo system, we’ll hear the beats move pretty evenly from left to right.
To make the definitions above clear, the first and last beats are “pure stereo”. The fourth is “mono”. The second and third and fifth and sixth are “off-center”.
This is called “panning” and it’s how the recording engineer who does the final mix places instruments and voices across the stage in the track.
Raising the level of the mono information compared to the stereo or off-center information helps to move images forward and rearward. Some additional realism can be added using reverb.
OK. A stereo system that consists of two speakers is ideally designed to pass these conditions through the system unchanged. If that’s the case, the the panning and reverb works and we hear the instruments and vocalists in their proper arrangement across the stage in between the two speakers.
I’m not going to discuss ambient sounds or sounds that seem to come from outside the bounds of the speakers in this tech tip–this is about center channels and the processing required to make them work.
The conditions for a properly set up STEREO system are that the levels of the two speakers and the frequency response must be precisely matched and the sound from the left and right speakers should arrive at the listener at the same time.
In a room, that looks something like this:
Traditionally, for a stereo system at home, we simply place the chair in between the two speakers we assume are already precisely matched from the factory and that’s it. Simple. If the speakers have reasonably flat frequency response, then what’s on the CD passes through the system mostly unchanged and we hear something pretty close to what’s on the CD.
What if we want to share the experience with someone else?
Or, what if we provide a venue for lots of people to experience something that’s been recorded at the same time. Like here:
What if we’re charging everyone in every seat $25? Obviously, anyone who cares about the whole experience will book their seat early and choose one as close to the center as possible. Everyone in the center in each row will get a pretty good representation of placements during the show, but what about the people on the left and on the right?
The audio track that accompanies video in movies can add a sense of realism. Watching Transformers on an iPad with the iPad speakers, for example, doesn’t provide the realism that we expect when we fork over a bunch of cash to go to the movies.
This is what multichannel was originally designed for: Movies.
In order to provide a sense of realism, it’s helpful for dialogue to appear to come from the location of the screen or the location of the image of the person speaking on the screen. When something flies over, it’s helpful for the sound to also fly over.
This is what multichannel does. It provides better realism.
This is what the ENCODED audio track on a DVD does. It includes six discrete streams that can be DECODED by a decoder into six channels of audio. Newer formats will encode many more tracks–all in the interest of adding realism.
Years ago, movies weren’t distributed on digital media and only two tracks of audio were available–on film and also on VHS tapes.
So, upmixing was designed to “encode” all of the surround information into the two audio tracks that were available.
Multichannel systems that relied on upmixing to place sounds in discrete channels relied on “decoding” the panning between left and right channels.
Early analog versions “Dolby” decoded left right and a rear channel. To encode a sound into the rear, the mixer put the information in the left and right channels 180 degrees out of phase. The decoder would send that to the rear speakers.
That decoder was included in receivers for use at home when watching movies.
Later, the technology was adapted to add an output for a center speaker, and that was designed to place the vocal track at the screen in the center speaker.
The most popular format was Dolby Pro Logic. It worked well for movies, but music listeners were often too lazy to turn off the decoder when they stopped watching movies and listened to music. The original pro logic didn’t sound so good with music because it often placed TOO MUCH information in the center and too much information in the surrounds.
For years, multichannel was maligned by audiophiles for this very reason.
Finally, some of the algorithm was changed and a “music’ setting was added. The damage was done, however.
Because the original objective for multichannel audio was enhancement of realism for video, and because both the encode./decode format (discrete surround) and upmixing (turning 2-channel program material into multichannel material) we both called “surround”, confusion ensued.
There’s a poor understanding of the difference in the audio industry and among enthusiasts and the two are often conflated.
OK, back to the topic: center channels.
In a stereo system, there’s ONE good seat. That means only ONE listener can enjoy high fidelity playback at a time. That listener has to be placed in between the speakers.
What about a car?
In a car, no one sits in the center.
So, we have to optimize this somehow. We can EQ the left and the right channels independently, so that the sound from both speakers is precisely matched in level and in frequency response.
Then, in order for stereo to work, we have to fix the arrival time problem in order for what’s on the disc to pass through the acoustic system as unchanged as possible.
So, we delay the left channel so sound from the left speaker arrives at the same time as sound from the right speaker.
I see diagrams like this one all the time used to illustrate this. It’s a complete misrepresentation.
The left speaker isn’t “moved outward”. It just plays a little later than the right speaker.
It’s a stupid graphic. It’s similar to saying that if you and I are going to meet at the Blue Note at 9 PM and I live next door to the club and you live on Long Island, if I leave an hour later than you so we arrive at the same time, my house is farther away. That’s ridiculous.
This is NOT what happens:
This is what happens:
We simply hear stuff that’s recorded the same in both channels in the center, as we would if we sat in the center. The center image appears directly in between the two speakers.
When we adjust the levels and the delays to compensate for one listener in the car, we make the problem worse for someone sitting in the other seat.
The sound from the right speaker arrives much earlier and the level of the right speaker is higher–we turned it up a little bit to compensate for the attenuation that happens because it’s farther away.
When we tune a stereo system, we often use delays to set the image.
When it’s correct, the center is in the center.
No matter what we do, we can only optimize this for one seat.
So, what if we put a speaker in the center?
If we combine a center speaker with some software that can send the mono information to the center and remove it from the left and the right, then the center information appears in the center no matter where we are seated.
This is what an upmixer does. It EXTRACTS the mono information and sends it to the center speaker. It also removes some (or all) of the center information from the left and the right.
Since the car is symmetrical and so are the listening positions, this works in both seats.
Now, the only information that relies on an acoustic sum that’s correct (like our stereo system) are the images that appear in between the center speaker and the right or left speaker, like this:
The same rule applies, here. The center speaker should have the same frequency response as left and right and should arrive at the same time.
Check out the distance between the left speaker and the listener and the center speaker and the listener. Much more similar than the left and right. This dramatically improves the time problem and dramatically minimizes our need for delay–which is the tool that fixes one side and destroys the other.
So, now we can use simple level adjustments to optimize the image. Easy peasy.
So, what happens if instead of using an upmixer to extract the mono information, we just use a summed signal (L+R) for the center?
Think about the rules.
And this is why center channels get a bad rap.
If one considers all the mystery surrounding how to determine exactly what the arrival time of a speaker actually is (do I use a tape measure, do I measure impulses, so I use clicks and pops and screw around for a week), this is a much simpler and more straightforward way to tune the front image.
You should hear this:
What if we use a simple sum (L+R) for a center signal?
Now, if we have a signal that’s recorded in only the left channel or the right and we’ve set our center channel level to be as loud as the left and the right, what happens?
The Left image appears in between the left speaker and the center speaker and the right image appears between the right speaker and the center channel. We’ve reduced the stage width by half.
So, you’ll hear all kinds of crap about upmixing:
1. It’s bullshit, and it doesn’t work
2. It isn’t “real”
3. I don’t watch movies in the car.
4. It sounds over processed
5. It narrows the stage
6. There isn’t enough encoded material
None of that is true or relevant.
It’s just easier. It sounds at least as good as a properly set up stereo system. And, it works in both seats.
Ken Ward: “I’ve been doing this over 30 years. I’ve seen many multichannel approaches. I hadn’t seen one sound good until this past year, using PLII and Andy’s upmixer tuning recommendations.
The people who have sold these products haven’t done a good job of communicating the purpose of their upmixing. I read the white papers on Logic 7. I read a LOT of stuff. And until this year, I wouldn’t have sold any of my clients multichannel sound because I hadn’t succeeded in making any of it sound any good. I can make stereo sound great, and my reasons for avoiding multichannel sound in cars for 30 years were good and reasonable and based on experience, not ignorance.”
Andy Wehmeyer: Of course, I agree with that. To Ken’s point, I will agree that all of these “advancements” have fallen short in one way or another. Some of them have been terrible, but most of them haven’t been well understood.
So, here is an explanation of why some of these processors have really sucked, and why most of them have included limitations. Many of them, used carefully and set up properly have provided better performance in both seats, but rarely matched the performance of simple stereo in ONE seat.
OK…for “Surround Sound” there are two basic objectives.
Originally, the objective was to increase the spaciousness or sense of envelopment in a MOVIE soundtrack, and in audio playback at home too.
As I mentioned in the tip yesterday, the medium for distribution of video to homes and to theaters was film or tape of some kind which contained only 2 audio channels. Whatever audio track came along with the video had to be included in these two channels.
Watching a movie in stereo, provided the speakers were large enough and the amplifier powerful enough, could adequately provide the full range of audio, including the visceral impact of loud events, but it couldn’t provide a 3-dimensional experience (including sounds behind and to the side of the viewers. Falling rain, for example, is an event that should appear to happen all around the viewer, even as dialogue should be presented from the location of the screen, in front.
So, rear speakers were added. Sending simple stereo to the rear speakers could make the rain sound like it came from all around, but that would also screw up the placement of the dialogue. That’s the same as rear speaker in a car, which, if they play regular stereo, screw up the image in the front.
Yuck.
OK, so how do you get a separate rear speaker channel into a 2-channel medium?
You encode them. That means you include them in a way that they can be extracted later. Ideally, the encoding would be done in a way that doesn’t compromise the performance if the track is played back in simple stereo.
So, in a stereo system we have two inputs and two outputs. Those are left and right.
In an original “surround sound” processor, there would be two additional outputs, left rear and right rear. If we’re just using a stereo system, we would simply hook up to the left and right outputs.
Remember, our film or tape only has two channels, so this has to work even WITHOUT an additional processor: if you play the movie WITHOUT the processor, it has to sound like stereo.
I think the original encode/decode may have been Dynaquad, which was used on some record albums. It didn’t perfectly preserve the stereo when played back over two channels, but it was pretty close.
Here’s how that works:
Below are the encode and decode matrices (thanks Wikipedia).
Don’t worry, this isn’t as hard as it may seem.
On the left, you have Left Total and Right Total. That’s what’s recorded on the left and right channels. On the top, you have the components of that signal. Ideally, for playback over 2-channels, the LT (left total) and RT (right total) would equal L and R. If the only change was additional output level, that would be OK.
In the matrix at the top, you have ENCODE. That’s what goes into the recording. On the bottom, you have what the processor does to extract the channels.
So, for ENCODE, on the left recorded channel, we have Left information (1 indicates no attenuation or amplification). We also have some right information (.25) indicates that the left channel is included in the right but attenuated by 12dB (20log (1/.25)).
Stick with me…
Also in the left channel, we have another left. That’s Left Back. OK. Now we have two lefts (so the left signal is twice as loud). Finally, the right channel is mixed in out of phase and attenuated by 6dB. That’s what -0.5 means.
The right channel is the opposite of that.
So, if I play this back on a regular stereo system without the decoder, in the left, I get left x 2 and right out of phase and attenuated by 6dB. However, because i have 2 lefts, the relative difference between the left and right signal in the left is 12dB. So, in stereo, I get a little more separation and a little less mono (phantom center) because some of the common information is cancelled (because a little right is inverted and mixed into left).
The opposite happens in the Right channel.
So, when we hook up a stereo system, we don’t get EXACTLY what we expect, but it probably doesn’t suck.
When we play this back through the processor that includes a DECODER, look what happens. The left front gets just Left Front. That’s the 1. The right front gets the Right Front.
The FRONTS play back in stereo.
The left back gets attenuated left and some attenuated right out of phase. The right back gets attenuated right and some attenuated left out of phase. That means that MONO information is attenuated in the rear channels. This adds some sense of spaciousness.
So, the result of listening to a Dynaquad system, so long as you sit in the sweet spot, just like a regular stereo system, is stereo with enhanced spaciousness.
If you don’t use the decoder, you get stereo, basically.
If you don’t sit in the center, you don’t get stereo, just like when you don’t sit in the center with a regular stereo.
OK. So the idea with Dynaquad was to add some spaciousness to the sound for music playback.
OK for the sake of brevity, lets skip ahead to Dolby–there were a bunch of intermediate schemes. No need to go into that.
Remember, this is a way to get surround (rears) onto a 2-channel medium and not screw up stereo playback too badly.
Below is the Dolby Surround Matrix:
So, the original Dolby Surround included a center and one surround channel. The CENTER channel is Left + Right. One divided by the square root of two is .707. That’s equal to 3db of attenuation. So, the center channel plays left plus right and plays that 3dB louder than the left or the right.
So, if we sit in the center, like we do in a stereo system, the phantom center works OK. Some narrowing of the stage happens. But…hang on…
The +j is a positive 90 degree phase shift and the -j is a negative 90 degree phase shift.
Now, the rear surround speakers are out of phase, they’re both attenuated by 3dB, and they help to restore the spaciousness to the front. Basically, the signal to BOTH rears is L-R (the difference signal).
The ESP-3 did something similar.
Now, when played back over stereo, this works pretty well. When decoded, we have good performance for movies–mono information which is mostly dialogue and stuff that happens in front of us is firmly anchored in front and the out of phase stuff comes from the back, too.
More spaciousness.
OK…that worked pretty well, but the next improvement was to be able to locate sounds to the sides and to be able to differentiate between left and right in the surround speakers.
Next came the first Dolby PL2:
In PL2, the center is still 1.4(L+R). That’s Left plus Right playing at +3dB.
But now we have sides and rears, both phase shifted by + or – 90 degrees and the Left total and Right total are similar enough to stereo that when we listen to an ENCODED soundtrack in stereo, this sounds OK.
Remember, this DOESN’T enhance the front stereo very much for an offset listener.
OK…so, we need an additional improvement. Movie theaters are full of people and with all of these, there’s no improvement to left center and right localization for people who aren’t seated in the center.
So, what to do?
PL2 does a pretty good job of resolving left and right in the sides and surrounds, but it narrows the stage in the front a little bit. But the front ONLY works for one seat.
Now, at the same time that all of this is going on, there are some other people working on other ways to add spaciousness for AUDIO playback. Remember, these matrices are designed for VIDEO.
The other guys are trying to find ways to add spaciousness by adding reverb and delay and applying that to the rear speakers and sometimes to the front speakers, too. Early reflections (or short reverb) added to the front can provide a sense of a space around the instruments. Late reflections can provide a sense of a listening space.
This was the thinking behind processors like the Yamaha DSP1, which provided lots of SOUND FIELDS, like Church, Hall, Stadium, Jazz Club. These were AUDIO processors.
All of these were ways to enhance the sense of space, but none of them addressed the localization problem in the FRONT.
Now, the problem in the front is crosstalk, or channel leakage. We want to enhance the localization of a center vocal or center dialogue, which is also present in the left or the right speaker, we can make the decoder determine if the dialogue is, in fact, in the center and then we can force the decoder to turn down the left and the right speakers during the dialogue.
Simple, when something is in the center, the left and right speakers don’t play.
That works fine when only one thing is happening, but if someone is speaking and something else is going on in the left or the right channel, then it’s attenuated along with the center information. That sucks and no one likes it.
Another way to do it is to use phase to cancel the center signal in the left and right, but that doesn’t work either because cancelling one channel’s content adds inverted signal to the other channel.
Ugh. None of this works.
So, we’re still stuck. we need a way to preserve ambient sounds but shift the focus of the algorithm to the sound that’s DOMINANT.
What we need is a vector. We can do front to rear OK and left to right OK, but the combination is a problem.
So we need a way to determine the ANGLE of the dominant sound, so we can use an ACTIVE MATRIX to turn speakers up and down quickly so that we minimize annoying loss of events in the sound field and provide a better representation of placement in front.
So, things that are recorded out of phase are REAR DOMINANT and in phase are FRONT DOMINANT. Level between right and left determines LEFT and RIGHT DOMINANCE.
Now we can calculate a vector and use the resulting math to turn speakers up and down quickly. In the diagram above, the resulting vector would cause the left, the rear left and side left speakers to be turned down a little and the front right, and side right to be turned up a little bit.
This happens really quickly so we don’t hear it–kind of like frame rate in a movie.
This is how matrix processing works. Dolby PL2 and Harman’s Logic 7 processors are matrix processors.
Now, all of this was done for MOVIES. What happens when the signal isn’t encoded?
To Ken’s point, none of this was very well communicated and all of it was intended for movies. People are lazy and when they put in music, they didn’t turn off the encoder. Plus, they wanted all the speakers they had paid for to play.
Audiophiles hated the way this sounded with music for a couple of reasons:
1. The math was designed to firmly place dialogue in the center, so the turning down of the left and right speakers was too aggressive. It resulted in a narrow sound.
2. Out of phase information was steered to the rear. At the time, guys mixing music would sometimes attempt to add spaciousness to stereo tracks by putting some sounds that appeared in one channel out of phase in the other channel. This breaks the algorithm and those sounds would steer rear. Later, when compressed files became available, some of them didn’t preserve the phase relationship between left and right and that would also cause steering artifacts.
These sometimes sounded like the rear and side speakers would turn on and off. With MS-8, moving the fader to the back made this problem apparent and I used to get calls all the time asking what was broken.
Keep in mind that all of this was designed to get a movie soundtrack onto a 2 channel medium and that performance for MUSIC was a secondary consideration for the people who designed this stuff. Improvements for music were all about tweaking around the margins for what improvement was possible for when people forgot to turn off the decoder.
Then, came DVD. Now it was possible for the audio track to contain lots of additional information. In addition to a basic stereo track, there were also five additional digital streams for the additional channels in a surround system. If you wanted stereo, you could choose to play back only the unaltered stereo track–your receiver would do this automatically. If you had a surround decoder, then the surround streams would be played.
The benefit of this is that now the guys mixing the soundtrack for the movie could place events in discrete channels. The surround didn’t have to be included in the stereo track, but it often was for BACKWARDS COMPATIBILITY.
That meant that if you had an old decoder, you could still hear surround when you watched a movie. That separate track was often available in the DVD menu, and it was called 2-channel downmixed.
And this is where the confusion came from. In an effort to automate all of this, marketers failed to explain any of it. They just figured that the receiver would take care of it and no one would have any trouble.
The license for the software often included the discrete decoder (Think Dolby Digital), the adaptive matrix (the new Pro Logic 2) and even the original Dolby Surround. All of this was too complicated for casual users and all of it was lumped in together, despite being designed for different systems. PL2 was later updated with a Music setting that worked really well for music too, but it was so late that audiophiles never caught on.
This confusion is similar to the confusion over bluetooth audio quality. At the baseline, it has to work, so it defaults to the best configuration supported by both the transmitter and the receiver. Some configurations sound good and some not so good, but no one except for the enthusiasts know the difference.
DTS, which is the Dolby competitor had a similar collection of software sold in the same way.
Another upmixer to watch out for: Harman QLS 3-D (Quantum Logic Surround).
I can only find three car brands that use this: Hyundai/Kia, Maserati and Lincoln, and I can’t find any lists or any additional information about it on the Harman site.
Anyway, here’s how it works.
In a stereo system, images are placed across the soundstage in a process called “panning”. Sounds on the left are recorded in the left channel. Sounds on the right are recorded in the right channel. Sounds in between are recorded in both channels at different levels. Sounds that should appear in the center are recorded the same in both channels. Sounds that are left of center are LOUDER in the left channel than in the right channel. A 6dB difference between the channels places the sound halfway between the center and the left or the right.
The sense of space in the recording may be real or synthesized. That sense of space is generated by reflections in the room in which the recording was made, or they may be generated using some kind of reverb (old school) or a convolution process (combining the impulse response of a real or a synthesized room) with the recorded signal. These reflections happen LATER in the audio signal than the initial sound of the instrument.
QLS separates these signals in two ways, and it’s quite ingenious. To separate the panning across the front stage, it uses a series of filters that extract the common signals for the center, the left and right signals for left and right and intermediate signals for sounds in between left and right.
Basically, these filters compare the left and right channels to determine whether a sound is the same in the left and right channels and for level differences. I think there are seven filters. Once those sounds are extracted, they can be steered around the speakers in a car like a horseshoe. There are basically two “modes”: In the audience or on the stage. On the stage mode wraps the sides all the way around to the back speakers. In the audience does less wrapping.
In addition, there is a filter which separates the initial sound from the room response included in the recording. The room ambience is extracted from the recording and it can also be steered to the output channels. The level of the room can be increased or decreased in each of the channels to place the listener in the front of the room or in the back of the room. The ambience can also be eliminated and the listener is left with a completely “dry” recording.
Depending on the settings, this may or may not sum through a summing processor acceptably. If it cannot be defeated entirely, then grabbing the left channel and grabbing the right channel is NOT going to be sufficient for any system that has to sound great.
If you run into this thing, you should use it or remove it entirely, if possible. In the Hyundai Genesis, you may not be able to remove it unless there’s a MOST adapter that makes it possible.
So, Ken Ward, you’re right. In audio, Surround Sound may be the most poorly explained product ever.
And none of it was designed for cars. It was included in head units back in the day because many of them played DVDs and the upmixer was included in the license. Hey, who doesn’t want to add a million additional features to the packaging?
No one explained it. Few people understood it. Lots of products included it. Lots of customers and installers were confused by it.
Now that super fast DSPs are available, there are lots of possibilities. Instead of the adaptive matrix that turns channels up and down to direct the sound in a particular vector, we can just extract information and send it to the right speaker. We don’t have to rely on phase cancellation and level to remove sounds.
That means we can upmix differently and design the performance for different spaces and different experiences.
In the car, we don’t care about resolving a plane flying overhead and we don’t watch movies very often, so maybe we don’t care about the ability to place rear events in the rear.
What we care about in the car is expanding the sense of space so it doesn’t sound like we’re in a car. We want a wider and deeper stage.
Maybe we don’t care about a stage and we want to distribute sound to all the speakers in a way that sounds kind of like what we hear in a club.
Most upmixers in the past were designed mainly for movies, but not any longer.
So, try to keep an open mind. Some of these things can simply improve the stereo we are used to in a way that’s in keeping with traditional goals for stereo systems: better believability, a wider and deeper stage and acceptable performance in more than one seat.