Pro Engineer School Vol.1
Download Pro Engineer School Vol.1...
The Use of Microphones
Loudspeaker Drive Units
Digital Audio Tape Recording
Appendix 1 – Sound System Parameters
Copyright Notice This work is copyright © Record-Producer.com You are licensed to make as many copies as you reasonably require for your own personal use.
Chapter 1: Microphone Technology The microphone is the front-end of almost all sound engineering activities and, as the interface between real acoustic sound travelling in air and the sound engineering medium of electronics, receives an immense amount of attention. Sometimes one could think that the status of the microphone has been raised to almost mythological proportions. It is useful therefore to put things in their proper perspective: there are a great many microphones available that are of professional quality. Almost any of them can be used in a wide variety of situations to record or broadcast sound to a professional standard. Of course different makes and types of microphones sound different to each other, but the differences don't make or break the end product, at least as far as the listener is concerned. Now, if you want to talk about something that really will make or break the end product, that is how microphones are used. Two sound engineers using the same microphones will instinctively position and direct them differently and there can be a massive difference in sound quality. Give these two engineers other mics, whose characteristics they are familiar with, and the two sounds achieved will be identifiable according to engineer, and not so much to according to microphone type. There are two ways we can consider microphones, by construction and by directional properties. Let's look at the different ways a microphone can be made, to start off with. Microphone Construction There are basically three types of microphone in common use: piezoelectric, dynamic and capacitor. The piezoelectric mic, it has to be said, has evolved into a very specialized animal, but it is still commonly found under the bridge of an electro-acoustic guitar so it is worth knowing about. Piezoelectric The piezoelectric effect is where certain crystalline and ceramic materials have the property of generating an electric current when pressure or a bending force is applied. This makes them sensitive to acoustic vibrations and they can produce a voltage in response to sound. Piezo mics (or
transducers as they may be called - a transducer is any device that converts one form of energy to another) are high impedance. This means that they can produce voltage but very little current. To compensate for this, a preamplifier has to be placed very close to the transducer. This will usually be inside the body of the electro-acoustic guitar. The preamp will run for ages on a 9 volt alkaline battery, but it is worth remembering that if an electro-acoustic guitar, or other instrument with a piezo transducer, sounds distorted, it is almost certainly the battery that needs replacing, perhaps after a year or more of service. Dynamic This is ‘dynamic’ as in ‘dynamo’. The dynamo is a device for converting rotational motion into an electric current and consists of a coil of wire that rotates inside the field of a magnet. Re-configure these components and you have a coil of wire attached to a thin, lightweight diaphragm that vibrates in response to sound. The coil in turn vibrates within the field of the magnet and a signal is generated in proportion to the acoustic vibration the mic receives. The dynamic mic is also sometimes known as the moving coil mic, since it is always the coil that moves, not the magnet - even though that would be possible. The dynamic mic produces a signal that is healthy in both voltage and current. Remember that it is possible to exchange voltage for current, and vice versa, using a transformer. All professional dynamic mics incorporate a transformer that gives them an output impedance of somewhere around 200 ohms. This is a fairly low output impedance that can drive a cable of 100 meters or perhaps even more with little loss of high frequency signal (the resistance of a cable attenuates all frequencies equally, the capacitance of a cable provides a path between signal conductor and earth conductor through which high frequencies can ‘leak’). It is not necessary therefore to have a preamplifier close to the microphone, neither does the mic need any power to operate. Examples of dynamic mics are the famous Shure SM58 and the Electrovoice RE20. The characteristics of the dynamic mic are primarily determined by the weight of the coil slowing down the response of the diaphragm. The sound can be good, particularly on drums, but it is not as crisp and clear as it would have to be to capture delicate sounds with complete accuracy. Dynamic microphones have always been noted for providing good value
for money, but other types are now starting to challenge them on these grounds. Ribbon Mic There is a variation of the dynamic mic known as the ribbon microphone. In place of the diaphragm and coil there is a thin corrugated metal ribbon. The ribbon is located in the field of a magnet. When the ribbon vibrates in response to sound it acts as a coil, albeit a coil with only one turn. Since the ribbon is very light, it has a much clearer sound than the conventional dynamic, and it is reasonable to say that many engineers could identify the sound of a ribbon mic without hesitation. If the ribbon has a problem, it is that the output of the single-turn ‘coil’ is very low. The ribbon does however also have a low impedance and provides a current which the integral transformer can step up so that the voltage output of a modern ribbon mic can be comparable with a conventional dynamic. Examples of ribbon mics are the Coles 4038 and Beyerdynamic M130.
Capacitor The capacitor mic, formerly known as the ‘condenser mic’, works in a completely different way to the dynamic. Here, the diaphragm is paralleled by a ‘backplate’. Together they form the plates of a capacitor. A capacitor, of any type, works by storing electrical charge. Electrical charge can be thought of as quantity of electrons (or the quantity of electrons that normally would be present, but aren't). The greater the disparity in number of electrons present – i.e. the amount of charge – the higher will be the voltage across the terminals of the capacitor. There is the equation:
Q=CxV or: charge = capacitance x voltage Note that charge is abbreviated as ‘Q’, because ‘C’ is already taken by capacitance. Putting this another way round: V = Q/C or: voltage = charge / capacitance Now the tricky part: capacitance varies according to the distance between the plates of the capacitor. The charge, as long as it is either continuously topped up or not allowed to leak away, stays constant. Therefore as the distance between the plates is changed by the action of acoustic vibration, the capacitance will change and so must the voltage between the plates. Tap off this voltage and you have a signal that represents the sound hitting the diaphragm of the mic.
Sennheiser MKH 40
The great advantage of the capacitor mic is that the diaphragm is unburdened by a coil of any sort. It is light and very responsive to the most delicate sound. The capacitor mic is therefore much more accurate and faithful to the original sound than the dynamic. Of course there is a
downside too. This is that the impedance of the capsule (the part of any mic that collects the sound) is very high. Not just high - very high. It also requires continually topping up with charge to replace that which naturally leaks away to the atmosphere. A capacitor mic therefore needs power for these two reasons: firstly to power an integral amplifier, and secondly to charge the diaphragm and backplate. Old capacitor mics used to have bulky and inconvenient power supplies. These mics are still in widespread use so you would expect to come across them from time to time. Modern capacitor mics use phantom power. Phantom power places +48 V on both of the signal carrying conductors of the microphone cable actually within the mixing console or remote preamplifier, and 0 V on the earth conductor. So, simply by connecting a normal mic cable, phantom power is connected automatically. That's why it is called ‘phantom’ – because you don't see it! In practice this is no inconvenience at all. You have to remember to switch in on at the mixing console but that's pretty much all there is to it. Dynamic mics of professional quality are not bothered by the presence of phantom power in any way, One operational point that is important however is that the fader must be all the way down when a mic is connected to an input providing phantom power, or when phantom power is switched on. Otherwise a sharp crack of speaker-blowing proportions is produced. A capacitor microphone often incorporates a switched -10 dB or -20 dB pad, which is an attenuator placed between the capsule and the amplifier to prevent clipping on loud signals. Electret The electret mic is a form of capacitor microphone. However the charge is permanently locked into the diaphragm and backplate, just as magnetic energy is locked into a magnet. Not all materials are suited to forming electrets, so it is usually considered that the compromises involved in manufacture compromise sound quality. However, it has to be said that there are some very good electret mics available, most of which are backelectrets, meaning that only the backplate of the capacitor is an electret therefore the diaphragm can be made of any suitable material. Electret mics do still need power for the internal amplifier. However, this can take the form of a small internal battery, which is sometimes convenient.
Electret mics that have the facility for battery power can also usually be phantom powered, in case the battery runs down or isn’t fitted.
Directional Characteristics The directional characteristics of microphones can be described in terms of a family of polar patterns. The polar pattern is a graph showing the sensitivity in a full 360 degree circle around the mic. I say a family of polar patterns but it really is a spectrum with omnidirectional at one extreme and figure-of-eight at the other. Cardioid and hypercardioid are simply convenient way points.
To explain these patterns further, fairly obviously an omnidirectional mic is equally sensitive all round. A cardioid is slightly less obvious. The cardioid is most sensitive at the front, but is only 6 dB down in response at an angle of 90 degrees. In fact it is only insensitive right at the back. It is not at all correct, as commonly happens, to call this a unidirectional microphone. The hypercardioid is a more tightly focussed pattern than the cardioid, at the expense of a slight rear sensitivity, known as a lobe in the response. The figure-of-eight is equally sensitive at front and back,
the only difference being that the rear produces an inverted signal, 180 degrees out of phase with the signal from the front. All of this is nice in theory, but is almost never borne out in practice. Take a nominally cardioid mic for example. It may be an almost perfect cardioid at mid frequencies, but at low frequencies the pattern will spread out into omni. At high frequencies the pattern will tighten into hypercardioid. The significant knock-on effect of this is that the frequency response off-axis – in other words any direction but head on – is never flat. In fact the off-axis response of most microphones is nothing short of terrible and the best you can hope for is a smooth roll-off of response from LF to HF. Often though it is very lumpy indeed. We will see how this affects the use of microphones at another time. Omnidirectional Looking at directional characteristics from a more academic standpoint, the omnidirectional microphone is sensitive to the pressure of the sound wave. The diaphragm is completely enclosed, apart from a tiny slowacting air-pressure equalizing vent, and the mic effectively compares the changing pressure of the outside air under the influence of the sound signal with the constant pressure within. Pressure acts equally in all directions, therefore the mic is equally sensitive in all directions, in theory as we said. In practice, at higher frequencies where the size of the mic starts to become significant in comparison with the wavelength, the diaphragm will be shielded from sound approaching from the rear and rearward HF response will drop. Figure-of-Eight At the other end of the spectrum of polar patterns the figure-of-eight microphone is sensitive to the pressure gradient of the sound wave. The diaphragm is completely open to the air at both sides. Even though it is very light and thin, there is a difference in pressure at the front and rear of the diaphragm, and the microphone is sensitive to this difference. The pressure gradient is greatest for sound arriving directly from the front or rear, and lessens as the sound source moves round to the side. When the sound source is exactly at the side of the diaphragm it produces equal pressure at front and back, therefore there is no pressure gradient and the microphone produces no output. Therefore the figure-of-eight microphone is not sensitive at the sides. (You could also imagine that a
sound wave would find it hard to push the diaphragm sideways – sometimes the intuitive explanation is as meaningful as the scientific one). All directional microphones exhibit a phenomenon known as the proximity effect or bass tip-up. The explanation for this is sufficiently complicated to fall outside of the required knowledge of the working sound engineer. The practical consequences are that close miking results in enhanced low frequency. This produces a signal that is not accurate, but it is often thought of as being ‘warmer’ than the more objectively accurate sound of an omnidirectional microphone. Cardioid and Hypercardioid To produce the in-between polar patterns one could consider the omnidirectional microphone where the diaphragm is open on one side only, and the figure-of-eight microphone where the diaphragm is completely open on both sides. Allowing partial access only to one side of the diaphragm would therefore seem to be a viable means of producing the in-between patterns, and indeed it is. A cardioid or hypercardioid mic therefore provides access to the rear of the diaphragm through a carefully designed acoustic labyrinth. Unfortunately the effect of the acoustic labyrinth is difficult to equalize for all frequencies, therefore one would expect the polar response of cardioid and hypercardioid microphones to be inferior to that of omnidirectional and figure-of-eight mics.
Multipattern Microphones There are many microphones available that can produce a selection of polar patterns. This is achieved by mounting two diaphragms back-toback with a single central backplate. By varying the relative polarization of the diaphragms and backplate, any of the four main polar patterns can be created. It is often thought that the best and most accurate microphones are the true omnidirectional and the true figure-of-eight, and that mimicking these patterns with a multipattern mic is less then optimal. Nevertheless, in practice multipattern mics are so versatile that they are commonly the mic of first choice for many engineers.
Special Microphone Types Stereo Microphone Two capsules may be combined into a single housing so that one mic can capture both left and right sides of the sound field. This is much more convenient than setting two mics on a stereo bar, but obviously less flexible. Some stereo mics use the MS principle where one cardioid capsule (M) captures the full width of the sound stage while the other figure-of-eight capsule (S) captures the side-to-side differences. The MS output can be processed to give conventional left and right signals.
Neumann stereo microphones Interference Tube Microphone This is usually known as a shotgun or rifle mic because of its similarity in appearance to a gun barrel. The slots in the barrel allow off-axis sound to cancel giving a highly directional response. The longer the mic, the more directional it is. The sound quality of these microphones is inferior to normal mics so they are only used out of necessity.
Sennheiser interference tube microphone
A close relation of the interference tube microphone is the parabolic reflector mic. This looks like a satellite dish antenna and is used for recording wildlife noises, and at sports events to capture comments from the pitch. Boundary Effect Microphone The original boundary effect microphone was the Crown PZM (Pressure Zone Microphone) so the boundary effect microphone is often referred to generically as the PZM. In this mic, the capsule is mounted close to a flat metal plate, or inset into a wooden or metal plate. Instead of mounting it on a stand, it is taped to a flat surface. One of the main problems in the use of microphones is reflections from nearby flat surfaces entering the mic. By mounting the capsule within around 7 mm from the surface, these reflections add to the signal in phase rather than interfering with it. The characteristic sound of the boundary effect microphone is therefore very clear (as long as there are no other nearby reflecting surfaces). It can be used for many types of recording, and can also be seen in police interview rooms where obviously a clear sound has to be captured for the interview recording. The polar response is hemispherical.
Crown PZM microphone
Miniature Microphone This is sometimes known as a ‘tie-clip’ mic, although it is rarely ever clipped to the tie these days. This type of mic is usually of the electret design, which lends itself to very compact dimensions, and is almost always omnidirectional. Miniature microphones are used in television and in theater, where there is a requirement for microphones to be unobtrusive. Since the diaphragm is small and not in contact with many air molecules, the random vibration of the molecules does not cancel out
as effectively as it does in a microphone with a larger diaphragm. Miniature microphones therefore have to be used close to the sound source; otherwise noise will be evident.
Vocal Microphone For popular music vocals it is common to use a large-diaphragm mic, often an old tube model. A large diaphragm mic generally has a less accurate sound than a mic with a diaphragm 10-12 mm or so in diameter. The off-axis response will tend to be poor. Despite this, models such as the Neumann U87 are virtually standard in this application due to their enhanced subjective ‘warmth’ and ‘presence’. Microphone Accessories First in the catalogue of microphone accessories is the mic support. These can range from table stands, short floor stands, normal boom stands, tall stands up to 4 meters for orchestral recording, fishpoles as used by video and film sound recordists, and long booms with cable operated mic positioning used in television studios. Attaching the mic to the stand is a mount that can range from a basic plastic clip, to an elastic suspension or cradle that will isolate the microphone from floor noise. The other major accessory is the windshield or pop-shield. A windshield may be made out of foam and slipped over the mic capsule, or it may look like a miniature airship covered with wind-energy dissipating material. For blizzard conditions windshield covers are available that
look as though they are made out of yeti fur. The pop-shield, on the other hand, is a fine mesh material stretched over a metal or plastic hoop, used to filter out the blast of air cause by a voice artist's or singer's ‘P’ and ‘B’ sounds.
Check Questions • What is the piezoelectric effect? • Where would you find a piezo-electric transducer? • What is attached to the diaphragm of a dynamic microphone? • What passive circuit component is incorporated in the output stage of all professional microphones? (Note that some microphones use an active circuit to imitate the action of this component). • Describe the sound of a dynamic microphone. • How does a ribbon microphone differ from an ordinary dynamic microphone? • What is the old term for 'capacitor microphone'? • Why does the capacitor microphone have a more accurate sound than a dynamic microphone? • Why does a capacitor microphone need to be powered (two reasons)? • What precaution should you take when switching on phantom power? • Can dynamic microphones of professional quality be used with phantom power switched on? • What is a pad? • Why does an electret microphone need to be powered? • Describe the actual polar response of a typical nominally omnidirectional microphone. • Describe the proximity effect. • What is an 'acoustic labyrinth', as applied to microphones?
• Why does a boundary effect microphone give a clear sound? • Why are large-diaphragm microphones used for popular music vocals? • Describe the differences between wind shields and pop shields.
Chapter 2: The Use of Microphones Use of Microphones for Speech In sound engineering, as opposed to communications which will not be considered here, there are commonly considered to be three classes of sound: speech or dialogue, music and effects. Each has its own considerations and requirements regarding the use of microphones. There are a number of scenarios where speech may be recorded, broadcast or amplified: • • • • • • • •
Audio book Radio presentation, interview or discussion Television presentation, interview or discussion News reporting Sports commentary Film and television drama Theatre Conference
In some of these, the requirement is for speech that is as natural as possible. In an ideal world perhaps it should even sound as though a real person were in the same room. The audio book is in this category, as are many radio programs. There is a qualification however on the term ‘natural’. Sometimes what we regard as a natural sound is the sound that we expect to hear via a loudspeaker, not the real acoustic sound of the human voice. We have all been conditioned to expect a certain quality of sound from our stereos, hifis, radio and television receivers, and when we get it, it sounds natural, even if it isn’t in objective terms. In the recording and most types of broadcasting of speech there are some definite requirements: • • • •
No pops on ‘P’ or ‘B’ sounds. No breath noise or ‘blasting’ Little room ambience or reverberation A pleasing tone of voice
Popping and blasting can be prevented in two ways. One is to position the microphone so that it points at the mouth, but is out of the direct line of fire of the breath. So often we see microphones used actually in the
line of fire of the breath that it seems as though it is simply the ‘correct’ way to use a microphone. It can be for public address, but it isn’t for broadcasting or recording. The other way is to use a pop shield. Ideally this is an open mesh stocking-type material stretched over a metal or plastic hoop. This can be positioned between the mouth and the microphone and is surprisingly effective in absorbing potential pops and blasts. Sometimes a foam windshield of the type that slips over the end of the microphone is used for this purpose. A windshield is really what it says, and is not 100% effective for pops, although its unobtrusiveness visually has value, for example, for a radio discussion where hoop-type pop shields would mar face-to-face visual communication among the participants. The requirement for little room ambience or reverberation is handled by placing the microphone quite close to the mouth – around 30 to 40 cm. If the studio is acoustically treated, this will work fine. Special acoustic tables are also available which absorb rather than reflect sound from their surface. ‘A pleasing tone of voice’? Well, first choose your voice talent. Second, it is a fact that some microphones flatter the voice. Some work particularly well for speech, and there are some classic models such as the Electrovoice RE20 that are commonly seen in this application. Generally, one would be looking for a large-diaphragm capacitor microphone, or a quality dynamic microphone for natural or pleasing speech for audio books or radio broadcasting. In television broadcasting, one essential requirement is the microphone should be out of shot or unobtrusive. The usual combination for a news anchor, for example, is to have a miniature microphone attached to the clothing in the chest area, backed up by a conventional mic on a desk stand. Often the conventional mic is held on stand-by to be brought on quickly if the miniature mic fails, as they are prone to through constant handling. Oddly enough, the use of microphones on television varies according to geography. In France for example, it is quite common for a television presenter to hand hold a microphone very close to the mouth. Even a discussion can take place with three or four people each holding a microphone. The resultant sound quality is in accordance with French subjective requirements. Radio microphones are commonly used in television to give freedom of movement and also freedom from cables on
the floor, leaving plenty of free space for the cameras to roll around smoothly. News Reporting For news reporting, a robust microphone – perhaps a short shotgun – can be used with a general-purpose foam windshield for both the reporter and interviewee, should there be one. Such a microphone is easily pointable (the reporter isn’t a sound engineer) and brings home good results without any trouble. The sound quality of a news report may not be all that could be imagined, but a little bit of harshness or degradation sometimes, oddly, makes the report more ‘authentic’. Sports Commentary Sports commentary is a very particular requirement. This often takes place in a noisy environment so the microphone must be adapted to cope with this. The result is a mic that has a heavily compromised sound quality, but this has come to be accepted as the sound of sports commentary so it is now a requirement. The Coles 4104 is an example of a 1950s design that is still widely used. It is a noise-cancelling microphone that almost completely suppresses background noise, and the positioning bar on the top of the mic ensures that the commentator always holds it in the correct position (as, indeed it is always held - sports commentators often like to move around in their commentary box as they work). Film and Television Drama For film and television drama, a fishpole (or boom as it is sometimes known) topped by a shotgun or rifle mic with a cylindrical windshield is the norm. The operator can position and angle the mic to get the best quality dialogue (while monitoring on headphones), while keeping the mic – and the shadow of the mic – out of shot. Miniature microphones are also used in this context, often with radio transmitters. Obviously they must not be visible at all. However, concealing the mic in the costume can affect sound quality so care must be taken. Sometimes in the studio a microphone might be mounted on a large floor mounted boom that can extend over several meters (we’re not in fishing
country anymore). In this case the boom operator has winches to point and angle the microphone. Theatre In theatre the choice is between personal miniature microphones with radio transmitters, or area miking from the front and sides of the stage. Personal microphones allow a higher sound level before feedback since they are close to the actor’s mouth. For straight drama, it isn’t necessary to have a high sound level in the auditorium. In fact in most theatres it is perfectly acceptable for the sound of the actors’ voices to be completely unamplified. However if amplification, or reinforcement, is to be used then area miking is usually sufficient. Shotgun or rifle mics are positioned at the front of the stage (an area sometimes known for traditional reason as ‘the floats’, therefore the mics are sometimes called ‘float mics’) to create sensitive spots on stage from which the actors can easily be heard. The drawback is that there will be positions on the stage from which the actors cannot be heard. The movements of the actors have to be planned to take account of this. Conference I use this term loosely to cover everything from company boardrooms to political party conferences. You will see that there can be a vast difference in scale. In the boardroom it has become common to use gooseneck microphones or boundary effect microphones that are specifically designed for that purpose. This lies beyond what we normally consider to be sound engineering and is categorized in the specialist field of sound installation. The party conference is another matter. To achieve reasonably high sound levels the microphone has to be close to the mouth, yet the candidate – for obvious reasons – does not want to look like a microphone-swallowing rock star. Therefore the microphone has to be unobtrusive so that it can be placed fairly close to the mouth without drawing undue attention to itself (the cluster of broadcasters’ microphones in front of the lectern is another matter, but they don’t have to be so close). The AKG C747 is very suitable for this application. You will have noticed that in this context microphones are often used in pairs. There are two schools of thought on this issue. One is that the microphones should point inwards from the front corners of the lectern.
This allows the speaker to turn his or her head and still receive adequate pickup. Unfortunately, as the head moves, both microphones can pick up the sound while the sound source – the mouth – is moving towards one mic and away from the other. The Doppler effect comes into play and two slightly pitch shifted signals are momentarily mixed together. It sounds neither pleasant nor natural. The alternative approach is to mount both microphones centrally and use one as a backup. The speaker will learn, through not hearing their voice coming back through the PA system, that they can only turn so far before useful pickup is lost. It is worth saying that in this situation, the person speaking must be able to hear their amplified voice at the right level. If their voice seems too loud, to them, they will instinctively back away from the mic. If they can’t hear their amplified voice they will assume the system isn’t working. I once saw the chairman of a large and prestigious organisation stand away from his microphone because he thought it wasn’t working. It had been, and at the right level for the audience. But unfortunately, apart from the front few rows, they were unable to hear a single unamplified word he said.
Use of Microphones for Music The way in which microphones are used for music varies much more according to the instrument than it possibly could for speech where the source of sound is of course always the human mouth. First, some scenarios: • • • • • • • •
Recording Broadcast Public address Recording studio Location recording Concert hall Amplified music venue Theatre
The requirements of recording and broadcasting are very similar, except that broadcasting often works to a more stringent timescale, and in television broadcasting microphones must be invisible or at least unobtrusive. There are two golden rules: Point the microphone at the sound source from the direction of the best natural listening position. The microphone will always be closer than a natural comfortable listening distance. So, wherever you would normally choose to listen from is the right position for the microphone, except that the microphone has to be closer because it can’t discriminate direct sound from reflected sound in the way the human ear/brain can. It is always a good starting point to follow these two rules, but of course it may not always be possible, practical, or a natural sound may not be wanted for whatever reason. Broadcasters, by the way, tend to place the microphone closer than recording engineers. They need to get a quick, reliable result, and a close mic position is simply safer for this purpose. Ultimate sound quality is not of such importance. The recording studio is a very comfortable environment for microphones. The engineer is able to use any microphone he or she desires and has
available. The mic may be old, large and ugly, cumbersome to use perhaps with an external power supply (not phantom) and pattern selector, prone to faults etc., but if it gets the right sound, then it will be used. Location recording is not quite so comfortable and you need to be sure that the microphones are reliable and easy to use, preferably without external power supplies and with a simple stand mount rather than a complicated elastic suspension. As far as comfort goes, the concert hall is a reasonably good place to record in as at least they are used to the requirements of music (the owners of many good recording venues often have higher priorities – religious worship being a prominent example). There are however restrictions on the placement of microphones during a concert. Usually it is against fire regulations to have microphones among the audience, unless the mics are positioned in such a way that they don’t impede egress and cables are very securely fixed. Generally therefore there will be a stereo pair of mics slung from the ceiling, supplemented by a number of mics on stage, which are closer than the engineer would probably prefer them to be under ideal circumstances. For amplified music, the problem is always in getting sufficient level without feedback. This necessitates that microphones are very much closer than the natural listening position, to the point that natural direction has very little meaning. The ultimate example would be a microphone clipped to the bridge or sound hole of a violin. It wouldn’t even be possible to listen from this position. In rock music PA, microphones are used as close to the singer’s lips as possible, right against the grille cloth of a guitarist’s speaker cabinet and within millimetres of the heads of the drums. Primarily this is to achieve level without risk of feedback. However this has also come to be understood as the ‘rock music sound’ because it is what the audience expects. In this context, the most distant mics would be the drum overhead mics, which don’t need much gain anyway. For string and wind instruments there are a variety of clip-on mics available. There are also contact mics that pick up vibrations directly from the body of the instrument, although even these are not entirely immune to feedback. In theatre musicals, the best option for the lead performers is to use miniature microphones with radio transmitters. The placement of the mic is significant. The original ‘lavalier’ placement, named for Mme Lavalier
who reportedly wore a large ruby from her neck, has long gone. The chest position is great for newsreaders but it suffers from the shadow of the chin and boominess caused by chest resonance. The best place for a miniature microphone is on a short boom extending from behind the ear. Mics and booms are available in a variety of flesh colours so they are not visible to the audience beyond the second or third row. If a boom is not considered acceptable, then the mic may protrude a short distance from above the ear, or descending from the hairline. This actually captures a very good vocal sound. It has to be tried to be believed. One of the biggest problems with miniature microphones in the theatre is that they become ‘sweated out’ after a number of performances and have to be replaced. Still, no-one said that it was easy going on stage. For the orchestra in a theatre musical, clip on mics are good for string instruments. Wind instruments are generally loud enough for conventional stand mics, closely placed. So-called ‘booth singers’ can use conventional mics.
Stereo Microphone Techniques Firstly, what is stereo? The word ‘stereophonic’ in its original meaning it suggests a ‘solid’ sound image and does not specify how many microphones, channels or loudspeakers are to be used. However, it has come to mean two channels and two loudspeakers using as few or as many microphones that are necessary to get a good result. When it works, you should be able to sit in an equilateral triangle with the speakers, listen to a recording of an orchestra and pinpoint where every instrument is in the sound image. (By the way, some people complain that ‘stereophonic’, as a word, combines both Greek and Latin roots. Just as well perhaps, because if it had been exclusively Latin it would have been ‘crassophonic’!) When recording a group of instruments or singers, it is possible to use just two or three microphones to pick up the entire ensemble in stereo, and the results can be very satisfying. There are a number of techniques: • • • • • • • •
Coincident crossed pair Near-coincident crossed pair ORTF Mercury Living Presence Decca Tree Spaced omni MS Binaural
The coincident crossed pair technique traditionally uses two figure-ofeight microphones angled at 90 degrees pointing to the left and right of the sound stage (and, due to the rear pickup of the figure-of-eight mic, to the left and right of the area where the audience would be also). More practically, two cardioid microphones can be used. They would be angled at 120 degrees were it not for the drop off in high frequency response at this angle in most mics. A 110-degree angle of separation is a reasonable compromise. This system was originally proposed in the 1930s and mathematically inclined audio engineers will claim that this gives perfect reproduction of the original sound field from a standard pair of stereo loudspeakers. However perfect the mathematics look on paper, the results do not bear out the theory. The sound can be good, and you can with
effort tell where the instruments are supposed to be in the sound image. The problem is that you just don’t feel like you are in the concert hall, or wherever the recording was made. The fact that human beings do not have coincident ears might have something to do with it.
Coincident crossed pair
Separating the mics by around 10 cm tears the theory into shreds, but it sounds a whole lot better.
Near-coincident crossed pair
The ORTF system, named for the Office de Radiodiffusion Television Francaise, uses two cardioid microphones spaced at 17 cm angled outwards at 110 degrees, and is simply an extended near-coincident crossed pair.
The redeeming feature of the coincident crossed pair is that you can mix the left and right signals into mono and it still sounds fine. Mono, but fine. We call this mono compatibility and it is important in many situations – the majority of radio and television listeners still only have one speaker. The further apart the microphones are spaced, the worse the mono compatibility, although near-coincident and ORTF systems are still usable.
Mercury Living Presence was one of the early stereo techniques of the 1950s, used for classical music recordings on the Mercury label. If you imagine trying to figure out how to make a stereo recording when there was no-one around to tell you how to do it, you might work out that one microphone pointing left, another pointing center and a third pointing right might be the way to do it. Record each to its own track on 35mm magnetic film, as used in cinema audio, and there you have it! Nominally omnidirectional microphones were used, but of course the early omni mics did become directional at higher frequencies. Later recordings were made to two-track stereo. These recordings stand up remarkable well today. They may have a little noise and distortion, but the sound is wonderfully clear and alive. The same can be said of the Decca tree, used by the Decca record company. This is not dissimilar from the Mercury Living Presence system but baffles were used between the microphones in some instances to create separation, and additional microphones might be used where necessary, positioned towards the sides of the orchestra.
Another obvious means of deploying microphones in the early days of stereo was to place three microphones spaced apart at the front of the orchestra, much more distant from each other than in the above systems. If only two microphones are used spaced apart by perhaps as much as two meters or more, what happens on playback is that the sound seems to cluster around the loudspeakers and there is a hole in the middle of the sound image. To prevent this, a centre microphone can be mixed in at a lower level so that the ‘hole’ is filled. There is no theory on earth to explain why this works - being so dissimilar to the human hearing system - but it can work very well. The main drawback is that a recording made in such a way sounds terrible when played in mono. The MS system, as explained previously, uses a cardioid microphone to pick up an all-round mono signal, and a figure-of-eight mic to pick up the difference between left and right in the sound field. The M and S signals can be combined without too much difficulty to provide conventional left and right signals. This is of practical benefit when it is necessary to record a single performer in stereo. With a coincident crossed pair, one microphone would be pointing to the left of the performer, the other would be pointing to the right. It just seems wrong not to point a microphone directly at the performer, and with the MS system you do, getting the best possible sound quality from the mic. It is sometimes proposed as an advantage of MS than it is possible to control the width of the stereo image by adjusting the level of the S signal. This is exactly the same as adjusting the width by turning the mixing console’s panpots for
the left and right signals closer to the centre. Therefore it is in reality no advantage at all. Binaural stereo attempts to mimic the human hearing system with a dummy head (sometimes face, shoulders and chest too) with two omnidirectional microphones placed in artificial ears just like a real human head. It works well, but only on headphones. A binaural recording played on speakers doesn’t work because the two channels mix on their way to the listener, spoiling the effect. There have been a number of systems attempting to make binaural recordings work on loudspeakers but none has become popular. In addition to the stereo miking system, it is common to mic up every section of an orchestra, whether it is a classical orchestra, film music, or the backing for a popular music track. Normally the stereo mic system, crossed pair or whatever, is considered the main source of signal, with the other microphones used to compensate for the distance to the rear of the orchestra, and to add just a little presence to instruments where appropriate. Sectional mics shouldn’t be used to compensate for poor balance due to the conductor or arranger. Sometimes however classical composers don’t get the balance quite right and it is not acceptable to change the orchestration. A little technical help is therefore called for. Instruments We come back to the two golden rules of microphone placement, as above. It is worth looking at some specific examples: Saxophone There are two fairly obvious ways a saxophone can be close miked. One is close to the mouthpiece, another is close to the bell. The difference in sound quality is tremendous. The same applies to all close miking. Small changes in microphone position can affect the sound quality enormously. There are many books and texts that claim to tell you how and where to position microphones for all manner of instruments, but the key is to experiment and find the best position for the instrument – and player – you have in front of you. Experience, not book learning, leads to success. Of the two saxophone close miking positions, neither will capture the natural sound of the instrument, if that’s what you want. Close mic positions almost never do. If you move the mic further away, up to
around a meter, you will be able to capture the sound of the whole of the instrument, mouthpiece, bell, the metal of the instrument, and the holes that are covered and uncovered during the normal course of playing. Also as you move away you will capture more room ambience, and that is a compromise that has to be struck. Natural sound against room ambience. It’s subjective. Piano Specifically the grand piano – it is common to place the microphone (or microphones) pointing directly at the strings. Oddly enough no-one ever listens from this position and it doesn’t really capture a natural sound, but it might be the sound you want. The closer the microphones are to the higher strings, the brighter the sound will be. You can position the microphones all the way at the bass end of the instrument, spaced apart by maybe 30 cm, and a rich full sound will be captured. Move the microphones below the edge of the case and angle them so that they pick up reflected sound from the lid and a more natural sound will be discovered. You can even place a microphone under a grand piano to capture the vibration of the soundboard. It can even sound quite good, but listen out for noise from the foot pedals. Drums The conventional setup is one mic per drum, a mic for the hihat perhaps, and two overhead mics for the cymbals. Recording drums is an art form and experience is by far the best guide. There are some points to bear in mind: You can’t get a good recording of a poor kit, particularly cymbals, or a kit that isn’t well set up. It is often necessary to damp the drums by taping material to the edge of the drum head to get a shorter, more controlled sound. The mics have to be placed where the drummer won’t hit them, or the stands. Dynamic mics generally sound better for drums, capacitor mics for cymbals.
The kick drum should have its front head removed, or there should be a large hole cut out so that a damping blanket can be placed inside. Otherwise it will sound more like a military bass drum than the dull thud that we are used to. The choice of beater – hard or soft - is important, as is the position of the kick drum mic either just outside, or some distance inside the drum. The snares on the underside of the snare drum may rattle when other drums are being played. Careful adjustment of the tension of the snares is necessary, and perhaps even a little damping. Microphones should be spaced as far apart from each other as possible and directed away from other drums. Every little bit helps as the combination of two mics picking up the same drum from different distances leads to cancellation of groups of frequencies. The brute force technique is to use a noise gate on every microphone channel, and this is commonly done. Noise gates will be covered later. Perhaps this is a brief introduction to the use of microphones, but it’s a start. And to round off I’ll give away the secret of getting good sound from your microphones: Listen!
Check Questions • What problem is commonly found in live sports commentary? • What does a fishpole operator concentrate on while working? • In theater, what is 'area miking'? • How is feedback avoided in live sound (the simplest technique)? • Why must the speaker at a conference hear his or her own amplified voice at the right level? • Write down, copy if you wish, the two golden rules for microphone positioning • Why do microphones have to be placed closer than a natural listening position? • Where are personal mics worn in the theater? • What is stereo? • Describe the coincident crossed pair. • What is the benefit of separating the microphones (relate this to the human hearing system)? • What is the value of mono compatibility? • Why is it desirable to mic up every section of an orchestra independently? • Pick an instrument other than those mentioned in the text. Describe the effect of two alternative close miking positions. • When you look at a grand piano, performed solo, on stage, does the pianist sit on the left or the right? Why? • Why do drums often need to be damped?
Chapter 3: Loudspeaker Drive Units Loudspeakers are without doubt the most inadequate component of the audio signal chain. Everything else, even the microphone, is as close to the capabilities of human hearing as makes hardly any difference at all. However, amplify the signal and convert it back into sound and you will know without any hesitation whatsoever that you are listening to a loudspeaker, not a natural sound source. Loudspeakers can be categorized by method of operation and by function: • • • • • • • • • •
Method of operation: Moving coil Electrostatic Direct radiator Horn Function: Domestic Hi-fi Studio PA
In this context we will use ‘PA’ to mean concert public address rather than announcement systems that are beyond the scope of this text. The moving coil loudspeaker, or I should say ‘drive unit’ as this is only one component of the complete system, is the original and still most widely used method of converting an electric signal to sound. The components consist of a magnet, a coil of wire (sometimes called the ‘voice coil’) positioned within the field of the magnet and a diaphragm that pushes against the air. When a signal is passed through the coil, it creates a magnetic field that interacts with the field of the permanent magnet causing motion in the coil and in turn the diaphragm. It is probably fair to say that 99.999% of the loudspeakers you will ever come across use moving coil drive units. The electrostatic loudspeaker (and this time it is a loudspeaker rather than just a drive unit) uses electrostatic attraction rather than magnetism. The electrostatic loudspeaker has the most natural sound quality, but is not
capable of high sound levels. Hence it is rarely used in professional audio outside of, occasionally, classical music recording. A moving coil drive unit can be constructed as either a direct radiator or a horn. In a director radiator drive unit, the diaphragm pushes directly against the air. This is not very efficient as the diaphragm and the air have differing acoustic impedance, which creates a barrier for the sound to cross. A horn makes the transition from vibration in the diaphragm to vibration in the open air more gradual, therefore it is more efficient, and for a given input power the horn will be louder. Let's look at these in more detail: Moving Coil Drive Unit
Perhaps the best place to start is a 200 mm drive unit intended for low and mid frequency reproduction. This isn't the biggest drive unit available, so why are larger drive units ever necessary? The answer is to achieve a higher sound level. A 200 mm drive unit only pushes against so much air. Increase the diameter to 300 mm or 375 mm and many more air molecules feel the impact. The next question would be, why are 300
mm or 375 mm drive units not used more often, when space is available? The answer to that is in the behavior of the diaphragm: The diaphragm must not bend in operation otherwise it will produce distortion. It is sometimes said that the diaphragm should operate as a ‘rigid piston’. The diaphragm could be flat and still produce sound. However, since the motor is at the center and vibrations are transmitted to the edges, the diaphragm needs to be stiff. The cone shape is the best compromise between stiffness and large diameter. High frequencies will tend to bend the diaphragm more than low frequencies. It takes a certain time for movement of the coil to propagate to the edge of the diaphragm. Fairly obviously, at high frequencies there isn't so much time and at some frequency the diaphragm will start to deviate from the ideal rigid piston. 200 mm is a good compromise. It will produce enough level at low frequency for the average living room, and it will produce reasonably distortion-free sound up to around 4 kHz or so. When the diaphragm bends, it is called break up, due to the vibration ‘breaking up’ into a number of different modes. ‘Break up’, in this context, doesn't mean severe distortion or anything like that. In fact most low frequency drive units are operated well into the break up region. It is up to the designer to ensure that the distortion created doesn't sound too unpleasant. By the way, it is often thought that a larger drive unit will operate down to lower frequencies. This isn't quite the right way to look at it. Any size of drive unit will operate down to as low a frequency as you like, but you need a big drive unit to shift large quantities of air at low frequency. At high frequency, the drive unit vibrates backwards and forwards rapidly, moving air on each vibration. At low frequencies there are fewer opportunities to move air, therefore the area of the drive unit needs to be greater to achieve the desired level. The material of the diaphragm has a significant effect on its stiffness. Early moving coil drive units used paper pulp diaphragms, which were not particularly stiff. Modern drive units use plastic diaphragms, or pulp diaphragms that have been doped to stiffen them adequately. Of course, the ultimate in stiffness would be a metal diaphragm. Unfortunately, it would be heavy and the drive unit would be less efficient. Carbon fiber
diaphragms have also been used with some success. (It is worth noting that in drive units used for electric guitars, the diaphragm is designed to bend and distort. It is part of the sound of the instrument and a distortionfree sound would not meet a guitarist's requirements). Moving up the frequency range: as we have said, the diaphragm will bend and produce distortion. Even if it didn't, there would still be the problem that a large sound source will tend to focus sound over a narrow area, becoming narrower as the frequency increases. In fact, this is the characteristic of direct radiator loudspeakers: that their angle of coverage decreases as the frequency gets higher. This is significant in PA, where a single loudspeaker has to cover a large number of people. (It is perhaps counter-intuitive that a large sound source will focus the sound, but it is certainly so. A good acoustics text will supply the explanation). Because of these two factors, higher frequencies are handled by a smaller drive unit. A smaller diaphragm is more rigid at higher frequencies, and because it is smaller it spreads sound more widely. Often the diaphragm is dome shaped rather than conical. This is part of the designer's art and isn't of direct relevance to the sound engineer, as long as it sounds good. It might be stating the obvious at this stage, but a low frequency drive unit is commonly known as a woofer, and a high frequency drive unit as a tweeter. In loudspeakers where a low frequency drive unit greater than 200 mm is used, it will not be possible to use the woofer up to a sufficiently high frequency to hand over directly to the tweeter. Therefore a mid frequency drive unit has to be used (sometimes known as a squawker!). The function of dividing the frequency band among the various drive units is handled by a crossover, more on which later. Damage There are two ways in which a moving coil drive unit may be damaged. One is to drive it at too high a level for too long. The coil will get hotter and hotter and eventually will melt at one point, breaking the circuit (‘thermal damage’). The drive unit will entirely cease to function. The other is to ‘shock’ the drive unit with a loud impulse. This can happen if a microphone is dropped, or placed too close to a theatrical pyrotechnic effect. The impulse won't contain enough energy to melt the coil, but it
may break apart the turns of the coil, or shift it from its central position with respect to the magnet (‘mechanical damage’). The drive unit will still function, but the coil will scrape against the magnet producing a very harsh distorted sound. Many drive units can be repaired, but of course damage is best avoided in the first place. The trick is to listen to the loudspeaker. It will tell you when it is under stress if you listen carefully enough. One common question regarding damage to loudspeakers is this: What should the power of the amplifier be in relation to the rated power of the loudspeaker? In fact, although the power of an amplifier can be measured very accurately, the capacity of a loudspeaker to soak up this power is only an intelligent guess, at best. During the design process, the manufacturer will test drive units to destruction and arrive at a balance between a high rating (in watts) that will impress potential buyers, and a low number of complaints from people who have pushed their purchases too hard. The rating on the cabinet is therefore only a guide. To get the best performance from a loudspeaker, the amplifier should be rated higher in terms of watts. It wouldn't be unreasonable to connect a 200 W amplifier to a 100 W speaker, and it won't blow the drive units unless you push the level too high. It is up to the sound engineer to control the level. Suppose, on the other hand, that a 100 W amplifier was connected to a 200 W loudspeaker (two-way, with woofer and tweeter). The sound engineer might push the level so high that the amplifier started to clip. Clipping produces high levels of high frequency distortion. In a 200 W loudspeaker, the tweeter could be rated at as little as 20-30 W, as under normal circumstances that is all it would be expected to handle. But under clipping conditions the level supplied to the tweeter could be massively higher, and it will blow. Impedance Drive units and complete loudspeaker systems are also rated in terms of their impedance. This is the load presented to the amplifier, where a low impedance means the amplifier will have to deliver more current, and hence ‘work harder’. A common nominal impedance is 8 ohms. ‘Nominal’ means that this is averaged over the frequency range of the drive unit or loudspeaker, and you will find that the actual impedance departs significantly from nominal according to frequency. Normally this isn't particularly significant, except in two situations:
At some frequency the impedance drops well below the nominal impedance. The power amplifier will be called upon to deliver perhaps more power than it is capable of, causing clipping, or perhaps the amplifier might even go into protection mode to avoid damage to itself. The output impedance of a power amplifier is very low – just a small fraction of an ohm. You could think of the output impedance of the amplifier in series with the impedance of the loudspeaker as a potential divider. Work out the potential divider equation with R1 equal to zero and you will see that the output voltage is equal to the input voltage. However, give R1 some significant impedance, as would happen with a long run of loudspeaker cable, and you will see a voltage loss. Make R2 the loudspeaker impedance - variable with frequency and you will now see a rather less than flat frequency response. To be honest, the above points are not always at the forefront of the working sound engineer's mind, but they are significant and worth knowing about.
Check Questions • What is the difference between the terms 'loudspeaker' and 'drive unit'? • How does a moving coil drive unit work? • Comment on the two qualities of an electrostatic loudspeaker. • What is a director radiator drive unit? • What is the function of a horn? • Why are drive units larger than 200 mm sometimes used? • What is meant by the phrase 'rigid piston'? • Why is the diaphragm of a moving coil loudspeaker normally cone shaped? • Why does the diaphragm bend more at higher frequencies? • What is 'break up'? • Does breakup occur in a woofer in normal operation? • Why should a guitar drive unit distort intentionally? • Comment on the 'beaming' effect of a large drive unit. • When is a separate midrange drive unit necessary? • Comment on the two damage modes of moving coil drive units. • If a loudspeaker is rated at 100 W, what should be the power of the amplifier, according to the text?
Chapter 4: Loudspeaker Systems Cabinet (Enclosure) The moving coil drive unit is as open to the air at the rear as it is to the front, hence it emits sound forwards and backwards. The backwardradiated sound causes a problem. Sound diffracts readily, particularly at low frequencies, and much of the energy will 'bend' around to the front. Since the movement of the diaphragm to the rear is in the opposite direction to the movement to the front, this leaked sound is inverted (or we can say 180 degrees out of phase) and the combination of the two will tend to cancel each other out. This occurs at frequencies where the wavelength is larger than the diameter of the drive unit. For a 200 mm drive unit the frequency at which cancellation would start to become significant is 1700 Hz, the cancellation getting worse at lower frequencies. The simple solution to this is to mount the drive unit on a baffle. A baffle is simply a flat sheet of wood with a hole cut out for the drive unit. Amazingly, it works. But to work well down to sufficiently low frequencies it has to be extremely large. The wavelength at 50 Hz, for example, is almost 7 meters. The baffle can be folded around the drive unit to create an open back cabinet, which you will still find in use for electric guitar loudspeakers. The drawback is that the partially enclosed space creates a resonance that colors the sound. The logical extension of the baffle and open back cabinet is to enclose the rear of the drive unit completely, creating an infinite baffle. It would now seem that the rear radiation is completely controlled. However, there are problems: The diaphragm now has to push against the air 'spring' that is trapped inside the cabinet. This present significant opposition to the motion of the diaphragm. Sound will leak through the cabinet walls anyway. The cabinet will itself vibrate and is highly unlikely to operate anything like a rigid piston or have a flat frequency response. (Of course, this happens with the open back cabinet too).
At this point it is worth saying that the bare drive unit is often used in theater sound systems where there is a need for extreme clarity in the human vocal range. Low frequencies can be bolstered with conventional cabinet loudspeakers. Despite these problems, careful design of the drive unit to balance the springiness of the trapped air inside the cabinet against the springiness of the suspension can work wonders. The infinite baffle, properly designed, is widely regarded as the most natural sounding type of loudspeaker (electrostatics excepted). The only real problem is that the compromises that have to be made to make this design work result in poor low frequency response. Points of order: 'Springiness' is more properly known as compliance. Another term for 'infinite baffle' is acoustic suspension. You would need a very deep understanding of loudspeakers (starting with the Thiele-Small parameters of drive units) to be able to design a loudspeaker that would work well for studio or PA use. Electric guitar loudspeakers are not so critical. The next step in cabinet design is the bass reflex enclosure. You will occasionally hear of this as a ported or vented cabinet. The bass reflex cabinet borrows the theory of the Helmholtz resonator. A Helmholtz resonator is nothing more than an enclosed volume of air connected to the outside world by a narrow tube, called the port. The port can stick out of the enclosure as in a beer bottle - a perfect example of the principle - or inwards. The small plug of air in the port bounces against the compliance of the larger volume of air inside and resonates readily. Try blowing across the top of the beer bottle (when empty) and you will see. The Helmholtz resonator can be designed via a relatively simple formula to have any resonant frequency you choose. In the case of the bass reflex enclosure, the resonant frequency is set just at the point where an equivalently sized infinite baffle would be losing low end response. Thus, the resonance of the enclosure can assist the drive unit just at the
point where its output is weakening, this extending the low frequency response usefully. There is of course a cost to this. Whereas an infinite baffle loudspeaker can be designed with a low-Q resonance, meaning essentially that when the input ceases the diaphragm returns straight away to its rest position, in a bass reflex loudspeaker the drive unit will overshoot the rest position and then return. Depending on the quality of the design, it may do this more than once creating an audible resonance. This can result in socalled 'boomy' bass, which is generally undesirable. Additionally, a loudspeaker with boomy bass will tend to translate any low frequency energy into output at the resonant frequency. This a carefully tuned and recorded kick drum will come out as a boom at the loudspeaker's resonant frequency. The competent loudspeaker designer is in control of this and a degree boominess will be balanced against a subjectively 'good' - if not accurate - bass response. There are other cabinet designs, notably the transmission line, but these are not generally within the scope of professional sound engineer so they will be excluded from this text. Horns We have covered horns to some degree already. There is a whole theory to horns that deserves consideration, but here we will simply list some of the basics: Whereas a direct radiator drive unit may be only 1% efficient (i.e. 100 W of electrical power converts to just 1 W of sound power), a horn drive unit may be up to 5% efficient. The air in the throat of the horn becomes so compressed at high levels that significant distortion is produced. However, some people - including the writer of this text! - can on occasion find the distortion quite pleasant. To make any significant difference to the efficiency of a loudspeaker at low frequencies, the length and area of the horn have to be very large. However, folded horn cabinets can be constructed that make enough of a difference to be worthwhile. These are sometimes known as 'bass bins'.
The most important application of the horn is in high quality PA systems such as those used for theater musicals. The problem in theater musicals is that the sound has to be intelligible otherwise the story won't be understood by the audience (many of whom in a London West End theater would be European tourists who wouldn't have English as their first language). Also, the whole of the auditorium has to be covered with high quality sound. if director radiator loudspeakers were used in the theater, then people who were on-axis would received good quality sound. Those members of the audience who were further from the 'straight ahead' position would received lower levels at high frequency and therefore a duller sound. The solution is the constant directivity horn. (More information on directivity...). The shape of the curvature of the horn can be one of any number of mathematical functions, or even just an arbitrary shape. With careful calculation and design it is possible to produce a constant directivity horn which has an even frequency response over an angle of up to 60 degrees. This means that one loudspeaker can cover a sizable section of the audience, all with pretty much the same quality of sound. This leads to the concept of the center cluster loudspeaker system that is widely used wherever intelligibility is a prime requirement in a PA system. A number of constant directivity horn loudspeakers are arrayed so that where the coverage of one is just starting to fall off, the adjacent loudspeaker takes over. Next time you are in a theater, or large place of worship, with a quality sound system, take a look at the loudspeakers. Apart from any loudspeakers that are dedicated to bass, where directionality isn't significant, there should be one cabinet pointing almost directly at you, plus or minus 30 degrees or so, and there should be no other loudspeaker pointing at you from any other location in the building, other than for special theatrical effects. There will be more on this when we cover PA system specifically. Crossover The function of the crossover is to separate low, mid and high frequencies according to the number of drive units in the loudspeaker. A crossover can be passive or active. A passive crossover is generally internal to the cabinet and consists of a network capacitors, inductors and resistors. Having no active components, it doesn't need to be powered. An active crossover on the other hand does contain transistors or ICs and
requires mains power. It sits between the output of the mixing console and a number of power amplifiers - one for each division of the frequency band. A system with a three-band active crossover would require three power amplifiers. Crossovers have two principal parameter sets: the cut off frequencies of the bands, and the slopes of the filters. It is impractical, and actually undesirable, to have a filter that allows frequencies up to, say, 4 kHz to pass and then cut off everything above that completely. So frequencies beyond the cutoff frequency (where the response has dropped by 3 dB from normal) are rolled off at a rate of 6, 12, 18 or 24 dB per octave. In other words, in the band of frequencies where the slope has kicked in, as the frequency doubles the response drops by that number of decibels. The slopes mentioned are actually the easy ones to design. A filter with a slope of, say, 9 dB per octave would be much more complex. As it happens, a slope of 6 dB per octave is useless. High frequencies would be sent to the woofer at sufficient level that there would be audible distortion due to break up. Low frequencies would be sent to the tweeter that could damage it. 12 dB/octave is workable, but most systems these days use 18 dB/octave or 24 dB/octave. There are issues with the phase response of crossover filters that vary according to slope, but this is an advanced topic that few working sound engineers would contemplate to any great extent. Passive crossovers have a number of advantages: • Inexpensive • Convenient • Usually matched by the loudspeaker manufacturer to the requirements of the drive units • And the disadvantages: • Not practical to produce a 24 dB/octave slope • Can waste power • Not always accurate & component values can change over time Likewise, active crossovers have advantages: • Accurate • Cutoff frequency and slope can be varied
• Power amplifier connects directly to drive unit - no wastage of power & better control over diaphragm motion • Limiters can be built into each band to help avoid blowing drive units And the disadvantages: • Expensive • It is possible to connect the crossover incorrectly and send LF to the HF driver and vice versa. • A third-party unit would not compensate for any deficiencies in the driver units. Some loudspeaker systems come as a package with a dedicated loudspeaker control unit. The control unit consists of three components: • Crossover • Equalizer to correct the response or each drive unit • Sensing of voltage (and sometimes) current to ensure that each drive unit is maximally protected
Use of Loudspeakers As mentioned earlier, there are four main usage areas of loudspeakers: domestic, hi-fi, studio and PA. We will skip non-critical domestic usage and move directly on to hi-fi. The hi-fi market is significant in that this is where we will find the very best sounding loudspeakers. The living room environment is generally fairly small, and listening levels are generally well below what we call 'rock and roll'. This means that the loudspeaker can be optimized for sound quality, and the best examples can be very satisfying to listen to with few objectionable features, although it still has to be said that moving coil loudspeakers always sound like loudspeakers and never exactly like the original sound source. Recording studio main monitors have to be capable of higher sound levels. For one thing, the producer, engineer and musicians might just like to monitor at high level, although for the sake of their hearing they should not do this too often. Another consideration is that the acoustically treated control room will absorb a lot of the loudspeaker's energy, so that any given loudspeaker would seem quieter than it would in a typical living room. It is generally true that a loudspeaker that is optimized for high levels won't be as accurate as one that has been optimized for sound quality. PA speakers are the ultimate example of this. There has been a trend over the last couple of decades for PA speakers to be smaller and hence more cost effective to set up. This has resulted in an intense design effort to make smaller loudspeakers louder. Obviously the quality suffers. If you put an expensive PA loudspeaker next to a decent hi-fi loudspeaker in a head-to-head comparison at a moderate listening level, the hi-fi loudspeaker will win easily. The most fascinating use of loudspeakers is the near field monitor. Near field monitors are now almost universally used in the recording studio for general monitoring purposes and for mixing. This would seem odd because twenty-five years ago anyone in the recording industry would have said that studio monitors have to be as good as possible so that the engineer can hear the mix better than anyone else ever will. That way, all the detail in the sound can be assessed properly and any faults or deficiencies picked up. Mixes were also assessed on tiny Auratone loudspeakers just to make sure they would sound good on cheap domestic systems, radios or portables.
That was until the arrival of the Yamaha NS10 - a small domestic loudspeaker with a dreadful sound. It must have found its way into the studio as cheap domestic reference. A slightly upmarket Auratone if you like. However, someone must have used it as a primary reference for a mix, and found that by some magical an indefinable means, the NS10 made it easier to get a great mix - and not only that but a mix that would 'travel well' and sound good on any system. The NS10 and later NS10M are now no longer in production, but every manufacturer has a nearfield monitor in their range. Some actually now sound very good, although their bass response is lacking due to their small size. The success if nearfield monitoring is something of a mystery. It shouldn't work, but the fact is that it does. And since so little is quantifiable, the best recommendation for a nearfield monitor is that it has been used by many engineers to mix lots of big-selling records. That would be the Yamaha NS10 then!
Check Questions • What problem is caused by sound coming from the rear of the drive unit? • What is a baffle? • How large does a baffle have to be to work well at low frequencies? • What is an 'open back' cabinet? • What is an 'infinite baffle' cabinet? • What problem in an infinite baffle cabinet is caused by the trapped air inside? • What is 'compliance'? • What is a 'bass reflex' enclosure? • What is the advantage of a bass reflex loudspeaker compared to an infinite baffle? • What is the disadvantage of a bass reflex loudspeaker compared to an infinite baffle? • Briefly describe a horn drive unit in comparison with a direct radiator drive unit. • What is the advantage of the horn regarding efficiency? • What is the (greater) advantage of the constant directivity horn? • What is a 'center cluster'? • What is meant by the 'slope' of a crossover?
Contrast some of the principal features of active and passive crossovers. Comment on the use of nearfield monitors
Chapter 5: Analog Recording Contrary to what you might read in home recording magazines, analog recording is not dead. Top professional studios still have analog recorders because they have a sound quality that digital just can't match. This isn't really to say that they sound better; in fact their faults are easily quantifiable, but their sound is often said to be 'warm', and it is often true to say that it is easier to mix a recording made on analog than it is to mix a digital multitrack recording. The other useful feature of analog recorders is that they are universal. You can take a tape anywhere and find a machine to play it on. As digital formats become increasingly diverse, individual studios become more and more isolated with audio being subject to an often complex export process to transfer it from one studio's system to another. With tape, you just mount the reel on the recorder and press play. History Magnetic tape recording was invented in the early years of the Twentieth Century and became useful as a device for recording speech, but simply for the information content, as in a dictation machine - the sound quality was too poor. In essence, a tape recorder converts an electrical signal to a magnetic record of that signal. Electricity is an easy medium to work in, compared to magnetism. It is straightforward to build an electrical device that responds linearly to an input. As we saw earlier, 'linear' means without distortion - like a flat mirror compared (linear) to a funfair mirror (non-linear). Magnetic material does not respond linearly to a magnetizing force. When a small magnetizing force is applied, the material hardly responds at all. When a greater magnetizing force is applied and the initial lack of enthusiasm to become magnetized has been overcome, then it does respond fairly linearly, right up to the point where it is magnetized as much as it can be, when we say that it is 'saturated'. Unfortunately, no-one has devised a way of applying negative feedback to analog recording, which in an electrical amplifier reduces distortion tremendously. Early tape recorders (and wire recorders) had no means of compensating for the inherent non-linearity of magnetic material, and it was left up to scientists in Germany during World War II to come up with a solution. The tape recorder was apparently used to broadcast orchestral concerts at
all hours of day and night, to the consternation of opposing countries who wondered how Germany could spare the resources to have orchestras playing in the middle of the night. (Obviously, recording onto disc was possible, but the characteristic crackle always gave the game away). After hostilities had ceased, US forces brought some captured machines back home and development continued from that point. There is a lot of history to the analog recorder, which we don't need here, but it is certainly interesting as the development of the tape recorder coincides with the development of recording as we know it now. The Sound of Analog There are three characteristic ingredients of the analog sound: • • • •
Distortion Noise Modulation noise Distortion
The invention that transformed the analog tape recorder from a dictation machine to a music recording device, during the 1940s, was AC bias. Since the response of tape to a small magnetizing force is very small, and the linear region of the response only starts at higher magnetic force levels, a constant supporting magnetic force, or bias, is used to overcome this initial resistance. Prior to AC bias, DC bias was used courtesy of a simple permanent magnet. However, considerable distortion remained. AC bias uses a high frequency (~100 kHz) sine wave signal mixed in with the audio signal to 'help' the audio signal get into the linear region which is relatively distortion-free. This happens inside the recorder and no intervention is required on the part of the user. However the level of the bias signal has to be set correctly for optimum results. In traditional recording, this is the job of the recording engineer before the session starts. It has to be said that line up is an exacting procedure and many modern recording engineers have so much else to think about (their digital transfers!) that line-up is better left to specialists. Despite AC bias, analog recording produces a significant amount of distortion. The higher the level you attempt to record on the tape, the more the distortion. It isn't like an amplifier or digital recorder where the signal is clean right up to 0 dBFS, then harsh clipping takes place. The
distortion increases gradually from barely perceptible to downright unpleasant. Most analog recordings peak at a level that will produce around 1% distortion, which is very high compared to any other type of equipment. At 3%, most engineers will be thinking about backing off. More is unacceptable. It may not sound promising to use a medium that produces so much distortion, but the fact is that it actually sounds quite pleasant! It is also different in character than vacuum tube (valve) distortion so it is an additional tool in the recording engineer's toolkit. Noise As well as producing more distortion than any other type of audio equipment, the analog tape recorder produces more noise too - a signal to noise ratio of around 65 dB is about the best you can hope for and represents the state of the art since tape recorders matured around the early 1970s. It is debatable whether noise is a desirable component of analog recording, but it is certainly a feature. Noise isn't really the ogre it is made out to be. If levels are set correctly to maximize the use of the available dynamic range up to the 1% or 3% distortion point, then there is no reason why it should be troublesome in the final mix, although some 'noise management' will be necessary of the part of the mix engineer. Modulation Noise There have been digital 'analog simulators', but to my ears, unless this aspect of the character of analog recorders is simulated, they just don't same the same. Modulation noise is noise that changes as the signal changes, and has two causes. One is Barkhausen noise which is produced by quantization of the magnetic domains (a gross over-simplification of a phenomenon that would take too much understanding for the working sound engineer to bother with). The other - more significant - cause of modulation noise is irregularities in the speed of tape travel. These irregularities are themselves caused by eccentricity and roughness in the bearings and other rotating parts, and by the tape scraping against the static parts. We some times hear of the term 'scrape flutter', which creates modulation noise, and the 'flutter damper roller', which is a component used to minimize the problem. If a 1 kHz sine wave tone is recorded onto analog tape, the output will consist of 1 kHz plus two ranges of other frequencies, some strong and
consistent, others weaker and ever-changing due to random variations. These are known in radio as 'sidebands' and the concept has exactly the same meaning here. Modulation noise, subjectively, causes a 'thickening' of the signal which accounts for the fat sound of analog, compared to the more accurate, but thin sound of digital. It has even been known for engineers to artificially increase the amount of modulation noise by unbalancing one of the rollers, thus creating more stronger sidebands containing a greater range of frequencies. Don't try it with your hard disk!
The Anatomy of the Analog Tape Recorder
The Studer A807 pictured here is typical of a workhorse stereo analog recorder, sold mainly into the broadcast market. Let's run through the major components starting from the ones you can't see: • Three motors, one each for the supply reel, take-up real and capstan. The take-up reel motor provides sufficient tension to collect the tape as it comes through. It does not itself pull the tape through. The supply reel motor is energized in the reverse direction to maintain the tension of the tape against the heads. • The capstan provides the motive force that drives the tape at the correct speed. • The pinch wheel holds the tape against the capstan. • The tach (short for tachometer) roller contains a device to measure the speed of the tape in play and fast wind. • The tension arm smooths out any irregularities in tape flow. • The flutter damper roller reduces vibrations in the tape, lessening modulation noise. • The erase head wipes the tape clean of any previous recording. • The record head writes the magnetic signal to the tape. It can also function as a playback head, usually with reduced high frequency response. • The playback head plays back the recording. Magnetic Tape Magnetic tape comprises a base film, upon which is coated a layer of iron oxide. Oxide of iron is sometimes, in other contexts, known as 'rust'. The oxide is bonded to the base film by a 'binder', which also lubricates the tape as it passes through the recorder. Other magnetic materials have been tried, but none suits analog audio recording better than iron, or more properly 'ferric' oxide. There are two major manufacturers of analog tape (there used to be several): Quantegy (formerly known as Ampex) and Emtec (formerly known as BASF).
Tape is manufactured in a variety of widths. (It is also manufactured in two thickness - so-called 'long play' tape can fit a longer duration of recording on the same spool, at the expense of certain compromises.). The widths in common use today are two-inch and half-inch. Oddly enough, metrication doesn't seem to have reached analog tape and we tend to avoid talking about 50 mm and 12.5 mm. Other widths are still available, but they are only used in conjunction with 'legacy' equipment which is being used until it wears out and is scrapped, and for replay or remix of archive material. Quarter-inch tape was in the past very widely used as the standard stereo medium, but there is now little point in using it as it has no advantages over other options that are available. Two-inch tape is used on twenty-four track recorders. A twenty-four track recorder can record - obviously - twenty-four separate tracks across the width of the tape, thus keeping instruments separate until final mixdown to stereo. Half-inch tape is used on stereo recorders for the final master. The speed at which the tape travels is significant. Higher speeds are better for capturing high frequencies as the recorded wavelength is physically longer on the tape. However, there are also irregularities (sometimes known as 'head bumps, or as 'woodles') in the bass end. The most common tape speed in professional use used to be 15 inches per second (38 cm/s), but these days it is more common to use 30 ips (76 cm/s), and not care about the massive cost in tape consumption! At 30 ips, a standard reel of tape costing up to $150 lasts about sixteen minutes.
Analog Recorders in Common Use
Otari MTR90 Mk III
There have been many manufacturers of analog tape recorders, but the top three historically have been Ampex, Otari and Studer. In the US, you will commonly find the Ampex MM1200 and occasionally the Ampex ATR124, which is often regarded as the best analog multitrack ever made, but Ampex only made fifty of them. All over the world you will find the Otari MTR90 (illustrated with autolocator) which is considered
to be a good quality workhorse machine, and is still available to buy. The Studer range is also well respected. The Studer A80 represents the coming of age of analog multitrack recording in the 1970s. It has a sound quality which is as good as the best within a very fine margin, but operational facilities are not totally up to modern standards. For example, it will not drop out of record mode without stopping the tape. The Studer A800 is still a prized machine and is fully capable, sonicly and operationally, of work to the highest professional standard. The more recent A827 and A820 are also very good, but sadly no longer manufactured. Multitrack Recording Techniques How to set about a multitrack recording session is a topic in itself and will be explained later. However, there are certain points of relevance to the equipment itself. The first is the necessity to be able to listen to or monitor previously recorded tracks while performing an overdub. The problem here is that there is a gap between the record head and the playback head. If the singer, for example, sings in time with the output from the playback head, the signal will be recorded on the tape a couple of centimeters away, therefore causing a delay. To get around this problem, while overdubbing, the record head is used as a playback head. In this situation we talk about taking a 'sync output' from the record head. The sync output isn't of such good sound quality since the record head is optimized for recording, nevertheless it is certainly good enough for monitoring. The playback head is used for final mixdown. Also, it is commonplace to 'bounce' several tracks, perhaps vocal harmonies, to one or two tracks (two tracks for stereo), thus freeing up tracks for further use. This has to be done using the sync output of the record head, otherwise the bounce won't be in time with the other tracks. The slight loss of quality has to be tolerated. Another technique worth mentioning at this stage is editing. As soon as tape was invented, people were cutting it apart and sticking it back together again. In fact, with the old wire recorders, people used to weld the wire together, although the heat killed the magnetism at the join. The most basic form of tape editing is 'top and tailing'. This means cutting the tape to within 10 mm or so of the start of the audio, and splicing in a section of leader tape, usually white (about two meters). Likewise the
tape is cut ten seconds or so after the end of each track and more leader inserted between tracks. At the end of the tape, red leader is joined on. No blank tape is left on the spool once top and tailing is complete. Editing can also be used to improve a performance by cutting out the bad and splicing in the good. Even two inch tape can be edited, in fact it is normal to record three or four takes of the backing tracks of a song, and splice together the best sections. The tape is placed in a special precisionmachined aluminum editing block, and cut with a single-sided razor blade, guided by an angled slot. Splicing tape is available with exactly the right degree of stickiness to join the tape back together. When the edit is done in the right place (usually just before a loud sound), it will be inaudible. It takes courage to cut through a twenty-four track two-inch tape though. Compared to modern disk recorders, the main limitation of tape-based multitrack - analog and digital - is that once they are recorded, all the tracks have a fixed relationship in time. In a disk recorder, it is easy to move one track backwards or forwards in time, or copy it to a new location in the song. The equivalent technique in tape-based multitrack recording is the 'spin in'. In the original sense of the term, a good version of the chorus, or whatever audio was required to be repeated, would be copied onto another tape recorder. The multitrack would be wound to where the audio was to be copied. The two machines would be backed up a little way, then both set into play. At the right moment, the multitrack would be punched into record. Of course, the two machines had to be in sync, and this was the difficult part. If the two machines were identical mechanically, then a wax pencil mark could be made on corresponding rotating tape guides and the tapes backed up by the same number of revolutions. It sounds hit and miss, but it could be made to work amazingly quickly. When the digital sampler became available, it was used in place of the second recorder. Maintenance There is a difference between the maintenance of an analog recorder and a digital recorder. Firstly you can do a lot of first-line maintenance on an analog machine. You can't do more than run a cleaning tape on a digital recorder. The second is that you have to do the maintenance, otherwise performance will suffer. These are the elements of maintenance:
Cleaning: the heads and all metallic parts that the tape contacts are cleaned gently with a cotton bud dipped in isopropyl alcohol. Isopropyl alcohol is only one of a number of alcohol variants, and it has good cleaning properties. It is not the same as drinking alcohol, so don't be tempted. Also, drinking alcohol - ethanol - attracts additional taxes in some countries, therefore it would not be cost-effective to use it. The pinch wheel is made of a rubbery plastic. In theory it shouldn't be cleaned with isopropyl alcohol, but it often is. You can buy special rubber cleaner from pro audio dealers but in fact you can use a mild abrasive household liquid cleaner. Just one tiny drop is enough. Demagnetizing the heads: After a while, the metal parts will collect a residual magnetism that will partially erase any tape that is played on the machine. A special demagnetizer is used for which proper training is necessary, otherwise the condition can be made even worse. Line-up: Line up, or alignment, has two functions - one is to get the best out of the machine and the tape; the other is to make sure that a tape played on one recorder will play properly on any other recorder. The following parameters are aligned to specified or optimum values: Azimuth - the heads need to be absolutely vertical with respect to the tape otherwise the will be cancellation at HF. The other adjustments of the head - zenith, wrap and height are not so critical and therefore do not need to be checked so often. Bias level - optimizes distortion, maximum output level and noise. Playback level - the 1 kHz tone on a special calibration tape is played and the output aligned to the studio's electrical standard level. High frequency playback EQ - the 10 kHz tone on the calibration tape is played and the HF EQ adjusted. Record level - a 1 kHz tone at the studio's standard electrical level is recorded onto a blank tape and the record level adjusted for unity gain. HF record EQ - adjusted for flat HF response. LF record EQ - adjusted for flat LF response.
The line-up procedure used to be considered part of the engineer's day-today routing, but is now often left to a specialist technician. To conclude, this is certainly far from a complete treatise on analog tape recording, but it is enough for a starting point considering that analog recorders are now quite rare. Even so, analog recording has a long history and will almost certainly have a long future ahead. In fact the machines are so simple and are infinitely maintainable - a fifteen year old Studer A800 will still be working for its living in fifteen years time. You can't say that for digital recorders. Also, the sound of analog is very much the sound of recording, as we understand it. Does it make sense therefore to use digital emulation to achieve a pale shadow of the analog sound, or would it be better to use the real thing?
Check Questions • Give two reasons why analog recorders are still in use in top professional studios. • Comment on distortion in analog recording. • Comment on noise in analog recording. • Comment on modulation noise in analog recording. • What is the function of AC bias? • What is the distortion level of peaks in an analog recording? • Why is the concept of clipping not relevant in analog recording? • Why is the supply reel motor driven in the opposite direction to the actual rotation of the reel? • What is the capstan? • What is the pinch wheel? • What is the tach roller? • What two tape widths are in common top-level professional use? • Name three twenty-four track analog tape recorders, make and model. • What is 'bouncing'? • Comment on cut and splice tape editing. • What are the two functions of line-up?
Chapter 6: Digital Audio Why digital? Why wasn't analog good enough? The answer starts with the analog tape recorder which plainly isn't good enough in respect of signal to noise ratio and distortion performance. Many recording engineers and producers like the sound of analog now, because it is a choice. In the days before digital, analog recording wasn't a choice - it was a necessity. You couldn't get away from the problems. Actually you could. With Dolby A and subsequently SR noise reduction, noise performance was vastly improved, to the point where it wasn't a problem at all. And if you don't have a problem with noise, you can lower the recording level to improve the distortion performance of analog tape. A recording well made with Dolby SR noise reduction can sound very good indeed. Some would say better than 16-bit digital audio, although this is from a subjective, not a scientific, point of view. Analog record also had the problem that when a tape was copied, the quality would deteriorate significantly. And often there were several generations of copies between original master and final product. Digital audio can be copied identically as many times as necessary (although this doesn't always work as well as you might expect. More on this in another module). In the domestic domain, before CD there was only the vinyl record. Well there was the compact cassette too, but that never even sounded good even with Dolby B noise reduction. (Some people say that they don't like Dolby B noise reduction. The problem is that they are usually comparing an encoded recording with decoding switched on and off. The extra brightness of the Dolby B encoded - but not decoded - sound compensates for dirty and worn heads and the decoded version sounds dull in comparison!). People with long memories will know that they used to yearn for a format that wasn't plagued with the clicks, pops and crackles of vinyl. The release of the CD format was eagerly anticipated, and of course the CD has become a great success. Done properly, digital audio recorders can greatly outperform analog in both signal to noise ratio and distortion performance. That is why they are used in both the professional and domestic domains. When the question arises of why the other parts of the signal chain have mostly been changed over to digital, any possible improvement in sound quality is hardly relevant. Everything else performs as well as anyone could possibly want. Well almost anyone, the only exceptions being the
microphone and the loudspeaker, but we are still some way off truly digital transducers becoming available. By the time digital recording and reproduction had become properly established, digital audio in general was showing that it could offer advantages over analog in terms of price and facilities offered. Digital effects were first, as it became possible to achieve, for instance, digital reverberation for a tiny fraction of the cost of an electromechanical system. Digital mixing consoles came rather later because they require an incredible processing power. Digital mixing consoles don't sound better than analog. They do however offer more facilities for the price, and have the advantage that settings can easily be stored and recalled. This is an important feature that we shall discuss more when we discuss mixing consoles. Having established the reasons we have digital audio, let's see how it works... Digital Theory Firstly, what do we mean by analog? Analog comes from the word analogy. If I say that electrical voltage is a similar concept to the pressure of water behind a tap (excuse me, faucet), then I am making an analogy. If I convert an acoustic sound to an electrical signal where the rise and fall in sound pressure is imitated by a similar rise and fall in voltage, then the electrical signal is an analog of the original. An analog signal is continuous. It follows the changes of the original without any kind of subdivision. It might not be able to track the changes fast enough for complete accuracy, in which case the high frequency response will be worse than it could be. Its useful dynamic range lies between a maximum value which the analog signal cannot exceed (generally the positive and negative voltage limits of the power supply - the signal can never exceed these and will be clipped if it tries) and random variations at a very low level that we hear as noise. Digital systems analyze the original in two ways: firstly by 'sampling' the signal a number of times every second. Any changes that happen completely between sampling periods are ignored, but if the sampling periods are close enough together, the ear won't notice. The other is by 'quantizing' the signal into a number of discrete - separately identifiable levels. The smoothly changing analog signal is therefore turned into a stair-step approximation, since digital audio knows no 'in-between' states.
As you can see, the digital signal here is only a crude approximation of the original, but it can be made better by increasing the sampling frequency (sampling rate), and by increasing the number of quantization levels. Let's go deep... To reproduce any given frequency, the sampling frequency, or sampling rate, has to be at least twice that frequency. So to convert the full range of human hearing to digital, a sampling frequency of at least 40 kHz ( twice 20 kHz) is necessary. In practice, a 'safety margin' has to be added, so we get the standard compact disc sampling frequency of 44.1 kHz (exactly this to coincide with the requirements of early digital equipment), and 48 kHz which is used in broadcasting (since in the early days of digital it was easier to convert to the standard satellite sampling frequency of 32 kHz). To reduce the quantization error between the digital signal and the original analog, more quantization levels must be used. Compact disc and DAT both use 65,536 levels. This, in digital terms, is a nice round number corresponding to 16 bits. Without going into binary arithmetic, each bit provides roughly 6 dB of signal to noise ratio. Therefore a digital
audio system with 16-bit resolution has a signal to noise ratio (at least in theory) of 96 dB. The question will arise, what happens if a digital system is presented with a frequency higher than half the sampling frequency? The answer is that a phenomenon known as aliasing will occur. What happens is that these higher frequencies are not properly encoded and are translated into spurious frequencies in the audio band. These are only distantly related to the input frequencies and absolutely unmusical (unlike harmonic distortion, which can be quite pleasant in moderation). The solution is not to allow frequencies higher than half the sampling rate (in fact less, to give a margin of safety) into the system. Therefore an 'anti-aliasing' filter is used just after the input. Filter design is complex, particularly filters with the steep slopes necessary to maximize frequency response, but not be too wasteful on storage or bandwidth by having a sampling rate that is unnecessarily high. The design of the filters is one of the distinguishing points that make different digital systems actually sound different. Once the signal has been filtered, sampled and quantized, it must be coded. It might be possible to record the binary digits directly but that wouldn't offer the best advantage, and indeed might not work. In the compact disc system, the tiny pits in the aluminized audio layer themselves form the spiral that the laser follows from the start of the recording to the end. A binary '1' is coded by a transition from 'land' - the level surface - to a pit or vice-versa. A binary '0' is coded by no transition. But what if the signal was stuck on '0' for a period of time - the spiral would disappear! Hence a system of coding is used that rearranges the binary digits in such a way that they are forced to change every so often, simply to make a workable system. There are other such constraints that we need not go into here. Additionally there is the need for error correction. In any storage medium there are physical defects that would damage the data if nothing were done to prevent such damage. So additional data is added to the raw digital signal, firstly to check on replay whether the data is valid or erroneous, secondly to add a backup data stream so that if a section of data is corrupted, it can be reconstituted from other data nearby. Adding error correction involves a compromise between preserving the integrity of the digital signal, and not adding any more extra data than necessary. It is fair to say that the error correction system on CD, and on DAT, is
very good. But as in all things, more modern digital systems are cleverer, and better. All of the above is known as analog to digital encoding, or A to D. The reverse process is known, fairly obviously, as decoding. To spare the details that only electronics experts need to know, the digital signal goes through a D to A convertor and out comes an analog signal. The only problem is that it now contains a strong component at the sampling frequency. Obviously this is above audibility, but it could cause severely audible distortion if allowed into any other equipment that couldn't properly handle it. To obviate this therefore, the output is filtered with what is known as a 'brickwall' filter, because of its steep slope. Once again the design of the filter does affect the sound quality, but digital tricks have now been developed to make the filter's job easier, therefore design is more straightforward. Analog to Digital Conversion Filtering: removing frequencies, in the analog domain, that are higher than half the sampling rate. Sampling: measuring the signal level once per sampling period. Quantization: deciding which of the 65,536 levels (in a 16-bit system) is closest to the input signal level, for each sampling period. Coding: converting the result to a binary number according to a scheme that incorporates a) error detection, b) provision for error correction, c) is recordable or transmissable in the chosen medium. The A to D decoder incorporates three levels of protection against damaged data: Error correction; an error is detected in the data and completely corrected by using the additional error-correction data specifically put there for the purpose. Error concealment; an error is detected but it is too severe to be corrected. Missing data is therefore 'interpolated' - just one of the many scientific words for 'guess' - from surrounding data and the result hopefully will be inaudible. However, if you ever get chance to see a CD
player that has correction and concealment indicator lights, you will notice that an awful lot of concealment goes on just to play an average disc. How well concealment is done is one of the factors that make different digital systems sound different. Muting; in this case the error is so bad that the system shuts down momentarily rather than output what could be an exceedingly loud glitch. Bandwidth Bandwidth, in this context, is the rate of flow of data measured in kilobits per second. 1 kilobit is 1024 bits. Often, the term byte is used where 1 byte = 8 bits. The abbreviation for bit is 'b' and for byte is 'B', but these are often confused, as are the multiplier prefixes 'k' meaning x1000, and 'K' meaning x1024. The bandwidth of a single channel of 16-bit 44.1 or 48 kHz digital audio is roughly 750 Kbps. Compare this with the bandwidth of a modem (56 Kbps), ISDN2 (128 Kbps) and common ADSL Internet connections (512 Kbps). None of these systems is capable of transmitting even a single channel of digital audio, hence the need for MP3 and similar data-reduction systems. 24/96 The quest for ever better sound quality leads us to want to increase both the sampling rate and the resolution. 24-bit resolution will in theory give a signal to noise ratio of 144 dB. This will never happen in practice, but the real achievable signal to noise ratio is probably as good as anyone could reasonably ask for. Of course, some of the available dynamic range may be used as additional headroom, to play safe while recording, but even so the resulting recording will be remarkably quiet. Also, even though most of us cannot even hear up to 20 kHz, a frequency which is perfectly well catered for these days by a 44.1 or 48 kHz sampling rate, there is always a nagging doubt that this is only just good enough, and it would be worthwhile to have a really high sampling rate to put all doubt at an end. This of course, affects storage requirements. It is a reasonable rule of thumb that CD-quality stereo audio requires about 10 Megabytes per minute of storage. 24-bit, 96 kHz digital audio will therefore, by simple
multiplication, require 30 Megabytes per stereo minute. Of course, Megabytes are getting cheaper all the time. There is another problem however - data bandwidth. When recording onto a hard disk system, there is a certain data throughput rate beyond which the system will struggle and possibly fail to record or playback properly. A standard modern hard drive should be easily capable of achieving 24 tracks of playback under normal circumstances (the track count is affected, for one thing, by the 'edit density' - the more short segments you cut the audio into, and the more widely the data is physically separated on the disk, the harder it will be to play back). Try this at three times the data rate and the track count, or the reliability is bound to suffer. However, disks are getting ever faster and most of the problems of this nature are in the past. Before long it will be possible to get virtually any number of tracks quite easily. It's worth a quick look at Digidesign's comments on hard disk specifications to maximize track count. Digital Interconnection Digital interconnection comes in a number of standards, which are summarized here: AES/EBU • • • • • • • • • •
Also known as AES3 1985 (the year it was implemented) Standard for professional digital audio Supports up to 24-bit at any sampling rate Transmits 2 channels on a single cable Uses 110 ohm balanced twisted wire pair cables usually terminated with XLR connectors Can use cables of length up to 100 meters Electrical signal level 5 volts Standard audio cables can be used for short distances but are not recommended as their impedance may not be the standard 110 ohm and reflections may occur at the ends of the cable Data transmission at 48 kHz sampling rate is 3.072 Megabit/s (64x the sampling rate) Self clocking but master clocking is possible
• Two types: • Electrical • Uses 75 Ohm unbalanced coaxial cable with RCA phono connectors • Cable lengths limited to 6 meters. • Optical • TOSLINK - Uses plastic fiber optic cable and same connectors as Lightpipe (below). TOSLINK is an optical data transmission technology developed by Toshiba. TOSLINK does not specify the protocol to be used • ST-type - Glass fiber can be used for longer lengths (1 kilometer). • Meant for consumer products but may be seen on professional equipment • Supports up to 24-bit/48 kHz sampling rate • Self-clocking • It ought to be necessary to use a format converter when connecting with AES/EBU since the electrical level is different (0.5 V) and the format of the data is different also. However, some AES/EBU inputs can recognise an S/PDIF signal • Some of the bits within the Channel Status blocks are used for SCMS (Serial Copy Management System), to prevent consumer machines from making digital copies of digital copies. MADI • an extension of the AES3 format (AES/EBU) • supports up to 24-bit/48 kHz sampling rate (higher rates are possible) • transmits 56 channels on a 75 Ohm video coaxial cable with BNC connectors • Length limited to 50 meters. Fiber-optic cable can be used for longer lengths • Data transmission rate is 100 Megabit/s • Requires a master clock - a dedicated master synchronization signal must be applied to all transmitters and receivers. ADAT Optical
• Sometimes known as 'Lightpipe' • Implemented on the Alesis ADAT MDM and digital devices such as mixing consoles, synthesizers and effects units • Supports of to 24-bit/48 kHz sampling rate • Transmits 8 channels serially on fiber-optic cable • Distance limited to 10 meters., or up to 30 meters with glass fiber cable • Data transmission at 48 kHz is 12 Megabit/s • Self clocking • Channels can be reassigned (digital patchbay function) TDIF (Tascam Digital Interface Format) • Implemented on Tascam's family of DA-88 recorders and other digital devices such as mixing consoles • Supports of to 24-bit/multiple sampling rates • Transmits 8 channels on multicore, unbalanced cables with 25-pin D-sub connectors • Bidirectional interface: a single cable carries data in both directions • Cable length limited to 5 meters • Data transmission at 48 kHz sampling rate is 3 Megabit/s (like AES/EBU) • Intended for a master clock system, although self-clocking is possible
Check Questions • To which type of sound engineering equipment was digital audio first applied? • In relation to the question above, why was this the most pressing need? • What types of equipment are currently not available in digital form? • Describe 'sampling rate'. • What is the minimum sampling rate for a digital system capable of reproduction up to 20 kHz (ignoring any 'safety margin'). • What is 'aliasing'? • What two sampling rates are most commonly used in digital audio? • Describe quantization. • What is the signal to noise ratio, in theory, of a digital system with 20-bit resolution? • Why is coding necessary? Give two reasons. • Why does a digital to analog convertor need a filter? • What is error correction? • What is error concealment? • What happens (or at least should happen) if an error is neither corrected nor concealed? • How many Megabytes of data, approximately, are occupied by one minute of CD-quality stereo digital audio? Why, in a hard disk recording system, is it likely that fewer tracks can be replayed simultaneously at the 24-bit/96 kHz standard, than at the CD-
Chapter 7: Digital Audio Tape Recording The original purpose of DAT (Digital Audio Tape) was to be a replacement for the Compact Cassette (or simply 'cassette', as we now know it). Since DAT was intended to be a consumer product right from the start, the cassette housing is very small, 73 x 54 mm and just 10.5 mm thick. For professional users, this is rather too small, not just because it makes the cassette easier to lose, but because there will always be a feeling that DAT could have been a better system if there had been a bit more space for the data. This would allow for error concealment to be minimized, and tracking tolerances could be such that a tape recorded on one recorder could be absolutely guaranteed to play properly on any other. This is generally the case for professional machines, but not necessarily so for semi-pro 'domestic' recorders.
Sony professional DAT
Having said that DAT’s size is a disadvantage for professional users, it really is amazing how it achieves what it does working at microscopic dimensions. DAT’s full title, R-DAT, indicates that the system uses a rotary head like a video recorder. Unlike analog tape which records the signal along a track parallel to the edge of the tape, a rotary head recorder lays tracks diagonally across the width of the tape. So even though the tape speed is just 8.15 millimeters/second, the actual writing speed is a massive 3.133 meters/second. The width of each track is 13.591 millionths of a meter. Unlike an analog tape, the tracks are recorded without any guard band between them. In fact, the tracks are recorded by heads which are around 50% wider than the final track width and each new track partially overlaps the one before, erasing that section. Since the same heads are used for recording and playback, this may seem to
present a problem because if the head is centred on the track it is meant to be reading, then it will also see part of the preceding track and part of the next track. Won't this result in utter confusion? Of course it doesn't, because a system originally developed for video recording is used, known as azimuth recording. The ‘azimuth’ of a tape head refers to the angle between the head gap, where recording takes place, and the tape track itself. In an analog recorder the azimuth is always adjusted to 90 degrees, so that the head gap is at right angles to the track. In DAT, which uses two heads, one head is set at -20 degrees and the other to +20 degrees, and they lay down tracks alternately. So on playback, each head receives a strong signal from the tracks that it recorded, and the adjacent tracks, which are misaligned by 40 degrees, give such a weak signal that it can be rejected totally. Mechanically, there is a strong similarity between a DAT recorder and a video cassette recorder. Both use a rotary head drum on which are mounted the record/playback heads. But there are differences. A video recorder uses a large head drum with the tape wrapped nearly all the way around. This is necessary so that there can always be a head in contact with the tape during the time that each video frame is built up on the screen. With digital audio, data can be read off the tape at any rate that is convenient and stored up in a buffer before being read out at a constant speed and converted to a conventional audio signal. The head drum in a DAT machine is a mere 30mm in diameter (and spins at 2000 revolutions per minute). The tape is wrapped only a quarter of the way around, which means that at times neither of the two heads is in contact with the tape, but as I said, this can be compensated for. This 90 degree wrap has its advantages: • There is only a short length of tape in contact with the drum so high speed search can be performed with the tape still wrapped. • Tape tension is low, giving long head and tape life • If an extra pair of heads is mounted on the drum, simultaneous offtape monitoring can be performed during recording just like a three-head analogue tape recorder. The signal that is recorded on the tape is of course digital, and very dissimilar to either analogue audio or video signals. As you know, the standard DAT format uses 16 bit sampling at a sampling frequency of 48
kHz. This converts the original analog audio signal to a stream of binary numbers representing the changing level of the signal. But since the dimensions of the actual recording on the tape are so small, there is a lot of scope for errors to be made during the record/replay process, and if the wrong digit comes back from the tape it is likely to be very much more audible than a drop-out would be on analog tape. Fortunately DAT, like the Compact Disc, uses a technique called Double Reed-Solomon Encoding which duplicates much of the audio data, in fact 37.5%, in such a way that errors can be detected, then either corrected completely or concealed so that they are not obvious to the ear. If there is a really huge drop-out on the tape, then the DAT machine will simply mute the output rather than replay digital gibberish. As an extra precaution against dropouts, another technique called interleaving is employed which scatters the data so that if one section of data is lost, then there will be enough data beyond the site of the damage which can be used to reconstruct the signal. The pulse code modulated audio data is recorded in the centre section of each diagonal track across the tape. There is other data too: • 'ATF' signals allow for Automatic Track Finding which makes sure that the heads are always precisely positioned over the centre of the track, even if the tape is slightly distorted and the track curved. • Sub Code areas allow extra data to be recorded alongside the audio information. Not all of the capacity of the Sub Code areas is in use as yet, allowing for extra expansion of the DAT system. Those at present in use include: • A-time, which logs the time taken since the beginning of the tape • P-time, which logs the time taken since the last Start ID. • Start ID marks the beginning of each item; • Skip ID tells the machine to go directly to the next Start ID, thus performing an ‘instant edit’. • End ID marks the end of the recording on the tape. • There is also provision for SMPTE/EBU timecode
DASH DASH stands for Digital Audio Stationery Head. The DASH specifications include matters such as the size of the tape, the tape speed and the layout of the tracks on the tape; also the modulation method and error correction strategy, among other things. The format is based on two tape widths: 1/4” (6.3 mm) and 1/2” (12.55 mm). For each tape width there are two track geometries, Normal Density and Double Density and there are also three tape speeds, nominally Slow, Medium and Fast (a further variation is caused by each of the three speeds being slightly different according to whether 44.1 kHz or 48 kHz sampling is used). According to the above, there must be twelve combinations all of which conform to the DASH format. This could make life confusing, but just because a particular combination of parameters is possible, it doesn't necessarily mean that a machine will be built to accommodate it.
Sony PCM 3348
The original Sony 3324, and recent 24-track machines, use the normal density geometry on 1/2” tape which allows twenty-four digital audio tracks, two analog cue tracks, a control track and a timecode track. (The cue tracks are there so that audio can be made available in other than normal play speed +/- normal varispeed). The tape speed at 44.1 kHz is 70.01cm/s. The 3324 is totally two-way compatible with the larger 3348 which can record forty-eight digital tracks on the same tape. To give an example, you may start a project on a 3324, of any vintage, and then the producer decides as the tracks fill up that he or she really needs more elbow room for overdubs. So you hire a 3348, put the twenty-four track tape on this and record another twenty-four tracks in the guard bands left by the other machine. Continuing my (hypothetical) example, when it is decided that the project is costing too much and going nowhere, the producer is sacked and another one brought in who decides that the extra twenty-four tracks are unnecessary embellishments and the original tracks, with a little touching up, are all that are required. Off goes the 3348 back to the hire company, the tape - now recorded with forty-eight tracks - is placed back on the 3324 and the original twenty-four tracks are successfully sweetened and mixed with not a murmur from the tracks that are now not wanted. We are now accustomed to new products and systems which offer new features yet are compatible with material produced on earlier versions. This must be audio history's only example of forward as well as reverse compatibility. It shows what thinking ahead can achieve. DASH Operation The first thing you are likely to want to do with your new DASH machine is of course to make a recording with it, but it would be advisable to read the manual before pressing record and play. Some of the differences between digital and analog recording stem from the fact that the heads are not in the same order. On an analog recorder we are used to having three heads: erase, record and play. DASH doesn't need an erase head because the tape is always recorded to a set level of magnetism which overwrites any previous recordings without further
intervention. So the first head that the tape should come across should be the record head. Right? Wrong. The first head is a playback head, which on an basic DASH machine is followed a record head only. If this seems incorrect, you have to remember that while analog processes take place virtually instantaneously, digital operations take a little time. So if you imagine analog overdubbing where the sync playback signal comes from the record head itself, you can see why this won't work in the digital domain. There will be a slight delay while the playback signal is processed, and another delay while the record signal is processed and put onto tape. 105 milliseconds in fact, which corresponds to about 75 mm of tape. To perform synchronous overdubs there has to be a playback head upstream of the record head otherwise the multitrack recording process as we know it just won’t work. For most purposes two heads are enough, and a third head is available as an option if you need it, and you'll need it if you want to have confidence monitoring. (There are no combined record/playback heads, by the way, all are fixed function). On any digital recording medium the tape has to be formatted to be used. On DAT the formatting is carried out during recording, but on DASH it is often better to do it in advance. The machine can format while recording - in Advance Record mode - but this is best done in situations where you will be recording the whole of the tape without stopping. If you wish, you can ‘pre format’ a tape but this obviously takes time. You can take comfort from the fact that it can be done in one quarter of real time, and the machine will lay down timecode simultaneously. Since there are different ways to format a tape and make recordings, the 3342S has three different recording modes: Advance, Insert and Assemble. Advance mode is as explained above. Insert is for when you have recorded or formatted the full duration of the material and you want to go back and re-record some sections. Assemble is when you want to put the tape on, record a bit, play it back, record a bit more etc, as would typically happen in classical sessions. Converter Delay The main text deals with some of the implications of delays caused by the process of recording digital signals onto tape and playing them back again. There is another problem caused by delays in the
A/D conversion itself. The convertors used in the Sony 3324S, for example, while being very high quality, have an inherent delay of about 1.7 milliseconds. Imagine the situation where you are punching into a track on an analog recording to correct a mistake. You will probably set up the monitoring so that you and the performer can hear both the output from the recorder and the signal to be recorded. The performer will play along with his part until the drop in, when the recorder will switch over to monitor the input signal. This will be returned to the console and you will hear the level go up by approximately 3dB because you are now monitoring the same signal via two paths. On the 3324S you can make a cross fade punch in of up to about 370 milliseconds. This is a good feature, but when you have made the punch in - using the monitoring arrangement described above you will hear the input signal added to the same signal returned from the recorder but delayed by about 1.7ms. This will caused phase cancellation and an odd sound. Fortunately, Sony have included an analog cross fade circuit which will imitate what is happening in the digital domain, but without the delay. Editing DASH was designed to be a cut-and-splice editing format. Briefly, this is possible but it was found in practice that edits were often unreliable. Editing of DASH tapes is now done by copying between two machines synchronized together with an offset. Two synchronized 24-track machines are obviously more versatile in this respect than one 48-track. Maintenance Although an analog recorder can be, and should be, cleaned by the recording engineer in the normal course of studio activities, a DASH machine should only be cleaned by an expert, or thousands of dollars worth of damage can be caused. The heads can be cleaned with a special chamois-leather cleaning tool, wiping in a horizontal motion only. Cotton buds, as used for analog records will clog a DASH head with their fibers. Likewise, an analog record can be aligned by a knowledgable engineer, but alignment of a DASH machine is something that is done every six
months or so by a suitably qualified engineer carrying a portable PC and a special test jig in his tool box. The PC runs special service software which can interrogate just about every aspect of the DASH machine checking head hours, error rates, remote ports, sampler card etc etc. With the aid of its human assistant it can even align the heads and tape tension. Current significance The current significance of DASH is as a machine that can record onto a relatively cheap archivable medium, with confidence that tapes will be replayable after many years. Also, when an analog project is recorded on twin 24-track recorders, it is often considered more convenient for editing to copy the tapes to a Sony 3348. The single 3348 is far faster and more responsive than synchronized analog machines, making the mixing process faster and smoother.
MDM The original modular digital multitrack was the Alesis ADAT (below left). On its introduction it was considered a triumph of engineering to an affordable price point. The ADAT (Alesis Digital Audio Tape) was closely followed by the Tascam DTRS (Digital Tape Recording System) format (below right).
There are certain similarities: • Both formats capable of 8 tracks. • Multiple machines can be easily synchronized to give more tracks. • Recordings are made on commonly available video tapes: ADAT takes S-VHS tapes, DTRS takes Hi-8 • Tape need to be formatted before use. Formatting can take place during recording, but this is only appropriate when a continuous recording is to be made for the entire duration of the tape.
• Very maintenance-intensive. For a 24-track system, four machines (4 x 8 = 32) are necessary to account for the one that will always be on the repair bench. • High resolution versions available (ADAT 20-bit, DTRS 24-bit, 96 kHz, 192 kHz, with reduced track count) • The differences are these: • Maximum record time: ADAT - 60 minutes, DTRS - 108 minutes • ADAT popular in budget music recording studios • DTRS popular in broadcast and film post-production One further difference is that it is probably fair to say that the ADAT has reached the end of its product life-cycle, although there are undoubtedly still plenty of them around and in use. DTRS however is still useful as a tape-based system offering a standard format and cheap storage.
Check Questions • Was DAT originally intended as a professional or a domestic recording medium? • What is the sampling rate of standard DAT? • What is the resolution of standard DAT? • What is 'azimuth recording'? • Describe the head wheel in DAT recorder. • What is SCMS? • What is the distinguishing feature of a DAT machine capable of near-simultaneous off-tape monitoring? • What is the sub-code area of the DAT tape used for? • What is 'interleaving'? • What is the width of the tape used for 24-track DASH? • What is the width of the tape used for 48-track DASH? • Describe how 24-track and 48-track DASH machines are compatible. • How are DASH tapes edited? • In DASH, why does a playback head come before the record head in the tape path? • Comment on the cleaning requirements of DASH • How many tracks does a modular digital multitrack (MDM) have? • How can more tracks be obtained? • Comment on the types of usage of ADAT and DTRS machines.
Appendix 1: Sound System Parameters Level A large part of sound engineering involves adjusting signal level: finding the right level or finding the right blend of levels. The level of a real sound traveling in air can be measured in µN/m2 (or µPa/m2 – micropascals per square meter if you prefer), or more practically dB SPL with reference to 0 dB SPL or 20 µN/m2. The level of a signal in electrical form can be measured in volts, naturally, or it can be measured in dB. The problem is that decibels are always a comparison between two levels. For acoustic sounds, the dB SPL works by comparing a sound level with the reference level 20 µN/m2 (the threshold of hearing). Therefore we need a reference level that works for voltage. Going in back in history, early telecommunication engineers were interested in the power that they could transmit over a telephone line. They decided upon a standard reference level for power, which was 1 mW (1 milliwatt, or one thousandth of a watt). This was subsequently called 0 dBm. The ‘m’ doesn't stand for anything, it just means that any measurement in dBm is referenced to 1 mW. Today in audio circuitry, we are not too concerned about power except at the final end product – the output of the power amplifier into the loudspeaker. For the rest of the time we can happily measure signal level in voltage. Going back into history, standard telephone lines had a characteristic impedance of 600 ohms. (‘Characteristic impedance’ is a term hardly ever used in audio so explanation here will be omitted). The relationship between power, voltage and impedance is: P = V2/R. Working out the math we find that a power of 1 mW delivered via a 600 ohm line develops a voltage of 0.775 V. This became the standard reference level of electrical voltage, and it is still in use today. There is a slight problem here. Over the years it became customary to refer to a voltage of 0.775 V as 0 dBm. This is not wholly correct. It is only true when the impedance is 600 ohms, which is not necessarily the case in audio circuitry. Despite this, any reference you find to 0 dBm, in practice, means 0.775 V regardless of what the impedance is.
Technical sound engineers abhor inconsistencies like this, so a new unit was invented: dBu, where 0 dBu is 0.775 V, without any reference to impedance. Once again, the ‘u’ doesn't stand for anything. ‘dBu’ is sometimes written ‘dBv’ (note lower case ‘v’). Confusingly there is also another reference: dBV (note upper case ‘V’), where 0 dBV is 1 volt. In summary: 0 dBm = 1 mW 0 dBu = 0.775 V 0 dBv = 0.775 V 0 dBV = 1 V There are more: dBr is a measurement in decibels with an arbitrary quoted reference level dBFS is a measurement in decibels where the reference level is the full level possible in a specific item of digital audio equipment. 0 dBFS is the maximum level and any measurement must necessarily be negative, for example –20 dBFS. All of the above (with the exception of dBFS) refer to electrical levels. We also need levels for magnetic tape and other media. Analog recording on magnetic media is still commonplace in top level music recording, and outside of the developed countries of the world. Magnetic level is measured in nWb/m (nanowebers per meter). ‘Nano’ is the prefix meaning ‘one thousandth of a millionth’. The weber (Wb) is the unit of magnetic flux. Wb/m is the unit of magnetic flux density, or simply ‘flux density’. Wilhelm Weber the person (pronounced with a ‘v’ sound in Europe, with a ‘w’ sound in North America), by the way, is to magnetism what Alessandro Volta is to electricity. There are a number of magnetic reference levels in common use. Ampex level, named for the company that developed the tape recorder from German prototypes after World War II, is 185 nWb/m. NAB (National Association of Broadcasters, in the USA) level is 200 nWb/m. DIN (Deutsche Industrie Normen, in Europe) level is 320 nWb/m. In summary:
Ampex level: 185 nWb/m NAB level: 200 nWb/m DIN level: 320 nWb/m It’s worth noting that none of these reference levels is better than any other, but NAB and DIN are the most used in North America and Europe respectively. Operating Level An extension of the concept of level is operating level. This is the level around which you would expect your material to peak. Much of the time the actual level of your signal will be lower, sometimes higher. It’s just a figure to keep in mind as the roughly correct level for your signal. In electrical terms, the standard operating level of professional equipment is 0 dBu. There is also a semi-professional operating level of –10 dBV. This does cause some difficulty when fully professional and semiprofessional equipment is combined within the same system. Either you have to keep a close eye on level and resign yourself to making corrections often, according to what combination of equipment you happen to be using, or buying a converter unit that will bring semi-pro level up to pro level. Magnetic tape also has a standard ‘operating level’ - several of them in fact. To simplify a little since analog magnetic tape is now a minority medium, albeit an important minority: In a studio where VU meters are used, then it is common to align the VU meters so that 0 VU equals +4 dBu. Tape recorders would be aligned so that a tone at 200 nWb/m gives a reading of 0 VU. In short: 200 nWb/m on tape normally equates to +4 dBu and 0 VU Most brands of tape can give good clean sound up to 8 dB above 200 nWb/m and even beyond, although distortion increases considerably beyond that. Digital equipment also has an ‘operating level’, of sorts. In some studios - mainly broadcast - digital recorders such as DAT are aligned so that –18 dBFS (18 dB below maximum level) is equivalent to +4 dBu and 0
VU. This certainly allows plenty of headroom (see later), but it doesn’t fully exploit the dynamic range of DAT. Most people who record digitally record right up to the highest level they think they can get away with without risk of red lights or ‘overs’. Gain Gain refers to an increase or decrease in level and is measured in dB. Since gain refers to both the signal level before gain was applied, and signal level after gain is applied, then the function of the decibel as a comparison between two levels holds good. The signal level from a microphone could be around 1 mV, for instance. Apply a gain of 60 dB and it is multiplied by a thousand giving around 1 V – enough for the mixing console to munch on. Suppose the signal then needed to be made smaller, or attenuated, then a gain of –20 dB would bring it down to around 100 mV. Some engineers find it fun to play around with these numbers. Your degree of fluency in the numbers part of decibels depends on whether you want to be a technical expert, or just concentrate on the audio. There is work available for both types of engineer. The need to make a signal bigger or smaller is fairly easy to understand, but what about making it stay the same level? What kind of gain is this? The answer is ‘unity gain’ and it is a surprisingly useful concept. Unity gain implies a change in level of 0 dB. In the analog era it was important to align a recorder so that whatever level you put in on record, you got that same level out on replay. Then, apart from being spared changes in level between record and playback, you could do things like copy tapes, edit bits and pieces together and the level wouldn’t jump. If you hadn't aligned your machines to unity gain then the levels would be all over the place. With digital equipment, it is actually the norm for digital input and output to be of the same level, so unity gain – in the digital domain at least – tends to happen automatically.
RMS and Peak Levels How do you measure the level of an AC (alternating current) waveform? Or to put it another way, how do you measure the level of an AC waveform meaningfully? A simple peak-to-peak measurement, or peak measurement, shows the height (or amplitude) of the waveform, but it doesn't necessarily tell you how much subjective loudness potential the waveform contains. A very ‘peaky’ waveform (or a waveform with a high crest factor, as we say) might have strong peaks, but it will not tend to sound very loud. A waveform with lower peaks, but greater area between the line and the x-axis of the graph will tend to sound louder on delivery to the listener. The most meaningful measurement of level is the root-mean-square technique. Cutting out all the math, the RMS measurement tells you the equivalent ‘heating’ capability of a signal. A waveform of level 100 Vrms would bring an electric fire element to the same temperature as a direct (DC) voltage of 100 V. A waveform of level 100 Vpeak-to-peak would be significantly less warm. Frequency Response It is generally accepted that the range of human hearing, taking into account a selection of real live humans of various ages, is 20 Hz to 20 kHz, and sound equipment must be able to accommodate this. It is not however sufficient to quote a frequency range. It is necessary to quote a frequency response, which is rather different. In addition, we are not looking for any old frequency response, we are looking for a ‘flat frequency response’ which means that the equipment in question responds to all frequencies, within its limits, equally and any deviations from an equal response are defined. The correct way to describe the frequency response of a piece of equipment is this: 20 Hz to 20 kHz +0 dB/-2 dB or this: 20 Hz to 20 kHz ±1 dB Of course the actual numbers are just examples, but the concept of defining the allowable bounds of deviation from ruler-flatness is the key.
Q Q is used in a variety of ways in electronics and audio but probably the most significant is as a measure of the ‘sharpness’ of a filter or equalizer. For example, an equalizer could be set to boost a range of frequencies around 1 kHz. A high Q would mean that only a narrow band of frequencies around the center frequency is affected. A low Q would mean that a wide range of frequencies is affected. Q is calculated thus: Q = f0/(f2-f1) where f0 is the center frequency of the band, f2 and f1 are the frequencies where the response has dropped –3 dB with respect to f0. It may be evident from this that Q is a ratio and has no units. Q doesn't stand for anything either, it’s just a letter. Whether you need to use a low Q setting or a high Q setting depends on the nature of the problem you want to solve. If there is a troublesome frequency, for example acoustic guitars sometimes have an irritating resonance somewhere around 150 Hz to 200 Hz, then a high Q setting of 4 or 5 will allow you to home in on the exact frequency and deal with it without affecting surrounding frequencies too much. If it is more a matter of shaping the spectrum of a sound to improve it or allow it to blend better with other signals, then a low Q of perhaps 0.3 would be more appropriate. The range of Q in common use in audio is from 0.1 up to around 10, although specialist devices such as feedback suppressers can vastly exceed this.
Noise Noise can be described as unwanted sound, or alternatively as a nonmeaningful component of a sound. Noise occurs naturally in acoustics, even in the quietest settings. Air molecules are in constant motion at any temperature above absolute zero and since sound is nothing more than the motion of air molecules, then the random intrinsic motion must produce sound - sound of a very low level, but sound none the less. We are not generally aware of this source of noise, but some microphones are. A microphone with a large diaphragm will have many molecules impinging on its surface, and the random motion of the molecules will tend to average out and be insignificant in comparison with the wanted signal. A microphone with a small diaphragm however (such as a clip-on mic) will only be in contact with comparatively few air molecules so the averaging effect will be less and the noise higher in level in comparison with the wanted signal. When sound is converted to an electrical signal, the signal is carried by electrons. Once again, electrons are in constant random motion causing what is called Johnson noise. If the signal is carried by a large current (in a low impedance circuit), then Johnson noise can be insignificant. If the signal is carried by only a small current with relatively few electrons (in a high impedance circuit), then the noise level can be much higher. We can extend this concept to any medium that can carry or store a sound signal.
Noise is cause by variations in the consistency of the medium. One more example would be a vinyl record groove. The signal is stored as undulations in the groove, but any irregularities such as dust or scratches translate into noise on playback. Digital audio systems are not immune to noise. When a signal is converted to digital form, it is analyzed into a certain number of levels, 65,536 in the compact disc format for example. Of course, most of the time the original signal will fall between levels, therefore the analysis is only an approximation. The inaccuracies necessarily produced are termed quantization noise. Signal to Noise Ratio Signal to noise ratio is one measure of how noisy a piece of equipment is. We said earlier that a common operating level is +4 dBu. If all signal were removed and the noise level at the output of the console measured, we might obtain a reading somewhere around –80 dBu. This would mean that the signal to noise ratio is 84 dB. In analog equipment, a signal to noise ratio of 80 dB or more is considered good. The worst piece of equipment as far as noise is concerned is the analog tape recorder, which can only turn in a signal to noise ratio of around 65 dB. The noise is quite audible behind low-level signals. Outside of the professional domain, a compact cassette recorder without noise reduction can only manage around 45 dB. This is only adequate when used for information content only, for instance in a dictation machine, or for music which is loud all the time and therefore masks the noise. As we said, digital equipment suffers from noise too. Quantization noise is more grainy in comparison to analog noise and therefore subjectively more annoying. Digital equipment requires a better signal to noise ratio. In basic terms, the signal to noise ratio of any digital system can be calculated by multiplying the number of bits by six. So the compact disc format with a resolution of 16 bits has a signal to noise ratio of 16 x 6 = 96 dB, if all other parts of the system are optimized. Currently the professional standard is moving to 24-bit resolution, therefore the theoretical signal to noise ratio would be 24 x 6 = 144 dB. This is actually greater than the useful dynamic range of the human ear, but in practice this idealized figure is never attained.
Another way of measuring the noise performance of equipment is EIN or Equivalent Input Noise, and this is mainly of relevance to microphone preamplifiers. An example spec might be 'EIN at 70 dB gain: -125 dBu (200 ohm source)'. This means that the gain control was set to 70 dB and the noise measured at the output of the mic preamp - in this case the measurement would be –55 dBu. When the set amount of gain is subtracted from this we get the amount of noise that would have to be present at the input of a noiseless mic amp to give the same result. The '200 ohm source' bit is necessary to make the measurement meaningful. If the EIN figure does not give the source impedance, then I am afraid the measurement is useless. Perhaps it is giving the game away to say that the reason a gain of 70 dB is quoted is because mic preamps normally give their optimum EIN figures at a fairly high gain. The lower the gain at which a manufacturer dare quote the EIN, the better the mic input circuit. Modulation Noise Noise as discussed above is a steady-state phenomenon. It is annoying, but the ear has a way of tuning out sounds that don’t change. However, there is another type of noise that constantly changes in level, and that is modulation noise. One source of modulation noise is that which occurs in analog tape recorders. The effect is that as the signal level changes, the noise level changes. This can be irritating when the signal is such that it doesn't adequately mask the noise. A low frequency signal with few higher harmonics is probably the worst case and will demonstrate modulation noise quite clearly. Noise reduction systems, as mainly used in analog recording, also have the effect of creating modulation noise. Noise reduction systems work by bringing up the level of low-level signals before they are recorded, and reducing the level again on playback – at the same time reducing the level of tape noise. Unfortunately, the noise level is now in a state of constant change and thereby drawing attention to itself. Some noise reduction systems have means of minimizing this effect. All of the various Dolby systems, for example, work well when properly aligned. Quantization noise in digital systems is also a form of modulation noise. At very low signal levels it is sometimes possible to hear the noise level going up and down with the signal.
Where you are most likely to hear modulation noise is on a so-called Hifi VHS video recorder. The discontinuous nature of the audio track causes a low frequency fluttering noise which requires noise reduction to minimize. On some machines, this noise reduction is not wholly effective and the modulation noise created can be very irritating. It is worth saying that signal to noise ratio should be measured with any noise reduction switched out, otherwise the comparison between peak or operating level and the artificially lowered noise floor when signal is absent gives an unfairly advantageous figure unrepresentative of the subjective sound quality of the equipment in question. Distortion Unfortunately, any item of sound equipment 'bends' or distorts the sound waveform to a greater or lesser extent. This produces, from any given input frequency, additional unwanted frequencies. Usually, distortion is measured as a percentage. For a mixing console or an amplifier, anything less than 0.1% is normally considered quite adequate, although once again it's the analog tape recorder that lets us down with distortion figures of anything up to 1% and above. Distortion normally comes in two varieties: harmonic distortion and intermodulation distortion. Looking at the harmonic kind first, suppose you input a 1 kHz tone into a system. From the output you will get not only that 1 kHz tone but also a measure of 2 kHz, 3 kHz, 4 kHz etc. In fact, harmonic distortion always comes in integral multiples of the incoming frequency - rather like musical harmonics in fact. This is why distortion is sometimes desirable as an effect - it enhances musical qualities, used with taste and control of course.
Sine wave - the simplest possible sound with no harmonics
The effect of even-order harmonic distortion on a sine wave
The effect of odd-order harmonic distortion on a sine wave
Intermodulation distortion is not so musical in its effect. This is where two frequencies combine together in such a way as to create extra frequencies that are not musically related. For instance, if you input two frequencies, 1000 Hz and 1100 Hz, then intermodulation will produce sum and difference frequencies – 2100 Hz and 100 Hz. A third form of distortion is clipping. This is where a signal ‘attempts’ to exceed the level boundaries imposed by the voltage limits of a piece of equipment. In modern circuit designs the peaks of the waveform are flattened off causing a rather unpleasant sound. In vintage equipment the peaks can be rounded off, or strange things can happen such as the signal completely disappearing for a second or two.
Crosstalk Crosstalk is defined as a leakage of signal from one signal path to another. For instance, if you have cymbals or hihat on one channel of your mixing console and you find they are leaking through to the adjacent channel, then you have a crosstalk problem. Crosstalk can consist of the full range of audio frequencies, in which case there is a resistive path causing the leakage. More often crosstalk is predominantly higher frequencies, which jump from one circuit track to another through capacitance. In analog tape recorders, an effect known as fringing allows low frequencies to leak into adjacent tracks on replay. The worst problem caused by crosstalk is when timecode leaks from its allocated track or channel into another signal path. Timecode – used to synchronize audio and video machines – is an audio signal which to the ear sounds like a very unpleasant screech. It only takes a little crosstalk to allow timecode to become audible. Headroom I have already mentioned the concept of operating level which is the 'round about' preferred level in a studio. This would typically be 4 dBu in a professional studio. But above operating level there needs to be a certain amount of headroom before the onset of clipping. This is most important in a mixing console where the level of each individual signal can vary considerably due to: 1) less than optimal setting of the gain
control, 2) gain due to EQ, or perhaps 3) unexpected enthusiasm on the part of a musician. Also, when signals are mixed together, the resulting level isn't always predictable. Professional equipment can handle levels up to +20 dBu or +26 dBu, therefore there is always plenty of headroom to play with. Of course, the more headroom you allow, the worse the signal to noise ratio, so it is always something of a compromise. In recording systems, it is common to reduce headroom to little or zero. The recording system is at the end of the signal chain and there are fewer variables. Nevertheless, it does depend on the nature of the signal source. If it is a stereo mix from a multitrack recording, then the levels are known and easily controllable therefore hardly any headroom is required. If it is a recording of live musicians in a concert setting, then much more headroom must be allowed because of the more unpredictable level of the signal, and also because there isn't likely to be a second chance if clipping occurs. Wow and Flutter The era of wow and flutter is probably coming to an end, but it hasn't quite got there yet so we need some explanation. Wow and flutter are both caused by irregularities in mechanical components of analog equipment such as tape recorders and record players. Wow causes a longterm cyclic variation in pitch that is audible as such. Flutter is a faster cyclic variation in pitch that is too fast to be perceived as a rise and fall in pitch. Wow is just plain unpleasant. You will hear it most often, and at its worst, on old-style juke boxes that still use vinyl records. Flutter causes a ‘dirtying’ of the sound, which used to be thought of as wholly unwelcome. Now, when we can have flutter-free digital equipment any time we want it, old-style analog tape recorders that inevitably suffer from flutter to some extent have a characteristic sound quality that is often thought to be desirable. Wow and flutter are measured in percentage, where less than 0.1% is considered good.
Check Questions • What is meant by '0 dBm'? • What is meant by '0 dBu'? • What operating level is commonly used by semi-professional equipment? • What does the term 'dBFS' mean? • What level is commonly used as the reference level for analog magnetic tape in North America? • Which has the greater heating effect: 100 V RMS or 100 V DC? • What is meant by 'unity gain'? • Why is it not acceptable to quote the frequency response of a piece of equipment as '20 Hz to 20 kHz'? • What is meant by 'signal to noise ratio'? • What is meant by 'EIN'? • What is modulation noise? • What is harmonic distortion? • What is intermodulation distortion? • What is clipping? • What is headroom?