Harmonic Mixing: Key & Beat Detection Algorithms

July 19, 2017 | Author: Chris Roebuck | Category: Disc Jockey, Pitch (Music), Phonograph, Tempo, Spectral Density

Share Embed Donate

Report this link

Short Description

My University final year project report on detecting the beat and key from a piece of music...

Description

Harmonic Mixing: Key & Beat Detection Algorithms M.Eng Individual Project - Final Report

Christopher Roebuck

Project Supervisor: Iain Phillips Project Directory: http://www.doc.ic.ac.uk/~cjr03/Project

1. Abstract Harmonic mixing is the art of mixing together two songs based on their key. In order for a DJ to perform harmonic mixing of two songs, their key and tempo must be known in advance. The aim of this project is to automate the process of detecting the key and tempo of a song, so that a DJ can select two songs which will ‘sound good’ when mixed together. The result will be a program, which given a song, can detect its key and tempo automatically and enable the user to mix two songs together based on these features. This document outlines the research into various key and beat detection algorithms and the design, implementation and evaluation of such a program.

2

2. Acknowledgements I would like to thank my supervisor, Iain Phillips, for proposing the project in the first place and taking the time to meet me regularly throughout the course of the project. I would also like to thank Christopher Harte for sending me his paper on a Quantised Chromagram, and Kyogu Lee for responding to my emails about the Harmonic Product Spectrum. Thanks also to Peter Littlewood, Rachel Lau and Tiana Kordbacheh for creating chord samples for which to test my algorithm on.

3

3. Contents 1.

Abstract .................................................................................................................................................................................... 2

2.

Acknowledgements ................................................................................................................................................................ 3

3.

Contents ................................................................................................................................................................................... 4

4.

Table of Figures...................................................................................................................................................................... 6

5.

Introduction ............................................................................................................................................................................ 7

6.

5.1

Motivation for this project ......................................................................................................................................... 7

5.2

Major Objectives .......................................................................................................................................................... 7

5.3

Deeper Objectives ....................................................................................................................................................... 8

5.4

Report Layout ............................................................................................................................................................... 8

Background ............................................................................................................................................................................. 9 6.1

History of DJ Mixing .................................................................................................................................................. 9

6.2

Illustration of Beat Mixing ....................................................................................................................................... 10

6.3

Key Detection Algorithms ....................................................................................................................................... 14

6.3.1

Musical key extraction from audio..................................................................................................................... 14

6.3.2

Chord Segmentation and Recognition using EM-Trained Hidden Markov Models ................................ 15

6.3.3

Automatic Chord Recognition from Audio Using Enhanced Pitch Class Profile .................................... 16

6.3.4

A Robust Predominant-F0 Estimation Method for Real-Time Detection of Melody and Bass Lines in CD Recording........................................................................................................................................................ 18

6.3.5

A computational model of harmonic chord recognition ............................................................................... 20

6.4 6.4.1

Tempo and Beat Analysis of Acoustic Musical Signals.................................................................................. 20

6.4.2

Analysis of the Meter of Acoustic Musical Signals ......................................................................................... 22

6.4.3

Audio Analysis using the Discrete Wavelet Transform ................................................................................. 23

6.4.4

Statistical streaming beat detection .................................................................................................................... 24

6.5

7.

8.

Beat Detection Algorithms ...................................................................................................................................... 20

Similar Projects / Software ...................................................................................................................................... 25

6.5.1

Traktor DJ Studio by Native Instruments........................................................................................................ 25

6.5.2

Rapid Evolution 2 ................................................................................................................................................. 26

6.5.3

Mixed in Key ......................................................................................................................................................... 27

6.5.4

MixMeister ............................................................................................................................................................. 28

Design .................................................................................................................................................................................... 29 7.1

System Architecture................................................................................................................................................... 29

7.2

Key Detection Algorithm Design Rationale ......................................................................................................... 29

7.3

Beat Detection Algorithm Design Rationale ........................................................................................................ 30

Implementation .................................................................................................................................................................... 32 8.1

System Implementation ............................................................................................................................................ 32

8.2

Detecting the Key ...................................................................................................................................................... 34

8.3

Detecting the Beats.................................................................................................................................................... 37

8.4

Calculating BPM Value ............................................................................................................................................. 38

4

8.5

Automatic Beat Matching ......................................................................................................................................... 39

8.6

Generating and animating the waveforms............................................................................................................. 39

9.

Testing .................................................................................................................................................................................... 41 9.1

Parameter Testing – Key Detection Algorithm.................................................................................................... 41

9.1.1

Bass threshold frequency..................................................................................................................................... 41

9.1.2

Choice of FFT window length ........................................................................................................................... 41

9.1.3

Harmonic Product Spectrum .............................................................................................................................. 42

9.1.4

Weighting System.................................................................................................................................................. 44

9.1.5

Time in between overlapping frames ................................................................................................................ 46

9.1.6

Downsampling ...................................................................................................................................................... 47

9.2

Parameter Evaluation – Beat Detection Algorithm ............................................................................................. 48

9.2.1

Size of Instant Energy .......................................................................................................................................... 48

9.2.2

Size of Average Energy ........................................................................................................................................ 48

9.2.3

Beat Interval ........................................................................................................................................................... 49

9.2.4

Low Pass Filtering ................................................................................................................................................ 50

10.

Evaluation ........................................................................................................................................................................ 51

10.1

Quantitative Evaluation ............................................................................................................................................ 51

10.1.1

Key Detection Accuracy Test with Dance Music ...................................................................................... 51

10.1.2

Key Detection Accuracy Test with Classical Music .................................................................................. 52

10.1.3

Beat Detection Accuracy Test ....................................................................................................................... 54

10.1.4

Performance Evaluation ................................................................................................................................. 56

10.2

Qualitative Evaluation ............................................................................................................................................... 56

10.2.1

Automatic Beat Matching............................................................................................................................... 56

10.2.2

Graphical User Interface ................................................................................................................................ 56

10.2.3

Pitch Shift and Time Stretching Functions ................................................................................................. 58

10.2.4

Overall Evaluation ........................................................................................................................................... 58

11.

Conclusion........................................................................................................................................................................ 59

11.1

Appraisal ...................................................................................................................................................................... 59

11.2

Further Work .............................................................................................................................................................. 60

12.

Bibliography ..................................................................................................................................................................... 61

13.

Appendix .......................................................................................................................................................................... 63

Appendix A: Introduction to Digital Signal Processing ..................................................................................................... 63 Appendix B: Specification ....................................................................................................................................................... 65 Aims of the project .............................................................................................................................................................. 65 Core Specification ................................................................................................................................................................ 65 Extended Specification........................................................................................................................................................ 66 Appendix C: User Guide ......................................................................................................................................................... 67 Loading a track into a deck ................................................................................................................................................ 67 Detecting the Key of a track .............................................................................................................................................. 69 Mixing two tracks ................................................................................................................................................................. 70

5

4. Table of Figures Figure 1: Crossfader in the left position ................................................................................................................. 10 Figure 2: Beats, Bars and Loops .............................................................................................................................. 10 Figure 3: Tracks in sync but not in phase ................................................................................................................ 11 Figure 4: Train wreck mix......................................................................................................................................... 11 Figure 5: Tracks in sync and in phase ..................................................................................................................... 11 Figure 6: Crossfader in central position ................................................................................................................. 12 Figure 7: Crossfader in right hand position ........................................................................................................... 12 Figure 8: Circle of Fifths and Camelot Easymix System ..................................................................................... 13 Figure 9: Flow diagram of the algorithm from Sheh et al(9) ................................................................................ 15 Figure 10: PCP vector of a C major triad .............................................................................................................. 16 Figure 11: Pitch Class Profile of A minor triad..................................................................................................... 17 Figure 12: Harmonic Product Spectrum ................................................................................................................ 17 Figure 13: Comparison of PCP and EPCP vectors from Lee(11)........................................................................ 18 Figure 14: Flow diagram of Goto’s algorithm (12).................................................................................................. 19 Figure 15: Overview of Scheirer's Algorithm(14) ................................................................................................... 21 Figure 16: Waveform showing Tatum, Tactus and Measure .............................................................................. 22 Figure 17: Overview of algorithm from Klapuri et al (15) .................................................................................... 22 Figure 18: Block diagram of algorithm from Tzanetakis et al (16) ....................................................................... 23 Figure 19: Traktor DJ Studio ................................................................................................................................... 25 Figure 20: Rapid Evolution 2 ................................................................................................................................... 26 Figure 21: Mixed in Key............................................................................................................................................ 27 Figure 22: MixMeister ............................................................................................................................................... 28 Figure 23: Overview of System Architecture ........................................................................................................ 29 Figure 24: System Overview..................................................................................................................................... 32 Figure 25: Key Detection Algorithm Flow Chart ................................................................................................. 34 Figure 26: Output from the STFT .......................................................................................................................... 35 Figure 27: Chroma Vector of C Major chord and its correlation with key templates .................................... 36 Figure 28: Overlapping of waveform images ........................................................................................................ 40 Figure 29: Illustration of the Harmonic Product Spectrum taken from (30) ..................................................... 43 Figure 30: Chroma Vector showing close correlation between many different key templates ..................... 45 Figure 31: F minor is detected correctly with the weighting system enabled................................................... 46 Figure 32: C minor is detected without the weighting system enabled ............................................................. 46 Figure 33: Too many beats detected with 50ms beat interval ............................................................................ 49 Figure 34: Beats being detected correctly with beat interval of 350ms ............................................................. 49 Figure 35: Sound energy variations detected as beats in silent areas of Quivver – Space Manoeuvres....... 55 Figure 36: The spacing between these detected beats is closer, leading to higher BPM calculation ............ 55 Figure 37: Sampling of a signal for 4-bit PCM...................................................................................................... 63 Figure 38: How FMOD stores audio data ............................................................................................................. 64 Figure 39: The Main Screen ..................................................................................................................................... 67 Figure 41: The Deck Control ................................................................................................................................... 68 Figure 40: Loading Sasha - Magnetic North into Deck A................................................................................... 68 Figure 42: Key Detection progress/results............................................................................................................ 69 Figure 43: Crossfader in left hand position ........................................................................................................... 70 Figure 44: Crossfader in central position ............................................................................................................... 71 Figure 45: Crossfader in right hand position ......................................................................................................... 71

6

5. Introduction This section sets out the aims and motivation for the project and introduces some of the concepts which will be discussed in greater detail further in the report.

5.1 Motivation for this project Beat mixing (or beat-matching) is a process employed by DJs to transition between two songs by changing the tempo of a new track to match that of the currently playing track, perfectly aligning the beats of one track with the beats of the other, then mixing or cross-fading between the two so that there is no pause between songs. This is used to keep the flow of the music constant for the pleasure of the listener, both through appreciation of the quality of the mix between records and the lack of time between tracks played back to back providing more variety in the melody and rhythm to dance to. Today's DJ software has simplified the task of beat mixing greatly; however, very few notable forays have addressed the idea of harmonic mixing. Two tracks can be beat-mixed together perfectly and still sound ‘off’. This is likely to be because the two tracks are out of tune with each other and their harmonic elements are in incompatible keys causing the melodies to clash. Harmonic mixing sets out to address this problem. Harmonic mixing is the natural evolution of beat mixing: mixing in compatible keys. It is the idea that the currently playing song should only be beat-mixed with another song of compatible key which will make the transition between the two songs sound pleasurable to the listener. This can give the DJ more creative freedom to perform a mix, as they do not have to rely on large segments of regular beats in order to make a transition between two songs, they can now start to overlay melody sections which are harmonic with each other. People with perfect pitch will find it easy to detect the key to a song (through years and years of practice) but there seems to be no automatic process in parallel to beat detection algorithms which would save DJs manually finding the key of every song of their 1000+ collection. Even when that's done, two songs with compatible keys will still not necessarily match, since changing the tempo of the songs to achieve the same speed will result in a change of key. For example a 6% increase/decrease in tempo, measured in beats per minute (BPM), would cause a change of one semitone in key, say C to Db minor. Timestretching algorithms are therefore essential to lock the key of the track and allow the BPM of the track to be altered independently of pitch/key. Pitch-shifting algorithms can change the pitch/key of the track without affecting the BPM.

5.2 Major Objectives The primary aim of this project is to design, implement and optimise a key detection algorithm which can work on polyphonic real-world audio. No key detection algorithm thus far can claim to be 100% accurate and as such there have been many different attempts at solving the problem with greater accuracy, each with their own strengths and weaknesses. As the finished program will be aimed mainly at DJs, the key detection algorithm should be able to accurately extract the key from various types of dance music. The main problems associated with this genre of music, is that there is a lot of emphasis on the bass line and bass drum, which may make it difficult for the key detection to give an accurate result. The key detection algorithm will also be tested on classical music which should enable the algorithm to give a more accurate result.

7

The other two major problems, the detection of beats and the calculation of an appropriate BPM value can be considered solved problems. There has been much research into the various ways of detecting the BPM from a piece of music and the main challenge is to find a suitable algorithm which will be able to detect beats and calculate a BPM value in the shortest amount of time while maintaining a certain amount of accuracy. As this project aims to aid DJs perform harmonic mixing, I will also attempt to implement an automatic real-time beat mixing algorithm, which will enable the DJ to beat-mix two tracks together based on their tempo and position of beats. Obviously the success of this feature will rely heavily on the accuracy of the above stated beat detection algorithm.

5.3 Deeper Objectives There are deeper objectives to this project than simply providing a DJ with an automatic key and tempo detection tool. This project aims to show that academic and state of the art music analysis techniques can be applied to real world problems in an efficient and reliable manner. Part of this will be to show that disparate areas of research can be combined together successfully. Finally the project aims to be more than just a research study of feasibility. The result of successful completion will be an application of sufficient reliability and quality that it can be released to, and used by, untrained computer users. This report only lightly touches on this facet of the project, as creating usable polished applications is a reasonably well solved problem, and the least interesting area of this project.

5.4 Report Layout •

The remainder of the report begins with a brief history of DJ mixing and illustrates the concepts of beat mixing and harmonic mixing in Chapter 6 (Background). We then discuss the main literature on beat detection and BPM calculation, along with selected literature on key extraction from music. Finally we compare the strengths and weaknesses of the state of the art to the aims of this project.

•

Chapter 7 (Design) gives a brief overview of the overall system design and the rationale behind the design of the algorithms.

•

A detailed description of the interesting or problematic areas of project implementation is given in Chapter 8 (Implementation). Trivial and/or uninteresting areas of the project are not mentioned and can be considered to have been implemented successfully.

•

The tests performed to determine the optimal values for the parameters of the algorithms are stated in Chapter 9 (Testing).

•

A quantitative and qualitative evaluation of the final product is made in Chapter 10 (Evaluation). Analysis of any anomalous results is given.

•

The report concludes with Chapter 11 (Conclusion) which covers the strengths and weaknesses of the project and details possible future work.

8

6. Background To fully understand this project requires a basic understanding of the process which a DJ will perform behind the turntables. First of all a brief history of DJ Mixing will explain the evolution of DJ mixing and the advancements in technology and music culture which brought us to where we are today. Then a more in-depth look at the ‘science’ of beat and harmonic mixing will follow, which will explain in detail the concept of beat mixing and the extra constraints that harmonic mixing implies. A discussion of the most applicable literature for detecting the beats and extracting the key from a song is given followed by an overview of software projects of a similar nature. An Introduction to Digital Signal Processing explaining some of the techniques used in the literature is given in the Appendix.

6.1 History of DJ Mixing The art of DJ mixing has come a long way since its early appearances. In general, its journey can be plotted to have passed through 4 basic stages. Before there was any mixing or blending together of songs, there was the Non-Mixing Jukebox DJ(1). Working with just one deck (or turntable) this DJs primary skill was to entertain an audience whilst playing requested music, usually at a wedding or some other celebration. The first dimension of mixing (Basic Fades)(1) occurred as DJs replaced bands as the primary form of club entertainment. The DJ, now working with two decks and a mixer would fade a new song over the end of the currently playing song, usually with calamitous results. As neither the beats nor the keys were in sync, the overlays would sound like train-wrecks. A train-wreck describes when two tracks are playing at the same time but their beats are not synchronised i.e. when your tracks cross, your train will crash(2). When the audience can hear this, it will sound like incoherent beats occurring at odd times and not making any musical sense. The second dimension of mixing (Cutting and Scratching)(1) coincided with the appearance of rap as a distinct vocal form. High torque turntables now allowed DJs to ‘cut’ by inserting short musical sections from a second source and ‘scratch’ by rapidly and rhythmically repeating single beats from a second source usually by manipulating a vinyl record as it played with their hand. Technological improvements brought about the 3rd Dimension of Mixing (Beat Mixing)(1). By now turntables had accurate speed stability thanks to the arrival of the direct-drive turntable motor as opposed to older belt-driven turntable motors which over time would wear out and cause records to turn in warped rotation and affect tempo. Technics(3) introduced the SL-1200 turntable in 1972 and by 1984 had added features suited to the needs of DJs wanting to beat-mix(4). Some of these features included pitch control, which allowed the DJ to adjust the speed of tracks to match one another. The fact that pressing the start button immediately started the turntable at the desired speed gave the DJ more confidence in starting a new track at exactly the right point and at the correct speed. Technics SL-1200 turntable also allowed the vinyl to be spun backwards for the first time to allow a DJ to carefully and precisely cue the starting position of the track to fall exactly on the onset of some desired beat. Separate from advancements in the technology of the equipment which DJs used to play their sets (live performances) on, was the advancement in technology in dance music production. Most dance music used electronic drums, which locked in a consistent tempo indefinitely. The speed stability of both the music and the turntables allowed DJs to overlay long segments of different records, as long as they could be synchronised, and beat mixing was born.

9

However, the limitation of beat mixing was that if the desired segments of both songs contained melodies, the result was usually unpleasant because of clashing keys. Thus the DJ either used trial and error to find songs with compatible keys or had to rely on there being a beat-only intro and outro section on each song. This is the reason why most dance music has two to three minutes of continual beats at the beginning and end of the song, to enable the DJ enough time to beat match tracks together. Harmonic mixing brings the fourth dimension, harmony, to DJ mixing technique, which only permits different melodies to be played simultaneously if they have compatible keys. The gradual shift away from analogue vinyl records to digital audio formats such as CD’s and MP3 combined with the development of time stretching algorithms made harmonic mixing possible. A time stretching algorithm locks in a certain key whilst allowing tempo to be altered independently. More and more DJs nowadays are letting computers do the beat mixing for them and focusing their attention on being more artistic and creative with their mixes.

6.2 Illustration of Beat Mixing 1. The DJ first starts their set with a song, we shall call it Song A. Song A has a tempo of 130BPM. Whilst Song A is playing, the DJ decides which song to play next, we will call this Song B. Song B has a lower BPM than Song A of 120BPM. The crossfader (part of the mixer) allows multiple audio outputs to be blended together into one output. At the start the crossfader will be in the left position as shown in Figure 1 below so that only the output from Deck A will be heard by the audience. Deck A

Deck B Song B loaded. Only DJ can hear this through headphones

Crossfader Song A currently playing

Output from Deck A only is audible to audience

Figure 1: Crossfader in the left position

2. As the DJ listens to Song B through their headphones, they detect that its tempo is slower than Song A. The DJ increases the tempo of Song B to 130BPM to match that of Song A. Nearly all modern dance music is written in the 4/4 common time signature i.e. 4 beats to every bar. A typical dance track contains a series of loops of n bars where n is a power of 2, usually 4,8,16 or 32. Assume that both Song A and Song B contain a series of 4 bar loops with 4 beats to every bar as illustrated in Figure 2: Downbeat for this loop

Downbeat for next loop

1

2

3

4

5

6

7

8

1

2

3

4

2

2

3

4

1 beat

9

10

11

12

13

14

15

16

1

3

2

3

4

4

2

3

4

1

Time

1 bar 1 loop

Figure 2: Beats, Bars and Loops

10

The downbeat is the first beat of a loop and is usually signified by an extra sound or accent, such as a cymbal crash. The DJ finds a downbeat towards the beginning of Song B (the first beat of Song B is normally used) and pauses Song B just before the onset of that downbeat. Song B is now cued and ready. 3. The DJ now waits for a downbeat to occur in Song A after the main melody has played out. Song B is started at the exact same time as the onset of the downbeat in Song A. Song B is still only audible through the DJs headphones. The DJ ensures that the two beats are in time and in phase. To be in time the beats of the two tracks must occur at the same time, to be in phase the downbeats of each track must occur at the same time. Below is an example where the two tracks are in time but not in phase: Downbeat for this loop

Downbeat for next loop

2

3

4

5

6

7

8

1

2

3

4

2

2

3

4

9

10

11

12

13

3

2

3

4

4

14

15

16

1

Song A

1

Time

2 3 4 1 Downbeat for this loop

7

8

9

10

11

12

13

14

15

16

1

2

3

4

5

6

2

3

4

3

2

3

4

4

2

3

4

1

2

3

4

2

2

Song B

6

Figure 3: Tracks in sync but not in phase

4. If the two tracks have different BPMs they will eventually go out of time and out of phase as the duration between the beats of each track will drift further and further apart. The following diagram shows this scenario, with Song B being the slower of the two tracks. This is a trainwreck mix: 2

3

4

5

6

7

8

1

2

3

4

2

2

3

4

9

10

11

12

13

14

15

16

1

3

2

3

4

4

2

3

4

1

Song A

1

Time

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

1

1

2

3

4

2

2

3

4

3

2

3

4

4

2

3

4

1

Song B

1

Figure 4: Train wreck mix

When the two tracks are in phase and have the same BPM they should be aligned like this: 2

3

4

5

6

7

8

1

2

3

4

2

2

3

4

9

10

11

12

13

14

15

16

1

3

2

3

4

4

2

3

4

1

Song A

1

Time

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

1

1

2

3

4

2

2

3

4

3

2

3

4

4

2

3

4

1

Song B

1

Figure 5: Tracks in sync and in phase

11

5. Once the tracks are in time and in phase, the DJ fades in the output from Deck B by sliding the crossfader to the middle. Both tracks are now audible and are being mixed.

Deck A

Deck B Crossfader

Song A currently playing

Output from both Decks is equally audible

Song B currently playing

Figure 6: Crossfader in central position

6. Finally after an arbitrary amount of time (or when Song A ends) the crossfader is moved all the way to the right so that only Song B is audible. Song A is taken off the Deck and the DJ chooses another song, i.e. Song C to replace Song A on Deck A. Song B will then be mixed into Song C. Thus we are now back at stage 1, and the cycle continues.

Deck A

Deck B Crossfader

Song C loaded

Output from Deck B is audible only

Song B currently playing

Figure 7: Crossfader in right hand position

Harmonic mixing adds constraints to the selection of the next track in stage one of the above cycle. The next track must be in a compatible key to the currently playing track. The circle of fifths illustrates relationships between compatible keys and is used by composers for correct sounding chord progressions(5). Any song is compatible with another song of the same key, its perfect fourth, fifth or relative major/minor. Using the circle of fifths, a song in C Major is compatible with another song of C Major, a song in F Major, a song in G Major or a song in A Minor. To make this easier to use, Camelot Sound came up with the ‘Easymix’ system where each key is assigned a keycode, 1-12A for Minor keys and 1-12B for Major keys(6). Using the easymix chart, a song with keycode 1A (A-Flat Minor) can be mixed together with another song of keycode 1A, 2A (E-Flat Minor), 12A (D-Flat Minor) or 1B (B Major). However, altering the tempo of a track by +/- 6% will alter its key by a semitone (it will shift its keycode by 7 steps). A song in E-Flat Minor (keycode 2A) becomes an E Minor (keycode 9A) song with a 6% increase.

12

Camelot Sound ‘Easymix’ System

Circle of Fifths

Figure 8: Circle of Fifths and Camelot Easymix System

Assume Song A in step 1 has a key of C Major (8B) with 130BPM and Song B has a key of F Major (7B) with 120BPM. The songs are compatible if played at their original tempo, but Song B has to be increased to 130 BPM to match the tempo of Song A. This is an 8.33% increase in the tempo and so Song B’s key now changes up a semitone to F-Sharp Major (2B) which is now incompatible with Song A. Timestretching algorithms solve this problem, the tempo of Song B can be increased to match Song A and Song B’s key remains at F Major which is harmonically compatible with Song A.

13

6.3 Key Detection Algorithms The extraction of key from audio is not new, but not often reported in literature. Many algorithms that are found in literature only work on symbolic data (e.g. MIDI or notated music) where the notes of the incoming signal are already known. For this project, the algorithm will need to work directly on incoming audio data with no prior knowledge of the notes which make up the song. Various different methods are used, varying from heavy use of spectral analysis, to statistical modelling, to modelling inner-hair cell dynamics. The algorithms presented below give a flavour of the research currently on-going into this challenging problem.

6.3.1 Musical key extraction from audio A key extraction algorithm that works directly on raw audio data is presented by Pauws(7). Its implementation is based on models of human auditory perception and music cognition. It is relatively straightforward and has minimal computing requirements. For each 100 millisecond section of the signal, it first down-samples the audio content to around 10kHz, which reduces significantly the computing cost and also cuts off any frequencies above 5kHz. It is assumed that these high frequencies will not contribute to the pitches in the lower frequency ranges. The ‘remaining’ samples in a frame are multiplied by a Hamming window, zero-padded, and the amplitude spectrum is calculated from a 1024-point FFT. A 12 dimension chroma vector (chromagram) is then calculated from the frequency spectrum, which converts the frequencies in the spectrum into the 12 musical notes, e.g. for pitch class C, this comes down to the six spectral regions centred around the pitch frequencies for C1 (32.7 Hz), C2 (65.4 Hz), C3 (130.8Hz), C4 (261.6 Hz), C5 (523.3 Hz) and C6 (1046.5 Hz). The chroma vector is normalised to show the relative ratios of each musical note in the frequency spectrum. Eventually there will be a chroma vector for each 100 millisecond section of the song. These are correlated with Krumhansl’s key profiles(8) and the key profile that has maximum correlation over all the computed chroma vector is taken as the most likely key. An evaluation with 237 CD recordings of classical piano sonatas indicated a classification accuracy of 75.1%. By considering the exact, relative, dominant, sub-dominant and parallel keys as similar keys, the accuracy is even 94.1%. The algorithm is quite basic and whilst it has the benefits of being fast, it suffers from using the FFT which although useful for detecting the frequency spectrum of a stationary signal, such as a chord played constantly, it is not suitable for extracting the frequencies of a non-stationary signal where major frequencies will change rapidly such as in any normal song. The method of scoring the most likely key could also be improved by weighting the maximum key relative to how close it was to the next likely detected key. This way, if two or more keys correlate highly for a single chromagram, the resulting winner is penalised by giving it a low weighting as it only just correlated higher than some other key. If one key dominates the correlation it is rewarded with a larger weighting and therefore is more likely to be the maximum key overall.

14

6.3.2 Chord Segmentation and Recognition using EM-Trained Hidden Markov Models Sheh et al.(9) describe a method of recognising the major chords in a piece of music using pitch class profiles and Hidden Markov Models (HMMs) trained using the Expectation Maximisation (EM) algorithm. The pitch class profile (PCP) was first proposed by Fujishima (10) and is the same idea as the above algorithms ‘chroma vector’, in which the Fourier transform intensities are mapped to the twelve semitone pitch classes corresponding to musical notes. First the input signal is transformed to the frequency domain using the short-time Fourier transform (STFT). The STFT has the advantage over the FFT of being able to determine frequency changes over time rather than simply just taking a snapshot of frequencies in a certain time span. Thus the STFT is more suited to frequency analysis of non stationary signals. The STFT is mapped to the Pitch Class Profile (PCP) features, which traditionally consist of 12dimensional vectors, with each dimension corresponding to the intensity of a semitone class (chroma). The procedure collapses pure tones of the same pitch class, independent of octave, to the same PCP bin. The PCP vectors are normalised to show the intensities of each pitch class relative to one another. Pre-determined PCP vectors are used as features to train a HMM with one state for each chord distinguished by the system. The EM algorithm calculates the mean and variance vector values and the transition probabilities for each chord HMM. Now the Viterbi algorithm is used to either forcibly align or recognise these labels. The PCP vector corresponding to a chord which aligned itself the most with the PCP vectors computed from the song is chosen as the most likely key. This algorithm performs well but attempting to code a hidden Markov model and the algorithms required in training it would be too time consuming for this project. Comparable results can be established using much simpler template matching techniques. The algorithm is also computationally expensive and as such only a short segment of a song is used to detect the key on. One major advantage of this project is the use of the STFT to analyse the frequencies and map them to the PCP / chroma vector. This is much more accurate than the FFT and this part of the algorithm can be used as part of a different key detection algorithm.

Figure 9: Flow diagram of the algorithm from Sheh et al(9)

15

6.3.3 Automatic Chord Recognition from Audio Using Enhanced Pitch Class Profile This algorithm (11) sets out to improve on other key detection algorithms which use a chromagram/PCP as the feature vector to identify chords. Some use a template matching algorithm to correlate the PCP with pre-determined PCP vectors for the 24 chords; others use a probabilistic model such as HMMs. The problem with the PCP in the template matching algorithm is that the templates which the PCP is matched against are binary i.e. since a C major triad comprises three notes at C (root),E (third), and G (fifth), the template for a C major triad is [1,0,0,0,1,0,0,1,0,0,0,0] where chord labelling is [C,C#,D,D#,E,F,F#,G,G#,A,A#,B]. However, the PCP from real world recordings will never be exactly binary because acoustic instruments produce overtones as well as fundamental tones. The PCP / chroma vector of a C major triad played on a piano is shown in Figure 10.

Figure 10: PCP vector of a C major triad

In Figure 10, even though the strongest peaks are found at C, E, and G, we can see that the chroma vector has nonzero intensities at all 12 pitch classes due to the overtones generated by the chord tones. This noisy chroma vector may cause confusion to the recognition systems with binary type templates, especially if two chords share one or more notes such as a major triad and its relative minor e.g. a C major triad and a C minor triad share two notes, C and G, and a C major triad and an A minor triad have notes C and E in common. Figure 11 shows an A minor triad and its correlation with the 24 keys. The A minor triad correlates highest with a C major chord and is identified incorrectly as C major. This is due to the fact that the intensity of the G in the A minor triad, which is not a chord tone, is greater than that of the A, which is a chord tone.

16

Figure 11: Pitch Class Profile of A minor triad

To overcome this problem, Lee has suggested taking the harmonic product spectrum (HPS) of the frequency spectrum, before computing the Enhanced Pitch Class Profile (EPCP) from the HPS. The algorithm for computing the HPS is very simple and is based on the harmonicity of the signal. Since most acoustic instruments and human voice produce a sound that has harmonics at the integer multiples of its fundamental frequency, decimating the original magnitude spectrum by an integer number will also yield a peak at its fundamental frequency. This should in theory eliminate the overtones which are produced and amplify the pure tones, leading to a more binary type EPCP. The following figure demonstrates the HPS and how it amplifies the main peak from the FFT whilst reducing the number of overtone frequencies.

Figure 12: Harmonic Product Spectrum

17

In Figure 13 below the EPCP vector from the above example (A minor), and its correlation with the 24 major/minor triad templates are shown. Overlaid are the conventional PCP vector and its correlation in dotted lines for comparison. We can clearly see from the figure that non-chord tones are suppressed enough to emphasize the chord tones only, which are A, C, and E in this example. This removes the ambiguity between its relative major triad, and the resulting correlation identifies the chord correctly as A minor.

Figure 13: Comparison of PCP and EPCP vectors from Lee(11)

This technique seems useful enough and could be used to optimise any key detection algorithm which uses PCP/chroma vectors. I am a little concerned that if used with dance music, which has a lot of intense low frequencies, such as the bass line and bass drum which may not be in key, that these frequencies will be amplified instead of the melodic frequencies which I intend to amplify, skewing the results in favour of the key of the bass line rather than the key of the melody.

6.3.4 A Robust Predominant-F0 Estimation Method for Real-Time Detection of Melody and Bass Lines in CD Recording Goto(12) describes a method, called PreFEst (Predominant-F0 Estimation Method), which can detect the melody and bass lines in complex real-world audio signals. F0 is shorthand notation for the fundamental frequency of the piece of music, or it’s key. The PreFEst obtains traces of the fundamental melody and bass lines under the following assumptions: • The melody and bass sounds have the harmonic structure. We do not care about the existence of the F0’s frequency component.

18

• The melody line has the most predominant harmonic structure in middle and high frequency regions and the bass line has the most predominant harmonic structure in a low frequency region. • The melody and bass lines tend to have temporally continuous traces. The diagram below shows an overview of the PreFEst. It first calculates instantaneous frequencies by using multi-rate signal processing techniques and extracts candidate frequency components on the basis of an instantaneous-frequency-related measure. The PreFEst basically estimates the F0 which is supported by predominant harmonic frequency components within an intentionally limited frequency range; by using two band-pass filters it limits the frequency range to middle and high regions for the melody line and low region for the bass line. It then forms a probability density function (PDF) of the F0, which represents the relative dominance of every possible harmonic structure. To form this F0’s PDF, it regards each set of the filtered frequency components as a weighted mixture of all possible harmonic-structure tone models and estimates their weights that can be interpreted as the F0’s PDF: the maximum-weight model corresponds to the most predominant harmonic structure. This estimation is carried out by using the Expectation Maximisation algorithm, which is an iterative technique for computing maximum likelihood estimates from incomplete data. Finally, multiple agents track the temporal trajectories of salient promising peaks in the F0’s PDF and the output F0 is determined on the basis of the most dominant and stable trajectory.

Figure 14: Flow diagram of Goto’s algorithm(12)

19

6.3.5 A computational model of harmonic chord recognition Walsh et al.(13) investigate the perception of harmonic chords by peripheral auditory processes and auditory grouping. The frequency selectivity of the auditory system is modelled using a bank of overlapping band-pass filters and a model of inner hair cell dynamics. By computing intervals between different classes of pitch, the model achieves considerable success in recognizing major, minor, dominant seventh, diminished and augmented chords. Part of the algorithm relies on an existing computational model of mechanical to neural transduction based on the hair cell-auditory-nerve fibre synapse. The output excitation function in response to an acoustic stimulus is a stream of spike events precisely located in time. The model describes the production, movement and dissipation of a transmitter substance in the region of the hair cell-auditorynerve fibre synapse. It is probably not feasible to implement the algorithm described in this paper; the aim here is just to demonstrate the wide variety of methods and theories that researchers are trying to apply to the problem of extracting the key from polyphonic audio signals.

6.4 Beat Detection Algorithms There are many different approaches to detecting the beats in a song. An overview of each algorithm is given below along with a brief discussion of its accuracy and applicability. The reader is encouraged to read each of the papers in full for more detail, the aim here is to give a brief introduction to the many ways in which beat detection can be performed. All diagrams from within this section come from their respective papers.

6.4.1 Tempo and Beat Analysis of Acoustic Musical Signals Scheirer’s paper(14) is one of the most frequently referenced papers on beat detection. The paper details the implementation of a fast, close to real time, beat detection system for music of many genres. The algorithm works by first dividing the music into six different frequency bands using a filterbank. This filterbank can be constructed by combining a low-pass and high-pass filter with many band-pass filters in between. The envelope of each frequency band is then calculated. The envelope is a highly smoothed representation of the positive values in a waveform. The differentials of each of the six envelopes are calculated, they are highest where the slopes in the envelope are steepest. The peaks of the differentials would give a good estimate of the beats in the music, but the algorithm in the paper uses a different method. Each differential is passed to a bank of comb filter resonators. In each bank of resonators, one of the comb filters will phase lock with the signal, where the resonant frequency of the filter matches the periodic modulation of the differential. The outputs of all of the comb filters are examined to see which ones have phase locked, and this information is tabulated for each frequency band.

20

Summing this data across the frequency bands gives a tempo (BPM) estimate for the music. Referring back to the peak points in the comb filters allows the exact occurrence of each beat to be determined. The beat detection strategy used in this paper has demonstrated high accuracy and has been implemented many times by different parties. It can cope with a wide variety of music genres and fits the requirements of this project. The speed of the algorithm may also be beneficial to this project. The algorithm is very complex and may be time consuming to implement. By working with music in a stream, it fails to take advantage of the ability to analyse all of the music as one element. This means that while the accuracy may be good enough to tap along with users in real time, it may not be able to determine the BPM to a sufficient accuracy for this project.

Figure 15: Overview of Scheirer's Algorithm(14)

21

6.4.2 Analysis of the Meter of Acoustic Musical Signals Klapuri et al(15) describe a method which analyses the meter of acoustic musical signals at the tactus, tatum, and measure pulse levels illustrated in Figure 16. The target signals are not limited to any particular music type but all the main Western genres, including classical music, are represented in the validation database.

Figure 16: Waveform showing Tatum, Tactus and Measure

An overview of the method is shown below. For the time-frequency analysis part, a technique is proposed which aims at measuring the degree of accentuation in a music signal.

Figure 17: Overview of algorithm from Klapuri et al (15)

Feature extraction for estimating the pulse periods and phases is performed using comb filter resonators very similar to those used by Scheirer in the above paper. This is followed by a probabilistic model where the period-lengths of the tactus, tatum, and measure pulses are jointly estimated and temporal continuity of the estimates is modelled. At each time instant, the periods of the pulses are estimated first and act as inputs to the phase model. The probabilistic models encode prior musical knowledge and lead to a more reliable and temporally stable meter tracking. An important aspect of this algorithm lies in the feature list creation block: the differentials of the loudness in 36 frequency sub-bands are combined into 4 ‘accent bands’, measuring the ‘degree of musical accentuation as a function of time’. The goal in this procedure is to account for subtle energy changes that might occur in narrow frequency sub-bands (e.g. harmonic or melodic changes) as well as wide-band energy changes (e.g. drum occurrences). The algorithm presented in this paper seems to output some good results across a wide variety of musical genres. However, due to the complexity of the many different parts which make up the algorithm it is a bit beyond the scope of the simple beat detection which this project aims to achieve. With the assumption that the system is designed only to work with music containing a prominent, distinguishable beat, implementing this algorithm would be like over-engineering the project and would use up valuable time ensuring that everything was working properly.

22

6.4.3 Audio Analysis using the Discrete Wavelet Transform Tzanetakis et al (16) describe an algorithm based on the DWT that is capable of automatically extracting beat information from real world musical signals with arbitrary timbral and polyphonic complexity. The beat detection algorithm is based on detecting the most salient periodicities of the signal. The signal is first decomposed into a number of octave frequency bands using the DWT. After that the time domain amplitude envelope of each band is extracted separately. This is achieved by low pass filtering each band, applying full wave rectification and down-sampling. The envelopes of each band are then summed together and an autocorrelation function is computed. The peaks of the autocorrelation function correspond to the various periodicities of the signal’s envelope. The first five peaks of the autocorrelation function are detected and their corresponding periodicities in BPM are calculated and added in a histogram. This process is repeated by iterating over the signal. The periodicity corresponding to the most prominent peak of the final histogram is the estimated tempo in BPM of the audio file. A block diagram of the beat detection algorithm is shown below.

Figure 18: Block diagram of algorithm from Tzanetakis et al (16) Key: WT: Wavelet Transform, LPF: Low Pass Filter, FWR: Full wave rectification, ↓: Downsampling, Norm: Normalisation, ACR: Autocorrelation, PKP: Peak Picking, Hist: Histogram

To evaluate the algorithm’s performance it was compared to the BPM detected manually by tapping the mouse with the music. The average time difference between the taps was used as the manual beat estimate. Twenty files containing a variety of music styles were used to evaluate the algorithm (5 HipHop, 3 Rock, 6 Jazz, 1 Blues, 3 Classical, 2 Ethnic). For most of the files the prominent beat was detected clearly (13/20) (i.e. the beat corresponded to the highest peak of the histogram). For 5/20 files the beat was detected as a histogram peak but it was not the highest, and for 2/20 no peak corresponding to the beat was found. In the pieces that the beat was not detected there was no dominant periodicity (these pieces were either classical music or jazz). In such cases humans rely on higher level information like grouping, melody and harmonic progression to perceive the primary beat from the interplay of multiple periodicities. This algorithm is different to the others in that it uses a more specialised version of the FFT algorithm to decompose the incoming signal into separate frequency bands. Whether this improves the beat detection is debatable, and it seems that the DWT is still relatively new technology. The test results show that the

23

algorithm performs well on music containing a constant beat, which is fine for this project, however the algorithm may also be too time consuming to implement.

6.4.4 Statistical streaming beat detection The human listening system determines the rhythm of music by detecting a pseudo – periodical succession of beats. The signal which is intercepted by the ear contains certain energy, this energy is converted into an electrical signal which the brain interprets. Obviously, the more energy the sound transports, the louder the sound will seem. But a sound will be heard as a beat only if this energy is largely superior to the sound's energy history. Therefore if the ear intercepts a monotonous sound with sometimes big energy peaks it will detect beats, however, if you play a continuous loud sound you will not perceive any beats. This algorithm assumes that beats are big variations of sound energy. Patin(17) presents a model whereby beats are detected by computing the average sound energy of the signal and comparing it to the instant sound energy. The instant energy will be the energy contained in 1024 samples, 1024 samples represent about 5 hundredths of a second which is pretty much 'instant'. The average energy should not be computed on the entire song, as some songs have both intense passages and more calm parts. The instant energy must be compared to the nearby average energy, for example if a song has an intense ending, the energy contained in this ending shouldn't influence the beat detection at the beginning. We detect a beat only when the energy is superior to a local energy average. Thus we will compute the average energy on say : 44032 samples which is about 1 second, that is to say we will assume that the hearing system only remembers of 1 second of the song to detect a beat. This 1 second time (44032 samples) is what we could call the human ear energy persistence model; it is a compromise between being too big and taking into account energies too far away, and being too small and becoming too close to the instant energy to make a valuable comparison.

24

6.5 Similar Projects / Software 6.5.1 Traktor DJ Studio by Native Instruments Traktor DJ Studio (18) is state of the art proprietary software enabling DJs to mix together up to four different tracks at the same time. Traktor’s beat detection system enables two tracks to be automatically beat-synchronised and manages to detect the beats well in most tracks with a prominent, regular beat. However, it does not produce good results when used with music of other genres such as classical and rock. Traktor offers a visualisation of the playing track and highlights the detected beats with visual beat markers. It has support for time-stretching of tracks and also basic tempo adjustment. Extra features of Traktor include a whole host of real time effects, such as reverb, delay, flange which can be applied, plus a selection of low-, mid- and high-pass filters. A file browser displays information about files for easy dragging and dropping of them onto the decks, and the program allows you to record and save your own mix as it happens, capturing any effects applied. Traktor is commercial software and is aimed at the professional DJ, however Traktor is missing a couple of features which this project aims to include. Traktor does not have any key detection algorithm capable of extracting the key from a digital audio file. Pitch shifting; enabling the pitch of the track to be adjusted without altering the tempo of the track is an aim of the project however is also not present in Traktor.

Figure 19: Traktor DJ Studio

25

6.5.2 Rapid Evolution 2 Rapid Evolution 2 (19) is free software which allows the user to import their music files and have them analysed in order to detect the BPM and the Key of the track. Based on the BPM and key extracted from an audio file, the system indicates which other songs would go well with the analysed song to produce a good harmonic mix. A unique element of rapid evolution is the availability of a virtual piano which can play the chord of the key detected in a song. This can be used to determine qualitatively how accurate the key detection of an audio file was, and would be a valuable feature in any program aimed at harmonic mixing. The program allows simultaneous playback of two files and has time-stretching functionality. Although this product strives to generate and display a lot of useful information to the harmonic mixing DJ, the graphical user interface is not the most intuitive. For example it is not obvious what the difference is between some of the buttons such as the ‘import’ and ‘add song(s)’ buttons, and a lot of the same controls and information is displayed in more than one area, making inefficient use of real-estate and confusing the user. The program does not have an automatic beat matching algorithm although this is planned for future release.

Figure 20: Rapid Evolution 2

26

6.5.3 Mixed in Key Mixed in Key (20) is a small commercial application whose sole purpose is to analyse files and extract the key and BPM from them and store the information in the files metadata. Mixed in Key uses Camelot’s easymix system to display the key as well as the formal musical notation. The software licenses a keydetection algorithm named tONaRT from zplane development(21) to detect the key from the audio file. The application is geared towards batch processing of several files at once. The software does not provide any way of playing the song and as such it does not support features such as pitch-shifting and timestretching.

Figure 21: Mixed in Key

27

6.5.4 MixMeister MixMeister (22) DJ mixing software is commercial software which allows users to ‘design’ a mix rather than create one in real time. With its unique timeline function it allows users to visualise the overlapping of two (or more) songs which they want to mix, enabling them to refine the mix so that for example the beats are perfectly aligned. It is much easier to create a perfect mix this way, as you have full control over the tempos of the tracks, and when they should both start and finish. The downside of this is that you would not be able to use MixMeister in a live situation, as it takes trial and error to align the songs perfectly. MixMeister is therefore aimed at people who want to create mixes for later use, such as creating their own mix CD. MixMeister has seemingly accurate BPM and Key Detection, making use of the Camelot notation to display the detected keys as Camelot keycode's. On the whole it is a solid application which creates a unique technique for DJ mixing which would not be possible without the advancement of computers in music analysis.

Figure 22: MixMeister

28

7. Design This section of the report gives a very brief description of the design and architecture of the project, before the reasons behind the design of the algorithms.

7.1 System Architecture The system was designed with the user in mind. As such the system was based around the need for a responsive, intuitive user interface. This meant keeping the graphical user interface (GUI) separate from the sound processing and from the main algorithms. The result is that the system has a modular architecture which can be broken down into three main areas: GUI, Core and Algorithms. The GUI comprises those classes which the user interacts with, and which the system uses to feedback information to the user about the state of the system. The core contains the functions which process the audio files when called upon by the user interacting with the interface. The algorithms are separated from the core logic as they apply specific routines on an audio file. When running, these routines should not hamper the smooth running of the program. They should work in the background independently of the core logic. For a more in-depth discussion on these areas see the implementation section.

GUI

Algorithms Beat Detection and Key Detection

Core Logic

Figure 23: Overview of System Architecture

The algorithms are separated into the key detection and beat detection algorithm. The rationale behind the design decisions for each algorithm is explained below.

7.2 Key Detection Algorithm Design Rationale Any key detection algorithm inevitably involves conversion of the signal from the time domain into the frequency domain, using either the Fourier, Constant Q or Wavelet transforms. Initially, I planned to write the entire algorithm in C# using the FFTW(23) (Fastest Fourier Transform in the West) library, which as the name suggests claims to perform the FFT transformation in the shortest amount of time. However, I was getting peculiar frequency spectrums which showed high intensities at very high frequencies (i.e. greater than 20kHz). Additionally, I learned that the FFT was not the best transform to use for non-stationary signals and so started looking for an efficient way of performing one of the other transforms more suited to non-stationary signals. 29

The transform I decided to use to convert the signal from the time to the frequency domain was the short time Fourier transform (STFT) which is essentially the FFT applied to small sections, or windows, of the signal at a time. Eventually I chose to use Matlab to develop the majority of the key detection algorithm. Matlab is a matrix based programming language and has excellent support for digital signal processing. Matlab uses the FFTW library to perform Fourier transformations. Recent versions of Matlab include the ‘Builder for .NET’ tool which conveniently converts Matlab code into a C# compatible dynamic link library which can be interfaced from the rest of my project in the C# language. To determine the key, the output from the STFT is mapped to a chroma vector. There are then two main techniques of matching the chroma vector to a key. Pattern matching techniques correlate the chroma vector against a series of pre-programmed key templates and record the highest correlating key. Probabilistic models involve developing and training a hidden Markov model, and recording the template which best aligns itself with the chroma vector. Pattern matching techniques were chosen for the design of the algorithm because they have shown to give similar results to probabilistic methods without the extra development time needed to program and train a HMM. The speed constraints of the key detection algorithm are not as tight as for the beat detection algorithm, as once a song has its key detected, that information will be stored in the songs ID3 tag and in future can be read by the program. Even so, it is still desirable for the process to take the shortest amount of time possible.

7.3 Beat Detection Algorithm Design Rationale The beat detection algorithm is based on the method set out by Patin(17) in ‘Statistical streaming beat detection’. This algorithm iteratively compares the instant energy of a piece of music with the average energy calculated over the past second. A beat is detected if this instant energy is significantly greater than the average energy. The concept is similar to the human hearing system in that when we listen to music, we only remember the past second or so of music. We are designing the algorithm primarily to be used with dance music. It is assumed that this type of music will have a consistent tempo throughout. This assumption means that the algorithm is unlikely to give good results when applied to music without a consistent tempo. It is also assumed that beats in this type of music are produced by a bass instrument such as a bass drum with low frequency. Because the algorithm does not convert the signal to the frequency domain, and works entirely in the time domain, the energies are based on the amplitude over the whole frequency spectrum. This means that a significant sound variation in the high frequencies could be detected as a beat just as much as one in the low frequencies. Applying a low pass filter to the signal should reduce the impact that high frequencies have on the detection of beats. The required accuracy for the beat detection for this project is to be within +/- 1.5% of the actual BPM. Bearing in mind that the majority of time was devoted to developing the key detection algorithm to a high standard, the method chosen was dictated by the time constraints of the project. Nevertheless, the algorithm claims to give good results with songs containing a dominant, consistent beat, so it is perfect for this project, which is intended for use with dance music. Patin’s method does not explicitly suggest a method for calculating a BPM value from the beats detected. Finding the BPM is not as simple as counting the number of beats detected in a minute. A comb filter could be used. This is a special type of filter that resonates at a certain frequency when a signal is passed 30

through it, that frequency is then used to calculate the BPM. Due to time constraints a more basic method was used to calculate the BPM; the average interval between similarly spaced beats is found and converted to a BPM value. The algorithm for detecting beats had to be accurate and fast at the same time, because each time a file is loaded into a deck, the program will detect its beats. Increasing the speed of the beat detection algorithm usually implies a trade off in the accuracy of the algorithm, so it was important to strike the right balance between speed and accuracy.

31

8. Implementation This section describes in detail the actual implementation of the system and algorithms plus any other interesting implementation areas of the project. This is not a complete account of all areas of the project, many small details are omitted and can be assumed to have been successfully implemented. This approach was taken in order to increase the readability of this report.

8.1 System Implementation The system was implemented in the C# language with the FMOD sound processing library(24) in mind. FMOD is an advanced platform independent front end to Microsoft’s DirectShow and Direct X API’s. It makes it much easier to develop a multimedia based program than using the API’s directly. It is aimed at the games industry and is used by many high profile game developers. FMOD is free for non-commercial use. Figure 24 illustrates a simplified overview of the system showing the main classes and their relationships.

Figure 24: System Overview

32

FMOD defines three main types which are used throughout the program: •

•

•

The System object initialises the FMOD engine, handles the creation and playing of Sound objects and is used to set global parameters for the FMOD engine, such as changing the size and type of buffers used by FMOD. There should only be one System object initialised throughout the whole program, for efficiency reasons, and I decided to keep this object in the core class and let other objects access it if and when they need it. This is the intuition behind the centralised design. The Sound object holds information on the type of audio file loaded, i.e. its length in samples, bytes and milliseconds, the number of channels (mono or stereo),and the bit-rate. It also reads the audio data in the sound file into a byte buffer, enabling custom operations and analysis to be performed on the raw audio data. The Channel object handles the parameters in which the sound is played, such as its volume, playback rate (tempo), pitch and current position.

For each deck, the Core class contains a corresponding Sound and Channel object. The GUI classes fire off events when certain actions are performed on them, these events are handled by the Core class which calls the appropriate FMOD function on the Sound and Channel objects corresponding to that deck. For example, when the Play button is pressed on deck A, an event is fired and sent to the Core class, the Core can tell from the message passed that deck A fired the event, so the core class knows to call FMOD’s play function on the sound object corresponding to deck A.. The GUI is made up of the following classes: •

•

• •

Deck – encapsulates the behaviour of a turntable i.e. loading, playing and pausing of sounds as well as controlling pitch and tempo. Each deck has a unique id which corresponds to the id of the relevant FMOD Sound and Channel objects. WaveForm – the Deck class contains a WaveForm class, which presents a zoomed-in animated visualisation of the currently loaded track. This visualisation contains beat markers which mark the precise location of where the beat detection algorithm detected a beat. The visualisation can be dragged forwards and backwards, mimicking the bi-directional rotation of a vinyl on a turntable. The waveform also displays a representation of the whole track, enabling the user to quickly skip to a certain position in the track. Mixer – blends the output from the two currently playing decks using the crossfader. Also adds functionality to filter out high or low frequencies for each track. MusicBrowser – displays the music files supported by the program on the user’s computer, and their corresponding metadata information, such as the BPM and Key that was detected by the program.

Both the algorithms run asynchronously in separate threads to the GUI and Core classes. This means that they run in the background and do not block the GUI thread. This enables the user, for example, to be playing a track in one deck, while at the same time loading a track in the other deck, whilst detecting the key of another track. Obviously, the more activities the user decides to perform simultaneously, the slower the performance of the system as the different threads all compete for CPU time. The structure of both algorithms is the same. The ‘worker’ class sets off the main routine asynchronously, and receives progress updates from the main routine which allow it to update the relevant progress bars. The worker is notified when the main routine has completed, causing the ‘results’ class to return the relevant results from the algorithm. For beat detection, this is more than just the estimated BPM result. Since the waveform generation happens in the same loop as the beat detection, the results class returns

33

arrays containing the values to be drawn onto the waveform. It also returns an array of the beat positions so that beat markers can be placed in the waveforms at the appropriate times.

8.2 Detecting the Key The audio file is broken down into non-overlapping sections of approx 5.5 seconds and the flow diagram shows the process which is applied to each section of the song, before a key for the whole song is chosen.

Figure 25: Key Detection Algorithm Flow Chart

In order to save computation time, my approach starts by converting the audio section to mono and downsampling to 11025Hz. Converting to mono involves taking the average of every two consecutive samples in the signal, reducing the number of samples by a factor of two. Downsampling further reduces the number of samples in the audio stream whilst still conveying enough information to perform accurate key detection. A side-effect of downsampling the audio file is that frequency content above 5512.5Hz is not considered, due to Nyquist’s theory. However, frequencies above this limit do not contribute much to the harmonic content of the song; the note with highest frequency detectable by the human ear is D# in octave 8, with a frequency of 4978.03Hz. After the pre-processing stage, the signal is passed to Matlab which performs an STFT of the signal using a hamming window of length 8192 samples. This is approximately 0.74s which is a relatively long analysis window in terms of musical harmony. Thus, to improve time resolution, frames are overlapped by an 1/8th of a window length giving a time resolution of 0.093s per frame. The STFT returns a spectrogram which shows a time-frequency plot, enabling you to see the intensities of frequencies at different time slices throughout the section of the song. Figure 26 shows a spectrogram of a C major chord played on the piano. You can see the most intense frequencies at around the 250 – 1500Hz range, and how the intensities gradually decay as time increases.

34

Figure 26: Output from the STFT

The next stage is to scan through the output from the STFT and map the frequencies in Hz to pitch classes or musical notes. The result will be a chroma vector, also called a Pitch Class Profile (PCP) or chromagram, which traditionally consist of 12-dimensional vectors, with each dimension corresponding to the intensity of a semitone class (chroma). The procedure collapses pure tones of the same pitch class, independent of octave, to the same chroma vector bin; for complex tones, the harmonics also fall into particular, related bins. Frequency to pitch mapping is achieved using the logarithmic characteristic of the equal temperament scale. STFT bins are mapped to chroma vector bins according to: 12 log ⁄ ⁄ 12

Equation 1

Where is the reference frequency corresponding to the first index in the chroma vector (0 ). I chose = 440Hz which is the frequency of pitch class A. is the sampling rate (11025Hz), is the size of the FFT in samples (8192). For each time slice, we calculate the value of each chroma vector element by summing the magnitude of all frequency bins that correspond to a particular pitch class i.e. for 0, 1," " " , 23,

∑':)'*) |& |

Equation 2

Once we have our normalised chroma vector, we need to match it against pre-defined templates representing the 24 possible keys (12 major, 12 minor). These templates are also 12 dimensional where each bin represents a pitch class. They are binary type, i.e. each bin is either 1 or 0. A C major chord consists of the notes C (root), E (third) and G (fifth), therefore, the template for the key of C Major would be [0,0,0,1,0,0,0,1,0,0,1,0] where the labelling of the template is [A,A#,B,C,C#,D,D#,E,F,F#,G,G#]. A G Major chord consists of the notes G, B and D, and so its template would be [0,0,1,0,0,1,0,0,0,0,1,0]. As can be seen from these examples, every template for the major triad is just a shifted version of the other. The minor key templates are the same as the major keys

35

but with the third shifted by one to the left. The template for a C minor chord (C,D#,G) is therefore [0,0,0,1,0,0,1,0,0,0,1,0] and the other minor keys are just a shifted version of this template. Templates for augmented, diminished, or 7th chords can be defined in a similar way. We will just deal with detecting of major and minor keys here, as the Camelot easymix system does not recognise other modes than these. Figure 27 shows the chroma vector of a C major chord played on the piano and its correlation with the 24 key templates.

Figure 27: Chroma Vector of C Major chord and its correlation with key templates

We now perform correlation of the computed chroma vector with each of the 24 key templates and get a correlation coefficient for each of the 24 keys. The correlation coefficient is calculated using:

+

∑- ∑.,-. / ,01-. / 12

3∑- ∑.,-. / ,04 -- ∑- ∑.1-. / 124

Equation 3

Where A and B are matrices of size m x n, in our case these will simply be vectors of size 12. We assign a weighting to the key that has the highest correlation which corresponds to the difference between its coefficient and the second highest correlation coefficient. For the weighting to be fair we need to normalise the correlation coefficients so that the highest value becomes 1. The weighting penalises the highest correlated key when the chroma vector correlates closely with other keys and the difference between them is minute, meaning that the key could possibly have been one of the other highly correlating keys. It rewards the highest key when the correlation coefficient is by far the highest value.

36

When we have reached the end of the song we will have several weightings, one for each 5.5 second segment of the song. To find the most likely key, we simply sum the weightings for each key and the key with the highest value at the end is selected as the most likely key. The detected key is then stored in the ID3 tag of the song so that in the future, this can just be read straight away without having to go through the whole process described above again. The ID3 Tag is written using the library Ultra ID3 Lib(25).

8.3 Detecting the Beats The basic intuition behind the beat detection algorithm is to find sections of the music where the instant energy in the signal is greater than some scaling of the average energy of the signal over the previous approximate second of music. The assumption made is that the instant energy in a signal will be much greater on the beat than between beats. This assumption is reasonable for songs with heavy down beats and little mid and high frequency “noise”. The audio file is first split into manageable sections. The reason behind splitting the file up is simply because reading the whole of the file in as one big chunk of data requires a lot of memory to cope with the large buffer containing the samples. It also causes a bottleneck on the entire system as the reading of the entire song takes up the majority of the CPU usage at that particular moment. The audio data is first converted to mono as in the key detection algorithm but is not downsampled. We then iteratively apply the following process to the signal. First we calculate the instant energy, 6, which is the energy contained in 512 samples. 512 samples are chosen for this length as it corresponds to one thousandth of a second which is pretty much instant. The instant energy is calculated using the following formula where X is the signal. 9:

6 7 &8

;*<

Equation 4

We then need to calculate the average energy. This is not calculated on the entire song, since a song may have an intense passage and also a calm part. The average energy is calculated on the last 44032 samples which is just short of a second. 44032 samples are chosen instead of 44100 because it is then more convenient to calculate the average energy by simply summing the past 86 instant energy readings (as 86 x 512 = 44032) and taking the average of them. We illustrate the calculation of the average energy, =2 , in Equation 5, where E is a history buffer of length 86 containing the past 86 instant energy readings. AB

1 =>

7=8 86 ;*<

Equation 5

Next we compare the current instant energy to the average energy over the past second multiplied by some constant C. To get the value of C we first compute the variance of the past 86 instant energies: AB

1

7=8 C =2 86 ;*<

Equation 6

37

The C constant is then computed using a linear regression of C and V with values: C0.0025714 H 1.5142857

Equation 7

A beat is detected only if the instant energy is greater than the average energy multiplied by C. Also, the chosen beat interval time must have elapsed between the last detected beat. The beat interval is a minimum time that separates adjacent beats, if a beat is detected but the beat interval has not elapsed since the last beat was detected, then the beat is rejected and not recorded. We continue this cycle by shifting the history buffer, E, one index to the right, making room for a new instant energy value whilst flushing the oldest. The new instant energy reading is placed in the first index of H. =2 is recalculated, as well as C, and we compare the new instant energy to =2 multiplied by C again and so on until we reach the end of the section of the song. We repeat the whole process for each section of the song until the end is reached. Each time a beat is detected its position is stored down so that a visual beat marker can be added at the correct location in the waveform. FMOD Sound objects have the ability to store markers called syncpoints within them, which, when the sound is played back and a syncpoint is read, a call-back is generated. So, each time a beat is detected, a syncpoint is automatically added to the sound object at the correct position in the song. This will come in useful later on for the automatic beat matching function.

8.4 Calculating BPM Value The BPM Value was originally calculated by keeping track of the number of beats detected in a 15 second section of the song and multiplying this number by 4, to get the number of beats in a minute, however this was very inaccurate and was very sensitive to the area of the song chosen to count the number of beats in the 15 second section. Some 15 second sections of a song may have many more beats than other 15 second sections of the song, so there needed to be a more accurate way. I noticed that the best way to get an accurate BPM estimate would be to make the estimate when the beats are consistently detected as being one after the other, in a dance song this is usually at the beginning and end of the track. To get a more accurate BPM this way, first we keep track of the time span in between adjacent detected beats in milliseconds, which I will refer to as the gap value from now on. The gap value is compared to the previous gap value. If they are equal, meaning that the beats are exactly spaced one after another, a similarity counter is incremented. If not, the similarity counter is reset. Depending on how high the counter is incremented, the actual gap values are added to a certain array corresponding to ranges of the similarity counter. The higher the similarity count gets, the more accurate the BPM estimate should be. At the end of the song, we take the average of the array that corresponds to the highest range of the similarity counter, if this array is empty, we take the average of the next array and so on until we find a non-empty array. Eventually we will have a value for the average gap value found throughout the song, but only those gap values which are similar are taken into account. To convert the gap value into a BPM estimate, we use the following formula: IJK

1000

60 LM NMOP6 Equation 8

38

8.5 Automatic Beat Matching The beat synchronisation consists of two elements, first getting the two tracks to the same tempo, second, starting the incoming track at the correct downbeat and at the same time as a beat occurs in the outgoing track. When both decks are loaded and the sync button is activated on one of the decks, the difference between its BPM and the other tracks BPM is found and converted into the appropriate tempo change which is applied to the track loaded in the deck. Now, both tracks are at the same tempo according to the BPM estimate given by the beat detection algorithm. To get the incoming track to start at the same time as when a beat happens in the outgoing track, the incoming track is cued up just before the first beat. With the sync button still enabled, when the play button is pressed, the track will start playing from that beat only when the track in the other deck encounters its next beat. We can tell when the next beat is read thanks to the syncpoints that have previously been added, and the call-backs that are generated when the current position of the playing track is equal to the position of a syncpoint. If the two tracks start to come out of sync due to slight differences in their ‘real’ tempo, there is a function which snaps both tracks to their next beat so that the two tracks will be back in sync. This is implemented by finding the nearest syncpoint in each track to the current play position, then finding the next syncpoint on from the current position in each track, and setting both tracks to the position of the next syncpoint. Since the position of a syncpoint corresponds to the position of a beat, both tracks should now skip to the position of their next beat and be in sync again.

8.6 Generating and animating the waveforms Although this area of the project is not to do with the actual beat and key detection algorithms, the waveforms were an interesting and challenging programming problem. I will document the most interesting aspects here. The generation of the waveforms is done at the same time as the beat detection to prevent having to scan through the song twice. For the zoomed in waveform I chose to display 6 seconds of the song at any instant. The current position of the song is illustrated by a vertical bar in the centre of the waveform display, this means that half of the waveform will show the past 3 seconds of the song that have just played and the other half will show 3 seconds which are about to play. Each pixel in the waveform corresponds to a number of samples in the song. The peak amplitude and peak negative amplitude found within these samples are plotted on the waveform. This process is carried out for each pixel until we have built up an image of the entire song. This image is placed into an image holder using the built-in .Net PictureBox control. Animating the waveforms involves using a timer to repeatedly find the current position of the track every 100 milliseconds, then translating the image. The amount to translate the image by is calculated by dividing the current position of the track by the width of the image to get a value in pixels. The translation is performed using matrix transformations. The problem with using the PictureBox control is that it has a limit on the size of images it can handle. This meant that loading a song longer than eight and a half minutes would cause an exception in the

39

PictureBox control, because a song this long generates a waveform image which is too wide for the PictureBox control to handle. To solve the problem, the images are broken down into smaller segments and placed in the PictureBox one after the other. But as only one image at a time can be translated, once the end of an image is reached there will be a gap before the next image is displayed by the PictureBox. To solve this further problem, the images are actually overlapped a certain amount, so that when one image is about to end, the next image is swapped in, eliminating the gap. The waveform appears to be one continuous image to the user, but it is actually several images all overlapped at certain positions.

Figure 28: Overlapping of waveform images

40

9. Testing This section covers the tests performed to determine the optimum value for the parameters used in the key and beat detection algorithms.

9.1 Parameter Testing – Key Detection Algorithm This section explains the tests performed to determine the optimum values for parameters in the key detection algorithm.

9.1.1 Bass threshold frequency When we scan through the STFT mapping the frequency components to pitch classes, we start from some lower bound, so that all frequencies below this lower bound are not taken into account. In effect, we are trying to filter out the bass line and bass drum from having an effect on the key detection result. The keys are displayed below with their corresponding Camelot keycode and formal notation where C = C Major, C m = C Minor, # = Sharp, b = Flat. Table 1: Effect of frequency cut-off on key detection Low Pass Cut-off None (1Hz) 32Hz 64Hz 96Hz 98Hz 100Hz 128Hz 250Hz

Lucid - I Can't Help Myself (4A)

Sasha & Emerson Scorchio (4A)

Quivver She Does [Quivver Mix] (4A)

Skip Raiders Another Day [Perfecto Remix] (3A)

Pulp Victim The World 99 [Lange Remix] (12A/9A)

Warrior Don't you want me (11A)

William Orbit Ravels Pavane [FC Remix] (11A)

4A / F m

7B / F

4A / F m

5B / Eb

8A / Am

11A / F#m

1A / Abm

4A / F m

7B / F

4A / F m

7B / F

8A / Am

11A / F#m

1A / Abm

4A / F m

3A / Bbm

4A / F m

3A / Bbm

3B / Db

3B / Db

3A / Bbm

4A / F m

4A / F m

4A / F m

3A / Bbm

12A / Dbm

11A / F#m

11A / F#m

4A / F m

4A / F m

4A / F m

3A / Bbm

12A / Dbm

11A / F#m

11A / F#m

4A / F m

3B / Db

4A / F m

3A / Bbm

12A / Dbm

11A / F#m

3B / Db

4A / F m

4A / F m

4A / F m

3A / Bbm

9A / Em

11A / F#m

11A / F#m

4A / F m

4B / Ab

4A / F m

3A / Bbm

1A / Abm

11A / F#m

3A / Bbm

We can see that the cut-off point has no effect on some tracks, whereas on others it alters the key detection result quite considerably. The most successful cut-off point seems to be around the 100Hz mark. It is interesting to see that such a small change in the value of this cut-off point can swing the key detection result one way or another, as going from 98Hz to 100Hz changes the key detection of the Sasha and William Orbit tracks. 98Hz is used in the final implementation as it was suggested in some literature as being quite reliable and it gives the correct reading for all the tracks tested.

9.1.2 Choice of FFT window length Choosing the correct size (N) for the FFT can affect the algorithm in the following ways. First, in order

41

to take advantage of the computational efficiency of the FFT algorithm, we want N to be a power of 2. Secondly, we want to choose a value of N that will not misrepresent the data. The larger we make N, the more data is being analysed in one FFT. We do not want to make the N too big that it takes in too much of the signal at a time, as this defeats the point of using the STFT. If N is too small, the STFT may not capture enough harmonic data and may lead to misinterpretation of the data. It should be noted that the actual window function is a Hamming window and this is fixed by the Matlab implementation of the STFT. Table 2: Effect of FFT Length on key detection

FFT Length in samples (Time) 1024 (0.023s) 2048 (0.046s) 4096 (0.093s) 8192 (0.186s) 16384 (0.372s)

Skip Raiders Another Day [Perfecto Remix] (3A)

Pulp Victim The World 99 [Lange Remix] (12A/9A)

Warrior Don't you want me (11A)

William Orbit Ravels Pavane [FC Remix] (11A)

4B / Ab

4B / Ab

12A / Dbm

11B / A

2A / Ebm

4B / Ab

4A / F m

3A / Bbm

12A / Dbm

12A / Dbm

3A / Bbm

4A / F m

4B / Ab

4A / F m

3A / Bbm

12B / E

11A / F#m

2B / F#

4A / F m

4A / F m

4A / F m

3A / Bbm

12A / Dbm

11A / F#m

11A / F#m

4A / F m

3A / Bbm

4A / F m

7B / F

12A / Dbm

11A / F#m

3B / Db

Lucid - I Can't Help Myself (4A)

Sasha & Emerson Scorchio (4A)

Quivver She Does [Quivver Mix] (4A)

4A / F m

8A / A m

4A / F m

Once again, the Lucid track is not affected by the length of the FFT. All of the other tracks are affected though. In most cases, the key is mistaken for its relative minor or major. An FFT length of 1024 samples, corresponding to 0.023s of audio data is too short for the analysis to capture enough of the harmonic data. 16384 samples, corresponding to 0.372s of audio data is too long, as the signal is unlikely to remain stationary during this time. The FFT size of 8192 gives the best results, probably because it strikes the right balance between capturing enough harmonic data to get a good analysis, without being long enough to allow the signal to change a lot. Therefore, the FFT in the final implementation is 8192 samples long.

9.1.3 Harmonic Product Spectrum Using the harmonic product spectrum to enhance the chroma vector is a technique suggested by Lee in ‘Automatic Chord Recognition from Audio Using Enhanced Pitch Class Profile’ detailed in the background section. It aims to remove all non-harmonic overtones which are inevitably produced by certain instruments and sounds, whilst amplifying the harmonic overtones of the signal. Most acoustic instruments and human voice produce a sound that has harmonics at the integer multiples of its fundamental frequency, so decimating the original frequency spectrum by an integer number will also yield a peak at its fundamental frequency. The HPS is calculated from the signal (X) using the following formula: U

QJRS T|&S| V*:

Equation 9

42

Figure 29: Illustration of the Harmonic Product Spectrum taken from (30)

Figure 29 illustrates what is happening in the Equation 9, for M = 5. In the case of chord recognition application, however, decimating the original spectrum by the powers of 2 turned out to work better than decimating by integer numbers, according to Lee. This is because harmonics not at the power of 2 or at the octave equivalents of the fundamental frequency may contribute to generating some energy at other pitch classes than those who comprise chord tones, thus preventing enhancing the spectrum. For example, the fifth harmonic of A in octave 3 is C# in octave 6, which is not a chord tone in an A minor triad. Therefore, the equation is modified as follows to reflect this: U

QJRS T|&2V S| V*:

Equation 10

I tested the Harmonic Product Spectrum with M = 3 for integer decimation and power-of-two decimation. The results are shown below:

43

Table 3: Effect of Harmonic Product Spectrum on key detection

HPS

None Integer Spacing Power of two spacing

Lucid - I Can't Help Myself (4A)

Sasha & Emerson Scorchio (4A)

Quivver She Does [Quivver Mix] (4A)

Skip Raiders Another Day [Perfecto Remix] (3A)

Pulp Victim The World 99 [Lange Remix] (12A/9A)

Warrior Don't you want me (11A)

William Orbit Ravels Pavane [FC Remix] (11A)

4A / F m

4A / F m

4A / F m

3A / Bbm

12A / Dbm

11A / F#m

11A / F#m

3B / Db

11A / F#m

3A / Bbm

2A / Ebm

11B / A

10B / D

1A / Abm

4A / F m

4A / F m

4A / F m

7B / F

12A / Dbm

4B / Ab

3B / Db

The HPS with integer spacing does not work well with the songs in the test set. The HPS with power of two spacing recognises the correct key in four out of the six tracks, but is still not perfect. It was decided, after further experimentation, that the HPS should not be used to detect the key of dance music tracks in the final implementation of the algorithm. The reason the HPS is not so successful with the test set maybe because there are few acoustic instruments used in the tracks, as dance music tends to use electronically generated sounds which may or may not produce harmonics in the same way as acoustic instruments. The HPS may be better suited to detecting the key of classical music or a genre which uses more acoustic instruments than dance music.

9.1.4 Weighting System The weighting system described in the implementation section assigns a weight to the highest correlating key for every section of the song. This weighting is the difference between the highest correlation coefficient and the second highest. Without the weighting system, the highest correlating key is always given a weighting of 1, no matter how close or remote the other correlating keys are. The results show the effect of the weighting system on the keys detected: Table 4: Effect of the weighting system on key detection

Weighting System

On Off

Skip Raiders - Another Day [Perfecto Remix] (3A)

Pulp Victim The World 99 [Lange Remix] (12A/9A)

Warrior Don't you want me (11A)

William Orbit Ravels Pavane [FC Remix] (11A)

4A / F m

3A / Bbm

12A / Dbm

11A / F#m

11A / F#m

4A / F m

5A / C m

12A / Dbm

11A / F#m

11A / F#m

Lucid - I Can't Help Myself (4A)

Sasha & Emerson Scorchio (4A)

Quivver She Does [Quivver Mix] (4A)

4A / F m

4A / F m

5A / C m

4B / Ab

The weighting system does make a difference to the key detected. Without the weighting system, the algorithm only detects 4 out of the 6 tracks correctly. Therefore the weighting system is used in the final implementation.

44

The reason the weighting system works well is because the chroma vector sometimes correlates highly with more than one key template. In the diagram below, the chroma vector has a strong reading in the A# pitch class, and smaller peaks at the pitch classes of C, F and G. The resulting correlation with the 24 key templates results in the highest correlation with A# minor, but also 5 other key templates correlate highly with the chroma vector. The weighting system recognises that the A# minor key has correlated the most but because it is so close to being another key chord, the weighting assigned is small, i.e. in this case, the difference between the correlation coefficient of A#m and Gm.

Figure 30: Chroma Vector showing close correlation between many different key templates

For the Lustral song in the test section, there is a competition between C minor and F minor for the highest detected key. Without the weighting system, C minor wins the competition and is chosen as the most likely key. With the weighting system on, F minor wins convincingly. This shows that although C minor may correlate highest the most amount of times, it must do so when there are other alternatives that correlate almost as high, and so it receives a smaller weighting overall. On the other hand, when F minor correlates, it must do so when there is less doubt about which key template the chroma vector matches, and so is assigned a higher weight. The bar charts below show the results of the key detection for the Lustral track with and without the weighting system.

45

Figure 31: F minor is detected correctly with the weighting system enabled

Figure 32: C minor is detected without the weighting system enabled

9.1.5 Time in between overlapping frames The time in between overlapping FFT frames of the STFT, also called the hopsize or stride, is another important parameter which can have an effect on the overall key result. A small hopsize, say 50ms equates to taking 20 FFT’s per second, whereas a hopsize of 1000ms means taking one overlapping FFT every second. Generally, the smaller the hopsize, the more accurate the FFT will be, however performing more FFT’s per second will affect performance.

46

Table 5: Effect of hopsize on key detection

Hopsize

50ms 75ms 100ms 200ms 500ms 1000ms

Lucid - I Can't Help Myself (4A)

Sasha & Emerson Scorchio (4A)

Quivver She Does [Quivver Mix] (4A)

Skip Raiders Another Day [Perfecto Remix] (3A)

Pulp Victim The World 99 [Lange Remix] (12A/9A)

Warrior Don't you want me (11A)

William Orbit Ravels Pavane [FC Remix] (11A)

4A / F m

4A / F m

4A / F m

3A / Bbm

12A / Dbm

11A / F#m

11A / F#m

4A / F m

4A / F m

4A / F m

3A / Bbm

12A / Dbm

11A / F#m

11A / F#m

4A / F m

4A / F m

4A / F m

3A / Bbm

12A / Dbm

11A / F#m

11A / F#m

4A / F m

4A / F m

4A / F m

7B / F

12A / Dbm

11A / F#m

11A / F#m

4A / F m

4A / F m

4A / F m

7B / F

12A / Dbm

11A / F#m

11A / F#m

4A / F m

4A / F m

4A / F m

7B / F

12A / Dbm

11A / F#m

11A / F#m

The hopsize does not affect the key detection result in any of the tracks except for the Skip Raiders track, at sizes greater than 100ms. 100ms is the final value chosen, which means taking an FFT frame every 10th of a second. This gives good time resolution to the STFT without a big performance hit.

9.1.6 Downsampling The downsampling of the song is done to speed the computation of the STFT up. I wanted to see the effect it would have, if any, on the actual key detected. Table 6: Effect of downsampling on key detection

Downsample Rate

4x - 11025Hz 2x - 22050Hz 1x - 44100Hz

Quivver She Does [Quivver Mix] (4A)

Skip Raiders - Another Day [Perfecto Remix] (3A)

Pulp Victim The World 99 [Lange Remix] (12A/9A)

Warrior Don't you want me (11A)

William Orbit Ravels Pavane [FC Remix] (11A)

4A / F m

4A / F m

3A / Bbm

12A / Dbm

11A / F#m

11A / F#m

4A / F m

4B / Ab

4A / F m

3A / Bbm

12B / E

11A / F#m

3B / Db

4A / F m

4A / F m

4A / F m

3A / Bbm

12A / Dbm

11A / F#m

3B / Db

Lucid - I Can't Help Myself (4A)

Sasha & Emerson Scorchio (4A)

4A / F m

For the Sasha and Pulp Victim tracks, downsampling the signal to 22050Hz results in the relative major of the actual key to be detected. Whereas for the William Orbit track, a Db major is detected instead of F# minor. Leaving the signal unaltered, at 44100Hz, the only key that was detected differently was the William Orbit track, again detecting a Db major. Although the downsampling is used to speed up the computation, I did not notice much difference in the time taken to perform the analysis at varying levels of sample rate. Even so, 11025Hz is used in the final implementation because this seems to be the standard for key detection algorithms.

47

9.2 Parameter Evaluation – Beat Detection Algorithm 9.2.1 Size of Instant Energy Varying the size of the instant energy buffer will affect the BPM result. A range of instant energy sizes were tested with six songs. Table 7: Effect of instant energy size on beat detection Size of Instant Energy (samples) 128 256 512 1024 2048 4096

Agnelli and Nelson Everyday [Lange Mix] (138.15)

Chakra Home [Above & Beyond Mix] (138)

Faithless Why Go (134.73)

Sasha & Emerson Scorchio (134.91)

William Orbit - Ravels Pavane [FC Remix] (137.85)

Xstasia Sweetness [Michael Woods Remix] (136)

137.84

137.82

163.09

145.48

155.92

138.32

137.82

137.82

140.81

133.72

141.71

136.01

138.51

134.34

136.73

134.73

138.85

136.01

136.45

136.45

137.45

136.01

139.68

136.01

143.56

127.22

118.34

107.65

113.43

67.96

75.08

46.11

71.74

118.94

46.11

129.21

An instant energy size of 512 samples consistently gives the closest BPM result, this size is used in the final implementation. However, for the Chakra song, an instant energy of 256 samples gave a closer result to the actual BPM value.

9.2.2 Size of Average Energy The size of the average energy buffer corresponds to how much of the song we compare the instant energy to. Table 8: Effect of average energy size on beat detection Size of Average Energy (Samples) 11008 22016 44032 88064 176128

Agnelli and Nelson Everyday [Lange Mix] (138.15)

Chakra Home [Above & Beyond Mix] (138)

Faithless Why Go (134.73)

Sasha & Emerson Scorchio (134.91)

William Orbit - Ravels Pavane [FC Remix] (137.85)

Xstasia Sweetness [Michael Woods Remix] (136)

139.68

134.68

128.48

135.84

127.56

136.01

138.53

136.3

139.68

135.06

138.43

136.01

138.51

134.34

136.73

134.73

138.85

136.01

138.74

166.72

141.42

135.18

138.93

136.01

142.57

166.72

153.82

135.52

140.31

136.01

48

44032 is chosen as the value for this parameter in the final implementation because it performed well in day to day use of the program.. This corresponds to just less than one second of audio data. The size of 22016 samples does identify the BPM of the Chakra, Sasha and William Orbit songs more closely, but not by much.

9.2.3 Beat Interval The beat interval al is the minimum time that must elapse between adjacent beats. Without this interval, beats are detected far too frequently. This is because a beat may consist of many peaks in the waveform, all of which are counted as separate beats if we do not separate them up using a minimum gap value. The following diagrams illustrate the problem; they are both waveform images of the start of Xstasia – Sweetness. The vertical green lines show where the beats are detected. The first is with the beat interval set at 50ms, s, you can see that beats are being detected too close to each other, especially at the beginning. In the second waveform, with the beat interval set at 350ms, the beats are being detected at the correct onset of the beat and at regular intervals.

Figure 33:: Too many beats detected with 50ms beat interval

Figure 34: Beats being detected correctly with beat interval of 350ms

Table 9:: Effect of beat interval size on beat detection Beat Interval 50ms 100ms 200ms 300ms 350ms 400ms 500ms

Agnelli and Nelson Everyday [Lange Mix] (138.15)

Chakra Home [Above & Beyond Mix] (138)

Faithless Why Go (134.73)

Sasha & Emerson Scorchio (134.91)

William Orbit Ravels Pavane [FC Remix] (137.85)

Xstasia Sweetness [Michael Woods Remix] (136)

1,033.61

1,033.61

1,033.61

923.76

492.2

685.57

177.34

574.24

574.24

228.18

369.16

136.01

138.51

283.19

184.99

179.22

250.2

136.01

138.51

195.77

136.73

137.25

136.01

136.01

138.51

134.34

136.73

134.73

138.85

136.01

139.26

131.95

136.01

134.88

137.32

136.01

68.87

68.87

85.96

77.71

100.72

93.46

350ms is chosen as the most suitable beat interval for separating out adjacent beats. Using a value too small a leads to a very high BPM, whereas using a value that is too long begins to cancel out actual beats rather than the extra peaks surrounding the onset of a beat.

49

9.2.4 Low Pass Filtering The idea behind low pass filtering the signal is to remove high frequencies and focus on the low frequencies which we assume dictate the beat of a typical dance music track. This also should help to smooth the waveform so that extra peaks surrounding a beat are not detected as separate beats, as illustrated above. Table 10: Effect of applying a low pass filter on beat detection

Low Pass Filter Cut-off Frequency No Low Pass Filter 10000Hz 5000Hz 2000Hz 1000Hz

Agnelli and Nelson Everyday [Lange Mix] (138.15)

Chakra Home [Above & Beyond Mix] (138)

Faithless Why Go (134.73)

Sasha & Emerson Scorchio (134.91)

William Orbit Ravels Pavane [FC Remix] (137.85)

Xstasia Sweetness [Michael Woods Remix] (136)

138.51

134.34

136.73

134.73

138.85

136.01

138.51

134.94

137.21

134.65

139.21

136.01

138.19

142.76

137.21

134.85

139.21

136.01

137.12

146.26

136.52

134.88

139.21

136.01

137.34

146.26

136.52

134.76

140.31

136.01

Applying the low pass filter did not affect the results that much. If anything it reduced the quality of the BPM estimate. In the final implementation I decided not to use a low pass filter.

50

10. Evaluation This section gives a critical evaluation of the finished project. It is divided into two parts, the first gives a quantitative evaluation of the key and beat detection algorithms, the second part gives a qualitative evaluation of all sections of the project.

10.1 Quantitative Evaluation 10.1.1 Key Detection Accuracy Test with Dance Music The accuracy of the algorithm was tested using a test set of 30 dance music tracks. The key was found for each of the tracks using three separate programs, MixMeister, Mixed in Key and Rapid Evolution 2. The keys found using these programs are compared with the key detected using the key detection algorithm developed in this project. The algorithm is judged to have detected the correct key if it is compatible with one of the keys detected by the other three programs, according to the Camelot sound easymix system. The keys are displayed below with their corresponding Camelot keycode and formal notation where C = C Major, C m = C Minor, # = Sharp, b = Flat. Table 11 MixMeister

Mixed in Key

Chris Lake & Rowan Blades - Filth

12A / C#m

12A / C#m

Rapid Evolution 2 12A / C#m

BT - Mercury and solace [Quivver Mix]

3A / A#m

3A / A#m

3A / A#m

3A / A#m

Yes

Lostep - Burma [Sasha Remix] Energy 52 - Cafe Del Mar [Three N One Mix] Dirty Bass - Emotional Soundscape

9A / E m

10A / Bm

9A / E m

9A / Em

Yes

8A / A m

3A / A#m

8A / A m

7B / F

No

10A / B m

11A / F#m

10A / B m

11A / F#m

Yes

5A / C m

6A / G m

5A / C m

5A / C m

Yes

8A / A m

7A / D m

8A / A m

11B / A

No**

11A / F#m

11A / F#m

11A / F#m

11A / F#m

Yes

4A / F m

4A / F m

4A / F m

4A / F m

Yes

8A / A m

7A / D m

8A / A m

8B / C

Yes

Lucid - I Can't Help Myself

4A / F m

4A / F m

4A / F m

4A / F m

Yes

Chicane - Saltwater [Original mix]

4A / F m

4A / F m

4A / F m

5A / C m

Yes

Sasha & Emerson - Scorchio

4A / Fm

4A / F m

4A / F m

4A / F m

Yes

Sasha - Arkham Asylum Sasha - Magnetic North [Subsky's In your face remix] Quivver - She Does [Quivver Mix]

10A / Bm

10A / Bm

10A / B m

10A / B m

Yes

5A / Cm

3A / A#m

5A / C m

8B / C

No**

4A / F m

4A / F m

4A / F m

4A / F m

Yes

Signum - First Strike Skip Raiders - Another Day [Perfecto Remix] Freefall - Skydive [John Johnson Vocal Mix] Solid Globe - North Pole Space Manoeuvres - Stage One [Seperation Mix] Quivver - Space Manoeuvres part 3

1A / Abm

2A / Ebm

1A / Abm

4B / Ab

No**

2A / D#m

3A / A#m

2A / D#m

3A / A#m

Yes

8A / A m

8A / A m

8A / A m

8A / A m

Yes

5A / C m

5A / C m

5A / C m

5A / C m

Yes

5A / C m

8A / A m

5A / C m

4A / F m

Yes†

9A / E m

10A / Bm

9A / E m

11B / A

No

Artist / Title

Dogzilla - Dogzilla Agnelli and Nelson - Everyday [Lange Mix] Evoke - Arms of Loren 2001 [Ferry Corsten Remix] Eye To Eye - Just Can't Get Enough [Lange Mix] Floyd - Come Together [Vocal Club Mix]

51

Key Detected

Correct Detection

5A / C m

No*

Starparty - I'm in Love [Fc & Rs Remix] Pulp Victim - The World 99 [Lange Remix] Travel - Bulgarian

7A / D m

7A / D m

7A / Dm

5A / C m

No

9A / E m

12B / E

9A / Em

12A / Dbm

Yes†

8A / A m

8A / A m

8A / Am

8A / A m

Yes

Warrior - If you want me [vocal club mix]

11A / F#m

11A / F#m

11A / F#m

11A / F#m

Yes

Faithless - Why Go William Orbit - Ravels Pavane [FC Remix] X-Cabs - Neuro 99 [X-Cabs mix] Xstasia - Sweetness [Michael Woods Remix]

10A / B m

10A / Bm

10A / Bm

2B / F#

No

11A / F#m

2B / F#

11A / F#m

11A / F#m

Yes

11A / F#m

11A / F#m

11A / F#m

11A / F#m

Yes

10A / B m

10A / Bm

8A / Am

10A / B m

Yes

The key detection correctly identified a compatible key to one of the other programs in 22 out of the 30 tracks, corresponding to a 73.3% success rate. Out of these 22 tracks, 20 were identified as being the exact same key as at least one of the other programs. The 2 others identified by † (Space Manoeuvres and Pulp Victim) were one step away from the key identified by the other programs using the Camelot easymix system. Of the 8 tracks which the key was not judged to have been detected correctly, one of the tracks identified by * (Chris Lake) was detected as a C minor instead of a C# minor, this is a semitone difference apart. The key was detected as being of the correct pitch class but the wrong mode (i.e. C major instead of C minor) in 3 tracks identified by ** (Agnelli and Nelson, Sasha, Signum). These can be considered near misses, because the key detected shares very similar chord notes to the key detected by the other programs. Also, the other programs are biased towards minor keys, as most dance music is written in minor keys. If we include these ‘near misses’ as being correct, the success rate would increase to 26/30 = 87%.

10.1.2 Key Detection Accuracy Test with Classical Music The key detection algorithm was also tested with 20 pieces of classical music, where the key of the piece is known in advance. The results are shown in the table. Table 12 Composer / Title

Actual Key

Key Detected

Correct

Wolfgang Amadeus Mozart - Requiem (K. 626) - Lacrimosa

D Minor

D Minor

Yes

Ludwig van Beethoven - Piano Concerto No. 5 (Op. 73) Adagio Un Poco Mosso

B Major

B Major

Yes

Wolfgang Amadeus Mozart - Klarinettenkonzert (K. 622) – Adagio

D Major

D Major

Yes

Antonio Lucio Vivaldi - Le Quattro Stagioni (Op. 8, RV 269) - La Primavera

E Major

C# Minor(Db)

Yes†

Johann Pachelbel - Kanon In D

D Major

G Major

Yes†

52

Antonin Dvorak - New World Symphony (Op. 95) - Largo

Db Major

C# Major(Db)

Yes

Sergej Vassiljevitsj Rachmaninoff - Piano Concerto No. 2 (Op. 18) - Adagio Sostenuto

Begins E Major Ends C Major

E Major

Yes

Tomaso Giovanni Albinoni - Adagio In Sol Minore

G Minor

G Minor

Yes

Ludwig van Beethoven - Symphony No. 7 (Op. 92)

A Major

C# Minor(Db)

No

Edvard Hagerup Grieg - Peer Gynt Suite No. 1 (Op. 46) – Morgenstemning

E Major

E Major

Yes

Gustav Mahler - Symphony No. 5

F Major

F Major

Yes

Samuel Osborne Barber - Adagio For Strings

F Major

F Major

Yes

Johann Sebastian Bach - Orchestersuite Nr. 3 (BWV 1068) – Air

D Major

D# Major

No*

Johann Sebastian Bach - Toccata E Fuga (BWV 565)

D Minor

D Minor

Yes

Georg Friederich Händel - Wassermusik (HWV 348)

F Major

F Major

Yes

Wolfgang Amadeus Mozart - Requiem (K. 626) - Introitus

D Minor

D Minor

Yes

Ludwig van Beethoven - Symphony No. 6 (Op. 68)

F Major

F Major

Yes

Johann Sebastian Bach - Orchestersuite Nr. 2 (BWV 1067) – Badinerie

D Major

B Minor

Yes†

The key detection algorithm detected a compatible key in 18 out of the 20 tracks tested, giving a success rate of 90%. In the tracks marked †, the key was detected as being either the relative minor or the major fifth of the actual key. The Camelot keycode represents this as being the same number but different letter, for Le Quattro Stagioni the actual key is 12B whereas the key detected was 12A, for Orchestersuite Nr. 2 the actual key is 10B whereas the key detected was 10A. For Kanon in D, the key is detected as G Major (9B) which is the major fifth of D Major (10B). For Orchestersuite Nr. 3 by Bach, the key detected was a semitone different from the actual key. In the case where a piece of music changes key often, as in Piano Concerto No. 2, which begins in E Major and ends in C Major, the key detection will pick up on the key which the piece remains in for the longest duration. The key detection performs considerably well on classical music, this is because acoustic instruments used in them give off more harmonic tones, making it more obvious which chords are being played. Also, unlike in the dance music tracks, there is no dominant bass line or bass drum which will affect the algorithms accuracy.

53

10.1.3 Beat Detection Accuracy Test The accuracy of the beat detection algorithm was tested by finding the BPM using four other programs (MixMeister, Mixed in Key, Rapid Evolution 2 and Traktor DJ Studio). The average of these four BPM estimates is compared against the BPM detected by the algorithm. Table 13 Artist / Title Chris Lake & Rowan Blades - Filth BT - Mercury and solace [Quivver Mix] Lostep - Burma [Sasha Remix] Energy 52 - Cafe Del Mar [Three N One Mix] Dirty Bass - Emotional Soundscape Dogzilla - Dogzilla Agnelli and Nelson Everyday [Lange Mix] Evoke - Arms of Loren 2001 [Ferry Corsten Remix] Eye To Eye - Just Can't Get Enough [Lange Mix] Floyd - Come Together [Vocal Club Mix] Free Radical - Surreal [En Motion Remix] Chakra - Home [Above & Beyond Mix] Lucid - I Can't Help Myself Chicane - Saltwater [Original mix] Sasha & Emerson Scorchio Sasha - Arkham Asylum Sasha - Magnetic North [Subsky's In your face remix] Quivver - She Does [Quivver Mix] Signum - First Strike Skip Raiders - Another Day [Perfecto Remix] Freefall - Skydive [John Johnson Vocal Mix] Solid Globe - North Pole Space Manoeuvres - Stage One [Seperation Mix] Quivver - Space Manoeuvres part 3 Starparty - I'm in Love [Fc & Rs Remix] Pulp Victim - The World 99 [Lange Remix] Travel - Bulgarian Warrior - If you want me [vocal club mix]

MixMeister

Mixed in Key

Rapid Evolution 2

Traktor DJ Studio

Average

BPM Detected

133.60

133.62

133.80

133.68

133.67

132.52

-1.16

(0.87%)

131.90

132.02

131.80

131.95

131.92

129.21

-2.71

(2.06%)

130.00

129.94

130.00

130.10

130.01

166.72

36.71

(28.23%)

133.00

133.00

133.20

132.96

133.04

132.52

-0.52

(0.39%)

141.10

140.90

141.00

141.16

141.04

139.68

-1.36

(0.96%)

135.00

135.20

135.20

134.83

135.06

135.92

0.87

(0.64%)

138.10

138.66

137.90

137.95

138.15

138.51

0.36

(0.26%)

137.80

137.91

137.90

137.67

137.82

137.21

-0.61

(0.44%)

137.80

137.80

137.60

137.81

137.75

139.68

1.93

(1.40%)

140.00

140.03

140.10

139.97

140.02

139.68

-0.34

(0.25%)

138.80

139.00

138.50

138.62

138.73

140.56

1.83

(1.32%)

138.00

137.82

138.30

137.88

138.00

136.30

-1.70

(1.23%)

131.00

130.94

131.10

130.85

130.97

138.02

7.05

(5.38%)

131.00

130.84

131.20

131.15

131.05

134.37

3.33

(2.54%)

135.00

135.02

134.80

134.80

134.91

134.73

-0.17

(0.13%)

126.00

125.80

126.00

126.07

125.97

126.05

0.09

(0.07%)

130.00

129.99

130.20

129.99

130.04

129.31

-0.73

(0.56%)

140.80

70.38*

140.70

140.52

140.67

139.35

-1.32

(0.94%)

139.70

139.60

139.80

139.43

139.63

139.68

0.05

(0.03%)

138.00

137.96

137.90

138.04

137.97

139.58

1.60

(1.16%)

135.00

135.02

134.90

135.07

135.00

139.20

4.20

(3.11%)

134.80

134.84

135.00

134.65

134.82

135.74

0.92

(0.68%)

131.90

131.89

131.80

132.12

131.93

132.52

0.59

(0.45%)

128.00

127.97

127.80

127.98

127.94

135.43

7.49

(5.86%)

136.00

135.86

135.90

136.08

135.96

136.01

0.05

(0.03%)

136.00

135.70

136.00

136.15

135.96

136.01

0.04

(0.03%)

138.20

138.21

138.10

138.32

138.21

139.68

1.47

(1.07%)

134.80

134.79

134.80

134.98

134.84

136.73

1.88

(1.40%)

54

Difference

Faithless - Why Go 135.00 William Orbit - Ravels 137.30 Pavane [FC Remix] X-Cabs - Neuro 99 [X139.90 Cabs mix] Xstasia - Sweetness 136.30 [Michael Woods Remix] *Value not used in calculation of average

134.80

134.80

134.34

134.73

136.73

1.99

(1.48%)

137.44

137.30

137.75

137.45

138.85

1.40

(1.02%)

139.85

140.10

139.87

139.93

139.68

-0.25

(0.18%)

136.14

136.00

135.78

136.05

136.01

-0.05

(0.04%)

The beat detection algorithm has an 80% success rate of detecting a BPM that is +/-- 1.5 % from the average BPM of the other four programs. The highest difference between the detected BPM and actual BPM was +36.7 +36.7 for Lostep – Burma. The reason why the BPM is not accurate is because this track does not have any sections of consistent beats. The track is actually a break-beat beat track, which means the main beat falls at irregular positions, unlike the consistent 4 beats to every bar tempo of most of the other tracks in the test set. However, the other programs all seem to agree closely on the BPM for this track, suggesting that the beat detection algorithm could be improved to detect beats in tracks without a consistent consis tempo. The other big anomaly is for Quivver – Space Manoeuvres part 3, where the BPM seems to have been calculated as 7.49 BPM too high. Closer analysis shows that the beats are being detected correctly, however, beats are also being detected in silent silen areas, this can be seen towards the left hand side of Figure 35 below.

Figure 35: Sound energy variations detected as beats in silent areas of Quivver – Space Manoeuvres

Figure 36 shows another part of the same track, here beats are being detected again during a relatively silent period of the track. The interval in between the beats being detected here is much shorter than it is in the diagram above, this could be why the BPM is mistaken for being higher than it actually is.

Figure 36: The spacing between these detected beats is closer, leading to higher BPM calculation

The problem is that the BPM calculation relies on consistent beats. If there are many any silent areas in a track where the algorithm detects beats, and these detected beats are equally spaced, then this will skew the BPM in favour of a different tempo. It is interesting that for Quivver – She Does, the BPM given by Mixed in Key is 70.38, which is approximately half the rate given by every other program, including my algorithm. Detecting the BPM as half the rate or twice the rate is a common problem, one that plagues humans as much as computers. This also shows that the algorithm can outperform outperform some of the other programs with certain tracks. Overall, the beat detection algorithm is very good at detecting and tracking the actual beats from the song. The he calculation of the actual BPM estimate from these beats is not as accurate compared to the other programs tested. Sometimes the algorithm detects very slight changes in sound energy as a beat when really it isn’t a beat. This can lead to calculation of an inaccurate BPM, usually one that is too high.

55

10.1.4 Performance Evaluation The performance of the algorithms were evaluated with 3 tracks of short, medium and long duration. The test computer specification is a mobile Intel Pentium 4 Processor running at 3.06GHz with 2 GB RAM. Table 14

Artist / Title Sasha - Arkham Asylum Travel - Bulgarian Nikkfurie - The a la menthe

Length 13:20 06:32 02:24

BPM Time (s) 24.527 10.324 3.949

Key Time (s) 69.489 26.788 9.233

The beat detection is quite fast even for the Sasha track which is more than 13 minutes long. The key detection algorithm is slower than the beat detection, this is inevitable as the analysis is much more indepth. Also, calling MATLAB from C# reduces performance slightly for the key detection algorithm.

10.2 Qualitative Evaluation 10.2.1 Automatic Beat Matching The automatic beat matching function does work as expected. However, it relies heavily on an accurate BPM estimate of both tracks that are loaded in the decks. Synchronising a track to the tempo of another adjusts its tempo by a suitable amount, so that it is equal to the detected BPM of the other track. Unfortunately, not all tracks are guaranteed to have an accurate enough BPM estimate for the beat matching function to work as intended. If the BPM estimate for either of the tracks is slightly inaccurate, then it will not take long for the two tracks to become out of sync with each other, and a train-wreck mix to take over. If the two tracks do come out of sync, it is then up to the user to try and alter the speed of one of the tracks to match the other. I attempted to use accurate timing functions to detect the timings of the beats of each playing track, and then to calculate which track needed speeding up or slowing down, so that the speed could be adjusted automatically. However, with the interval between beats being sensitive to even a slight difference in milliseconds, the timing was never accurate enough; any increase in CPU usage would have an effect on the interval being reported, so instead of timing the interval between beats I was also timing the speed at which the code was being processed. Once the tempo of the tracks are manually adjusted the snap to beat functionality comes in use and will then reposition both tracks at their next beat so that it is easy to tell whether or not the two tracks are at the same tempo or not. The ability to skip through the track in terms of its beats allows the user to reposition a track to its next or previous beat in the case when it is a beat behind or in-front of the other track.

10.2.2 Graphical User Interface The user interface of the program is designed for users who are experienced with either physical DJ equipment or other existing software. These users should find the design and layout easy to use and much less cluttered than existing DJ software. Anyone not used to this environment may find the various buttons, sliders and visual displays overwhelming at first sight, even so, universally recognisable functions such as play, pause, eject, mute and volume controls should be fairly intuitive.

56

The user interface was evaluated with users split into two categories, those that are familiar with DJ concepts and have had previous experience of DJ software or equipment, and those users who are familiar with computer software but unfamiliar with similar DJ software or equipment. Both sets of users could easily carry out simple tasks such as loading and playing tracks in the decks. The advanced users instantly recognised the layout of the software, the decks and the crossfader are fairly standard in any DJ software. The users unfamiliar with the DJ software did not understand the concept of beat mixing and therefore did not understand the process to go through in mixing two tracks together, even so, the controls for adjusting tempo and pitch were intuitive and they instantly understood how to use them. Both sets of users understood and could interact with the visual displays and both users could figure out how to detect the key of a track. Nielsen developed 10 heuristics by which a user interface can be evaluated. These are listed below with a brief description of how well the heuristic is fulfilled in the system. •

Visibility of system status The system always keeps users informed about what is going on, through the use of progress indicators i.e. during beat detection/waveform generation/key detection. A separate progress bar also shows the current CPU usage to the user.

•

Match between system and the real world The system does speak the user’s language; terms such as ‘Deck’, ‘Crossfader’, and ‘Mixer’ should be familiar to any budding DJ, whom this program is primarily aimed at. The layout of the system matches that of a real-life DJ setup, and the ability to drag the waveform forwards and backwards mimics the ability to push a vinyl record forwards and backwards on a real turntable.

•

User control and freedom The system does not have support for undo or redo functions. Allowing the user to be able to cancel loading a song, or cancel detecting the key of a song which they clicked on by accident would be a welcome addition.

•

Consistency and standards The system uses a consistent design and layout throughout. The user should be familiar with the words and terms used to describe functions or to display information.

•

Error prevention The system provides helpful error messages upon error-prone conditions, such as pressing the play button with no track loaded, or trying to load a track into a deck which is currently loading another track. The system exits gracefully if it cannot process a certain file.

•

Recognition rather than recall Status bars display the track which is currently being loaded into the deck, or which is currently being key detected, so the user does not forget which track they selected.

57

•

Flexibility and efficiency of use There are a few built-in accelerators which are a side effect of using Windows Forms to produce the GUI. These allow the user to control the sliders such as tempo/pitch/volume with the cursor keys rather than the mouse. Apart from this, the support for shortcuts to speed up common functionality is limited.

•

Aesthetic and minimalist design The system has a simplistic yet informative design. Only relevant information is ever displayed, and it is displayed in a clear concise manner.

•

Help users recognize, diagnose, and recover from errors The error messages are expressed in a plain language for the most part, and will explain what the problem is to the user and how to prevent the error happening again. The only error messages that may confuse the user are those which come from the FMOD system, but these should only occur when a corrupt file is loaded.

•

Help and documentation

The system can be used without documentation for the most part, as most of the functions it provides are straight forward. However, when it comes to mixing two tracks together, some help explaining the concepts to the novice user will be useful. The user guide is a brief summary of how to use the system and its various functions, and should provide enough help to enable even novice users to use the system effectively.

10.2.3 Pitch Shift and Time Stretching Functions The pitch shift and time stretching functions works as they should do. Adjusting the pitch slider does adjust the pitch of the track in the way expected, without altering the tempo of the song. Adjusting the tempo of the track does alter the tempo of the song, and with the key lock function disabled, it alters the pitch at the same time too. The change in the key of the track is displayed to the user as the tempo increases or decreases in steps of +/- 6%, because this change in tempo corresponds to a semitone change in the key. With the key lock enabled, adjusting the tempo of the track keeps the pitch constant, and so the key stays constant.

10.2.4 Overall Evaluation This project was implemented primarily in C# using the FMOD Library. The motivation behind using C# was based on familiarity with the language and the speed and ease at which development of a user interface is made possible with Windows Forms. Windows Forms is a fourth generation programming language in which the user can drag and drop various built-in controls such as buttons and sliders onto forms in a WYSIWYG format. This aids rapid application development. The disadvantage of using C# for such a project is the fact that compared to C++ or C code it is not as fast. This is because C# is an interpreted programming language, sometimes referred to as ‘managed’ code. The main libraries used in this project i.e. MATLAB and FMOD are developed in C++/C or ‘unmanaged code’, faster code that runs natively on the processor. The C# interface to these libraries

58

therefore suffers a slight loss of performance in the conversion from managed to unmanaged code; if the application had been developed in C++ / C then this would not have been a problem. Even so, the application still has good performance. FMOD was chosen because of its ease of use and the wide functionality it offered. There is no other library out there that can be used free of charge that offers the freedom, performance and breadth of functions which FMOD offers. Without FMOD, the project would not have been possible. MATLAB was used in the key detection algorithm primarily because of its support for the short time Fourier transform. It offers support for many other digital signal processing techniques and if the project was to be started again, MATLAB would probably be used to develop the beat detection algorithm as well.

11. Conclusion This chapter gives an appraisal of the system and discusses further work which can be undertaken to extend the project.

11.1 Appraisal The project’s primary aim was to create a tool which would aid DJs to perform harmonic mixing. This aim has been fulfilled to some extent; the project provides accurate key detection and reasonably accurate beat detection which help the DJ select suitable tracks, and provides functionality to enable the DJ to mix two selected tracks together. No other DJ software on the market today combines key detection, beat detection and the ability to mix tracks in real-time into one package. The key detection algorithm is the main algorithm which brings together many different ideas from current music analysis research. The short time Fourier transform used to transform the signal to the frequency domain has proven to be a very worthy alternative to other transforms used in key detection algorithms; the key detection has a 73% success rate on dance music, which could potentially reach 87% with minor improvements, and a 90% success rate used with classical music. The method of using a chroma vector with pattern matching techniques to select a key is the basis of many key detection algorithms described in the background section, and one which performs well in this algorithm. The weighting system used to reward the most suitable key has been shown to improve the accuracy of the key detection result on certain tracks where there are many key candidates. Finally, I have experimented with using the harmonic product spectrum to try to remove non-harmonic overtones and improve the accuracy of the computed chroma vector. Unfortunately, by removing some relevant harmonic frequencies, this optimisation has shown to be too aggressive and does not improve the results on dance music. The beat detection algorithm shows an 80% success rate at detecting the tempo to within +/- 1.5% of the actual BPM. This is good enough to enable a DJ to select tracks knowing that they are within a certain tempo range of one another, however, it is not accurate enough to perform automatic beat matching successfully. It gives very accurate results for songs which have areas of consistent beats, however it does not perform well on tracks which lack these features. The Parameter Evaluation showed that the algorithm is sensitive

59

to small changes in its parameters. The parameters chosen in the final implementation do not give optimum results for every track tested, so tweaking the parameters more to suit a wider variety of tracks would increase the accuracy. The automatic beat matching function works well when the two tracks that are used have an accurate BPM estimate, the function sets the tempo of the tracks equal to one another and starts the incoming tracks at the same time as a downbeat occurs in the currently playing track. The only problem is that most of the time, the BPM estimates, which tempo changes are based on, are slightly inaccurate which causes the mix to go out of sync. The usability of the tool ranked highly with users who were familiar with the concept of DJ mixing and also users who were not so familiar. The interface fulfils most of the ten usability heuristics set out by Nielsen as a way of analysing user interfaces.

11.2 Further Work There are various extensions to the work outlined in the Extended Specification of the Appendix. Some of these extensions have already been carried out in the final implementation. The following discussion covers opportunities for further work that have arisen as a result of implementing the system. The key detection algorithm can be improved in a number of ways. Different methods of transforming the signal into the frequency domain such as using the discrete wavelet transform, or the Constant Q(26) transform instead of the STFT could be investigated further. These may lead to more accurate chroma vectors being produced. Using a finely grained 36-bin chroma vector and applying a tuning algorithm as described in Harte(26), could lead to extra improvements in the quality of the algorithm, as could extending the weighting function; possibly to reward key candidates that came a close second or third in the correlation stage of the algorithm. Using a statistical approach to matching the chroma vector with key templates, such as training a Hidden Markov Model to detect the key could also increase the predictive accuracy of the algorithm. The actual function of the key detection algorithm could be extended beyond just identifying the overall key of the track. By splitting the song up into smaller sections, the key of each section of the song could be found and stored down. This could then be used to transcribe or ‘reverse engineer’ a song into basic manuscripts or even a symbolic audio format such as MIDI files. Chroma vectors are currently being used in music analysis research to identify repetitive sections of a song. For example, the chorus and verse sections of a song could be extracted based on how the key changes throughout the song. The beat detection algorithm could be improved by using an algorithm which converts the signal from the time domain to the frequency domain. The song could be split up into certain frequency bands ranging from low to high frequencies. The detection of beats could then be more selective, for example, if we assumed that beats only occur in low frequency bands, we can filter out beats detected in the higher frequency bands. A highly accurate beat detection algorithm capable of detecting the tempo to within +/- 0.1% of the actual BPM would improve the automatic beat detection function, meaning that the beats of each track would be guaranteed to stay in sync.

60

12. Bibliography 1. Camelot Sound. Harmonic-mixing.com - The History of DJ Mixing. [Online] www.harmonicmixing.com. 2. T. Beamish. A Taxonomy of DJs - Beatmatching. [Online] August 2001. http://www.cs.ubc.ca/~tbeamish/djtaxonomy/beatmatching.html. 3. Technics. Technics Europe. [Online] 2007. http://www.panasonic-europe.com/technics/. 4. A. Cosper. Art and History of DJ Mixing. [Online] 2007. http://www.tangentsunset.com/djmixing.htm. 5. Number A Productions. Scales and Key Signatures - The Method behind the Music. [Online] 2007. http://numbera.com/musictheory/theory/scalesandkeys.aspx. 6. Camelot Sound. Harmonic-Mixing.com - The Camelot Sound Easymix System. [Online] 2007. http://www.harmonic-mixing.com/overview/easymix.mv. 7. S. Pauws. Musical key extraction from audio. Proceedings of the 5th ISMIR. 2004, pp. 96-99. 8. C. Krumhansl. Cognitive Foundations of Musical Pitch. 1990. 9. A. Sheh and D.P.W. Ellis. Chord Segmentation and Recognition using EM-Trained Hidden Markov Models. Proceedings of the 4th ISMIR. 2003, pp. 183-189. 10. T. Fujishima. Realtime chord recognition of musical sound: A system using Common Lisp Music. 1999. 11. K. Lee. Automatic Chord Recognition from Audio Using Enhanced Pitch Class Profile. International Computer Music Conference. 2006. 12. M. Goto. A Robust Predominant-F0 Estimation Method for Real-Time Detection Of Melody and Bass Lines in CD Recordings. June 2000, pp. 757-760. 13. R. Walsh, D. O’Maidin. A computational model of harmonic chord recognition. 14. E.D. Scheirer. Tempo and beat analysis of acoustic musical signals. January 1998, Vol. 103, pp. 588601. 15. A.P. Klapuri, A.J. Eronen and J.T. Astola. Analysis of the meter of acoustic musical signals. January 2006, Vol. 14, pp. 342- 355. 16. G. Tzanetakis, G. Essl and P. Cook. Audio Analysis using the Discrete Wavelet Transform. September 2001. 17. F. Patin. Beat Detection Algorithms. [Online] 2007. http://www.gamedev.net/reference/programming/features/beatdetection/. 18. Native Instruments. Traktor DJ Studio. [Online] www.native-instruments.com. 19. Mix Share. Rapid Evolution. [Online] www.mixshare.com. 20. Y. Vorobyev. Mixed in Key. [Online] www.mixedinkey.com.

61

21. Z. Plane Development. tONaRT Key detection algorithm. [Online] www.zplane.de. 22. MixMeister Technology, LLC. MixMeister DJ Mixing Software. 23. M. Frigo, S.G. Johnson,. The Design and Implementation of FFTW3. Proceedings of the IEEE 93. 2005, Vol. 2, pp. 216-231. 24. Firelight Technologies. FMOD SoundSystem. [Online] www.fmod.org. 25. Hundred Miles Software. UltraID3Lib. [Online] 2007. http://home.fuse.net/honnert/hundred/?UltraID3Lib. 26. C.A. Harte and M.B. Sandler. Automatic Chord Identification Using a Quantised Chromagram. Proceedings of the 118th Convention of the Audio Engineering Society. 2005. 27. C. Bores. Introduction to DSP. [Online] www.bores.com. 28. S.M. Bernsee. DSP Dimension. [Online] www.DSPDimension.com. 29. Camelot Sound. Harmonic-Mixing. [Online] www.harmonic-mixing.com. 30. Mazurka Project. Harmonic Spectrum. [Online] 2007. http://sv.mazurka.org.uk/MzHarmonicSpectrum/. 31. B. Hollis. The Method behind the Music. [Online] www.numbera.com/musictheory/theory/.

62

13. Appendix Appendix A: Introduction to Digital Signal Processing Digital Signal Processing (DSP) is the process of manipulating manipulating a signal digitally either to analyse it or to create various possible effects at the output. Music can be thought of as a real world analogue signal. For music to be processed digitally by a computer it must be converted from a continuous analogue signal signal to a digital signal made up of a sequence of discrete samples. This conversion is called sampling. The analogue signal that we want to convert to a digital representation is sampled many times per second. Each sample is simply a binary encoding of the amplitude amplitude at that sampling instant. The sampling frequency is the number of samples obtained per second, measured in hertz (Hz). If we sample at too low a rate, we could miss changes in amplitude that occur between the taking of samples, and we could also mistake istake higher frequency signals for lower ones. This is called aliasing. To prevent aliasing from happening and to perfectly reconstruct the original signal completely and exactly, we must sample at a rate which is more than double the Nyquist Frequency. This is the highest frequency in the original signal that we want to sample. It is called the Nyquist frequency after it was proven in the Nyquist-Shannon Shannon sampling theorem. Most digital audio is recorded at 44,100 Hz (44.1KHz), so the Nyquist frequency for this sample rate would be 22,050 Hz, which is approximately the highest frequency perceivable by the human ear. PCM or Pulse code modulation is the most common method of encoding analogue audio signals in digital form. The diagram below shows the sampling of a signal for 4-bit PCM.

Figure 37: Sampling of a signal for 4-bit PCM

The following diagram displays how the FMOD library (24) stores raw PCM audio data in memory buffers and differentiates between samples, bytes and milliseconds. In this format it can be seen that a left-right left pair is called a sample.

63

Figure 38: How FMOD stores audio data

Once we have access to the audio data in digital format, we can apply many functions to manipulate it. Filtering is a common process used to either select or suppress certain frequency ranges of the signal. A low-pass filter removes all high frequency components from a signal above a certain bound, allowing the low frequencies to pass through it normally. In music, high frequencies equate to the treble and low frequencies equate to the bass. Therefore a low pass filter would return only the bass elements of a song. A high pass filter returns a signal containing only treble frequencies above a certain bound. A band-pass filter returns a signal containing frequencies between a specified lower and upper bound. Filtering is made possible through a technique called convolution. Convolution takes the original signal, and with a shifted reversed version of the same signal, it finds the amount of overlap between the two signals. It is a very general moving average of the two signals. This process is very computationally expensive as every point in the original signal has to be multiplied by the corresponding point in the transformed signal. Correlation is another function which shows how similar two signals are, and for how long they remain similar when one is shifted with respect to the other. The idea is the same as convolution, however the second signal is not reversed, just shifted by a certain factor. Correlating a signal with itself is called autocorrelation and can be used to extract a signal from noise. Full wave rectification takes the absolute values of each sample, so that a signal can be treated on a purely positive amplitude scale. Half wave rectification is an example of a clipper, in which negative valued samples of a signal are blocked, whilst positive valued samples are untouched. One of the most important techniques used in DSP (and in almost every algorithm described below) is called the Fourier Transform. Jean Baptiste Joseph Fourier showed that any signal could be reconstructed by summing together many different sine waves with different frequencies, amplitudes and phase. The discrete Fourier transform (DFT) is a specific type of Fourier transform used in signal processing. A computationally efficient algorithm to calculate the DFT is called the fast Fourier transform (FFT). There are many different algorithms to calculate the FFT, the most popular being the Cooley-Tukey algorithm. The FFT basically takes a signal in the time domain as input and returns a spectrum of the frequency components which make up every sample in the signal. Many techniques in DSP operate in the frequency domain so the FFT is a good way of converting a signal to the frequency domain from the time domain to enable certain operations to be carried out. An inverse FFT is used to convert the data back into the time domain. The short term Fourier transform (STFT) is a specialised version of the FFT more applicable to polyphonic, non-stationary signals such as music. Essentially, it applies the FFT to small sections of the signal at a time, in a process called windowing.

64

Wavelets and the Discrete Wavelet Transformation (DWT) offer an even better alternative to the STFT. More can be read about the DWT in the paper described below; Audio Analysis using the Discrete Wavelet Transform. More on DSP techniques in general can be found at the online introduction to DSP by Bores(27).

Appendix B: Specification Aims of the project The aim of this project is to analyse pieces of music to detect their tempo (beat detection) and their key (harmonic detection). The system will work by accepting up to two music files at a time from the user, analyse them, and return a visualisation of the audio content with beat markers, and an indication of the key of the song. There should be intuitive controls to enable the user to play, pause, stop, alter the volume, pitch-shift and time-stretch the song. There should also be a function which can automatically mix together two songs based on the detected beats and keys of the songs. The big idea of the project is to aid a DJ to perform a perfect beat and harmonic mix; to devise a program that will enable a DJ to mix together two songs which have a similar tempo, and a similar key, so that they ‘sound good’ together.

Core Specification The system must be able to analyse music in the following common file formats: .wav (Microsoft Wave files), .mp3 (MPEG I/II Layer 3), .wma (Windows Media Audio format), .ogg (Ogg Vorbis format). The system is only required to deal with music containing a prominent distinguishable beat. It is only expected to deal with music of a similar style to that which a DJ would be playing at a night club. The system must be able to detect the beats from the music files accurately, and provide visualisations of those beats to the user as the music file is played. The system must be able to detect the key of the track accurately, and display the detected key to the user. The system must be able to play two tracks at the same time, enabling the user to alter certain properties of each track independent of the other. The system should treat each track as a separate unit, rather like a physical deck or turntable in a DJ set-up. There should also be a mixer unit, which stands between the two tracks, and enables the user to mute or un-mute each track or to cross-fade one track into the other, so that they are both audible at the same time. The properties of each track that the user should be able to adjust are: its volume, its tempo (independent of pitch; time-stretching), its pitch (independent of tempo: pitch-shifting) and both its tempo and pitch together. By varying the tempo, the user is also varying the tracks key, for example a ± 6% adjustment in the BPM rate would cause a change of one semitone in key. This adjustment in key should also be taken into account and updated based on the changes in tempo. The system should be able to return some sort of visualisation of the currently playing tracks to the user. A plot of the waveform of the currently playing track along with visual beat-markers that mark out the start of a new beat would be very useful to a user looking to beat-match together two songs. An oscilloscope or VU meter outlining the amplitude or volume of each track would aid the user to detect which track needs to be turned up or down, so that when the tracks are mixed together, both tracks can be heard, and one does not drown out the other.

65

The BPM and key detected by the program should be stored in a tag within the music file, such as ID3v2 format (for MP3 files). This then makes it possible for the program to indicate which other tracks would be suitable to be mixed with a selected track. The system can indicate tracks with a compatible key to one that is selected based on Camelot’s easymix system as described in the background section. Additionally the system should be able to read and display common tags or metadata from files loaded such as the artist and title of a song. The system should be able to automatically beat-match together two tracks. First it detects the beats and key of each track. Then it time stretches or adjusts the speeds of each track to the same speed and a compatible key. Finally by overlaying one track on top of another at the beginning of a beat, the two tracks should be beat-matched (the beats of each track should fall at precisely the same time), making a seamless harmonic mix. The project is to be written in C#.net, making use of the power of windows forms for the graphical user interface. The FMOD sound system (24) will be the main library used. This library contains various functions that will enable the raw sound data to be extracted from the audio files and used in the algorithms. It includes a pitch-shifting algorithm which can also be used to do time stretching. This algorithm is based on code developed by Bernsee (28). The graphical user interface should be clear and intuitive. Using sliders the user should be easily able to adjust each property of the track. Buttons with intuitive icons should make it clear to the user what their function is.

Extended Specification If time permits, there are various extensions to the project that could be implemented to improve the overall functionality of the system. Improvements to the beat detection algorithm could be investigated in order to enable the system to detect beats in a wider range of musical styles, such as rock. The addition of cue- and loop-points gives the user more control over the mix. Cue-points enable the user to start playback of a file from a defined point, such as the time of the first beat. Loop-points enable the user to repeat certain regions of the song, possibly to extend the length of a mix. Various effects could be added to the mixer unit, such as reverb, echo and flange. Low and high-pass filters enable the user to filter out the sounds of one track which may not fit well with that of another, e.g. a low pass filter could be used to eliminate an irregular high-pitched hi-hat sound. The addition of real-time recording of mixes, including applied effects, would enable the user to look back over the mix to decide if two songs sound good together and to make pre-recorded mixing a possibility. Finally, extending the application with an integrated file browser displaying information about the users song library, would make it easier for the user to load new songs and generate playlists.

66

Appendix C: User Guide Upon loading the program, you are presented with the main screen. This consists of two decks, Deck A and Deck B, a Mixer containing volume controls and crossfader, and the music browser, which will display information on the tracks in your music library.

Deck A

Mixer

Deck B

Music Browser

Figure 39: The Main Screen

Loading a track into a deck A track can be loaded in one of three different ways. You can choose to load a track into a deck by dragging and dropping an appropriate file from windows explorer onto one of the decks. You can also click the eject button on the appropriate deck, this will load a file browser dialog where you can navigate to a music folder and load one of your songs. The alternative method is to use the built in music browser to navigate your computer for music folders. Once a folder is selected, the music browser will display information about the songs in the folder such as the artist/title/duration and BPM and Key if they have previously been detected. You can sort the tracks in ascending or descending order of any of these categories by clicking the column header of the appropriate category.

67

Figure 40: Loading Sasha - Magnetic North into Deck A

Select a track from the music browser and click the ‘Load in Deck A’ button to load the track into Deck A, the same applies for Deck B. You can also right click on the track and select the same option from the drop down menu. Once the song is loaded the deck will display the waveforms of the track; one which covers the whole track, and the zoomed-in view which displays 6 seconds of the audio at a time. The deck displays information on the track such as its BPM and Key. The deck now allows you to play and pause the track and to adjust its pitch and tempo.

Figure 41: The Deck Control

68

Detecting the Key of a track Follow the same process as described above for loading a track, however instead of selecting ‘Load into Deck A/B’, choose the ‘Detect Key’ option. The program will then begin to detect the key of the selected track. Once again, a progress bar indicates the progress of the key detection process. Once the key detection process has finished, a table showing the results will pop up, and the status bar will display the key and keycode which will be written to the ID3 tag of the audio file. In Figure 42 the key detected is clearly a C major.

Figure 42: Key Detection progress/results

69

Mixing two tracks First load tracks into Deck A and Deck B. These tracks should be selected based on their BPM and Key detected. A track with keycode 4A can be harmonically mixed with any track with keycode 3A, 4A, 5A or 4B. The BPM’s of the two tracks should not differ by much (+/- 10 BPM max); trying to mix tracks that differ largely in their BPMs will not usually sound good. In Figure 43, the two tracks are X-Cabs – Neuro 99 in deck A, with a keycode of 10A and BPM of 139.68, which is going to be mixed into Xstasia – Sweetness which has a compatible keycode of 11A and BPM of 136.01. 1. Make sure the crossfader is all the way to the left hand side and start the track in deck A. 2. Enable the sync button on the track in deck B. The sync button will sync the non playing track to the tempo of the playing track. In figure 41, you can see the tempo of Xstasia in deck B has been automatically increased by 3.67BPM (2.7%) to match the BPM of 139.68 of X-Cabs in deck A. 3. Now cue up the track in deck B to its first downbeat, as described in the ‘Illustration of beat mixing’ section of the background. This can be achieved by dragging the waveform until the first beat marker is found, and aligning the beat marker with the centre of the waveform.

Figure 43: Crossfader in left hand position

4. When the playing track reaches a downbeat press play on deck B. The track will only start playing when the next beat marker of song A passes the centre of the waveform. 5. Now both tracks are playing but because the crossfader is in the left hand position, only deck A’s output is audible. Drag the crossfader to the central position so that both tracks can now be heard, as in Figure 44. If the beats are in sync the mixed output will sound like one song still. If the beats are not matched, the beats of the two songs will clash at irregular intervals and the overall output will not make any musical sense!

70

Crossfader in central position, both songs playing at same time

Figure 44: Crossfader in central position

6. To correct an out of sync mix, click the ‘Sync to Beat’ button on either of the decks. This will shift the tracks to their next beat. 7. If the tracks still become out of sync, this means they are not at the correct tempo. To correct this make minor adjustments to the tempo of the song in deck B by clicking the + / - buttons on the tempo control depending on whether it is slower or faster than the other playing song. 8. Move the crossfader all the way over to the right once you are finished mixing the two songs, as in Figure 45. Now the song in deck B is playing and you can decide to load another song into deck A.

Figure 45: Crossfader in right hand position

71

Harmonic Mixing: Key & Beat Detection Algorithms

Short Description

Description

Comments

We need your help!