cover
next page >
Page i
Melodic Similarity Concepts, Procedures, and Applications Computing in Musicology 11 Edited by Walter B. Hewlett Eleanor Selfridge-Field The MIT Press Cambridge, Massachusetts London, England CCARH Stanford University Stanford, CA
title: author: publisher: isbn10 | asin: print isbn13: ebook isbn13: language: subject publication date: lcc: ddc: subject:
Melodic Similarity : Concepts, Procedures, and Applications Computing in Musicology ; 11 Hewlett, Walter B.; Selfridge-Field, Eleanor. MIT Press 0262581752 9780262581752 9780585069784 English Melodic analysis--Data processing. 1998 ML73.D57 1998eb 781.2 Melodic analysis--Data processing.
cover If you like this book, buy it!
next page >
inside front cover
The series Computing in Musicology is a co-publication of the Center for Computer Assisted Research in the Humanities and The MIT Press. Established in 1985, CM treats topics related to the acquisition, representation, and use of musical information in applications related to musical composition, sound, notation, analysis, and pedagogy and to significant digital collections of textual information supporting the study of music. Editorial matters and enquiries concerning submissions should be directed to CCARH, Braun #129, Stanford University, Stanford, CA 94305-3076. Prospective contributors should consult the guidelines given on the last page of this book and send a query to
[email protected]. Editors: WALTER B. HEWLETT, ELEANOR SELFRIDGE-FIELD Associate Editor: EDMUND CORREIA, JR. Assistant Editor: DON ANTHONY Advisory Board: MARIO BARON LELIO CAMILLERI TIM CRAWFORD EWA DAHLIG ICHIRO FUJINAGA DAVID HALPERIN JOHN WALTER HILL KEIJI HIRATA JOHN HOWARD DAVID HURON THOMAS J. MATHIESEN KIA NG JOHN STINSON YO TOMITA ARVID VOLLSNES LISA WHISTLECROFT FRANS WIERING Volume 11 and subsequent issues of Computing in Musicology are distributed by The MIT Press, Massachusetts Institute of Technology, Cambridge, MA, and London, England http://mitpress.mit.edu Back issues, highlights of which are listed on the inside back cover, are available from CCARH. CCARH welcomes queries, offprints, unpublished studies (including theses), notices of work-in-progress, and citations for Web sites of interest to its readers. Links may be found at http://musedata.stanford.edu
Page ii
This book contains characters with diacritics. When the characters can be represented using the ISO 8859-1 character set (http://www.w3.org/TR/images/latin1.gif), netLibrary will represent them as they appear in the original text, and most computers will be able to show the full characters correctly. In order to keep the text searchable and readable on most computers, characters with diacritics that are not part of the ISO 8859-1 list will be represented without their diacritical marks. © 1998 Center for Computer Assisted Research in the Humanities All rights reserved. No part of this book may be reproduced in any form by any electronic or mechanical means (including photocopying, recording, or information storage and retrieval) without permission in writing from the publisher. This issue of Computing in Musicology is dedicated to the memory of Prof. Dr. Helmut Schaffrath (1942-1994). ISBN 0-262-58175-2 ISSN 1057-9478 Library of Congress Catalog Card Number 98-88104 Printed and bound by Thomson-Shore, Dexter, Michigan. Java and UltraSPARC are registered trademarks of Sun Microsystems, Inc. Macintosh is a registered trademark of Apple Computer, Inc. MuseData is a registered trademark of the Center for Computer Assisted Research in the Humanities. SCORE is a registered trademark of San Andreas Press. Windows is a registered trademark of Microsoft Corp. Other product names mentioned in this text may also be protected. The Center for Computer Assisted Research in the Humanities is a non-profit educational research facility located at Braun #129, Stanford University, Stanford, CA 94305-3076. Tel.: (650) 725-9240; (800) JSB-MUSE
[email protected] Fax: (650) 725-9290 http://musedata.stanford.edu
Page iii
TABLE OF CONTENTS Preface
vii
I. Concepts and Procedures
1
1. Conceptual and Representational Issues in Melodic Comparison Eleanor Selfridge-Field CCARH, Stanford University
3 4
Concepts of Melody 15 Searchable Representations of Pitch 31 Searchable Representations of Duration and Accent 36 Strategies for Multi-dimensional Data Comparison 46 Prototypes, Reductions, and Similarity Searches 54 Conclusions 2. A Geometrical Algorithm for Melodic Difference Donncha Ó Maidín Department Of Computer Science, University Of Limerick, Ireland
65 66
General Properties of a Difference Algorithm 72 Implementation 3. String-Matching Techniques for Musical Similarity and Melodic Recognition Tim Crawford, Costas S. Iliopoulos, and Rajeev Raman Departments Of Music And Computer Science, King's College, London, UK
73 74
Introduction 76 String-Matching Problems in Musical Analysis
78 Exact-Match Algorithms 87 Inexact-Match Algorithms 89 Musical Examples
Page iv
4. Sequence-Based Melodic Comparison: A Dynamic-Programming Approach Lloyd A. Smith, Rodger J. McNab, and Ian H. Witten Department Of Computer Science, University Of Waikato, Hamilton, NZ
101 102
Previous Research in the Field 103 Sequence-Comparison Using Dynamic Programming 104 Algorithms for Melodic Comparison 108 Additional Operations for Comparing Music 108 Effect of Music Representation and Match Parameters 110 A Sample Application: Retrieving Tunes from Folksong Databases 115 Conclusion 5. Strategies for Sorting Musical Incipits John Howard Digital Librarian, Widener Library, Harvard University
119 120
Potentials and Practicalities 121 The Frankfurt Experience 123 The Harvard Experience 6. Signatures and Earmarks: Computer Recognition of Patterns in Music David Cope Porter College, University Of California, Santa Cruz
129 130
Signatures 134 Earmarks
II. Tools and Applications
139
7. A Multi-scale Neural-Network Model for Learning and Reproducing Chorale Variations Dominik Hörnel Institute For Logic, Complexity, And Deduction Systems, University Of Karlsruhe, Germany
141 142
Background 143 Task Description 144 A Multi-scale Neural Network Model 148 Motivic Classification and Recognition 148 Network Structure 151 Intervallic Representation 153 System Performance 156 Conclusions
Page v
8. Melodic Pattern-Detection Using MuSearch in Schubert's Die schöne Müllerin Nigel Nettheim University Of New South Wales, Australia
159 160
The MuSearch Program 163 Musical Examples 166 Conclusions 9. Rhythmic Elements of Melodic Process in Nagauta Shamisen Music Masato Yako Kyushu Institute Of Design, Fukuoka, Japan
169 171
Shamisen Music of the Edo Era 171 Towards a Catalogue of Melodic Types 174 Classification of Rhythmic Patterns 181 Observations on Nagauta Shamisen Rhythmic Patterns and Features of Melodic Process 182 Summary of Results III. Human Melodic Judgments
185
10. Concepts of Melodic Similarity in Music-Copyright Infringement Suits Charles Cronin School Of Information Management, University Of California, Berkeley
187 189
Fundamentals 192 Demonstrating Melodic Similarity 193 Some Case Decisions 207
Conclusion 11. Judgments of Human and Machine Authorship in Real and Artificial Folksongs Ewa Dahlig Helmut Schaffrath Laboratory, Polish Academy Of Sciences, Warsaw, Poland, and Helmut Schaffrath 211 [Hochschule für Musik, Essen University, Germany] 216 Correlations of Judgment with Musical Features 217 Correlations of Judgment with Social Factors
page_vi
Page vi
IV. Online Tools for Melodic Searching
221
12. MELDEX:A Web-based Melodic Index Service David Bainbridge Department Of Computer Science, University Of Waikato, Hamilton, New Zealand
223 224
Databases 226 Software 227 Search Algorithm 228 The New Zealand Digital Library Project 13. Themefinder:A Web-based Melodic Search Tool Andreas Kornstädt Computer Science Division, Department Of Human Interface Design, University Of 231 Hamburg 233 Using Themefinder 233 List of Features 236 Technical Background 236 Future Plans Index
237
Page vii
PREFACE With this issue, Computing in Musicology makes its debut as a co-publication with MIT Press. We believe that this change will benefit both our readers and our contributors. We welcome your comments. This special issue is devoted entirely to the subject of melodic processing. Much of what it discusses is more concerned with conceptualization than with technology. Yet it is technology that makes the subject topical. Computers offer the potential of enabling us to perform many tasks related to melodic searching, recognition, comparison, or generation, yet their purveyors sometimes convey a false sense of the ease with which we can produce useful results. Quite apart from the details of technology, we lack an articulate vocabulary for discussing melody. The intellectual apparatus for framing specific questions awaits development. Technology cannot proceed without it, except in superficial ways that may enjoy only temporary value. The subject of melody cuts across many subdisciplines including music history, theory, analysis, cognition, perception, and ethnic studies. The subject has many bridges to other disciplines as well. Thus two articles bring perspectives to similarity-searching in general from the domain of computer science. Another contribution recounts legal judgments on claims of melodic plagiarism. One incorporates procedures from the social sciences. In the introductory article some fundamental issues of conceptualization and data representation are considered. While the contributions included in this issue are representative of the range of approaches that have been proposed, they barely suggest all the possible implementations or adaptations to particular repertories or querytypes. Much important work is still to come. Many of the contributions presented here have been given at conferences but none has been published previously. A conference on computer applications in music (Belfast, 1991) was the original venue for the work of SelfridgeField. John Howard's work was presented at the study session organized by the International Musicological Society's Study Group on Musical Data and Computer Applications (Madrid, 1992). Ó Maidín's paper was given at a meeting of the same group (London, 1997), while drafts of the papers by the King's and Waikato consortia were given at a special follow-on conference (King's College, London, also 1997). The contributions of Hörnel and Yako have been given at conferences in their respective fields and
Page viii
countries. Cronin's paper provides the basis for a conference on music and copyright provisionally scheduled to take place in 1999. Kornstädt's work, which builds on software developed by David Huron, was scheduled to debut at the Colloquio sull'Informatica Musicale, held in collaboration with the IEEE Computer Society Task Force on Computer-Generated Music, in Gorizia (Italy) in September 1998. It is wholly appropriate that we dedicate this issue of Computing in Musicology to our late colleague, Prof. Dr. Helmut Schaffrath, who was prodigiously involved in the development of datasets and tools for melodic searching for twelve years prior to his untimely death in 1994. The value of his example and his direct contribution to the resolution of issues both conceptual and practical has become more apparent with each passing year. Users of the Essen databases now seem to exist on five continents. Helmut is remembered as the capable organizer of the study group on computer applications of the International Council on Traditional Music (ICTM), as a key member of the study group on computer applications of the International Musicological Society (IMS), as an imaginative teacher at the Essen Gesamthochschule für Musik, and as a diligent scholar of the indigenous musics of Europe, South America, and China. Prior to his studies in these areas he had been apprenticed to an organ-builder, which may well have provided some of the intellectual and technical foundations for his prescient grasp of issues and his inventiveness in methology. A substantial list of his publications accompanies his contribution to Beyond MIDI: The Handbook of Musical Codes (MIT Press, 1997). The article we include here, long dormant in our files, was summarized by Helmut a few months before his death from a more extensive study made in collaboration with Ewa Dahlig, who has become of the official custodian of his work. Significant contributions to the production of this issue were made possible by Don Anthony, who set most of the musical examples, by Akiko Orita, who explored the mysteries of obsolete Hiragana as used in shamisen music (thus enabling us to prepare the illustrations for Yako's article) and to Craig Sapp, who resolved many hardware and software issues at appropriate junctures. Ed Correia prepared camera-ready copy, implementing our new format and indexing the entire volume with characteristic diligence. Doug Sery and many other members of the MIT Press staff have been endlessly helpful in resolving the practical details of our new publication arrangements.
page_ix
Page ix
Readers should note that in December 1996 the Center for Computer Assisted Research in the Humanities moved to Stanford University and that address and contact information in earlier publications is now obsolete. Current particulars are given on the obverse of the title page. General information for contributors is found on the inside front cover and details of style are given on the last page of the book. The inside back cover itemizes the contents of previous issues of Computing in Musicology, which continue to be handled by CCARH.
STANFORD UNIVERSITY AUGUST 1998
Page 1
I. CONCEPTS AND PROCEDURES
Page 3
1 Conceptual and Representational Issues in Melodic Comparison Eleanor Selfridge-Field CCARH Braun Music Center Stanford University Stanford, CA 94305-3076
[email protected] Abstract Melodic discrimination is a fundamental requirement for many musical activities. It is essential to psychological recognition and often signifies cultural meaning. In computer applications, melodic searching and matching tasks have been extensively explored in the realm of bibliographical tools and monophonic repertories but have been relatively little used in analyses of polyphonic works. Computer approaches to melodic comparison often produce vague or unacceptably prolific lists of false ''matches." In some cases these occur because the underlying representation of the melodic material is too general. In other cases they arise from queries that are poorly adapted to the kind of repertory being examined or are inadequately articulated. Some queries assumed to be simple must operate within contexts that are conceptually complex or purposefully ambiguous. Lessons we have learned from finding-tool projects and monophonic repertories may help to clarify and resolve issues of representation, design, and conceptualization as we approach studies of entire corpora of encoded materials and daunting quests for "melodic similarity."
Page 4
The ability to recognize melodic similarity lies at the heart of many of the questions most commonly asked about music. It is melody that enables us to distinguish one work from another. It is melody that human beings are innately able to reproduce by singing, humming, and whistling. It is melody that makes music memorable: we are likely to recall a tune long after we have forgotten its text. It is often the subtlety of the effect that leads us to consider the possibility of engaging a computer in our research. Music bibliographers want to be able to overcome the deceptive impression of difference given by transposition or re-orchestration. They want help in resolving questions of identification and attribution. Music historians want dependable tools for locating similar tunes disguised by modular substitutions (as in the transmission of early chant), retexting (as in masses that parody motets), consolidation of parts (as in lute or keyboard transcriptions of chansons), and elaboration (as in divisions, diminutions, and variations). Folk-music researchers seeking to identify tune families depend on stress constants where rhythmic details, reflecting variations of text, may vary. 1.1 Concepts of Melody 1.1.1 Intellectual Frameworks For two centuries or more theorists have concentrated their attention on harmony, counterpoint, and "form" in examining the fabric of music of the past. Rightly they proclaim that these aspects of music distinguish European art music from that of other cultures. In consequence, the component elements of harmony and counterpoint are rigorously and systematically described in the literature of music theory. Such terms as "I6" and "second species" are unambiguous in meaning, at least from the perspective of observation: we would not identify a V7 as a I6 or fourth-species counterpoint as second. The rule-based vocabularies of harmony and counterpoint are moderately supportive of efforts at artificial composition in the styles of the sixteenth and eighteenth centuries. Schottstaedt's implementations of species counterpoint following Fux's teachings (1989) offer an apt example. It has generally been recognized by those engaged in generative applications that even those rule-systems which we regard as extensive are far from exhaustive. Thus the task of deriving "invisible" rules of practice, never expressed formally in music-theoretical works, has attracted welcome
Page 5
attention. The rules accumulated through Kemal Ebcioglu's * artificial harmonizations of chorale melodies set by J. S. Bach (1986; 1992), for example, now number more than 300. The intellectual framework for discussions of melody in European art and folk music is not yet nearly so well formed. Formal discussions of melody in German and Italian music theory of the eighteenth century are engaging but few, particularly in comparison with the copious literature on harmony, counterpoint, and form of the past three centuries, and even, vis-a-vis recent literature, with expositions of reductionist techniques of musical analysis or principles of music perception and cognition. This is partly the result of the idea, current in earlier eras, that the construction of melody allowed for inspiration and experimentation, for permutation and transformation. The "invention" of a melody was considered to be concept-based, but it was not rule-driven in the same way that harmonizations and realizations of form were. Folk melodies were considered to have arisen "unconsciously." There is little automatic agreement on definitions for various manifestations of melody. We may all believe we have a common focus in mind when we discuss our notions of a theme, a phrase, a motive, or a fugal subject. When it comes to the design of computer applications, however, we may disagree about thematic hierarchies, about phrase boundaries, about motivic overlaps, or about the termination point of a fugal subject. These conceptual lapses are quite paralytic, because computers cannot make intuitive judgments for themselves and there is too little consensus for them to make "scientific" judgments founded on our beliefs. The degree to which melody is a dominant part of the musical fabric is another conceptual variable. It goes without saying that folksongs in the Western tradition frequently offer a 1:1 ratio of melodic content to overall length, but the ratio is almost always diminished in art music. The extent to which it is diminished is highly variable. The point of departure for many recent discussions of melodic and structural processes has been the opening theme of Mozart's Piano Sonata K. 331 (Figure 1).
Page 38
Abstractions (2) and (4) can be collapsed into a single line in which the first operator following a numeral indicates melodic direction and the second indicates stress change, e.g., 1++1+-2-+2+-0++1--1-+1+-4-+. Zarhipov's system is noteworthy for its combination of explicit and implicit information, for its effort to coordinate several strings of data at the same time, and for its model of collapsing multiple variables into a single string for easier processing. Another approach to the coordination of elements representing both the pitch and duration domains is offered by the music psychologist Mari Riess Jones (1993). She differentiates between two kinds of pitch accents, melodiccontour accents and melodic-interval accents, and also between two kinds of implications of duration, which correspond to the literary concepts (traceable to antiquity) of qualitative and quantitative accents. She calls these strong-beat accents and elongational accents.From these she contemplates the possibility of investigating jointaccent structure and temporal phrasing.Joint-accent structure codes give a composite view of the amount of activity collected from earlier profiles, giving one stroke each for a "filled" beat, a "melodic" (i.e., pitch) accent, and a temporal accent (Figure 23). Jones's collective structures are conceptually similar to Lerdahl and Jackendoff's representation of metrical structure (1983: Chapter 4) in "time-span reductions."
Page 68
2.1.2 Pitch Differences For this repertory, two pitches under comparison may be converted into either a base-12 note number or to a base7 note number. In the first representation, which employs the MIDI key-number specifications for pitch, Middle C is represented by the number 60 and the note E above it by the number 64. The pitch difference between these two notes is taken as the positive difference between them, that is, 4. If, on the other hand, one were to use a base-7 representation, where Middle C = 1 and the E above it = 3, then a difference of 2 would be obtained. Using either of these pitch representations, an estimation of melodic distance between two tune-segments could be got by summing such (positive) pitch differences. This is very roughly analogous to calculating the sum of the lengths of the heavy vertical lines superimposed in Figure 1. This algorithm has its origin in the idea of representing musical pitch geometrically (Krumhansl, 1990). To date extensive use has been made of the base-12 representation in MIDI-based analysis systems. 2.1.3 Note Durations Intuitively it makes sense to influence difference measures by note lengths. Thus when all other things are equal, pairs of long notes contribute to the difference measure to a greater extent than pairs of short notes. Here individual pitch differences are weighted according to the width of the window to which they belong. 2.1.4 Metrical Stress The incorporation of metrical stress into the difference measure is achieved by assigning differential weights to notes that start at different places in a bar. These can be shown as a weight map which, for a piece in 6/8 time, might be assembled as in Table 1. The choice of the weights is arbitrary.
page_69
Page 69
Table 1 Stress weights for 6/8 time. Distance in Bar Weight 0 4 1/8 2 2/8 2 3/8 3 4/8 2 5/8 2 otherwise 1 The one-bar segments shown in Figure 2 are used for illustrating the operation of various difference algorithms.
Figure 2. Sample melodic segments for illustrating difference algorithms 2.1.5 Transpositions The difference algorithm can be expressed as
where p1k is the pitch of the note from the first segment at the kth window p2k is the pitch of the note from the second segment at the kth window wk is the width of window k wsk is the weight derived from metrical stress for window k
Page 70
The above formula can be expressed as
The creation of a key-independent version of the algorithm might be derived from a process of transposing one of the segments so as to minimize the above difference. From considering various transposed versions of one of these tune-segments, such as where the second tunesegment has been transposed up m semitones, we get
One possible way in which we can visualize a key-independent comparison is by making multiple estimates of the distance by means of one of the previous algorithms, where we allow one of the tune-segments to be transposed to all possible keys in the pitch vicinity of the other segment. A difference is calculated for each transposition, and the minimum value of this set of differences is taken as a measure of melodic similarity. This is illustrated in Figure 3 and in Figure 4, by considering a comparison between two related segments. We can see that the difference calculation for the original untransposed method gives 88, but that if the second segment is transposed down either a major or a minor second, a smaller value of 63 results. The process of finding this difference is equivalent to finding the value of m which minimizes
A well known theorem in statistics (Aitken, 1939) enables us to find the required value of m which minimizes the sum, without the repeated calculations involved above. This is where m is the median value of the sequence of pitch differences (p1k - p2k), with associated weight Wk. In statistical applications, Wk is normally interpreted as a frequency. The use of this theorem gives us a way of arriving at the answer efficiently.
Page 71
Figure 3. Two related tune-segments for comparison.
Figure 4. Calculation of a transformationally independent difference. This example illustrates the evaluation of difference between Segment 1 and various transpositions of Segment 2. The difference matrix shown as Table 2 was produced by a transposition-independent difference algorithm using windows and stress weighting.
Page 72
Table 2. Differences weighted by windows, stresses with transpositions. a b c d b 217 c 42 200 d 133 154 133 e 188 63 170 92 2.2 Implementation The algorithm was implemented in C.P.N. View (for further information see the ScoreView user manual in Ó Maidín, 1995: 175-240), in the form of the difference function shown earlier. Various settings can be made to select various features discussed here, together with some extra features. Selection of options may be made to control such things as key transitions, metrical stresses and window widths. Differences may be weighted by note durations at onset points. Selection may be made of base-7 (diatonic) or of base-12 (chromatic) pitch comparisons. Additionally, the adjustment of weights is allowed for, in cases where it is desirable to optimize the algorithm, as for example when a corpus is used for training the algorithm. In practice, the algorithm worked well with the intuitive weights, and it was possible, by experimentation, to pick a critical value of difference for a particular application that partitioned the melodic segments into similar and dissimilar pairs in a reasonably satisfactory way. References Aitken, A. C. Statistical Mathematics (Edinburgh: Oliver and Boyd, 1939), I, 32. Krumhansl, Carol L. Cognitive Foundations of Musical Pitch (Oxford, 1990), pp. 112-119. Ó Maidín, Donncha, "A Programmer's Environment for Music Analysis," Technical Report UL-CSIS-95-1. Copies are available from the Department of Computer Science, University of Limerick, Ireland.
Page 73
3 String-Matching Techniques for Musical Similarity and Melodic Recognition Tim Crawford Music Department King's College London London WC2R 2LS, England
[email protected] Costas S. Iliopoulos
[email protected] Rajeev Raman
[email protected] Algorithm Design Group Dept. of Computer Science King's College London London WC2R 2LS, England www.dcs.kcl.ac.uk Abstract The primary goal of this paper is to identify some computational tasks which arise in computer-assisted musical analysis and data retrieval, to formalize them as string-pattern-matching problems and to survey some possible algorithms for tackling them. The selection of computational tasks includes some foreseen as useful in research into such historical repertories as lute or keyboard music.
Page 74
3.1 Introduction We wish to identify some computational approaches used in biological and technical sciences which may be of value in pursuing certain kinds of musical queries and to formalize them as string-pattern-matching problems. The approaches are of two general kinds: (1) Approaches for which computationally efficient procedures are suggested in the computer-science literature. By identifying these solutions we hope to provide a basis by which musicologists and computer scientists may collaborate to develop efficient software for musical analysis problems. (2) Approaches for which computationally efficient solutions are not known to exist in the computer-science literature. By describing these unresolved problems we hope to stimulate further research in the field of stringalgorithm design. The approaches discussed here are representative rather than inclusive. 3.1.1 Objectives An important direction of our research is towards a formal definition of musical similarity between such musical entities as "themes" or "motifs." We aim to produce a quantitative measure, or "characteristic signature," of a musical entity. This measure is essential for melodic recognition. It could have multiple uses, including that of data retrieval from musical databases. The ideal characteristic signature would be derived from the pattern of notes (and other musical objects) as they occur in temporal sequence in the musical entity. The note-pattern itself may be derived from an unstructured audio input or from symbolically encoded score data containing a high degree of logical structure, or some intermediate state, such as a stream of MIDI commands, wherein pitches may be clearly identifiable but their structural relationship is not clear. Note-pattern derivation is not considered in this paper.
page_75
Page 75
Two musical entities which are said to be similar will be expected to have matching characteristic signatures, that is, both entities will satisfy a set of properties (Cambouropoulos and Smaill 1995). A property is said to be satisfied when it achieves a certain score. Each property is assigned a certain weight, and the characteristic signature is the combination of the weighted properties of objects in the musical entity. Intuitively, these properties will encode patterns in the musical entity. It is hoped that the pattern-matching and pattern-discovering problems that we discuss in this paper will provide the basis for obtaining a set of properties that can be used as parameters in musical similarity and as a touchstone for creating the characteristic signature of a piece of music. 3.1.2 Computational Resources for Musicologists Textbooks on string algorithms, although few, may sometimes be useful. Apostolico and Galil (1985, 1997) provide surveys of the majority of fundamental algorithms and results on strings. The textbook of Crochemore and Rytter (1996) is the only one covering a plethora of problems in this area. Aho (1990) gives an excellent survey focussing on a certain set of problems. In relation to possible connections between computational biology and computer-assisted musicology, Setubal and Meidanis (1997) give an excellent introduction to computational molecular biology. Some software packages designed for molecular biology may possibly be useful for musical analysis. FAST is a family of programs for sequence database search: for example, FASTP is a subprogram for general sequence comparison, LFASTA is a tool for local similarity comparison, and TFASTA is a sequence query tool. The FAST implementation is described by Lipman and Pearson (1985), Pearson and Lipman (1988) and Pearson (1990). BLAST is another sequence-similarity tool, described in Altschul et al. (1991). A more general and wider-ranging software library for string processing, aimed at a larger range of applications, is also under development (Czumaj et al. 1997).
page_76
Page 76
3.2 String-Matching Problems in Musical Analysis 3.2.1 A Perspective from Computational Science In computational science, string-matching procedures have their own jargon. A string is a (usually finite) sequence of symbols drawn from a finite set of symbols which is called the alphabet. Patterns and texts are both strings. The prefixes of a string are the strings formed from a concatenation of its initial characters, e.g., the prefixes of the string abcd are the empty string, a, ab, abc and abcd itself. The suffixes of a string are the strings formed from a concatenation of its final characters, e.g., the suffixes of the string abcd are the empty string, d, cd, bcd and abcd itself. The text usually corresponds to a score or other musical entity and the pattern could be a motif, in the form of a sequence of notes, provided by a user or some other sequence of items. Some pattern-matching problems do not involve a user-specified pattern, but rather involve discovering a pattern in a text: for instance, analyzing a score to find repeated passages. The mapping of musical entities onto texts depends heavily on the way in which the entity is represented in a computer [see the opening article by Selfridge-Field in this issue]. The subject of musical representation for computers is comprehensively covered (Selfridge-Field 1997). The schemes in that book take account of the multidimensional quality (parameters may involve pitch [both chromatic and diatonic], duration, loudness, timbre, notational and other features) of musical information to some extent (to these should be added the GPIR representation described in Cambouropoulos 1996A). It is assumed that the values of musical parameters to be compared in a matching task can be mapped to symbols in an alphabet as defined above.1 A polyphonic score may be treated either as a collection of melodic strings laid end-to-end, each of which is labelled as explicitly belonging to a certain voice (double-stopping is not considered in this discussion), or as a sequence of collections of notes lacking voice information, each of which occurs simultaneously within a certain time-slot. Musical data that is derived from a score in conventional musical notation can usually be treated as belonging to the first kind, while that derived from MIDI performance may lack explicit
1 Such values, for example, could be MIDI note-numbers (0-127) for a base-12 representation of pitch. For the present purposes, rests are regarded as a special form of note, and are treated no differently; multiple rests contiguous within a voice are assumed to have been concatenated into single rests separating notes.
page_77
Page 77
voice-information and thus will generally need to be treated as the second kind, while note-data originating from raw audio can at best be expected to conform to an error-ridden form of the second kind. 3.2.2 Symbols Used in Algorithmic Descriptions The alphabet is denoted by S, the size of the alphabet is denoted by |S|, and the lengths of the text and pattern strings are usually denoted by n and m.The running time of an algorithm is expressed as a function of one or more of |S|, n and m, and is expressed using the O-notation (Cormen et al. 1990). Roughly, an algorithm with a running time of O(f(n,m, |S|)) is predicted to run in time proportional to f(n,m, |S|) or less. For instance, an algorithm with a running time O(m+n) might require no more than 50(m+n)microseconds of CPU time to execute on a particular computer, for different values of m and n.The constant of proportionality depends upon the computer on which the program is executing, the programming language in which the program is written, the compiler used, and indeed upon the algorithm itselftwo algorithms for the same problem with equal running times in the O-notation (e.g., O(m+n)) may, ceteris paribus, have different running times in real life (e.g., 5(m+n) microseconds versus 50(m+n)microseconds). An algorithm with running time O(n)or O(m+n)is said to be linear since its running time is a linear function of the size of the input. Since in most cases it would take O(m+n)steps simply to read the input, this running time is considered the best possible (neglecting the constant of proportionality). 3.2.3 Pattern-Match Categories The problems discussed here are based on a two-dimensional model: pitch and duration. At some point in the future it will be necessary to consider the same problems in higher dimensions by adding other parameters, such as loudness or timbre. Here we consider two kinds of matchesexact matches and transposed matches. In the first, specific pitch information is matched. In the second, intervallic information is matched. A special case of a transposed match is an octave-displaced match, where the matchability of a sequence of pitches may be obscured by octave displacement.
page_78
Page 78
The algorithms we describe deal principally with exact matches. Transposed and octave-displaced matches may be found in some cases by suitably transforming the pattern and the score, and applying an exact matching algorithm to the transformed pattern and score. 3.2.4 Problem Typologies The first series of problems (Types 1-9) deals with polyphonically structured musical entities in which there is explicit formal voice-leading information. The accompanying diagrams are intended as a generalized representation of a musical situation sufficient to clarify the text. In most cases we provide references to the musical examples given in score notation in the appendix to this article. The examples are intended to show some possible uses of these techniques in real musical applications. The horizontal axis refers to the time domain and the vertical one to that of pitch; musical pitch relates to frequency but it is important to recognize that chromatic and diatonic pitch-standards are not usually interchangeable (See Cambouropoulos 1996A, 1996B, 1997). The second series of problems (Types 10-13) concerns entities in which the voice-leading information is unspecified. 3.3 Exact-Match Algorithms (Category 1) 3.3.1 Type 1. Exact Matching PROBLEM DESCRIPTION: Given a set of sequences of notes (one for each voice), find whether an exact subsequence occurs in one of the voices.
Page 79
This problem can be solved by the Knuth-Morris-Pratt algorithm (1977) in linear time. This algorithm preprocesses the pattern by finding prefixes of the pattern which occur in the pattern itself and by storing the results in a (failure-function) array. The algorithm proceeds by attempting to match the symbols of the pattern with those of the text one by one. When a mismatch occurs, the algorithm attempts to find the pattern in another position of the text using the failure-function value as a yardstick. The algorithm requires O(n+m) operations. In practice, the best performing method is a variant of Boyer-Moore (1977, see also Hume and Sunday 1991). The Boyer-Moore algorithm, an extension of the Knuth-Morris-Pratt algorithm, also makes use of the failure function as well as the ''mismatch" information. In the Boyer-Moore method, when a mismatch occurs, then the algorithm shifts the pattern, either until it matches the mismatched symbol or by the failure-function amount, whichever is greater. On some pathological inputs, the Boyer-Moore algorithm is slow, requiring O(nm +|S| ) operations. However, the expected time complexity of the Boyer-Moore algorithm is linear and this bound matches the times achieved in practice. It is also worth considering the Aho-Corasick automaton (1975), which is implemented in the grep command in UNIX systems (Hume 1988). The Aho-Corasick automaton is an extended version of the suffix-tree data structure (Weiner 1983). This automaton is particularly suitable for the case where several patterns are given and one wants to test whether any one of them occurs in the text. It is worth noting that the running time of the Knuth-Morris-Pratt algorithm does not depend on the size of the alphabet, but the Boyer-Moore algorithm and the Aho-Corasick automaton depend on the alphabet. This alphabet dependence causes O( |S| ) preprocessing operations in the case of the Boyer-Moore algorithm, but since in a musical context the alphabet is small, this cost is negligible in practice. On the other hand, the Aho-Corasick automaton needs O( |S| ) outgoing edges for each state, which makes it inefficient in terms of storage space, but again, once the automaton is constructed, it is very efficient to use. These algorithms can be modified to find transposed matches by matching the intervals between successive notes in the pattern to intervals between successive notes in the score. A survey of algorithms for this problem on parallel computers and systems can be found in Iliopoulos (1993).
page_80
Page 80
3.3.2 Type 2: Matching with Deletions PROBLEM DESCRIPTION: Given a set of sequences of notes (one for each voice), find whether an exact subsequence occurs in one of the voices without preserving the duration times of each pattern.
Figure 2. Searching for a sequence of notes in any one voice without observing onset-times and duration. See Musical Example 2. Approximate string-matching algorithms (Ukkonen 1985; Galil and Giancarlo 1990; Aho 1990) can be used to solve the above problem. Furthermore, software packages like BLAST and FASTA can be used in identifying such sequences of notes. But this problem appears to be simpler than approximate string-matching (e.g., there are no insertions), and therefore new, faster algorithms could be designed specifically for this type of problem, for both exact and transposed matches. 3.3.3 Type 3: Repetition Identification PROBLEM DESCRIPTION: Given a set of sequences of notes (one for each voice), identify non-overlapping repeated patterns in different voices or the same voice.
page_81
Page 81
One of the best methods for solving this type of problem was given by Main and Lorentz (1984, 1985). The MainLorentz method is based on the failure function, and its running time is O(n log n) operations, where n is the length of the text. Crochemore (1981) also gave an O(n log n)algorithm for repetition identification based on "set partitioning" techniques. All the above algorithms can also find transposed repetitions. It is not yet conclusively settled whether O(n log n)is the best running time for this problem. Crochemore (also 1981) showed that the Fibonacci strings can have of the order of n log n repetitions, and argued that since it could take of the order of n log n steps simply to write the output, his method was optimal. However, recent results (Iliopoulos, Moore, and Smyth [1996]; Iliopoulos and Smyth [1995] ) showed that one can report all the repetitions in a Fibonacci string in linear time using special encodings. This raises the question whether it is possible to design an optimal (linear) algorithm for this problem whose output is a linear-sized encoded representation of all the repetitions in the string. 3.3.4 Type 4: Overlapping Repetition Identification PROBLEM DESCRIPTION: Given a set of sequences of notes (one for each voice), identify repeated patterns that may overlap in different voices or the same voice.
Figure 4. Identifying overlapping repeats in a score. See Musical Example 4. For computing possibly overlapping repetitions that occur locally somewhere in the score, Apostolico and Ehrenfeucht (1993) and Iliopoulos and Mouchard (in preparation) provide efficient methods (see also Guibas and Odlyzko, 1981). The Apostolico and Ehrenfeucht method makes use of
Page 82
the suffix-tree data structure. It identifies all text positions where a repetition starts, as these positions are mapped on the same locations of the suffix-tree data structure. The Iliopoulos and Mouchard method is based on set partitioning similar to that in Crochemore (1981). Both methods proceed by identifying the gaps between the repetitions. Both algorithms are non-linear and it remains open whether these methods are optimal. A generalization of this problem occurs when one wants to cut up the score into segments and to test whether each segment can be tiled by repeated identical substrings. For example, the string abcabcabca is tiled by overlapping occurrences of abca occurring at positions 0, 3 and 6.
For this one can use the methods presented in Apostolico, Farach and Iliopoulos (1991), Iliopoulos and Park (1996), Mongeau and Sankoff (1990). The first algorithm is based on the failure function, the second is based on monitoring the gaps between repetitions, and the third is based on the string-border computation.2 All three algorithms are linear. Another variant occurs if the segmentation is not perfect and one wants to test whether a score segment can be tiled by repeated identical substrings, except perhaps the edges of the segment (due to imperfect cut-off). For example, cabcabcabc is tiled by overlapping and complete occurrences of abca occurring at positions 1 and 4, and by incomplete occurrences at position 7 and (non-existent) position -2.
2 A border is the largest prefix of a string that is also its suffix.
Page 83
For this one can use the methods given by Iliopoulos, Moore, and Park (1996) and Berkman, Iliopoulos, and Park (1996). Note that the methods in Apostolico, Farach, and Iliopoulos (1991), Iliopoulos and Park (1996), and Mongeau and Sankoff (1990) cannot be used in this case as all three depend on a perfect segment edge, which here might have been cut off erroneously. The Iliopoulos/Moore/Park method is based on partitioning the set of text positions into subsets of positions where identical occurrences start, and then combinatorially evaluating whether tiling is possible. The Berkman method is a complex one, based on the suffix-tree data structure and computational geometry techniques (so-called outer envelope computation). Both methods require O(n log n)operations, but the Iliopoulos method is simpler and practical. All the above algorithms can also find transposed repetitions. 3.3.5 Type 5: Transformed Matching PROBLEM DESCRIPTION: Given a set of sequences of notes (one for each voice) and a pattern, find whether the pattern occurs in one of the given sequences in either the original form, inversion, retrograde or retrograde inversion.
Figure 5. Identifying transformations: retrograde, inversion, retrograde inversion. This type of problem can be tackled by three consecutive applications of one of the exact pattern-matching methods given for solving problem 1 above. The Aho-Corasick algorithm is the most suitable method for this type of problem, since it can handle all four patternsoriginal, inversion, retrograde and retrograde inversionin one pass.
Page 84
3.3.6 Type 6: Distributed Matching PROBLEM DESCRIPTION: Given a set of sequences of notes (one for each voice) and a pattern, find whether the pattern occurs distributed horizontally, either in one voice or across several other voices.
Figure 6. Pattern distributed across voices. See Musical Examples 1 and 4. There is no specific method in the literature for this type of problem. A naive way of solving the problem is as follows: first design an Aho-Corasick automaton that accepts all the prefixes of the given pattern. Now view the input string as a two-dimensional array. Assume that we have computed and stored all the pattern prefixes that occur in the text up to a certain column. At the next unit of time (column), we either extend these prefixes by one symbol that occurs in that column or the prefix is voided. This leads to an O(nm)operations algorithm, where n is the length of the text and m is the length of the pattern. This algorithm can also be adapted to find transposed matches with a complexity of O(nm)using |S| extra locations. It will be interesting to design a more efficient algorithm for this problem. However, if the value of m is small enough so that an (m-1)-bit value may be stored in one word of the computer (on a 32-bit processor like the Pentium, m should be no more than 31), the above algorithm can be implemented to find exact matches very efficiently, in O(n + |S| ) time, in fact.3 This implementation may not be suitable for very large alphabets. A straightforward implementation of the transposed matching variant of this algorithm seems considerably slower (about 5 times slower for patterns of length 4, and 13 times slower for patterns of length 16). We are investigating heuristics to speed up this algorithm.
3 Our implementation, for example, takes 0.12 seconds of processing time to search in a file of one million characters on a 200 MHz Pentium processor, and about half as long on a 300 MHz Sun UltraSPARC processor.
Page 85
3.3.7 Type 7: Chord Recognition PROBLEM DESCRIPTION: Given a set of sequences of notes (one for each voice) and a pattern, the pattern has to be found with all its elements located in the same time-slot.
Figure 7. Pattern distributed as a chord. This problem appears to be similar to the distributed matching one but it is simpler in the sense that the pattern has to be found in the same time-slot. The exact version of this problem can be solved in time linear in the score length: for each time-slot in the score, if a note of the score is also a chord note, then we mark the chord note. A chord occurs in a time-slot only when all chord notes are marked. This algorithm may also be adapted for transposed matching with some reduction in performance. (This problem will need to be solved, for example, in matching versions of a lute or keyboard piece in which a chord is written in "broken" form in one version only. It has many other applications in harmonic analysis.) 3.3.8 Type 8: Approximate Matching PROBLEM DESCRIPTION: Given a set of sequences of notes (one for each voice) and a pattern, find whether approximate occurrences (accommodating insertion, deletion, and/or replacement of notes) of the pattern occur in one of the sequences. Algorithms for solving this type of problem can be found in Crochemore and Rytter (1996), Aho (1990), and Ukkonen (1985). In fact, in addition to considering the approximate string-matching problem, we would like to consider all of the above problems in the presence of errors, such as the identification of substrings that are duplicated to within a certain tolerance k.The tolerance is normally measured using distance metrics such as Hamming distance and edit distance.
page_86
Page 86
The Hamming distance of two strings u and v is defined to be the number of substitutions necessary to get u from v (u and v have the same length). The edit distance is the (weighted) number of edit operations (taken from a set of permissible edit operations) needed to transform u to v.Edit-distance has been used by Mongeau and Sankoff to define a notion of musical similarity. Given two strings A, B, we say that B is k-approximate to A if the distance is k.Given a string x and an integer k, a substring w of x is called a k-approximate period of x, if x is partitioned into disjoint blocks such that every block has distance at most k from w. Given a string x and an integer k, a substring w of x is called a k-approximate cover of x, if x is divided into disjoint or overlapping blocks such that every block has distance at most k from w.The most efficient algorithm for computing non-overlapping repetitions is that by Schmidt (1994). Computing overlapping repeats in the presence of errors is an open problem. 3.3.9 Type 9: Evolution Detection PROBLEM DESCRIPTION: Given a sequence of notes and a pattern u, find whether there exists a sequence u1 = u, u2, . . . , uk in the score such that ui+1begins to the right of ui,and uiand ui+1have an edit-distance of one.4 There is no specific algorithm for this problem. Landau and Vishkin (1986) gave a simple algorithm for the "1difference" problem (find all substrings of the text which have an edit-distance of 1 from the pattern) which runs in O(n log n)time. A naive way to solve this problem is to repeatedly apply the Landau/Vishkin algorithm to the text using ui as the pattern, for i = 1, 2, . . . ,giving an approach with O(n2 log n) worst-case running time.
Figure 8. Local approximations in search pattern trace gradual change (analogous to "evolution") in a motif. See Musical Example 6.
4 i.e., one can transform ui to ui+1by one insertion, deletion or a replacement of a note.
page_87
Page 87
A variant of this problem is to find whether the sequence u1, . . . , uk exists in the case that u is not given. Another variant of the same problem is to be considered over a set of sequences of notes (one for each voice), and the ui'sare distributed over the voices. Another variant is where the edit-distance between uiand ui+1is allowed to be some fixed number larger than one. It is not known whether the Landau/Vishkin algorithm can be generalized to efficiently find occurrences of the pattern at a distance greater than 1, so even the naive solution described above does not work. Further investigation as to whether methods such as Landau and Vishkin (1988) and Galil and Park (1990) can be adapted to solve the above problems is needed. The three remaining problem types to be discussed concern musical entities that are polyphonically unstructured with no explicit voice leading. 3.4 Inexact-Match Algorithms (Category 2) 3.4.1 Type 10: Unstructured Exact Matching PROBLEM DESCRIPTION: Given a sequence of notes (voices unspecified) and a pattern, find whether the pattern occurs in the mixed set of sequences. A variant of this problem is to identify the pattern spread over time with several notes intervening.
Figure 9. Unstructured exact matching. See Musical Example 4. The O(mn)-timealgorithm for Problem-Type 6 can solve this problem as well. However, it will be interesting to design new methods for this type of problem and in particular to examine the relationship of this problem with that of distributed matching (Type 6), settling the question whether the lack of structure (voice information) affects the complexity of the problem.
page_88
Page 88
3.4.2 Type 11: Unstructured Repetitions PROBLEM DESCRIPTION: Given a sequence of notes (voices unspecified), identify repeated patterns (i.e., significant motifs; see Cambouropoulos and Smaill 1995) that may or may not overlap.
Figure 10. Unstructured repetitions. See Musical Examples 4, 5. Algorithms for two-dimensional string matching (see Crochemore and Rytter 1996) may be useful for this problem. If overlaps are allowed, then methods for identifying repetitions can be found in Crochemore, Iliopoulos, and Korda (1997) and in Iliopoulos and Korda (1996A and 1996B). 3.4.3 Type 12: Unstructured Approximate Matching PROBLEM DESCRIPTION: Consider the above problems in the presence of errors and find approximate matchings and repetitions.
Page 89
Figure 11. Unstructured approximate matching. Very little work has been done in this direction (see Crochemore and Rytter 1996). New algorithms for handling errors in two dimensions need to be designed. 3.5 Musical Examples The following musical examples illustrate some of the problem types discussed above and identify the most appropriate algorithmic approaches to melodic matching. In Example 1, the often-cited ''stereophonic" theme of the fourth movement of Tchaikovsky's Symphony No. 6 is shown here as a likely search query which could be encoded as a string la. Algorithms of Type 1 will find a match in the first violin part at Measure 104 (1b). They will not, however, find the theme at the beginning of the movement (1c), since the melodic string is distributed between first and second violins as indicated. To match this, algorithms of Type 6 are required.
Page 90
a) a reasonable search-query:
b) measure 104:
c) beginning of movement:
Example 1. Tchaikovsky: Symphony 6, fourth movement: (a) the theme as perceived, (b) the theme as rendered in Measure 104, and (c) the "distributed" version of the theme at the opening of the movement.
Page 91
In Example 2, the opening of the song (2a) is based on a motif (2b) which appears in the piano part (right hand, with partial echoes in the left hand) in imitation with the voice, but with different onset-times and durations. a) Vocal score:
b) opening motif:
Example 2. Brahms: Deutsche Volkslieder, No. 15 ("Schwesterlein"). In Example 3, Bach's chorale prelude "Kyrie, Gott Vater in Ewigkeit," a fugal subject of 13 pitches, based on and opening with the first three notes of the chorale melody, appears in each of the voices. This non-overlapping repeated pattern will be detected by algorithms of Type 3. Note the partial match of seven pitches only at Measure 5, and the inverted form of the pattern at Measure 8. "Polyphonic" transcriptions of a lute piece may be searched for a given pattern using algorithms of Type 1, for non-overlapping repetitions using algorithms of Type 3, or for overlapping repetitions with algorithms of Type 4. The apparently trivial, but perhaps structurally significant two-note pattern e'-d' in Example 4, for example, can be found as non-overlapping repetitions (Type 3).
Page 92
Example 3. J. S. Bach: organ chorale prelude "Kyrie, Gott Vater in Ewigkeit," BWV 669, from Clavierübung III. The descending tetrachord, on the other hand, appears in several overlapping repetitions (Type 4), but the number of such occurrences depends on the nature of the transcription. Since voice leading is largely absent from lute tablature notation, transcriptions may thus be unsafe for such analysis, or at least a different approach is needed, such as using algorithms of Type 6. Matching a pattern given a priori in the "raw" pitch-data derived from tablature requires algorithms of Type 10. The detection of repetitions in such data requires algorithms of Type 11 (this is also true of matches in pitch derived from polyphonic audio input). In Example 5a, we first see the separation on rhythmic and pitch data. Then, in the arrangements A and B, we see the melodic lines of Gaultier's piece revised (with important modifications of pitch and rhythmic strings) in two arrangementsone for spinet and one for violin or flute. In both cases the treble pitches have been transposed up an octave.
Page 93
Example 4. Denis Gaultier: the allemande "Le Tombeau de L'Enclos" for lute: computer facsimile of original tablature, pitch and duration data, and excerpts from three divergent transcriptions. Ornament signs have been omitted.
page_94
Page 94
a) Scores:
b) Edit operations on melodic strings extracted from scores:
Example 5. Denis Gaultier: "Le Tombeau de L'Enclos." Here (a) the pitch and duration data derived from the tablature shown in Example 4 are given first. Then two arrangements (A and B) contemporary with the tablature are shown. Next (b) the melodies derived separately from the treble and bass of A and B are shown, with markers to indicate edit operations (insertions, deletions, replacements, and temporal displacements).
page_95
Page 95
c) All three versions reduced to note-data form: Lute
(Vertical lines between notes link those occurring in the same time-slot.) Example 5, continued. In 5c, time-slice information is added to the recoupled treble and bass lines of the two arrangements shown in 5a. In Example 5c, timing data is introduced in the representations of A and B and co-incident pitches are indicated by vertical lines.
Page 96
In Example 6, the five successive entries (A-E) of one ricercare for lute are audibly related. They can be treated as stages in the evolution of a diatonic motif by a series of alterations of edit-distance 2 (where the deletion, insertion, replacement, and time-displacement operations each have a weight of 1). This requires algorithms of Type 9. a) Selected entries (in their original sequence):
b) 'Evolution' of diatonic-pitch pattern
Page 97
References Aho, A. V., "Pattern Matching in Strings," in Handbook of Theoretical Computer Science, Volume A, Algorithms and Complexity, Elsevier, 1990. Altschul, S., W. Gish, W. Miller, E. Myers, and D. Lipman, "A Basic Local Alignment Tool," Journal of Molecular Biology 219 (1991), 555-565. Aho, A. V., and M. J. Corasick, "Efficient String Matching," Communications of the ACM 18 (1975), 333-340. Apostolico, A., and A. Ehrenfeucht, "Efficient Detection of Quasiperiodicities in Strings," Theoretical Computer Science 119 (1993), 247-265. Apostolico, A., M. Farach, and Costas S. Iliopoulos, "Optimal Superprimitivity Testing for Strings," Information Processing Letters 39 (1991), 17-20. Apostolico, A., and Z. Galil, eds. Combinatorial Algorithms on Words, Springer-Verlag, NATO ASI Series, 1985. Apostolico, A., and Z. Galil, eds. Pattern Matching Algorithms, Oxford University Press, 1997. Berkman, O., Costas S. Iliopoulos, and K. Park, "String Covering," Information and Computation 123 (1996), 127137. Boyer, R., and J. Moore, "A Fast String Searching Algorithm," Communications of the ACM 20 (1977), 262-272. Cambouropoulos, E., "A General Pitch Interval Representation: Theory and Applications," Journal of New Music Research 25 (1996A), 231-251. Cambouropoulos, E., "A Formal Theory for the Discovery of Local Boundaries in a Melodic Surface," in Proceedings of the III Journées d'Informatique Musicale, Caen, France, 1996B. Cambouropoulos, E., "The Role of Similarity in Categorisation: Music as a Case Study." In Proceedings of the Third Triennial Conference of the European Society for the Cognitive Sciences of Music (ESCOM), Uppsala, 1997. Cambouropoulos, E., and A. Smaill, "A Computational Theory for the Discovery of Parallel Melodic Passages," in Proceedings of the XI Colloquio di Informatica Musicale, Bologna, Italy, 1995. Cormen, T. H., C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms, MIT Press, 1990.
Page 98
Crawford, Tim, "Lute Tablature and Concordance Recognition: Special Problems Needing Special Solutions," read at the 15th Congress of the International Musicological Society, Madrid, 1992. This paper is available on the WWW using the URL http://www.kcl.ac.uk/kis/schools/hums/music/ttc/madrid.html Crochemore, M., "An Optimal Algorithm for Computing the Repetitions in a Word," Information Processing Letters 12 (1981), 244-250. Crochemore, M., Costas S. Iliopoulos, and M. Korda, "An Optimal Algorithm for Prefix String Matching and Covering," to appear in Algorithmica, 1997. Crochemore, M., and W. Rytter. Text Algorithmics, Oxford Press, 1996. Czumaj, A., P. Ferragina, L. Gasieniec, S. Muthukrishnan and J. Traeff, "The Architecture of a Software Library for String Processing," to be presented at Workshop on Algorithm Engineering, Venice, September 1997. Galil, Z., and R. Giancarlo, "Efficient Algorithms for Molecular Biology," in Sequences: Combinatorics, Compression, Security, Transmission, Springer-Verlag, 1990, pp. 59-74. Guibas, L., and A. Odlyzko, "String Overlaps, Pattern Matching, and Non-transitive Games," J. Combinatorial Theory (Series A)23 (1981), 183-208. Galil, G., and K. Park, "An Improved Algorithm for Approximate String Matching," SIAM Journal on Computing, 19 (1990), 989-999. Hume, A., "A Tale of Two Greps," Software Practice & Experience 18 (1988), 1063-1072. Hume, A., and D. Sunday, "Fast String Searching," Software Practice & Experience 21 (1991), 1221-1248. Iliopoulos, Costas S., "Parallel Algorithms for String Pattern Matching," in A. Gibbons and P. Spirakis, eds., Lectures on Parallel Computation, Volume 4, Cambridge University Press, 1993, 109-121. Iliopoulos, Costas S., and M. Korda, "Parallel Two-dimensional Covering," in Proceedings of the Australasian Workshop on Combinatorial Algorithms (AWOCA '96), University of Sydney, 1996A, 62-75. Iliopoulos, Costas S., and M. Korda, "Optimal Parallel Superprimitivity Testing on Square Arrays," Parallel Processing Letters 6 (1996B), 299-308. Iliopoulos, Costas S., and L. Mouchard, "Fast Local Covers," in preparation.
Page 99
Iliopoulos, Costas S., and K. Park, ''An O(n log n)PRAM Algorithm for Computing All Seeds of a String," Theoretical Computer Science 164 (1996), 299-310. Iliopoulos, Costas S., D. W. G. Moore, and K. Park, "Covering a String," Algorithmica 16 (1996), 288-297. Iliopoulos, Costas S., D. W. G. Moore, and W. F. Smyth, "A Linear Algorithm for Computing the Squares of a Fibonacci String," in P. Eades and M. Moule, eds., Proceedings CATS '96, "Computing: Australasian Theory Symposium,"University of Melbourne, 1996, 55-63. Iliopoulos, Costas S., and W. F. Smyth, "A Fast Average Case Algorithm for Lyndon Decomposition," International Journal for Computer Mathematics 57 (1995), 15-31. Iliopoulos, Costas S., and W.F. Smyth, "An On-line Algorithm for Computing a Minimal Set of k-covers of a String," submitted. Knuth, D. E., J. Morris, and V. R. Pratt, "Fast Pattern Matching in Strings," SIAM Journal on Computing 6 (1977), 323-350. Lipman, D., and W. R. Pearson, "Rapid and Sensitive Protein Similarity Search," Science 227 (1985), 1435-1441. Landau, G. M., and U. Vishkin, "Introducing Efficient Parallelism into Approximate String Matching and a New Serial Algorithm," in Proc. Annual ACM Symposium on Theory of Computing, ACM Press, 1986, 220-230. Landau, G. M., and U. Vishkin, "Fast String Matching with k Differences," Journal of Computer and Systems Sciences 37 (1988), 63-78. Main, G., and R. Lorentz, "An O(n log n)Algorithm for Finding All Repetitions in a String," Journal of Algorithms 5 (1984), 422-432. Main, G., and R. Lorentz, "Linear Time Recognition of Square Free Strings," in A. Apostolico and Z. Galil, eds., Combinatorial Algorithms on Words, Springer-Verlag, 1985, 271-278. Mongeau, Marcel, and David Sankoff, "Comparison of Musical Sequences," Computers and the Humanities 24 (1990), 161-175. Pearson, W., "Rapid and Sensitive Sequence Comparison with FASTP and FASTA," in Methods in Enzymology, Academic Press, 1990, 63-98.
page_100
Page 100
Pearson, W., and D. Lipman, "Improved Tools for Biological Sequence Comparison," Proceedings of National Academy of Sciences of the USA 85 (1988), 2444-2446. Schmidt, J. P., "All Shortest Paths in Weighted Grid Graphs and its Application to Finding All Approximate Repeats in Strings," in Proc. Fifth Symposium on Combinatorial Pattern Matching, Springer-Verlag Lecture Notes in Computer Science, 1994. Selfridge-Field, Eleanor, ed. Beyond MIDI: The Handbook of Musical Codes, MIT Press, 1997. Setubal, J., and J. Meidanis. Introduction to Computational Molecular Biology, PWS Publishing, 1997. Ukkonen, E., "Algorithms for Approximate String Matching," Information and Control 64 (1985), 100-118. Weiner, P., "Linear Pattern Matching Automaton," in Proc. of the 14th IEEE Symposium on Switching and Automata Theory, 1983, 1-11. Sources for Musical Examples Buch, 1990 D. Buch, ed., Denis Gaultier, La rhétorique des dieux (A-R Editions: Madison, 1990). Cavalcanti Lutebook Brussels, Belgium, Bibliothèque Royale (B-Br), MS II 275. Darmstadt 18 Darmstadt, Germany, Stadtbibliothek, MS 18. Gaultier, 1670 Denis Gaultier, Pièces de luth (Paris, 1670; repr. Minkoff: Geneva, 1978). Perrine, 1680 Pièces de luth en musique . . . par le Sr. Perrine (Paris, 1680; repr. Minkoff: Geneva, 1982). Rollin, 1996 M. Rollin and F.-P. Goy, eds., Oeuvres de Denis Gaultier (CNRS: Paris, 1996). Suittes faciles Suittes faciles (Amsterdam: Roger, 1701).
Page 101
4 Sequence-Based Melodic Comparison: A Dynamic-Programming Approach Lloyd A. Smith, Rodger J. McNab, and Ian H. Witten Department of Computer Science University of Waikato Private Bag 3105 Hamilton, New Zealand {las, rjmcnab, ihw}@cs.waikato.ac.nz Abstract Because of the importance of melodic comparison in musical analysis, several methods have been developed for comparing melodies. Most of these methods have been targeted at particular styles of analysis, and thus cannot be used for carrying out the melodic comparison necessary for other kinds of analysis. The observation that music is a sequence of symbols, however, leads to the use of general-purpose sequence-comparison algorithms for melodic comparison. This paper describes one such algorithmdynamic programmingand discusses experiments in which dynamic programming is used to match input melodic phrases against a database of 9400 folk songs in order to retrieve closely matching tunes from the database.
Page 102
4.1 Previous Research in the Field Melodic comparison is a fundamental operation aimed at determining whether two melodies are, in some sense, similar. It has important practical applications whenever any kind of search for melodic patterns is performed. For that reason, several algorithms have been developed for melodic comparison. The earliest of these were focused on a particular application. Stech (1981), for example, developed a method for micro-analysis of melodies. His system searched for similarities within a song using a combination of exact pitchand rhythm-matching modes. For pitch, the modes were (1) original sequence, (2) inversion, (3) retrograde, and (4) retrograde inversion. For rhythm, they were (1) original and (2) retrograde. Dillon and Hunter (1982) developed a more flexible system using Boolean operators to search for combinations of pitches. Their system required that melodies be encoded in five ways: (1) by measured pitches, (2) by unmeasured pitches, (3) by measured stressed pitches, (4) by unmeasured stressed pitches, and (5) by pitches with phrase information. Using this representation, a user could specify, for example, that a melody's third stressed pitch must be the dominant note of the scale. In an attempt to gain still more flexibility, Logrippo and Stepien (1986) suggested using cluster analysis to determine the similarity of melodies. The difficulty they faced was in specifying a distance measure that makes sense with music. Their metric was based on the percentage of occurrence of notes within melodies, thus ignoring the sequential ordering of notes. Perhaps the most general method of musical pattern matching was developed by Cope (1991). Cope's Experiments in Musical Intelligence (EMI)system looks for similar patterns across works of a given composer, in an attempt to find commonly occurring musical signatures relevant to that
Page 103
composer. These signatures are used by EMI to compose music in the style of the composer. [See Cope's article, "Signatures and Earmarks," in this issue.] All the above methods were developed specifically for comparing music and were tailored to the user's particular goal in performing the comparison. The observation, however, that a musical score is a sequence of symbols (or, in computer science terminology, a string)suggests that general sequence-comparison algorithms, developed in the fields of information systems and pattern recognition, might be productively applied to melodies and melodic patterns. This has the advantage of divorcing the technical aspects of how the comparison is performed from the features used to determine melodic similarity, thus freeing the researcher to focus on selecting an appropriate feature set and musical representation to use when conducting the comparison. While many sequence-comparison algorithms have been developed, the most general is a technique known as dynamic programming, used, for example, in matching biological DNA sequences (Goad and Kanehisa, 1982). Dynamic programming has been independently adapted for musical applications by Mongeau and Sankoff (1990) and by Orpen and Huron (1992). This paper describes the use of dynamic programming in matching melodies; the particular approach followed is based on Mongeau and Sankoff (McNab et al., 1996). 4.2 Sequence-Comparison Using Dynamic Programming Dynamic programming is based on the concept of edit distance;when matching sequence a against sequence b, the edit distance is the cost of changing sequence a (the source string)into sequence b (the target string).The sequences may consist of virtually any type of symbol including alphabetic characters (input to spelling checkers), numeric feature vectors (used in pattern recognition), DNA symbols (for gene identification), or musical pitches and rhythms. The cost of changing the source string into the target is calculated in terms of edit operators.The standard edit operators used in dynamic programming are replacement, insertion, and deletion.To change the sequence ABCDE into ABDEF, for example, requires leaving the letters A, B, D and E (i.e., replacing them with themselves), deleting C and inserting F.
Page 104
Each of these operations incurs a cost, or weight.If the cost of replacing a letter is the "alphabetic distance" (where B - A = 1, C - A = 2, and so forth, and the cost of replacing a letter with itself is 0), and the cost of inserting or deleting is 1, then the cost of changing ABCDE into ABDEF is 2or the distance between ABCDE and ABDEF is 2. If the cost of insertion is equal to the cost of deletion, and the cost of replacement is the same regardless of which is source and which is target (i.e., B - A = A B), then the algorithm is symmetric, meaning that it does not matter which is the source and which is the targetchanging ABCDE into ABDEF incurs the same cost as changing ABDEF into ABCDE. In the above example, other sequences of operations are possible. For example, we could replace C with D, replace D with E, and replace E with F. This has the desired effectABCDE is transformed into ABDEFbut now the total cost is 3 instead of 2. In the dynamic-programming paradigm, it is assumed that the goal is to find an optimal way of transforming the source into the target, so that the lowest cost is returned. Furthermore, a dynamic programming algorithm, in addition to returning the distance between two sequences, also returns an optimal alignment between the two sequencesit shows how the sequences match in order to calculate the best score. Sometimes more than one alignment may yield the same score. In that case, the algorithm has no way of knowing which alignment is "correct." All alignments returning the same score are considered equally good. 4.3 Algorithms for Melodic Comparison Equation 1 expresses the dynamic programming algorithm for matching sequence a against sequence b (Sankoff and Kruskal, 1983). dij = min[di-1,j + w(ai, Ø), di-1,j-1 + w(ai, bj), di, j-1 + w(Ø, bj)] (1) where
1 £ i £ length of sequence a 1 £ j £ length of sequence b w(ai, bj) is the cost (or weight) of substituting element ai with bj w(ai, Ø) is the cost of inserting ai w(Ø, bj) is the cost of deleting bj dij is the accumulated distance of the best alignment ending with ai and bj
Page 105
Initial conditions are: d00 = 0 (2) di0 = di-1,0 + w(ai, Ø), i ³ 1 (3) d0j = d0,j-1 + w(Ø, bj), j ³ 1 (4) While there are many ways of implementing a dynamic programming algorithm, the process may be viewed in three stages. The first stage generates a local score matrix which holds the distances between each element of the source string and each element of the target; in equation 1, the local cell matrix provides the values for references to w (ai, bj). The second stage uses the local score matrix to generate a global score matrix which holds the cost of a complete match. In equation 1, d refers to cells in the global score matrix; dij is the cell currently being computed, while di-1,j,di-1,j-1,and di,j-1 are previously computed cells. The third stage traces back from the final score to determine the alignment that generated that score. The process may be illustrated by stepping through a match of the two incipits, or phrase beginnings, shown in Figure 1 (after Leppig, 1987).
Figure 1. A sample pair of melodies for comparison. Figure 2 illustrates the local score matrix associated with the match. Innsbruck ich muss ich lassen, the source string, is represented going from top to bottom on the left, and Nun ruhen alle Wälder, the target, is from left to right across the top. In the example, pitch is represented by MIDI pitch number (Middle C = 60) and rhythm by duration in terms of sixteenth-note units, so the first note of Nun ruhen is 69-4: A4 is MIDI 69, and a quarter note has a duration of four sixteenth notes. In this example, distances are calculated by taking the absolute value of the difference in MIDI note number, then adding half the absolute value of the difference in duration (this particular choice of relative pitch vs. rhythm weight was chosen simply because it yields an unambiguous alignment for these two melodic phrases). Each cell of the matrix holds the distance
Page 106
between one note in Innsbruck and one note in Nun ruhenso the top leftmost cell holds the distance between the first note of each melody. Each row holds the distances between one note of Innsbruck and all notes of Nun ruhen.The top row, for example, holds the distances between the first note of Innsbruck and each note of Nun ruhen. Each column holds the distances between one note of Nun ruhen and all notes of Innsbruck.
Figure 2 Local score matrix. Each cell in the global score matrix represents the best possible score matching the two strings up to that point. It is filled by sweeping through the local distance matrix, moving to the right and down (toward the end of both melodies) and calculating the distance for each cell based on cells immediately to the left of, above, or diagonal (left and above) to the target cell. The easiest way to meet this requirement is to proceed row by row, left to right from top to bottom. Each cell is calculated, following equation 1, by adding the cost of an insertion to the cell to the left, adding the cost of a deletion to the cell above, adding the local distance in the target cell to the cell diagonally left and above, then taking the minimum of those three figures. Figure 3 shows the global score matrix for the match of the two incipits from Figure 1. Note that there is one more row and one more column than shown by the local score matrix. The extra row and column reflect the cost of inserting or deleting all notes of one melody, relative to the other. The bold numbers in Figure 3 show the trace of best path through the global score matrixthis path gives the alignment between the two melodies, shown in Figure 4. Because initial good paths through the matrix may lead to poorly scoring regions, the trace is done looking back from the bottom right corner, where
Page 107
the final score occurs, to the top left, where the match began. If the alignment between the two melodies is not needed, then the third stagetracing the alignmentmay be skipped.
Figure 3 Global score matrix. It is often useful, however, for the researcher to view the alignment to make sure that the match "makes sense"even when the alignment is not needed for the application. An alignment that does not make sense may be generated by an inappropriate setting of replacement or insertion/deletion costs, or by an inappropriate data representation; these issues are discussed below.
Figure 4. Alignment generated by Figure 3 score matrix.
page_108
Page 108
4.4 Additional Operations for Comparing Music The example discussed above applies standard dynamic programming to matching melodic sequences. It is possible, however, to introduce additional edit operators for a specific application. Mongeau and Sankoff (1990) define fragmentation and consolidation for matching melodies. Fragmentation allows a note to match notes of lesser duration that combine to cover the same time span; a quarter note, for example, may match two eighth notes or a dotted eighth followed by a sixteenth. Consolidation is the inverse of fragmentationfour sixteenth notes, for example, may consolidate to match a quarter note. Orpen and Huron (1992) make a distinction based on whether notes are repeated in one string or the other, or both. They define two versions of insertion and deletionone for repeated notes and one for nonrepeated notesand four versions of replacement, based on whether the replaced note repeats in neither string, both strings, in the source or in the target string. This allows the deletion of a repeated note to incur a cost of 0. 4.5 Effect of Music Representation and Match Parameters While dynamic programming provides a powerful and flexible tool for comparing melodic patterns, the comparison must be guided, and the result interpreted, on the basis of sound musical judgement. In particular, the researcher must be aware of the implications of both edit operator costs and of the musical data representation. Figure 5 shows one of several equal-scoring alignments of Innsbruck with Nun ruhen if the cost of insertions and deletions is set to 2 and pitch and rhythm differences are weighted equally. In this case, only four notes "match," while all others are deletions (deletions from Innsbruck or insertions into Nun ruhen;either perspective is valid). In setting the parameters, the researcher must consider what the musical question is, and what feature(s) form the basis for similarity measurement. In addition, it should be kept in mind that the value of the score returned by the comparison is meaningless except as it relates to ranking different comparisons; if the cost of all operations is doubled, the score of the corresponding match will doublebut that does not make the two melodies less similar.
Page 109
Mongeau and Sankoff (1990) base their replacement costs on musical consonancethe cost of replacing one note by another at the interval of a fifth, for example, is lower than the cost of replacing the same note at an interval of a second. This assignment of costs indicates the focus of their test applicationidentifying theme variations. Mongeau and Sankoff arrived at the particular values they used for their parameters (costs of replacement, insertion and deletion, as well as relative weighting of pitch and rhythm) through an empirical investigation of theme variation clustering in Mozart's well-known variations on a theme, K. 300e. While Mongeau and Sankoff's algorithm, like most based on dynamic programming, is symmetric, Orpen and Huron (1992) point out that a nonsymmetric algorithm may be more appropriate for matching a melody with an embellished variant. Attention must also be given to the musical representation. In the above examples (illustrated by Figures 2-5), rhythm was represented by duration in sixteenth-note units. A representation in eighth-note units would be equally valid, but the difference between a half note and a quarter note would now be 2 instead of 4. Other representations may be equally valid, but harder to manage in terms of relative differences. The Humdrum **kern format (Huron, 1994) represents rhythm by the reciprocal of note valuea whole note is 1, half note 2, quarter note 4, and so forth. This method has the advantage of easily handling triplets or other time values that "defeat the meter," but the researcher must carefully consider how to carry out a rhythmic match. Finally, music researchers may encounter matching algorithms in ''canned" form. The Humdrum music analysis suite (Huron, 1994), for example, provides a dynamic programming comparison algorithm in the form of simil (Orpen and Huron, 1992). Such utilities can empower the user to ask complex questions concerning musical similarity, but it is the user's responsibility to understand the assumptions made by the programmer and what control, if any, he or she has over parameters and/or data representation.
Page 110
4.6 A Sample Application: Retrieving Tunes from Folk-Song Databases In order to get a feel for the issues involved in sequence-based melody comparison, we have implemented a comparison program and used it to perform an extensive simulation based on the task of retrieving tunes from a database of folk songs. The database incorporated two folk song corpora: the Digital Tradition (Greenhaus, 1994) and the Essen database (Schaffrath, 1992). [See also David Bainbridge's article on MELDEX in this issue.] At the time we downloaded it, the Digital Tradition contained approximately 1700 tunes, most of North American origin. The Essen database contains approximately 8300 melodies, about 6000 of which are German folk songs and 2200 are Chinese; most of the remainder are Irish. Nearly 400 duplicatesthe same tune with a different name and, often, in a different keywere removed from the Essen database, and 14 duplicates were removed from the Digital Tradition. Because our music display program does not currently display tuplet note values, the approximately 200 songs containing tuplets were also removed. Combining the two sources and eliminating the three songs common to both gave us a database of 9400 melodies. There are just over half a million notes in the database, with the average length of a melody being 56.8 notes. 4.6.1 Experimental Method The experiments focused on the number of notes required to identify a melody uniquely under various matching conditions. The dimensions of matching include ·
whether an intervallic or merely a directional (updownsame) contour is used as the pitch representation;
·
whether comparison is based on pitch only or is inclusive of rhythm;
· whether matching is exact or approximate, with the possibility of note deletion, insertion or substitution; and
·
whether note fragmentation and consolidation are allowed.
Page 111
Based on these dimensions, we have examined exact matching of: ·
interval and rhythm;
·
contour and rhythm;
·
interval regardless of rhythm;
·
contour regardless of rhythm;
and approximate matching of: ·
interval and rhythm;
·
contour and rhythm.
4.6.2 User Input For each matching scheme we imagine a user singing the beginning of a melody, comprising a certain number of notes, and asking for it to be identified in the database. For this application, the most appropriate distance measure, when musical interval is used as the pitch representation, is the absolute distance in semitones, and that is the metric used for matching pitch. Rhythm is scored using the difference in duration between notes. If the tune sung by the user is in the database, how many other melodies that begin this way might be expected? We examined this question by randomly selecting 1000 songs from the database, then matching patterns ranging from 5 to 20 notes against the entire database. This experiment was carried out both for matching incipits and for matching sequences of notes embedded within songs; in order to match embedded sequences of notes, it is necessary to modify the dynamic programming starting condition so that deletions preceding the match of the pattern receive a score of 0. The only change is to equation 4 (Galil and Park, 1990), which is replaced by: d0j = 0, j ³ 1 (5) For each sequence of notes, we counted the average number cn of "collisions"that is, other melodies that match. Fragmentation and consolidation are relevant only when rhythm is used in the match; in these experiments, fragmentation and consolidation were allowed for approximate matching but not for exact ones.
page_112
Page 112
Figure 6. Number of collisions for different lengths of input sequence when matching incipits. From left to right: · exact interval and rhythm; · exact contour and rhythm; · exact interval; · exact contour; · approximate interval and rhythm; · approximate contour and rhythm. 4.6.3 Results of Retrieval Experiments Figure 6 shows the expected number of collisions plotted against n, for each of the matching regimes when queries are matched at the beginnings of songs. The number of notes required to reduce the collisions to any given level increases monotonically as the matching criteria weaken. All exact-matching schemes require fewer notes for a given level of identification than all approximate-matching methods. Within each group the number of notes decreases as more information is used: if rhythm is included, and if interval is used instead of contour. For example, for exact matching with rhythm included, if contour is used instead of interval two more notes are needed to reduce the average number of items retrieved to one. The contribution of rhythm is also illustrated at the top of Figure 6, which shows that, if rhythm is included, the first note disqualifies a large number of songs. It is interesting that melodic contour with rhythm is a more powerful discriminator than interval without rhythm; removing rhythmic information increases the number of notes needed for unique identification by about three if interval is used and about six if contour is used. A similar picture emerges for approximate matching except that the note sequences required are considerably longer.
Page 113
An important consideration is how the sequence lengths required for retrieval scale with the size of the database. Figure 7 shows the results, averaged over 1000 runs, obtained by testing smaller databases extracted at random from the collection. The number of notes required for retrieval seems to scale logarithmically with database size.
Figure 7. Number of notes for unique tune retrieval in databases of different sizes. Lines correspond, from bottom to top, to the matching regimes listed in Figure 3. Figure 8 shows the expected number of collisions for matching embedded note patterns. As expected, all matching methods require more notes than searches conducted on the beginnings of songs. In general, an additional three to five notes are needed to avoid collisions, with approximate matching on contour now requiring, on average, over 20 notes to uniquely identify a given song.
Figure 8. Number of collisions for different lengths of input sequence when matching embedded patterns. Lines correspond, from left to right, to those in Figure 3.
Page 114
4.6.4 Timing Considerations The computational complexity of dynamic programming is O(nxm) for an n-note search pattern matched against a database of m notes, meaning that the time taken for performing the comparison increases multiplicatively with the size of either the database or search pattern. For comparing single pairs of melodies, or for searching small databases, dynamic programming is fast enough for interactive applications. For the folk song database used in these experiments, running on a Macintosh PowerPC 8500 with a clock speed of 120 Mhz, a search for an embedded theme of 20 notes takes 23.7 seconds if fragmentation and consolidation are allowed, and 16.7 seconds if those operations are not allowed. While this may be reasonable performance, much larger databasesa million folk songs, for example, or a thousand symphoniesmight take an unacceptably long time to search. There are approximate string matching algorithms that have the potential to speed up approximate searches, and we are currently investigating one, based on the UNIX agrep text searching utility (Wu and Manber, 1992), that represents the state of the match, at any point, by a bit vector stored as a binary number. This method is not as flexible as dynamic programming; it allows only a predetermined number of errors (insertions, deletions, replacements) and all such errors are weighted equallya semitone difference is scored the same as any other interval. Furthermore, state matching does not return the alignment between the two melodies. It does however possess the benefit of having a very low execution time which is constant, for a given database, for all search patterns up to the length of a machine word in bits. Figure 9 shows timing results for dynamic programming, with and without fragmentation and consolidation, and state matching for search patterns of up to 20 notes on the folk song database. State matching takes half a second to match all the search patterns, making it about 47 times faster, for a 20-note search pattern, than dynamic programming with fragmentation and consolidation.
Page 115
An alternative way of speeding retrieval based on embedded patterns is to automatically identify themes using an offline matching method, storing those themes in a separate collection indexed to the original database. Because themes are relatively short (in comparison to an entire composition), the theme database can be searched much more quickly. In addition, it may be unnecessary to search for embedded patterns in a database containing only themes. 4.7 Conclusion The focus of this paper is on the use of general sequence comparison techniques for comparing melodies. Such methods have a number of applications. We have already discussed the use of sequence-based melody comparison for retrieving tunes from a database of 9400 folk melodies. In other work, we have used dynamic programming melody comparison as part of a sight-singing tutorial program (Smith and McNab, 1996). Other applications include identifying theme variations (Mongeau and Sankoff, 1990), studying the use of motifs by a given composer or group of composers, analysing and tracing folk song variants (Dillon and Hunter, 1982), performing copyright searches, or any of a number of other musicological studiesindeed, the list is endless. In all cases, however, the researcher must remember that the analysis tool is blind and must be guided by sound musical knowledge and judgement. The primary method discussed heredynamic programmingoperates by finding the cost required to transform one string, the source, into another, the target. The cost is defined in terms of edit operations, the standard ones being insertion of a note, deletion of a note, or replacement of one note by another. Each of these operations carries its own cost, or weight, and other operations, specific to the application, may be defined, such as fragmentation and consolidation (Mongeau and Sankoff, 1990). In order to apply these operations, the algorithm first creates a local score matrix, which reflects the distance of all notes of the source melody from all notes of the target. The local score matrix is then used to generate the global score matrix by applying the dynamic programming operations in such a way that the score at each cell of the matrix is minimized. The score in the final (bottom rightmost) cell is the overall score for the match; an alignment, showing how the melodies
Page 116
match, is then generated by tracing back from the final score to find the best scoring path through the global score matrix. While dynamic programming is fast enough for comparing single pairs of melodies, or for searching small databases, it can be too slow for performing searches of large databases, or for exhaustively comparing all pairs of melodies even in relatively small databases. For that reason, we are currently investigating state matching, another general sequence-based matching method, which is not as flexible as dynamic programming but is much faster. References Cope, David, "Recombinant Music: Using the Computer to Explore Musical Style," Computer 24/7 (1991), 22-28. Dillon, M., and M. Hunter, "Automated Identification of Melodic Variants in Folk Music," Computers and the Humanities 16 (1982), 107-117. Galil, Z., and K. Park, "An Improved Algorithm for Approximate String Matching," SIAM J. Comput.19/6 (1990), 989-999. Goad, W. B., and M. I. Kanehisa, "Pattern Recognition in Nucleic Acid Sequences," Nucleic Acids Research 10/1 (1982), 247-263. Greenhaus, D., "About the Digital Tradition," http://www.deltablues.com (1994). Huron, David. The Humdrum Toolkit: Reference Manual.Stanford University: Center for Computer Assisted Research in the Humanities, 1994. Leppig, M., "Musikuntersuchungen im Rechenautomaten," Musica 41/2 (1987), 140-150. Logrippo, Luigi, and Bernard Stepien, "Cluster Analysis for the Computer-assisted Statistical Analysis of Melodies," Computers and the Humanites 20/1 (1986), 19-33.
Page 117
McNab, Rodger J., Lloyd A. Smith, Ian H. Witten, C. L. Henderson, and S. J. Cunningham, "Towards the Digital Music Library: Tune Retrieval from Acoustic Input," Proc. ACM Digital Libraries, Bethesda, Maryland (1996), 11-18. Mongeau, Marcel, and David Sankoff, "Comparison of Musical Sequences," Computers and the Humanities 24 (1990), 161-175. Orpen, K. S., and David Huron, "Measurement of Similarity in Music: A Quantitative Approach for Nonparametric Representations," Computers in Music Research 4 (Fall 1992), 1-44. Sankoff, David, and J. B. Kruskal (ed.). Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison.Reading, MA: Addison-Wesley, 1983. Schaffrath, Helmut, "The EsAC Databases and MAPPET Software," Computing and Musicology 8 (1992), 66. Smith, Lloyd A., and Rodger J. McNab, "A Program to Teach Sight-singing," Proc. Third Int. Conf. Technological Directions in Music Education, San Antonio, Texas (1996), 43-47. Stech, David A., "A Computer-assisted Approach to Micro-analysis of Melodic Lines," Computers and the Humanities 15 (1981), 211-221. Wu, S., and U. Manber, "Fast Text Searching Allowing Errors," Commun. ACM 35/10 (1992), 83-91.
Page 119
5 Strategies for Sorting Melodic Incipits John B. Howard Widener Library, Room 188 Harvard University Cambridge, MA 02138
[email protected] Abstract The Répertoire International des Sources Musicales (RISM) has over a period of 30 years catalogued more than 300,000 works preserved in manuscript in both public and private libraries in more than 60 countries. This article explores some experimental work in the relative effectiveness of different kinds of thematic searching conducted both at the Zentralredaktion in Frankfurt (Germany) and at the U.S. RISM Office at Harvard University over the past decade.
Page 120
Editors and performers of music are often confronted with questions of identity and attribution. Is the material at hand the first or final version of a work? Is it in the original key? Is the scoring original? Are performing indications (e.g., dynamics markings, bowing signs, ornaments) authorial or editorial? Scholars who work with original materials and librarians who attempt to catalogue them are confronted with more basic questions of the same general kind. Is the attribution of the work correct? Is the work complete as it stands? Is it a parody of another work? That is, does it incorporate thematic material used in a different context? Among large-scale projects in music bibliography, none has been so centrally concerned with the task of collating thematic information from musical materials that are physically remote from each other as the manuscript-indexing project (''A/II") of the Répertoire International des Sources Musicales (RISM). Over a period of 30 years RISM has catalogued more than 300,000 works preserved in manuscript in both public and private libraries in more than 60 countries. Access to partially congruent portions of these holdings are available in two ways: (1) on CD-ROM by annual license from G. K. Saur Verlag and (2) via the World Wide Web site www.rism.harvard.edu/rism/DB.html. Since the original cataloguing was done by hand, retrospective conversion of musical incipits is an ongoing process. Originally the CD-ROM included materials from Europe, and an Internet connection to the Harvard University Library Hollis database containing listings of European manuscripts found in the U.S. Over the past year there has been extensive coalescence: the Web site now includes a total of 230,000 holdings from Europe and the U.S. Of these 200,000 are found on the third issue of the CD-ROM. 5.1 Potentials and Practicalities In principle, one value of such a fund of encoded incipits should be in providing answers to vexing questions of attributions. Such questions have arisen frequently in the preparation of work lists to be included in the new collected Mozart and Haydn editions. Ten years ago the two projects between them listed 144 pieces of uncertain attribution. A meeting of editors was convened by the Akademie der Wissenschaften und der Literatur to
Page 121
establish a common list of criteria on which to assess the validity of uncertain attributions. Ludwig Fischer found it "both astonishing and disappointing that there ha[d] been hardly any basic and methodoriented discussion of the whole problem." Georg Feder of the Haydn Institute maintained that investigations should be based on sources and their routes of transmission. Wolfgang Plath of the Mozart Edition hoped for the eventual establishment of a "logical, methodical, systematic approach" but noted that for the present the discovery of conflicting or clarifying information from diverse sources was largely a matter of chance.1 5.2 The Frankfurt Experience 5.2.1 Search Approach The search for concordances among the 144 uncertain works and between them and the holdings of the RISM database in Frankfurt became a test case for evaluating the efficiency of diverse methods of melodic searching. In the two best known methodsthe pitch index and the melodic interval index"insignificant" information is normally excluded. The RISM A/II repertory, which is drawn largely from the seventeenth and eighteenth centuries, is almost entirely tonal. It is assumed in the case of a pitch index that by transposing all the examples to a common key (C Major or A Minor), concordances will be easily spotted. It is assumed in the case of a melodic-interval index that by concentrating on the defining events of a melodic contour, the cognitively "essential" similarities will surface. Long experience of those working with large collections of material had exposed the weaknesses of both approaches, and the decision was made to instead take into account the richness of "insignificant" elements of notation supported by the Plaine and Easie encoding language used by RISM. These elements include staccato dots, slurs, pauses, ties, grace notes, ornaments, redundant accidentals written in the musical text, and so forth.
1 This paragraph and the following text with the heading "The Frankfurt Experience" are based on Joachim Schlichte's article "Der automatische Vergleich von 83,243 Musikincipits aus der RISM-Datenbank: ErgebnisseNutzenPerspektiven," Fontes artis musicae 37 (1990), 35-46, and are used by permission.
Page 122
5.2.2 Procedure To facilitate comparative searching, the Plaine and Easie Code is first translated into a meta-code in which every musical parameter is connected to each individual note it affects. These related parameters include octave number, note duration, beaming, position within a beamed group, and so forth. The meta-code (formulated by Norbert Böker-Heil), enables the incipits to be sorted efficiently. The sorted material can be retranslated into Plaine and Easie Code, from which incipits may be printed. Human judgment is still required to interpret search results. For example, in the case of the Mozart mass known by the Köchel number C1.04 a parallel
source was determined to exist in manuscript in a collection in Winterthur (Switzerland), and in Veszprém (Hungary) the same work is attributed to someone named "Müller." The Swiss source is considered inconsequential, since it is copied from a print of 1815 or 1821. The Hungarian source was copied in 1806, 15 years after Mozart's death, and its attribution may remain questionable. At least, however, Mozart scholars are now aware of its existence.2 5.2.3 Results At the time this study was made, the RISM control database in Frankfurt contained slightly under 84,000 incipits. From the test database of 144 incipits of works traditionally attributed to Haydn or Mozart which are actually unattributed in "original" sources, concordances were found for 33. Numerous concordances from within the RISM database were also found. These were of three main types:
2 This mass for four voices and numerous instruments was published in 1815 and was known in the nineteenth century as "Mozart's Twelfth Mass." Its legitimacy was first questioned in 1826.
Page 123
(1) concordances between works with common attributions (2) concordances between sources in which the surname is identical but the forename is missing on one source (3) concordances in which the musical incipits match but the composer attributions conflict Roughly 5,000 identical works, or 6.17% of the sample, were found to have conflicting attributions (Type 3). At the same time, matches to attributed works were found for 292 works previously indexed locally as "anonymous." This result represented a 2% hit-rate, since within the database the number of anonymous attributions originally stood at 14,000. From these results, we were then able to determine that when all of the "insignificant" details are included in the search data, musical incipits can be effectively sorted, and deviations from the sought match will be evident by the third, fourth, or (at the most) fifth item of information processed. This finding stands in marked contrast to the very long letter-name incipits given in such finding tools as Parsons (1975) and Barlow and Morgenstern (1948). In these cases, the string of "significant" information for which a match is sought is much longer. 5.2.4 Future Directions The meta-code underlying RISM searching is malleable. If it were decided that other weightings (the selective inclusion or exclusion of particular parameters for specific repertories) were appropriate, varied sorts could be supported. If a user needed a melodic-interval index, it could be derived from the meta-code. 5.3 The Harvard Experience Like its colleagues in Frankfurt, U.S. RISM officials found Plaine and Easie Code to be unwieldy for sorting procedures. Instead of using the meta-code adopted in Frankfurt, it pursued a different path: conversion of Plaine and Easie Code to the DARMS encoding language. DARMS encodes the exact
page_124
Page 124
registral and rhythmic value of each musical event, whereas Plaine and Easie describes them contextually. The fact that DARMS represents pitch contextually, that is, by relative position on a musical staff, has both benefits and limitations. This form of representation facilitates transposition but is easily misled by identical melodies written with divergent clef signs. 5.3.1 Procedures Using DARMS as the basis for the sorting of encoded musical excerpts, various methods of generalizing the encoded musical structure have been made and the results compared to the known "ideal" result. The levels and types of generalization made have been as follows: (1) the complete encoding with all parameters (2) the complete encoding transposed to a common pitch register3 (3) the encoding stripped of such features as beaming, bar lines, and fermatas (4) the encoding stripped of the items given in (3) plus grace notes (5) the encoding stripped of the items given in (3) and (4) plus rhythmic values, rests, and ties, with transposition to a common register (2) but with preservation of repeated notes 5.3.2 Results The results of the various sorts performed on the sample data have confirmed in part the experience of the RISM Central Office in Frankfurt, but they also bring to light some particular problems that relate to the methodology applied, to repertory, and to the encoding system employed. Among works in the sample that have been attributed to known composers, a sort of Type 1 grouped together identical works transmitted in versions notated at the same pitch register. A sort of Type 2 brought together
3 The initial DARMS value for pitch register is set to 0 and subsequent values are adjusted accordingly.
Page 125
the few known transposed versions of pieces. These results are in complete agreement with the Frankfurt experience. The bulk of pieces in the data sample derive from the general repertory of tunes, however, and the two initial types of sorts were far less effective in recognizing the relationships of pieces presented differently on the page.4 Of the 13 known occurrences of a song called "Roslin Castle," for example, only four sorted together in a Type-1 sort, and only six in a Type-2 sort [see Figure 1]. In both cases, musically similar manifestations of the tune were separated from the most common manifestation by such different tunes as "The White Cockade" and ''General Washington's March" [12 versions of the first are shown in Figure 2]. Sorts of Types 3 and 4 both failed to bring together additional concordant melodies. Sorts of Type 5 yielded results similar to Types 1 and 2. 5.3.3 Future Goals The ineffectiveness of all strategies in bringing together different manifestations of pieces from the tune repertory suggests that different methodologies must be developed for determining musical identity for the different repertories represented by the project. Some desirable elements of these new approaches can be suggested by examining the data involved in the U.S. experiments to determine which parameters served to separate variants of the same tune in the sort result. At this juncture, the features that seem to be most valuable in separating pieces known to be based on common thematic material are these: (1) details that relate to notational conventions: clefs, bar line placement, two tied notes vs. one dotted note, two eighth rests vs. a quarter rest, etc. (2) variations in the rhythm of specific figures, such as a dotted eighth/sixteenth rendered as two eighths in another source
4 The fact that tunes identified by title were often grouped in the sort result with completely unrelated tunes is ignored in the discussion here. The phenomenon is well known to music bibliographers using a wide array of tools.
Page 126
Figure 1. Thirteen iterations of the incipit of "Roslin Castle" from listings by RISM. Note in particular the variations in (a) key, (b) placement of dotted notes, and (c) presence or absence of grace notes.
Page 127
Figure 2. Twelve iterations of the incipit of "The White Cockade" from listings by RISM. Note the differences (a) of key, (b) of high versus low beginnings, (c) of shorter versus longer note values, (d) of grace notes, and (e) of beamings [which are represented in Plaine and Easie Code].
Page 128
(3) small intervallic differences in the initial pitches of the melodic line (4) the use of rests and repeated notes, particularly in separating vocal and instrumental renditions of the same tune Although each procedure is conceptually simple, each poses complex problems if the query confronts the broader issue of manipulating particular values to facilitate grouping or separation. For example, if one wanted to match the syncopated version of a tune with a conventionally accented version, one would need to "regularize" durations. While the U.S. RISM office has been successful in its exploration of principles involved in generalizing data for searching and sorting, it has also been mindful that encoding methods can enable or cripple searching strategies, since there is no way to find what is not there. References Barlow, Harold, and Sam Morgenstern. A Dictionary of Musical Themes, New York: Crown Publishers, 1948. Parsons, Denys. The Directory of Tunes and Musical Themes, Cambridge: Spencer Brown, 1975.
Page 129
6 Signatures and Earmarks: Computer Recognition of Patterns in Music David Cope Porter College #88 University of California Santa Cruz, CA 95064
[email protected] Abstract In this article I attempt to distinguish between two types of patterns I find extremely important in understanding and analyzing music: signatures and earmarks. Using these patterns, my computer program Experiments in Musical Intelligence (EMI)has created a number of compositions arguably in the style of various classical composers (Cope 1994, 1997). I briefly describe how, using pattern-matching techniques, EMI finds and then uses signatures and earmarks in its compositional processes.
Page 130
6.1 Signatures Musical signatures (a term for motives common to two or more works of a given composer) can aid in the recognition of musical style (Cope 1987, 1991a and b, 1996). For example, signatures can tell us what period of music history a work comes from, the probable composer of that work, and so on. Signatures are typically two to five beats in length and are often composites of melodic, harmonic, and rhythmic elements. Signatures usually occur between four and ten times in any given work. Variations often include transposition, diatonic interval alteration, rhythmic refiguring, and registral and voice shifting. With few exceptions, however, such variations do not deter recognition. Figure 1 shows examples of a musical signature used by Mozart and how signatures can change over time, providing a microcosm of stylistic development. Figure la shows a signature in a rudimentary form over a simple harmonic statement. This is followed by a slightly more elaborate version from a sonata composed four years later (Figure 1b). In Figure 1c, the melody has been truncated with a more active version of the accompaniment. Both this and the following version (Figure 1d, which shows a more elegant and developed melody) were composed around the same time: six years after the version shown in Figure lb. In Figure le, the melodic line closely matches the version shown in Figure 1b but is an octave lower, has slight rhythmic differences, and has a more developed accompaniment. The final version shown here (Figure 1f), composed fifteen years after Figure la, has a fully developed melody and accompaniment and is by far the most complex of those shown. All of these versions of the signature appear in cadences in fast movements. 6.1.2 Signatures and Stylistic Analysis Observing a signature change and develop over time can provide valuable insights into how a given style matures and how, to some extent, one can differentiate by ear the various periods in the life of a composer. While such style analysis cannot supplant other forms of harmonic and melodic or structural analyses, it can augment them in important ways. Interestingly, most forms of standard analysis tend to articulate the ideas and materials composers have in common. Studies of signatures, on the other hand, tend to define what makes each composer unique.
Page 131
Figure 1. Versions of a Mozart signature from his (a) Piano Sonata K. 280 (1774), mvt. 1, mm. 107-8; (b) Piano Sonata K. 330 (1778), mvt. 3, m. 110; (c) Piano Concerto K. 453 (1784), mvt. 1, mm. 162-3; (d) Piano Concerto K. 459 (1784), mvt. 2, mm. 114-6; (e) Piano Sonata K. 547a (1788), mvt. 1, mm. 63-4; (f) Piano Sonata K. 570 (1789), mvt. 2, m. 4.
Page 132
Figure 2. Versions of a signature found in Mozart's Piano Sonata K. 284, mvt. 2: (a) m. 16; (b) m. 30; (c) m. 46; (d) m. 69; (e) m. 92. Placement of signatures can also be an extremely important strategy in classical period structural logic. Figure 2 shows five examples of a Viennese signature used by Mozart which is described generally in my book Computers and Musical Style (Cope 1991, pp. 157-169). It can be analyzed as a premature tonic bass note under a dominant chord or a late-sounding dominant over a tonic pedal point. Each iteration of the signature shown here appears at the end of a period whose first phrase does not cadence with the signature. Note how the spacing and texture (number of dissonant notes) provide differing tensions to the signatures with the final occurrence providing the most prominent weight of the five shown. The tension and location variances help delineate the rondo form in this movement with tonic-function signatures weighted more strongly towards dissonance than the dominant-function signatures. Hence signatures may not only be location-dependent at the local phrase level, but may also be structurally dependent according to section endings. Experienced listeners can hear such subtle balance and know when composers or machine-composing programs misplace or leave out such important signatures in given styles. 6.1.3 Identifying Inter-work Signatures Pattern-matching for signatures entails discovering musical patterns that occur in more than one work of a composer. This requires the development of a program that not only recognizes that two patterns are exactly the same, a
Page 133
fairly trivial accomplishment, but also that two patterns are almost the same. EMI pattern-matches by means of controllers that define how closely a pattern must resemble another for it to register as a match. If these controllers are resolved too narrowly, signatures will not pass. If the controllers are resolved too broadly, patterns that do not identify a composer's style will be allowed to pass. If these controllers are set with discrimination, only signatures will pass. Looking back at Figure 1 provides a simple example of pattern-matching to find signatures. Imagine that a patternmatching program is attempting to determine whether the melody in Figures la and 1c constitutes a signature. It is improbable that a nonmusical pattern-matcher would find these two melodies very similar. They share only two common pitches (D and E). Also, Figure 1c has fewer notes than Figure la. To the ear, however, these are easily identifiable as simple variations of the same pattern. Musical pattern-matchers can discover the similarities in these two patterns. This is initially accomplished by reducing pitches to (base-12) intervals. For Figures la and 1c this produces [1 -4 -3 -2 -1] and [1 -3 -4 2] respectively. Notice how using intervals shows more similarity in the two patterns than using pitches. Introducing controllers that determine interval accuracy proves the patterns to be similar enough to qualify as a signature. By allowing, for example, either pattern to match when different by just one step in either direction enables the program to match the first three intervals. Such variations are obviously very common in tonal music, where composers, in order to remain within a diatonic framework when sequencing, often substitute whole steps for half steps and vice versa. Matching Figures la and 1c requires a second controller to ignore the direction of one-note motion (i.e., interval of 2 matching interval of -2). A third controller will allow the extra note in the first pattern. Thus, an allowance for these variations helps to make the pattern-matcher find musical similarities. Figure 3 shows a signature from Chopin's mazurkas with more elaborate rhythmic and pitch variations. Here, few of the examples have either the same number of melodic notes or intervals. Matching such subtle differentiations can be quite difficult even for a sophisticated pattern-matcher because the controller settings necessary to recognize these variants may also produce numerous non-variants. To reduce this noise in the output, the program must factor elements such as the exact placement of the variations. Such precision allows the EMI pattern-matcher to discover signatures which are aurally recognizable but numerically very different.
Page 134
Figure 3. Various forms of a signature in Chopin mazurkas: (a) Op. 6, No. 1, m. 1; (b) Op. 6, No. 4, mm. 9-10; (c) Op. 7, No. 2, m. 3; (d) Op. 17, No. 4, m. 13; (e) Op. 17, No. 4, m. 15; (f) Op. 17, No. 4, m. 29; (g) Op. 50, No. 1, m. 18. 6.2 Earmarks There are other patterns in music which indicate, at least to the initiated listener, important attributes about their host works besides style. These patterns are more generalized than signatures. I call such patterns earmarks since they are identified most easily by ear and tend to mark specific structural locations in a work. Earmarks can tell us what movement of a work we are hearing. Earmarks can also foreshadow particularly important structural events. Earmarks may even contribute to our expectations of when a movement or work should climax or end and therefore enhance our appreciation or lack of appreciation of that work. In general, earmarks have significant impact on the analysis of structure beyond thematic repetition and variation.
Page 135
Recognition of such patterns in music is not new. The study of musical signs and symbols, for example, reveals that certain gestures in works can be traced to sources beyond their context and often beyond their composer (Agawu 1991, Gjerdingen 1988). Frequently, however, such analysis takes the form of recognition of quotation or quasi-quotation so that the semantic understanding of a work is enhanced. Earmarks, while not falling out of the scope of the study of signs and symbols, have little such semantic meaning. Earmarks, like signatures, are integrated seamlessly into their immediate environment and have syntactic rather than semantic value. Earmarks are icons holding little interest in themselves but great interest for what they reveal about structure and what they can foretell about what is to follow. 6.2.1 Earmarks as Gestural Information In general, variations of earmarks point out their gestural nature. They can typically be described in general terms such as a trill followed by a scale or an upward second followed by a downward third, and so on. Trills and scales, however, abound in many composers even when in combination. The distinguishing characteristic about earmarks is their locationthat they appear at particular points in compositions just after some important event and just before another important event. Thus, finding earmarks helps pinpoint important nexus points in music. Figure 4 shows five examples of an earmark found in Mozart's piano concertos Numbers 6 through 27: the tonic 6/4s which precede the trills at the ends of expositions and recapitulations just prior to cadenzas. Note how the first measure in each example varies yet adheres to simple scalar designs. While these examples are simple, possibly obvious, it is clear that with even a limited listening experience, the ear can become accustomed to the harmonic and melodic substance of such material and begin to anticipate the eventual culmination of the exposition in their first occurrence and the cadenza in the second occurrence.
Page 136
Figure 4. An earmark from the first movements of Mozart's Piano Concertos: (a) K. 238, mm. 86-7; (b) K. 449, mm. 318-9; (c) K. 450, mm. 277-8; (d) K. 482, mm. 196-7; (e) K. 595, mm. 326-7.
page_137
Page 137
6.2.3 Earmarks as an Aid to Structural Perception Misplaced earmarks can cause a disruption in an educated listener's perception of the apparent musical structure. For example, earmarks which do not precede anticipated sections, occur out of sequence, or are ill-timed can cause rifts in the antecedent-consequent motion so important to musical structure. There is, for example, an oboe concerto attributed to Haydn (Hoboken VIIg:C1) which has numerous earmarks scattered about the first movement, earmarks which typically foreshadow or simply precede the cadenza. None of the uses of this earmark subsequently moves to the cadenza, which results in a movement which sounds scattered at best, and at worst appears to constantly stumble about, unsure of where it should go. Given that Haydn was acutely aware of the use of this earmark, at least if one can judge by the numerous examples of his concertos which use this earmark correctly, it would seem unlikely that this work's attribution is correct and more likely that it is apocryphal.
Figure 5. An earmark from the fourth movement of EMI's Symphony (mm. 82-85), arguably in the style of Mozart.
Page 138
Figure 5 shows a fourth-movement earmark found in many of Mozart's symphonies and here found in EMI's symphony in the style of Mozart. Three different fourth movements of Mozart's symphonies were used in the analysis which helped produce this music. Each of Mozart s examples possessed a version of this earmark at roughly the same structural point in the movement. Again, as with the earmark found in the concertos, the music here is pedestrian and not particularly distinguished. However, upon hearing, it does stand out from the surrounding material with enough integrity to make its structural importance known to the ear. The orchestration, inherited in the case of the EMI-Mozart from the original scores upon which its composition was based, so blends with the surrounding material that the earmark seems all but lost in the forest. The character, here of a descending scale rather than the leaps which otherwise bookend it on either side, forms the basis on which one can anticipate, in this case, the recapitulation of the first theme in the original key. References Agawu, V. Kofi. Playing with Signs.Princeton: Princeton University Press, 1991. Cope, David, ''An Expert System for Computer-Assisted Music Composition," Computer Music Journal 11/4 (Winter, 1987): 30-46. Cope, David. Computers and Musical Style.Madison, WI: A-R Editions, 1991. Cope, David, "Recombinant Music," Computer 24/7 (July, 1991): 22-28. Cope, David. Bach by Design.Sound Recording. Baton Rouge, LA: Centaur Records, 1994. Cope, David. Experiments in Musical Intelligence.Madison, WI: A-R Editions, 1996. Cope, David. Classical Music Composed by Computer.Sound Recording. Baton Rouge, LA: Centaur Records, 1997. Gjerdingen, Robert. A Classic Turn of Phrase.Philadelphia: University of Pennsylvania Press, 1988.
Page 139
II. TOOLS AND APPLICATIONS
Page 141
7 A Multi-scale Neural-Network Model for Learning and Reproducing Chorale Variations Dominik Hörnel Institut für Logik, Komplexität und Deduktionssysteme Universität Karlsruhe (TH) Am Fasanengarten 5 D-76128 Karlsruhe, Germany
[email protected] Abstract This article describes a multi-scale neural-network system producing melodic variations in a style directly learned from musical pieces of baroque composers like Johann Sebastian Bach and Johann Pachelbel. Given a melody, the system invents a four-part chorale harmonization and improvises a variation of any chorale voice. Unlike earlier approaches to the learning of melodic structure, the system is able to learn and reproduce higherorder elements of harmonic, motivic, and phrase structure. Learning is achieved by using mutually interacting neural networks, operating on different time-scales, in combination with an unsupervised learning mechanism to classify and recognize these elements of musical structure. A complementary intervallic encoding allows the neural network to establish relationships between intervals and learned harmony. Musical pieces in the style of chorale partitas written by Pachelbel are the result.
page_142
Page 142
7.1 Background The investigation of neural information structures in music brings together such disciplines such as computer science, musicology, mathematics, and cognitive science. One of its objectives is to find out what determines the personal style of a composer. Neural networks constitute one procedure which has been shown to be able to "learn" and reproduce style-dependent features from given examples. Various alternative techniques have been applied to style identification and simulation. These include signature identification (Cope; see preceding article), back-propagation networks (Ebcioglu *), and the use of neural networks in conjunction with other procedures (e.g., data-compression measures in the work of Witten, Conklin, et al.) The generation of chorale melodies and/or harmonizations in the style of Bach, for example, have been a central focus of the work of Ebcioglu (1986 and 1992); Conklin and Witten (1990); Witten, Manzara, and Conklin (1994); and Hild, Feulner, and Menzel (1992). When dealing with longer melodic sequences in, for example, folk melodies, models have considerable difficulties in learning structure. Instead they may produce new sequences that lack coherence (Feulner and Hörnel 1994; Mozer 1994). A principle reason for their failure may be that they are unable to capture higher-order structural features such as harmonies, motifs, and phrases simultaneously occurring on multiple time scales (Hörnel and Ragg 1996b). 7.1.1 Melodic Variation as a Musical Technique The art of melodic variation has a long tradition in Western music. Many European composers, particularly in the eighteenth and nineteenth centuries, have written variations on a given melody, e.g., Mozart's keyboard variations K. 300e on the folk melody "Ah! Vous dirai-je, Maman" (also known as the children's song "Twinkle, Twinkle Little Star"). Underlying this tradition is the baroque genre of chorale variations. These were written for performance on the organ or harpsichord for use in the Protestant church. Even earlier, the secular keyboard partita had a presence in Italy through the works of Frescobaldi and various other keyboard composers, including his German pupil Johann Jakob Froberger. A prominent representative of this kind of composition in Germany at the end of the seventeenth century was Johann Pachelbel, who wrote seven sets
page_143
Page 143
of variations on chorale melodies. Typically each work included from seven to twelve variations. Pachelbel was also praised for his variations on six secular arias published under the title Hexachordum Apollinis (1699). He subjected chorale melodies to a great many other compositional procedures. In his lifetime Pachelbel was known as "a perfect and rare virtuoso" whose works influenced many other composers such as Bach. Most of Pachelbel's chorale partitas can be seen as improvisations of an organist who invented "real-time" harmonizations and melodic variations on a given chorale melody. This method of composing is very similar to the behavior of the neural-network system presented here. The problem of learning melodic variations with neural networks has been studied by Feulner and Hörnel (1994) and by Toiviainen (1995) for jazz improvisation. Although these approaches produce some musically convincing local sections, the results in general lack global coherence. 7.1.2 A Neural Model for Variation Technique The neural-network model we present here is able to learn global structure from musical examples by using two mutually interacting neural networks that operate on different time-scales. The main idea of the model is a combination of unsupervised and supervised learning techniques to perform the given task. Unsupervised learning classifies and recognizes musical structure; supervised learning is used for prediction in time. The model has been tested on simple children's song melodies in (Hörnel and Ragg 1996b). In the following we will illustrate its practical application to a complex musical taskthe learning of melodic variations in the style of Pachelbel. 7.2 Task Description Given a chorale melody, the learning task is achieved in two steps: (1) A chorale harmonization of the melody is invented. (2) One of the voices of the resulting chorale is selected and provided with melodic variations. Both subtasks are directly learned from musical examples composed by J. Pachelbel and performed in an interactive composition process which results in a chorale variation of the given melody. The first task is performed by
Page 144
HARMONET, a neural-network system which is able to harmonize melodies in the style of various composers such as J. S. Bach. The second task is performed by the neural-network system described below. 7.2.1 Time Resolution of Variations For simplicity we have considered only melodic variations consisting of four sixteenth notes for each quarter note of the melody. This is the most common variation type used by baroque composers and presents a good starting point for even more complex variation types, inasmuch as there are enough musical examples for training the networks, and because it allows the representation of higher-scale elements in a rather straightforward way. 7.2.2 Harmonet: A Neural System for Harmonization HARMONET is a system producing four-part chorales in various harmonization styles, given a one-part melody. It solves a musical real-world problem on a performance level appropriate for musical practice. Its power is based on (1) an encoding scheme capturing musically relevant information and (2) the integration of neural networks and symbolic algorithms in a hierarchical system, combining the advantages of both. For a detailed account see Hild, Feulner, and Menzel (1992) or Hörnel and Ragg (1996a). Figure 1 shows the chorale melody "Alle Menschen müssen sterben" and a corresponding harmonization and variation of the soprano voice composed by Pachelbel. 7.3 A Multi-scale Neural-Network Model The learning goal of this model is two-fold. On the one hand, the results produced by the system should conform to melodic and harmonic constraints such as the correct resolution of dissonances or the appropriate use of successive intervallic leaps. On the other hand, the system should be able to capture unique stylistic features from the learned examples, in this case melodic shapes preferred by Pachelbel. The adherence to musical rules and aesthetic conformance to the learning set can be achieved by a multi-scale neural-network model. The learning task is divided into subtasks. The procedure is illustrated in Figure 2.
Page 145
Figure 1. The German chorale melody "Alle Menschen müssen sterben" (upper staff) and a chorale variation composed on the same melody by Pachelbel (lower staves).
Page 146
Figure 2. The organization of the system for composing new chorale variations. Reading from the lower left-hand corner, each note of the melody (which has been harmonized by HARMONET) is passed to the supernet, which predicts the current motif class (MT) from a local window (see the Motif-Prediction window). By a similar procedure performed on a lower time-scale, the subnet predicts, on the basis of MT and the current harmony, the next note of the motif (see the Note-Prediction window). The result is returned to the supernet through the motivic-recognition component (see the right side of the chart) in order to be considered when the net computes the next motif class (MT+1).
Page 147
(1) A chorale variation is considered on an abstract time-scale as a sequence of note groups (motifs). Each quarter note of the original melody is replaced by a motif (here a motif consisting of four sixteenth notes). Before training the networks, motifs are classified according to their similarity. (2) One neural network is used to learn the abstract sequence of motivic classes. Motivic classes are represented in a 1-of-n encoding form where n is a fixed number of classes. The question this step solves is this: What kind of motif fits a particular note with respect to the melodic context and the motifs that have occurred previously? No precise pitches are fixed by this network. It works at a more abstract level and is therefore called a supernet in this commentary. (3) Another neural network learns how to translate motivic classes into concrete note sequences appropriate to a given harmonic context. It produces the actual pitches. Because it works one level of precision below the supernet, it is here called a subnet. (4) Although the output of the subnet is mainly influenced by the motivic class computed by the supernet, the subnet has to find a meaningful realization according to the harmonic context. Sometimes the subnet invents a sequence of notes that does not belong to the motivic class determined by the supernet. This motif will be considered by the supernet when computing the next motif, however, and should therefore match the notes previously formed by the subnet. The motif is therefore reclassified by the motivic recognition component of the system before the supernet determines the next motif class. The motivation for this separation into supernet and subnet arose from the following considerationthat if each motif had an associated contour (i.e., a sequence of intervallic directions to be produced for each quarter note), the choices for note-generation could be restricted to suit these contours. The procedure is based on human behavior. When a human organist improvises a melodic variation of a given melody in real time, he must make his decisions in a fraction of a second. In order to find a meaningful continuation of the variation, he must therefore have at least some idea about what kind of variation should be applied to the next note.
page_148
Page 148
The validity of the concept was established by several experiments where motivic classes, previously obtained from Pachelbel originals through classification, were presented to the subnet. After training, the subnet was judged to be able to reproduce almost perfectly the Pachelbel originals. Since the motivic contour was shown to be an important element of melodic style, another neural network was introduced at a more general time-scale to accommodate contour. The training of this net greatly improved the overall performance of the system rather than merely shift the learning problem to another time-scale. 7.4 Motivic Classification and Recognition In order to coordinate learning at different time scales, we needed a procedure to classify motifs for training the supernet and to recognize motifs for reclassification. This was achieved by using unsupervised learning following Kohonen's topological feature maps (1990), which represent agglomerative hierarchical clustering. We implemented a recursive clustering algorithm based on a distance measure which determines the similarity between motifs by comparing their contours and interval sizes. The result of hierarchical clustering is a dendrogram that allows comparison of classified elements on a distance scale. Figure 3a shows the result of classification for eight motifs. While cutting the classification tree at lower levels, we get more and more classes. Another approach is to determine appropriate motivic classes through self-organization within a one- or twodimensional surface. Figure 3b displays the distribution of motif contours over a 10x10 Kohonen feature map. The algorithm is then applied to all motifs contained in the training set. 7.5 Network Structure An important aspect of this task is to find an appropriate number of classes for the given learning task. Both the supernet and the subnet are implemented as standard forward-feed networks. The task of the note-prediction subnet is to find a variation of a given melodic note according to the motif class proposed by the supernet and the harmony determined by HARMONET. Because the character of a motif depends on the intervallic relationship between its notes rather than on absolute pitches, we have chosen
Page 149
an intervallic representation for this procedure. Each note is represented by the interval to the first motif note, the so-called reference note (indicated by the percent sign [%]).
Figure 3a A dendrogram for the first eight motifs of the Pachelbel chorale variation shown in Figure 1b; below the staff one can see the corresponding base-7 intervallic representation.
Figure 3b A Kohonen feature map developed from all motifs of the chorale variation (initial update area 6x6, initial adaptation height 0.95, decrease factor 0.995). Each cell corresponds to one unit in the feature map. One can see the arrangement of regions responding to motifs having different contours.
Page 150
7.5.1 Harmonic Controls Since the motivic structure also depends on harmonic information, a special complementary intervallic encoding was developed to allow the network to establish a relationship between melodic intervals and harmonic context. 7.5.2 Subnet Input and Output The following array depicts all input and output features of the subnet. The network has three fully connected layers: (1) an input layer with 47 units, (2) a hidden layer with about 25 units, and (3) an output layer with 12 units. A corresponding musical example is displayed in the upper right-hand box of Figure 2. The output feature of the subnet is the note N to be learned at time t. The input features of the subnet are these:
·
the motif class MT determined by the supernet
·
the harmonic field HT determined by HARMONET
·
the next reference note mT+1
·
one preceding melody note Nt-1
·
the position pt within the motif.
The supernet learns a sequence of motifs which are given as abstract classes developed during motivic classification. The most important element influencing the choice of a motivic class is the interval between the current and next pitches. To produce motivic sequences that are also coherent on larger time frames, information about the position relative to the beginning or end of a musical phrase (2-4 measures) is added. The motivic classes are represented in a simple 1-of-n encoding form (n is the number of classes). 7.5.3 Supernet Input and Output The following array summarizes the input and output features of the supernet. The network has a 61-35-12 topology for n = 12. A corresponding musical example is displayed in the lower right-hand box of Figure 2.
page_151
Page 151
The output of the supernet is the motivic class M to be learned at time T. The input features of the supernet are: ·
a melodic context given by one preceding and two following notes mT-1, mT+1, and mT+2.
·
one preceding motivic class MT-1
·
phrasing information phrT
·
information about up- and downbeats ZT within a measure.
7.6 Intervallic Representation In general one can distinguish two groups of motifs: melodic motifs prefer small intervals, mainly primes and seconds; harmonic motifs prefer leaps and harmonizing notes (chord notes). Both motif groups rely heavily on harmonic information. In melodic motifs dissonances should be correctly resolved; in harmonic motifs notes must fit the given harmony. Small deviations may have a strong effect on the quality of musical results. Thus our idea was to integrate musical knowledge about intervallic and harmonic relationships into an appropriate intervallic representation. Each note was represented by its intervallic relationship to the first note of the motif, the so-called reference note. This is an important element contributing to the success of our system. We have developed and tested various intervallic encodings. The initial encoding of intervals took account of two important relationships: ·
neighboring intervals were realized by overlapping bits
·
octave invariance was represented using an octave bit.
The activation of the overlapping bit was reduced from 1 to 0.5 in order to allow a better distinction of the intervals. This encoding was then extended to capture harmonic properties as well. The idea was to represent in a similar way ascending and descending intervals leading to the same note. This was achieved by using the complementary intervallic encoding shown in Table 1.
Page 152
It used three bits to distinguish the direction of the interval, one octave bit, and seven bits for the size of the interval. Table 1. Complementary intervallic encoding allowing the numeral 1 to represent letter-name changes DIRECTION OCTAVE INTERVAL SIZE INTERVAL 1 0 0 1 0 0 0 0 0 0.5 1 ninth down 1 0 0 1 1 0 0 0 0 0 0.5 octave down 1 0 0 0 0.5 1 0 0 0 0 0 seventh down 1 0 0 0 0 0.5 1 0 0 0 0 sixth down 1 0 0 0 0 0 0.5 1 0 0 0 fifth down 1 0 0 0 0 0 0 0.5 1 0 0 fourth down 1 0 0 0 0 0 0 0 0.5 1 0 third down 1 0 0 0 0 0 0 0 0 0.5 1 second down 0 1 0 0 1 0 0 0 0 0 0.5 prime 0 0 1 0 0.5 1 0 0 0 0 0 second up 0 0 1 0 0 0.5 1 0 0 0 0 third up 0 0 1 0 0 0 0.5 1 0 0 0 fourth up 0 0 1 0 0 0 0 0.5 1 0 0 fifth up 0 0 1 0 0 0 0 0 0.5 1 0 sixth up 0 0 1 0 0 0 0 0 0 0.5 1 seventh up 0 0 1 1 1 0 0 0 0 0 0.5 octave up 0 0 1 1 0.5 1 0 0 0 0 0 ninth up Complementary intervals such as ascending thirds and descending sixths [beginning from the same pitch class] have similar representations because they lead to the same new note name and can therefore be regarded as harmonically equivalent. A simple rhythmic element was then introduced by a tenuto bit (not shown in Table 1) which is set when a note is tied to its predecessor. This final (3+1+7+1=) 12-bit interval encoding gave the best results in our simulations. This intervallic encoding requires an appropriate representation for harmony. It can be encoded as a harmonic field which is a vector of chord notes of the diatonic scale. The tonic T in C major, for example, contains three chord notesC, E and Gwhich correspond to the first, third and fifth degrees of the C major scale (1010100). This representation may be further extended, for we can now encode the harmonic field starting with the first motivic note instead of the first degree
If you like this book, buy it!
Page 153
of the scale. This is equivalent to rotating the bits of the harmonic field vector. An example is displayed in Figure 4.
Figure 4. The relationship between complementary intervallic encoding and a rotated harmonic field is shown. Each note is represented by its interval in relation to the first (reference) note [of the motif]. The harmonic field indicates the intervals leading to harmonizing notes (i.e., B, D, F, G for harmony D7). The chord given to the motif is the dominant D7 [in Riemann functional notation the equivalent of V7]; the first note of the motif is B, which corresponds to the seventh degree of the C major scale. Therefore the harmonic field for harmony D7 (0101101) is rotated by one position to the right, resulting in the bit-string 1010110. Starting with the first note B, the harmonic field indicates the intervals that lead to harmonizing notes B, D, F and G. On the right side of Figure 4 one can see the correspondence between bits activated in the harmonic field and bits set to 1 in the three intervallic encodings. This kind of representation helps the neural network to directly establish a relationship between intervals and a given harmony. 7.7 System Performance We carried out several simulations to evaluate the performance of the system. Many methods of improvement were suggested by listening to the improvisations generated by the system. One important problem was to find an appropriate number of classes for the given learning task. Table 2 lists the classification rate on the learning and validation set of the supernet and the subnet using 5-, 12-, and 20-motif classes. The learning set was automatically built from 12 Pachelbel chorale variations, which produced 2,220 patterns for the subnet and 555 for the supernet. The validation set included six Pachelbel variations, which provided 1,396 patterns for the subnet and 349 for the supernet. The supernet
page_154
Page 154
and subnet were then trained independently with the RPROP learning algorithm (''Resilient backPROPagation"; see Riedmiller and Braun 1993) using the validation set to avoid over-fitting. Table 2. Classification performance for the supernet and subnet. supernet
subnet
# of classes
5
12
20
5
12
20
learning set
91%
87%
88%
86%
94%
96%
validation set
50%
40%
38%
79%
83%
87%
7.7.1 Optimizing the Number of Classes The classification rate of both networks strongly depends on the number of classes, especially on the validation set of the supernet. The smaller the number of classes, the better the classification of the supernet because there are fewer alternatives to choose from. We can also notice an opposite classification behavior of the subnet. The bigger the number of classes, the easier the subnet will be able to determine concrete motif notes for a given motif class. One can imagine that the optimal number of classes lies somewhere in the middle (about 12 classes). This was then confirmed by comparing the results produced by different network versions.
Page 155
Figure 6. Melodic variation on "Ah! Vous dirai-je, Maman" composed by the HARMONET neural-network system. The original melody is shown above.
Page 156
The accompanying examples are composed by the neural-network system with twelve motivic classes. Figure 5 shows an extract of a Pachelbel-style harmonization and chorale variation in the tenor voice based on the melody "O Welt, ich muß dich lassen," which did not belong to the learning or validation set. We have also tested our neural organist on melodies that do not belong to the Baroque era. Figure 6 shows a baroque-style harmonization and variation on the melody "Ah! Vous dirai-je, Maman," used by Mozart in his famous piano variations. The result clearly exhibits global structure and is well-bound to the harmonic context. 7.8 Conclusions Preliminary results confirm that the HARMONET system is able to reproduce style-specific elements of melodic variation. In future work we may explore the question of whether the global coherence of the musical results may be further improved by adding a third neural network working at a still higher level of abstraction, e.g., at a phrase level. We believe that our overall approach presents an important step towards the learning of complete melodies. More information about our research project (goals, demos) is offered on the WWW page: http://illwww.ira.uka.de/~dominik/neuro_music.html References Conklin, Darrell, and Ian H. Witten, contribution on "Predictive Theories" in "Software for Theory and Analysis," Directory of Computer-Assisted Research in Musicology 6 (1990), 122. Ebcioglu *, Kemal, "An Expert System for Harmonization of Chorales in the Style of J. S. Bach." Ph.D. thesis, State University of New York at Buffalo (Technical Report #86-09), 1986. Ebcioglu, Kemal, "An Expert System for Harmonizing Chorales in the Style of J. S. Bach" in Understanding Music with AI: Perspectives on Music Cognition, ed.
Page 157
Mira Balaban, Kemal Ebcioglu *, and Otto Laske (Cambridge: AAAI Press/MIT Press, 1992), pp. 294-334. Feulner, J., and Dominik Hörnel, "MELONET: Neural Networks that Learn Harmony-Based Melodic Variations," Proceedings of the 1994 International Computer Music Conference (Aarhus, Denmark: International Computer Music Association, 1994), 121-124. Hild, H., J. Feulner, and W. Menzel, "HARMONET: A Neural Net for Harmonizing Chorales in the Style of J. S. Bach," Advances in Neural Information Processing 4 (1991), ed. R. P. Lippmann, J. E. Moody, D. S. Touretzky, 267-274. Hörnel, Dominik, and T. Ragg 1996a, "A Connectionist Model for the Evolution of Styles of Harmonization" in Proceedings of the 1996 International Conference on Music Perception and Cognition (Montreal, Canada: Society for Music Perception and Cognition, 1996a), 213-218. Hörnel, Dominik, and T. Ragg, "Learning Musical Structure and Style by Recognition, Prediction and Evolution" in Proceedings of the 1996 International Computer Music Conference (Hong Kong: International Computer Music Association, 1996b), 59-62. Kohonen, T., "The Self-Organizing Map," Proceedings of the IEEE Vol. 78/9 (1990), 1464-1480. Mozer, M. C., "Neural Network Music Composition by Prediction," Connection Science 6/2 & 3 (1994), 247-280. Riedmiller, M., and H. Braun, "A Direct Adaptive Method for Faster Backpropagation Learning: The RPROP Algorithm," Proceedings of the 1993 IEEE International Conference on Neural Networks (San Francisco, 1993), ed. H. Ruspini, 586-591. Toiviainen, P., "Modeling the Target-Note Technique of Bebop-Style Jazz Improvisation: An Artificial Neural Network Approach," Music Perception 12/4 (1995), 399-413. Witten, Ian H., Leonard C. Manzara, and Darrell Conklin, "Comparing Human and Computational Models of Music Prediction," Computer Music Journal 18/1 (1994), 70-80. [Calgary, 1990]
Page 159
8 Melodic Pattern-Detection Using MuSearch in Schubert's Die Schöne Müllerin Nigel Nettheim 204A Beecroft Road Cheltenham, NSW 2119, Australia
[email protected]. Abstract The MuSearch program described here was written in 1989 in C. It can be used with the commercial musicprinting program SCORE, whose data it processes, to pursue analytical questions related to pitch, rhythm, meter, and text underlay. In this article I show its application to a repertory of Schubert Lieder.
Page 160
Melodic personality. Pitch alone is rather limited for this purpose: rhythm and meter should preferably also be taken into account. Song texts may provide additional evidence of the significance of the musical patterns. The four variables just mentionedpitch, rhythm, meter, textare basic to many investigations in analytical musicology. They are handled by the custom computer program to be described here, as are less basic elements such as dynamics and slurs. Other variables could of course be relevant, and can be handled by fairly straightforward extensions of the method to be described. For instance, although there is apparently no fully successful computer algorithm for the harmonization of music of the period of common practice, the conventional symbols (I, V, ii6, etc.) may be determined by a human, entered as an extra line of "text" underlay, and thus handled similarly. 8.1 The MuSearch Program I began writing the program MuSearch in 1989 in the "C" language for DOS on IBM PC's, after acquiring the commercial music printing program SCORE, whose data it processes. The starting point for the program design was the requirement that the musical data constituting the input should be viewed, if desired, as conventional music-score images (not just as alphanumeric codes or as a schematic representation), and that the output should also be viewable as music scores, possibly annotated by indicating on the scores the occurrence of specified patterns; the output could also include statistical tables or other lists or reports. That starting point resulted from my desire to have ready access at any time to the score images of the music being studied. Between the input and output lies the searching (or analytical) part of the program. Thus in the trivial case where no searching is performed, the input scores will pass straight through to be viewed again unchanged as the output scores. This starting point has not always been adopted by others, which is one reason for the difference in character between MuSearch and, for example, Humdrum, whose design did not emphasize music-score images.
page_161
Page 161
8.1.1 Input The input data consists of plain ASCII text files in SCORE's macro input language.1 They embody the musical data in a form understandable by a human, by contrast with more elaborate parameter or other files representing the graphical image. The macro files may indicate not only staves, note-names, rests, durations, text underlay, dynamics, etc., but also editing procedures, such as the adjustment of slurs, to be applied in putting together the graphical image. These features of the macro files, as well as the fact that they may contain comments, make them easy to use.2 One file may contain a number of staves, typically covering one song, and each staff normally contains one phrase, as determined by the encoder. MuSearch first reads a master file containing a list of the desired data files for a particular run of the program. The data to be discussed here (as a small sample) consists of the 20 songs from Schubert's cycle Die schöne Müllerin, D. 795 (1823)see Figures 1 and 2.
Figure 1 Input phrase from Schubert's "Der Neugierige," D. 795/6. 8.1.2 Search Routines Search routines in programs for public use normally require the user to specify the target by using a pre-set minilanguage. Instead, I simply program in the "C" language a function for whatever search criteria I may require, and then recompile the program. This has the advantage of complete freedom from restriction, but it means that, in its present form, my program is not suited to public release.3 The program forms an internal representation 1 Data not initially in the required format can often be converted to it. For example, I have written a program EsSCORE (freely available) to convert the well-known Essen database files to SCORE macro files. 2 Comments begin with a "!"an undocumented feature of the SCORE program. 3 Yet a comparable situation exists in the Microsoft Word word-processing program, which currently allows programming within it in the Visual Basic language. Further, I might have (footnote continued on next page)
page_162
Page 162
Figure 2. Input data for the phrase in Figure 1. The syntax is that of the SCORE program; comment lines begin with "!". of the music, upon which the desired search function acts. The function itself is conceptually simple: proceeding through the data, if the data match the search criteria, prepare the excerpt for output; but its implementation in program code requires considerable detail. 8.1.3 Output The output files are again text macro files, as were the input files. The user may choose the scope of the output excerpts: either the whole phrase in which the target occurred, or just the motif, defined here to include along with the target the immediately preceding and following note and barline, if any. For phrases, the program supplies macro commands in the output file (footnote continued from previous page)
prepared my program for public release but for a problem of establishing communication with the SCORE personnel. SCORE is apparently the only commercial music-notation program allowing ASCII macro input files, and to that extent made the present approach possible; if a future release also allows nested macros, its scope for automatic manipulation of musical scores for musicological purposes will be greatly increased.
Page 163
to draw a box around portions of the music found in a searcha vital addition.4 If the user is very familiar with the data, then motifs may suffice to call to mind the indicated excerpt; otherwise whole phrases may be preferred. The resulting annotated scores may be viewed on the computer screen or printed. Naturally, further editing of those files, or recombination of them, may then be carried out by the user. However, it should be noted that the ideal for this kind of musicological purpose, where many different runs of the program may be made in a given study, is that formatting and other control of the graphical musical image should as far as possible be performed automatically by the program via the macro language. Artistic touching-up, when needed on the basis of visual inspection, is then left for the final stage. The user is also informed of the number of phrases searched and the number of matches found. 8.2 Musical Examples Any pattern expressible in terms of the data elements may be the target of a search, but here just two illustrations will be sketched: (a) a simple text search, and (b) searching for a text/melody pattern reflecting pitch and meter. Song texts are involved in both illustrations. In the case of strophic songs, the human musician should first determine which verse of the poetic text has primarily been set to music by the composer. Usually this is the first verse, the same music fitting the remaining verses slightly less perfectly because of the varying sense and/or accentuation. Accordingly, only the first verse has generally been used in the present study. Exceptions can occur, however: in the last song, "Des Baches Wiegenlied," it appears that Schubert has set primarily the last verse, which of course ends the cycle, and accordingly only the text of that verse has been used.5
4 Technically, the automatic placement of the box is facilitated by drawing it as a pair of enlarged "firsttime endings" without numerals, the lower one inverted. 5 At other times, one might even wish to study just the secondary verses of strophic songs, in order to look into the degree of inappropriateness of the music to their texts which the composer tolerated.
Page 164
8.2.1 A Simple Text Search For a first illustration a search was made for all cases where Schubert set a word containing the string "Lieb" or "lieb" (love). This includes all the German words derived from "lieb'': Liebe, lieben, Geliebte, Liebchen (love, to love, beloved, diminutive), etc. In a study of this kind one must be aware of the varied contexts and nuances of meaning for the target wordsthus the word may have its most straightforward positive sense, but may on other occasions be used negatively ("Du liebst mich nicht," D. 756) or ironically ("Die Liebe hat gelogen," D. 751). The search function for this illustration is in principle simple: if the given text string is present, append a suitable excerpt including it to the output file. The 22 matches found are shown in Figure 3 (figures), and four of them in Figure 4 (phrases). The output motifs are positioned by the program in columns in order to include as many as possible on each page, minimizing page-turns in the comparative study of large collections of excerpts. Some remaining touch-up editing has deliberately been left undone, so that the reader may see how little is not handled by the automatic score-generating procedure. The present example does not constitute a project, but just an illustration of the modus operandi.The motifs found (imagining one had a larger database) would next be processed by the human musicologist. They could be sorted according to (1) the more casual uses of the given text "lieb" and (2) the more intensely expressive ones, here especially D. 795/19, bar 68. The relative lengths of the notes for "lieb", the extent of melisma, the metrical location, and ideally also the harmony, would next be taken into account. Such a study of the settings of various significant words could lead to quite instructive conclusions concerning the compositional approach taken by the given composer, and to the ways in which musical melody may match speech inflections. Systematic studies of the relationship between music and text/speech in the period of common practice are surprisingly uncommon; see Fecker (1984), who collected many examples without computer assistance. 8.2.2 A Text/Melody Search For a second example I searched for cases similar to the setting of the words "mein Bächlein" in "Der Neugierige," D. 795/6, bar 17, expressed as in Figure 5 (the appropriate formulation would of course be dictated by the user's idea of what might prove fruitful for a particular purpose).
Page 165
Figure 3. Output motifs from a search for the text "lieb". The resulting four matches are shown in Figure 6. Motives of this kind appear to express admiration for someone or something that has endeared itself to the speakercomparison with the inflection of the speaking voice in such circumstances is indicated. Naturally this idea needs to be worked out thoroughly with a much larger database (cf. Fecker, 1984).
page_166
Page 166
Figure 4. Output phrases from a search for the text "lieb". 8.2.3 An Application to Pulse Analysis A very different application of MuSearch was made in Nettheim (1993) to the Essen database of German folksong, producing tables of melodic/rhythmic progressions at the barlines for the study of the musical "pulse". That paper, though not discussing details of the computer program, gave results from a complete project. The same approach could well be taken to the music of Schubert and others, depending on the availability of suitable large databases. 8.3 Conclusions The melodic searching available with MuSearch is virtually unlimited in its specifications. However, the specifications used so far do not constitute sophisticated analysis, for which considerable complexity would be needed. As with the task of natural language translation, it is not reasonable to expect a computer program to carry out the whole of a worthwhile humanistic or artistic taskits role is instead that of an assistant. The assistance provided includes the following:
(1) once the data has been entered, the time-consuming consulting of the volumes of printed scores is not needed again;
Page 167
Figure 5. Search function skeleton for a pitch/meter/text search.
Page 168
(2) all cases searched for will be found without human error, once the program is sound; (3) the search criteria can be modified and the search re-run far more easily than in work carried out without computer assistancethis is important because often the best criteria for a given purpose can be found only after considerable experimentation; (4) output can be manipulated or reordered for the next stage of the project with little further effort. Thus, once the present shortage of large musical databases is overcome, the assistance obtainable from computer searching with graphical input and output can be expected to be substantial, enabling worthwhile research which would otherwise scarcely have been feasible. References Fecker, Adolf. Sprache und Musik: Phänomenologie der Deklamation in Oper und Lied des 19. Jahrhunderts.Hamburg: Karl Dieter Wagner, 1984. Nettheim, Nigel, "The Pulse in German Folksong: A Statistical Investigation," Musikometrika 5 (1993), 69-89. [The proof corrections were not implemented by the editora corrected copy is obtainable from the author.]
Page 169
9 Rhythmic Elements of Melodic Process in Nagauta Shamisen Music Masato Yako Kyushu Institute of Design Acoustic Design Division 4-9-1 Shiobaru, Minami-ku Fukuoka, 815 Japan
[email protected] Abstract The three-string Japanese shamisen was used in many venues over the preceding three centuries. Techniques for plucking and methods of composing were taught by example and oral tradition. Preliminary research aimed at creating a written catalogue of melodic types reveals that some rhythmic patterns are fixed while others are flexible, that reverse rhythmic patterns may overlap one another, and that some rhythmic patterns can be correlated with particular intervallic profiles.
COMPUTING IN MUSICIOLOGY 11 (1997-98), 169-184. Much of the content here is an abridged translation from Musical Acoustics Research Association Material 11/6 (992), with additions from other writings by the author. All material is used by permission.
Page 170
The shamisen is a three-string lute in use since the seventeenth century. The shamisen is often used to accompany singing, particularly of nagauta, or "long songs" performed in a formal manner. The instrument has been associated with a broad variety of social contexts including forerunners of soap opera, puppet theatre, and Buddhist narrative. In the seventeenth and eighteenth centuries the shamisen was most often used in kabuki theatre, an important venue for nagauta.In the nineteenth century, nagauta became independent of kabuki and its composition gained prestige as an independent art.
Figure 1. Japanese shamisen. The shamisen's three possible tunings are all based on the "tenth pitch" (B3) of the Japanese traditional system. Until the twentieth century vocal syllables were used to represent finger positions in shamisen notation. A wide variety of plucking techniques has evolved over time, simulating the effect of varied timbres. Nagauta shamisen melodies do not have much fluctuation in tempo. However, they may be full of latent energy. Some sections of a work may contain elaborate melodic activity, while others may be placid. Preliminary research aimed at cataloguing melodic types suggests that (1) overlapping rhythms can exist; (2) some rhythmic patterns have inflexible configurations while others seem to occur fortuitously; and
(3) some rhythmic patterns can be correlated with particular intervallic movements.
page_171
Page 171
9.1 Shamisen Music of the Edo Era In Japanese music of the Edo era (1603-1868), there are many examples of musical compositions structured such that melodically and rhythmically stereotyped configurations are combined and linked continuously. The melodic patterns which can be taken as samples of these stereotyped configurations are not specific to particular musical compositions, but they are common to genres and can be found even when they transcend the genre. In shamisen music also, much melodic material has been drawn from the melodic pattern group peculiar to nagauta, which is close to the common people. However, much has been diverted from the melodic patterns of other genres. In either case, melodic patterns generally accumulate predominantly in genres, and which melodic patterns are singled out and how they are combined is what forms the nature of a musical composition. Consequently, a broad range of knowledge regarding melodic patterns is required for the understanding and appreciation of shamisen music. Most melodic patterns do not have names. This absence reflects the fact that neither composers nor performers are conscious of melodic patterns. Since there exists no comprehensive literature covering all the melodic patterns which are latent in the genres, the scope of research regarding melodic patterns up to now has been limited. The cataloguing of melodic types could therefore be an invaluable aid to the understanding of nagauta and shamisen music. 9.2 Towards a Catalogue of Melodic Types 9.2.1 Selection of Works In this study, I have chosen ten nagauta compositions by way of provisional research into the creation of a catalogue of shamisen melodic types. Pieces from the Kojuro scores (first published in 1918) were used as a basis for the analysis. The titles are these: (1) Musume Dojoji (6) Yoshiwara Suzume (2) Oimatsu (7) Yoiwamachi (3) Kokaji (8) Tokimune (4) Tsurukame (9) Suehirogari (5) Akenokane (10) Hanami Odori
Page 172
I have sampled and classified the shamisen rhythmic patterns within these ten compositions while at the same time comparing pitch movement in the sample compositions. Direct sampling of melodic patterns is intended as a focus of future research. One advantage of performing musical analysis with computer assistance is that sampling of patterns used unconsciously by human beings becomes possible. The patterns of which players, composers, and listeners are conscious are only a small percentage of those that appear to exist in this repertory. Human interpretation is necessary to make final judgments, however. 9.2.2 Criteria for Judgment Rhythmic patterns and pitch are interrelated. Therefore, rhythmically configured parts also have melodic configurations and they must form phrases. Thus the sampling of rhythmic patterns requires that one determine whether there are melodic configurations present. In judging the segmentation of phrases, the points listed below need to be considered: (1) Is the sound directed toward the beginning and end sounds? [Koizumi postulates that a melody progressing with regularity becomes very irregular as it approaches the break in a phrase.] If the structural sounds of shamisen melody are listed in sequence from bottom to top, they will be as shown below (octave relationships are omitted):
page_173
Page 173
The core sounds of the sound structure are taken as chi and kyu, which are enharmonic readings in the tonic and dominant respectively. Koizumi maintained that nagauta was core-note dominated, in contrast to interval-dominated Western music. In his view a core note has its own "gravitational sphere." The core sound is held together throughout by a perfect fifth. However, there are many instances of the final sound of each phrase which makes up the piece overlapping with the core sounds (kyu and chi). (2) Is the shape of the rhythmic pattern suggested by the same sound being drawn out or reiterated? (3) Does the shamisen enter during a phrase break? This complicates the identification and classification of patterns. (4) Is a measure in duple meter or one filled by a rest to be taken as a phrase break? (5) Is there disjunct motion in the sample? (6) Are there instances of sukui (plucking upwards from underneath the string), hajiki (stopping the string without plucking), or the one-string open sound? The foregoing questions were first studied in the context of four-measure units. When the patterns were volatile in four measures, they were reexamined in two-measure units. 9.2.3 Phrases and Hierarchy The beat structure of traditional Japanese music is basically constructed in a hierarchical fashion. According to Koizumi (1984), beat units called before-beat and after-beat form one measure,
page_174
Page 174
and then preceding and following measures create a two-measure group (motif). Similarly the preceding and following motives create a four-measure phrase. Two such units may be further combined to form a before-stage and an after-stage. At the juncture there often seems to be a lost beat or a redundant beat. Schematically, the arrangement is shown in Figure 3.
Figure 3. Hierarchical structure of phrases. In a long melody with a succession of phrases, the tension-producing last note of a phrase appears at a point where the melody shifts to a phrase of a different length or where a different type of phrase is inserted. 9.3 Classification of Rhythmic Patterns Seven hundred kinds of rhythmic patterns were identified in the total of 5,878 measures of the ten pieces. They were identified in the ways described below. 9.3.1 Classification by Component Configuration Rhythmic patterns were grouped into 39 classifications in accordance with their composition (Figures 4a-c). 9.3.2 Classification According to Common Properties Rhythmic patterns were investigated for the common properties of their components, and cases in which overlap, connotation, and crossing occur were retrieved (Figure 5).
Page 175
Figures 4a-c. Classification of (a) four-measure, (b) two-measure, and (on the following page) (c) three-measure patterns. Al shows the same rhythmic component being repeated four times. New rhythmic components are indicated by the letters b, c, and d.
Page 176
Figure 4c.
Figure 5. These arrangements of components, when taken alone, are volatile, thus the patterns were classified according to their components' similarities. 9.3.3 Study of the Melodic Process In many passages there was rhythmic overlap, but much room for selection was left in the pitch movement inside these overlapping patterns. Here I compared individual rhythmic patterns in each composition and determined whether any rule characteristics or particular tendencies could be found.
Page 177
Page 178