Music alignment



Music can be described and represented in many different ways including sheet music, symbolic representations, and audio recordings. For each of these representations, there may exist different versions that correspond to the same musical work. The general goal of music alignment (sometimes also referred to as music synchronization) is to automatically link the various data streams, thus interrelating the multiple information sets related to a given musical work. More precisely, music alignment is taken to mean a procedure which, for a given position in one representation of a piece of music, determines the corresponding position within another representation. In the figure on the right, such an alignment is visualized by the red bidirectional arrows. Such synchronization results form the basis for novel interfaces that allow users to access, search, and browse musical content in a convenient way.

Basic procedure


Given two different music representations, typical music alignment approaches proceed in two steps. In the first step, the two representations are transformed into sequences of suitable features. In general, such feature representations need to find a compromise between two conflicting goals. On the one hand, features should show a large degree of robustness to variations that are to be left unconsidered for the task at hand. On the other hand, features should capture enough characteristic information to accomplish the given task. For music alignment, one often uses chroma-based features (also called chromagrams or pitch class profiles), which capture harmonic and melodic characteristics of music, while being robust to changes in timbre and instrumentation, are being used.

In the second step, the derived feature sequences have to be brought into (temporal) correspondence. To this end, techniques related to dynamic time warping (DTW) or hidden Markov models (HMMs) are used to compute an optimal alignment between two given feature sequences.

Related tasks
Music alignment and related synchronization tasks have been studied extensively within the field of music information retrieval. In the following, we give some pointers to related tasks. Depending upon the respective types of music representations, one can distinguish between various synchronization scenarios. For example, audio alignment refers to the task of temporally aligning two different audio recordings of a piece of music. Similarly, the goal of score–audio alignment is to coordinate note events given in the score representation with audio data. In the offline scenario, the two data streams to be aligned are known prior to the actual alignment. In this case, one can use global optimization procedures such as dynamic time warping (DTW) to find an optimal alignment. In general, it is harder to deal with scenarios where the data streams are to be processed online. One prominent online scenario is known as score following, where a musician is performing a piece according to a given musical score. The goal is then to identify the currently played musical events depicted in the score with high accuracy and low latency. In this scenario, the score is known as a whole in advance, but the performance is known only up to the current point in time. In this context, alignment techniques such as hidden Markov models or particle filters have been employed, where the current score position and tempo are modeled in a statistical sense. As opposed to classical DTW, such an online synchronization procedure inherently has a running time that is linear in the duration of the performed version. However, as a main disadvantage, an online strategy is very sensitive to local tempo variations and deviations from the score - once the procedure is out of sync, it is very hard to recover and return to the right track. A further online synchronization problem is known as automatic accompaniment. Having a solo part played by a musician, the task of the computer is to accompany the musician according to a given score by adjusting the tempo and other parameters in real time. Such systems were already proposed some decades ago.