5: Music Transcription in the Studio

Music Transcription in the Studio

Sebastian Ewert, Queen Mary University of London

Given a musical audio recording, the goal of music transcription is to determine a score-like representation of the piece underlying the recording. The resulting precise description of which notes are being played is a fundamental component in a semantic analysis of music as high-level music concepts including harmony and rhythm are defined on top of pitch and event information.

By making high-level information such as harmony and rhythm available, transcription has become a cornerstone in intelligent music retrieval and processing systems. However, despite significant interest, music transcription is still an unsolved problem even after several decades of research. One of the main difficulties originates from instruments such as the piano where dozens of notes can sound at the same time, which makes the analysis of such recordings a highly challenging task.

The FAST project and its focus on the music production process, however, opens up new possibilities for approaching the music transcription problem. In particular, the aim of this demonstrator is to drastically improve the accuracy of transcriptions by exploiting additional sources of information available in controlled recording conditions and integrating this information into novel types of signal models and transcriptions methods. Hereby, we employ, fuse and develop novel approaches to

  • the integration of information into signal models using tailored regularizers
  • highly robust parameter estimation in non-convex problems using methods inspired by simulated annealing
  • the extension of state-of-the-art proximal optimization methods for non-differentiable objective functions using novel sub-problem derivations
  • the identification of certain projections as dynamic programming problems and suitable integrations as proximal operator and regularizer

Ewert, Sebastian and Plumbley, Mark D. and Sandler, Mark, A Dynamic Programming Variant Of Non-Negative Matrix Deconvolution For The Transcription Of Struck String Instruments, in: Proceedings of the {IEEE} International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Brisbane, Australia, 2015.