Category Archives: Blog

Mixing for mundane everyday contexts

by Sean McGrath (The University of Nottingham)

Recent endeavours at the Mixed Reality Lab at Nottingham have explored the use of mixing in consumer audio. As technology becomes more pervasive, consumers now have substantially more options in their choice of listening experiences. We aim to explore the role of volume in multi-track audio and to ascertain to what degree control over audio features can benefit users in their everyday listening experiences.

1

The research is driven by the following questions:

  • Can non-optimal audio be mixed in an optimal way? How do users reason about this?
  • Do these additional controls afford additional utility and if so, how might they improve the user experience?
  • How does context define the level of control, access and utility in this scenario?

The work utilises a technology probe developed by Dr Sean McGrath at the MRL. The probe is a simple web-based application that pre-loads a series of multitrack components and randomises the order in which they are played. This enables the representation of both optimised and non-optimised mixes where repetition of tracks is possible. For instance, a listener may be presented with four distinctive instrument tracks, four identical drum tracks or some permutation between. The interface offers no way of knowing what mix is present, other than by adapting volume controls to try to ‘discover’ the configuration in place.

2

Figure 1 – Controls on the interface display current volume, master audio volume control and individual track audio controls.

The tool has been placed in a variety of different settings, at home and at work. Here we aim to mimic listening habits, relying on familiar hardware and environments, with the tool a centrepiece of this. The hybrid mix of a new tool embedded in existing setups presents context-driven problems and solutions. The focus is on adding utility and control and better understanding how we can build malleable music listening experiences for future audiences and contexts.

User feedback has placed the tool into a number of useful contexts, alluding to places where implantation of such controls would or could be beneficial. There has also been feedback about how the tool could be adapted for different contexts, such as in the car or in a shared space. The work has encompassed the social element of music listening, looking at how groups engage with the tool in socially rich settings and exploring the use of the tool in mitigating factors such as control, access and expression. The work is currently being written up as a paper.

FAST in conversation with Meinard Müller, International Audio Laboratories Erlangen

In August 2016, FAST interviewed Professor Meinard Muller, one of the partners on the FAST IMPACt project.

muller image

1. Could you please introduce yourself?

After studying mathematics and theoretical computer science at the University of Bonn, I moved to more applied research areas such as multimedia retrieval and signal processing. In particular, I have worked in audio processing, music information retrieval, and human motion analysis. These areas allow me to combine technical aspects from computer science and engineering with music – a beautiful and interdisciplinary domain. Since 2012, I hold a professorship at the International Audio Laboratories Erlangen (AudioLabs), which is a joint institution of the Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and the Fraunhofer Institute for Integrated Circuits IIS. At the AudioLabs, I chair the group on Semantic Audio Processing with a focus on music processing and music information retrieval.

  1. What is your role/work within the project?

Within the FAST project, I see my main role as bringing advice and skills from music information retrieval and expertise in semantic audio processing. For example, in collaboration with researchers from the Centre for Digital Music (C4DM), we have developed methods for music content analysis (e.g. structure analysis, source separation, vibrato detection). Also, we have been working on the development of automated procedures for linking metadata (such as symbolic score or note information) and audio content. As we are conducting joint research, another important role is to support the exchange of young researchers between the project partners. We have had such an exchange in the last years by sending students and post-docs from the AudioLabs to the Centre for Digital Music, and vice versa.

  1. What, in your opinion, makes this research project different to other research projects in the same discipline?

One main goal of the FAST project is to consider the entire chain from music production to music consumption. For example, to support automated methods for content analysis, one may exploit intermediate audio sources such as multitrack recordings or additional metadata that is generated in the production cycle. This often leads to substantial improvements in the results achievable by automated analysis methods. Considering the entire music processing pipeline makes the FAST project very special within our discipline.

  1. What are the research questions you find most inspiring within your area of study / field of work?

Personally, I am very interested in processing audio and music signals with regard to semantically or musically relevant patterns. Such patterns may relate to the rhythm, the tempo, or the beat of music. Or one may aim at finding and understanding harmonic or melodic patterns, certain motives, themes, or loops. Other patterns may relate to a certain timbre, instrumentation, or playing style (involving, for example, vibrato or certain ornaments). Obviously, music is extremely versatile and rich. As a result, musical objects (for example, two music recordings), although similar from a structural or semantic point of view, may reveal significant differences. Understanding these differences and identifying semantic relations despite these differences by means of automated methods are what I find inspiring and challenging research issues. These issues can be studied within music processing, but their relevance goes far beyond the music scenario.

  1. What can academic research in music bring to society?

Music is a vital part of nearly every person’s life on this planet. Furthermore, musical creations and performances are amongst the most complex cultural artifacts we have as a society. Academic research in music can help us to preserve and to make our musical heritage more accessible. In particular, due to the digital revolution in music distribution and storage, we need the help of automated methods to manage, browse, and understand musical content in all its different facets.

  1. Please tell me why do you find valuable/ exciting / inspiring to do academic research related to music.

 As already mentioned before, music is an outstanding example for studying general principles that apply to a wide range of multimedia data. First, music is a content type with many different representations, including audio recordings, symbolic scores, video material as provided by YouTube, and vast amounts of music-related metadata. Furthermore, music is rich in content and form, comprising different genres and styles – from simple, unaccompanied folk songs, to popular and jazz music, to symphonies for full orchestras. There are many different musical aspects to be considered such as rhythm, melody, harmony, timbre – just to name a few. And finally, there is also the emotional dimension of music. All these different aspects make the processing of music-related data exciting.

  1. What are you working on at the moment?

My recent research interests include music processing, music information retrieval, and audio signal processing. In the last years, I have been working on various questions related to computer-assisted audio segmentation, structure analysis, music analysis, and audio source separation. In my research, I am interested in developing general strategies that exploit additional information such as sheet music or genre-specific knowledge. We develop and test the relevance of our strategies within particular case studies in collaboration with music experts. For example, at the moment, we are involved in projects that deal with the harmonic analysis of Wagner’s operas, the retrieval of Jazz solos, the decomposition of electronic dance music, and separation of Georgian chant music. By considering different application scenarios, we study how general signal processing and pattern matching methods can be adapted to cope with the wide range of signal characteristics encountered in music.

  1. Which is the area of your practice you enjoy the most?

Besides my work in music processing, I very much enjoy the combination of doing research and teaching. I am convinced that music processing serves as a beautiful and instructive application scenario for teaching general concepts on data representations and algorithms. In my experience as a lecturer in computer science and engineering, starting a lecture with music processing applications – in particularly playing music to students opens them up and raises their interest. This makes it much easier to get the students engaged with the mathematical theory and technical details. Mixing theory and practice by immediately applying algorithms to concrete music processing tasks helps to develop the necessary intuition behind the abstract concepts and awakens the student’s fascination for the topic. My enthusiasm in research and teaching has also resulted in a recent textbook titled “Fundamentals of Music Processing” (Springer, 2015, www.music-processing.de), which also reflects some of my research interests.

  1. What is it that inspires you?

Because of the diversity and richness of music, music processing and music information retrieval are interdisciplinary research areas which are related to various disciplines including signal processing, information retrieval, machine learning, multimedia engineering, library science, musicology, and digital humanities. Bringing together researchers and students from a multitude of different fields is what makes our community so special. Working together with colleagues and students who love what they do (in particular, we all love data we are dealing with) is what inspires me.

Contact Details:
Prof. Dr. Meinard Müller
Lehrstuhl für Semantische Audiosignalverarbeitung
International Audio Laboratories Erlangen
Friedrich-Alexander Universität Erlangen-Nürnberg
Am Wolfsmantel 33
91058 Erlangen, Germany
Email: meinard.mueller@audiolabs-erlangen.de

Reference:
Müller, Meinard, Fundamentals of Music Processing, Audio, Analysis, Algorithms, Applications, 483 p., 249 illus., 30 illus. in color, hardcover, ISBN: 978-3-319-21944-8, Springer 2015. www.music-processing.de

FAST represented at the Audio Mostly 2016 conference

by Sean McGrath (The University of Nottingham)

This week, FAST were well represented at Audio Mostly, a conference on interaction with sound in cooperation with ACM. In its tenth year anniversary, the conference returned to Sweden. The conference was hosted in the beautiful city of Norrköping at the iconic Visualization Center. It ran over three days, with talks on all matter of subjects from walking, playing, producing and consuming sound in a wide range of settings.

FAST members presented four pieces of work, discussing a range of audio related work from production to consumption. The work explored the role of dynamic music, performance and composition tools and the role of social media in production. Paper titles were as follows:

Creating, Visualizing, and Analyzing Dynamic Music Objects in the Browser with the Dymo Designer, Florian Thalmann, György Fazekas, Geraint A. Wiggins, Mark B. Sandler (Centre for Digital Music, Queen Mary University of London)

^muzicode$: Composing and Performing Musical Codes, Chris Greenhalgh, Steve Benford, Adrian Hazzard (The Mixed Reality Lab, The University of Nottingham)

Making Music Together: An Exploration of Amateur and Pro-Am Grime Music Production, Sean McGrath, Alan Chamberlain, Steve Benford (The Mixed Reality Lab, The University of Nottingham)

The Grime Scene: Social Media, Music, Creation and Consumption, Sean McGrath, Alan Chamberlain, Steve Benford (The Mixed Reality Lab, The University of Nottingham)

It was a pleasure to visit such a beautiful city. We would like to take this opportunity to thank those involved in organising the conference. The presentations were informative and the opportunity to network and discuss ongoing work in the area was wonderful.

To blockchain, or not to blockchain?

by Panos Kudumakis, Centre for Digital Music, Queen Mary University of London

MixRightsThe US Digital Millennium Copyright Act & the EU Electronic Commerce Directive aimed to revive the music industry; however, they are currently under revision, with respect to: a) what changes are needed to guarantee fair and increased revenues returned to artists and rights holders; and, b) how these changes would result in improved standards for multi-territory licensing, timely payments, and overall more transparency.

In meantime, several key artists and musicians have turned their hopes for resolving these issues to technology and in particular, towards blockchain. Blockchain emerged in 2008 as the technology that underpins bitcoin. It operates as a shared ledger, which continuously records transactions or information. Its database structure, where there is a timestamp on each entry and information linking it to previous blocks, makes it not only transparent, but exceptionally difficult to tamper with.

Initiatives investigating blockchain have been launched on both sides of the Atlantic. In the USA, Open Music Initiative (OMI) has been launched by Berklee Institute for Creative Entrepreneurship, harnessing the MIT Media Lab’s expertise in decentralized platforms, whose mission is: to promote and advance the development of open source standards and innovation related to music, and to help assure proper compensation for all creators, performers and rights holders of music. It is worth to mention that OMI focus, wisely, set on a) new works rather than the vast legacy music catalogue, with the aim, that the same principles can be applied to the legacy music, retrospectively; and, b) on achieving interoperability among infrastructures, databases and systems so to be accessed, shared and exchanged by all stakeholders.

In Europe, one of blockchain’s evangelists is the Grammy Award-winning UK singer, songwriter and producer Imogen Heap. She has launched a blockchain project, Mycelia. Although still in its foundational stages, she intends it to be an entire eco-system that utilises blockchain as a way to enact a complete shake up in the music industry. Mycelia’s mission is to: a) empower a fair, sustainable and vibrant music industry ecosystem involving all online music interaction services; b) unlock the huge potential for creators and their music related metadata so an entirely new commercial marketplace may flourish; c) ensure all involved are paid and acknowledged fully; d) set commercial, ethical and technical standards in order to exponentially increase innovation for the music services of the future; and, e) connect the dots with all those involved in this shift from our current outdated music industry models, exploring new technological solutions to enliven and positively impact the music ecosystem.

However, blockchain is not quite ready yet and that is its dirty secret. Much as the enthusiasm is growing, it is likely to be several years before we see blockchain rolled out in a wide-scale, mainstream capacity.

In this section a brief overview of the components needed for a fair trade music ecosystem, beyond blockchain, are described. MixRights: Fair Trade Music Ecosystem recently presented at Interactive Music Hack-Fest in London, 11 June 2016, Sonar+D in Barcelona, 16-18 June 2016 and Mycelia Weekend in London, 8-10 July 2016. It features the following components and it is a mature test-bed for blockchain integration and experimentation.

  • Identification is a fundamental component of any music trade system. A song identifier can be random so long as it can also be discovered by alternate IDs such as ISRC and/or ISWC. MPEG-21 Digital Item Identification provides a simple and extensible way for facilitating alternate IDs through the elements: a) Identifier; and, b) RelatedIdentifier;
  • IM AF/ISO BMFF/STEMS/HTML5 editor/player for collaborative music creation & remixing, karaoke & chords, tagging & sharing in social nets & counting …. music citations!;
  • MVCO Extensions on Time-Segments and Multi-Track Audio has reached the stage of PDAM at 115th MPEG Meeting, Geneva (CH), 30 May – 3 June 2016. It facilitates transparent IP rights management even when content reuse is involved with respect to permissions, obligations and prohibitions. It enables music navigation based on IP rights and … co-author graphs!;
  •  DASH streaming of IM AF is further enabling radio producers and DJs to schedule playlists for streaming to their radio stations and clubs, respectively, and perform live mixing for their audience. Thanks to MVCO artists could be paid straightaway, while they could even be notified when their tracks are scheduled for streaming, thus, enabling artists/fans interaction;
  • Monetisation via Express Play and/or blockchain.

For further info please visit MPEG Developments.

Generating webpages for digital objects based on linked data

by Matthew Wilcoxson, Oxford e-Research Centre, University of Oxford

We are exploring the idea of a Performance Digital Music Object (DMO) – an object of this nature would contain information associated with a particular performance or group of performances taking place in a single event. As an example, it could be used as a souvenir gift which you may receive after a live concert containing (or linking to) the recording, analysis, video and social media reactions of the event.

An initial aim of this work is to have a DMO which only contains a list of linked datasources not the data itself. When accessed it would request the data and generate a human readable view of this data. Initial efforts centre around automatically creating webpages based on linked data and a minimal specification of how that data would be best displayed.

Early prototypes of this webpage generation use a SPARQL server. Data is requested by queries which return the main entity’s literal objects, its linked entities and other entities which are directly linked to it. By utilising the additional information within the connected entities we can create a practical enhanced view of the main entity. For example, when viewing an entity representing a music group we can enhance it with the group’s events, members, etc.

To represent this data in an engaging way suitable view templates are selected for different parts of the whole view by matching a combination of the main entity’s type and the predicate and type (or type hierarchies) of the linked entities. By utilising entity type hierarchies we should be able to avoid specifying all views for all entities as parent entity views should be compatible with child entities. For example, a view template that was designed for an Agent (e.g. https://www.w3.org/ns/prov#Agent) could also be used for a Person (e.g. http://xmlns.com/foaf/0.1/Person). However, in this case unique predicates of children entities would not be shown.

A further aim is to receive and display real-time updates, for example reading automatic music transcriptions (see https://www.semanticaudio.ac.uk/blog/can-a-computer-tell-me-what-notes-i-play-music-transcription-in-the-studio/) generated during an ongoing live performance. This work is still to be done but it is expected to work in a similar way to how static data is retrieved and displayed but this system will poll a SPARQL endpoint for updates. If some processing of the raw data is needed this is assumed to have already happened before passing through the SPARQL endpoint.

Our datasource currently comes from Annalist which is a generic data store created by Graham Klyne and produces JSON-LD (See https://www.semanticaudio.ac.uk/blog/linked-data-descriptions-of-live-performances/ ) . The data is indexed into our SPARQL server (currently Apache Jena’s Fuseki) and queried from there via a NodeJS server. Final versions should be generic enough to use any SPARQL endpoint.

With future research it should be possible to create a new view template ontology and separate view template specifications meaning specifications could be located anywhere on the internet and requested and reused in the same way one might request data from any datasource.

FAST in conversation with Enrique Perez-Gonzalez, Solid State Logic

Solid State Logic is a high-end manufacturer of audio products for the broadcast, music and live market. They manufacture anything from modular bus compressors for the music studio industry to huge audio over IP networked mixing systems for large scale broadcast facilities.

FAST interviewed Enrique Perez-Gonzalez, one of FAST project’s participants, in August 2016.

participants_0009_enrique

Enrique Perez-Gonzalez, Solid State Logic

1. Could you please introduce yourself?

My name is Enrique Perez-Gonzalez and I am the Chief Technology Officer for Solid State Logic. Music and audio have been my passion since I was a little kid. In my current role at SSL, I am in charge of all the R&D and Engineering teams. I am also in charge of the Technology vision and strategy which underpins all research and development for SSL. I have a PHD from Centre for Digital Music, QMUL, and have worked in the audio/music industry since I was very young.  I believe the mixture of a solid industrial and academic background can be of use to the FAST project. I hope SSL can help bridge the gap between industry and academia.

2. What’s your role within the FAST project?

My role is to advise and guide the project, so that the resulting research is useful and deployable to the audio, music and entertainment industry. The type of technology researched in the project has direct application in the creative industry and has the potential of having very high impact in the way people interact with music and content. I am keen to make sure that the FAST research conducted has solid industrial application and becomes deployable in real life applications. As a representative of SSL, I am happy to offer guidance on current trends in the industry and give market advice. SSL can also provide the FAST team with access to state of the art recording, broadcasting and live facilities and equipment. SSL can also provide access to experts in the fields of music recording, broadcasting and live mixing.

3. Which would you say are the most exciting areas of research in music currently?

Research on workflow is something SSL is always interested (how why people do things and understanding why they do it in order to offer improved workflows). Other areas are also interesting: object based composition/mixing, multi-format delivery of content, remote production, spatial audio, audio over IP.

4. What, in your opinion, makes this research project different to other research projects in the same discipline?

The quality and expertise of the research team is impressive in its own right. The fact that this is a true multidisciplinary research that brings a holistic approach to music research makes this project unique. I find very interesting how the FAST project approaches the study of music from its creation all the way to the consumption of it. The research currently done by the FAST team does not only operate on an academic angle, it also brings engendering and scientific disciplines with the help social research experts, musicians and industry experts. I believe the multidisciplinary approach currently being used in the FAST project will be able to shed some light on the complexities of music creation and lead to a better understanding of the interdependencies of the artistic, technological and creative tension which results in the content creation of music.

5. What are the research questions you find most inspiring within your field of work?

I find two areas of research of the FAST project particularly interesting from the point of view of my field of work. The first one has to do with the understanding of workflow and how music is created, produced, delivered and consumed. I think that in order to develop unique, efficient and useful tools for music we need to understand why musicians, composers and content creation creative people do things the way they do. This will lead to a better understanding of how and why musicians, content creators and consumers interact with music. The second research question I find particularly fascinating is the idea of solving and understating the challenges in the current distribution mechanisms and user consumption of music. Where does currently music content need to be produced  over and over for multiple formats and media in order to satisfy the varied and sophisticated consumption needs of music users? Also, the current need to distribute music through social media makes it harder to produce, licence and monetise. It is the true understanding of this workflow that I find fascinating. In my view it is the study of this start to end workflow that will lead to the development of better and more useful tools that will push the boundaries of the creative and music industry. In that sense, I find the FAST project research on semantic technologies one of the strongest contenders in solving the distribution and licensing challenges that the current music and entertainment industry currently suffers from. The concept of music objects and the use of sematic web tools are promising technologies that could be used towards a more efficient workflow of music.

6. What, in your opinion, is the value of the connections established between the project research and the industry?

In my opinion this connection is invaluable. Where most academic research fails is in delivering results that have a direct application in industry and have a direct connection with real life problems. Currently, significant areas of the music/audio industry are in a state where most of the resources are deployed to the development of products and little is left to long-term research. Industry desperately needs to get answers to deeper research questions, such as, why music creators and consumers do things the way they do. This will benefit the industry directly because manufacturers such as us need this information in order to create better tools and workflows for our costumers. I am convinced that the links that FAST has with Industry will have a direct impact on the future developments of tools and products which we will see in industry use in the next few years to come.

7. What can academic research in music bring to society?

Society needs music as a form of expression and entertainment. Music is a force of social change and can enhances positive development models for society. Academic research is capable of improving accessibility to music. How society interact with music from the start of the creative process all the way to the consumption of it is a invaluable contribution that academic research will bring to society. In particular the FAST project will deliver better understanding of the music creation process, this will enable the development of better and easer to use tools that will have a direct impact for music composers of all levels and will improve general access to music creation for society in general. In my mind it is academic research which will produce rigorous enough results which will allow industry to take inform decisions which will influence the developments of the next generation of tools for music creation.

9. Why did you choose this area of field of industry?

For me, researching and developing technology for the audio industry has the perfect balance of creativity, science and engineering. The tension generated between the creative forces and the limitations of technology is what challenges me and keeps me motivated to develop the best possible tools and products for musician and audio content creators.

10. What are you working on at the moment?

We are always working on new ways to improve and enhance the ways the way people work. Currently, we are developing a new generation of audio over IP large scale mixing consoles. The System has been design to deal with multiple formats for simultaneous delivery of material over multiple formats. The new systems we are developing are metadata ready, so they can deal with object based workflows. We are also working on a series of products for the recording music industry that aim to improve user workflow while enabling then to use them the latest available technology with ease.

11. Which is the area of your practice you enjoy the most?

The design of new products is always rewarding. It’s usually signifies the cohesion of several years of research and development which combines state of the art electronics and signal processing research together with clever workflow streamlining for enhancing the way musicians and audio operators work and perform. Developing technology that is a game changer for the creative and entertainment industries is what inspires me most. I always find it amazing when I see that creative individuals (musicians, content creators, mixing engineers…) have the ability to take new technologies and workflows and push it to extreme.

Music making as a process, exploring the complexities of music making in a digital-physical space

by Sean McGrath, Mixed Reality Lab, Nottingham University

Recent work at the Mixed Reality Lab, Nottingham University has begun to explore and unpack the work of music producers and performers with a focus on their attitudes, working practices and use of tools and technology. We explore a complex set of interactions between people and technology in order to facilitate the production and dissemination of audio content in an evolving digital climate. The work extends from early production through to mixing, mastering, distribution and consumption. We explore how individuals in this space curate and collect content, with a view to reuse in the future, their roles, agendas and goals and the use of technology within this space.

1

Image 1. The definition of a studio environment changes as bedroom producers now have access to a range of tools

We also explore emerging technology, how technology is affecting practice and ways in which technology might be able to facilitate the work that people do in the future. Finally, we explore technological issues that pertain to music production and dissemination in its current state and implications for design for future applications and contexts. Some of the contexts that our work focuses on include:

  • Amateur producers
  • Pro-Amateur producers
  • Professional “at work” producers
  • Communities of artists (grime, hip-hop)
  • Mobile modular music making
  • The studio space, what this means and how its meaning is quickly changing
  • Distributed music production
  • The role of social media in music making

Much of our work has been about untangling the complexities of music production in a shifting sociotechnical environment. Metadata has emerged as a particularly interesting feature of the engagements with artists and communities of artists. This metadata pertains to the types of things that people might want to know about how a track is composed. This type of metadata ranges from the mundane (bpm, pitch, key) to more contextually rich information about how a track was recorded, locative data and the arrangement of technology within a space. Though many DAWs embed metadata in a number of ways, grouping according to associated themes, in particular spaces or colour coding, there is much work to be done in this space. We must take what we have learned from these engagements about how people work and what people do and try to apply these lessons in future production technologies.

Image 2. A digital audio workstation containing a range of contextually relevant metadata

Our work focuses on a community of artists with varying levels of technical and technological skill. We explore their roles, working patterns, behaviours and apply the particular lens of social media to investigate their activities and intentions within this space. This work will be presented at the 2nd AES Workshop on Intelligent Music Production on 13 Sep 2016 as a poster. We will also be presenting both a long paper on production practice and a poster on the social media aspect of the work at Audio Mostly 2016 on October the 6th in Sweden.

Making MIR usable: How can we trust our computer-generated musical analysis?

by Elio Quinton, Centre for Digital Music, Queen Mary University of London

Research in the field of Music Information Retrieval (MIR) aims at developing methods and computational tools to automatically analyse the musical attributes of an audio recording. It could typically consist in extracting chords, tempo or the structural segmentation (e.g. verse, chorus, bridge etc.) of a piece. These tools can then be deployed at a scale that would not be achievable by human beings: it takes a human at the very least the duration of a piece to be able to listen to it in its entirety and analyse it whereas computer systems are fast enough to analyse dozens of tracks in the same time frame. As a result, enormous music collections can be analysed in just a couple of days or weeks.

The recorded music industry is currently going through a period of deep changes in its organisation and business models. Digital music providers, whether it is streaming or download, are no longer just providers of audio content, but strive to deliver a compelling experience centred on music to their customers. In order to achieve this goal, Digital music providers have deployed MIR-powered musical analysis on their music collections, typically totalling dozens of millions of tracks, as they regard musicological metadata as a useful asset. Say for instance that a given user tends to have a preference for songs in a minor key; a discovery playlist themed around songs in minor keys could then be tailored specifically for this user.

However, it is clear that such a system can only be successful if the musicological metadata (i.e. the chords, tempo, segmentation etc.) is correct: building a consumer facing system based on erroneous MIR data is doomed to failure. Despite very good performance, the current state of the art algorithm do not exhibit 100% success rate, which means that they will inevitable produce erroneous outputs. Given the subjective and ambiguous nature of music it is even very unlikely that achieving a 100% accuracy would ever be possible. Nevertheless, it does not mean that MIR-powered musical analysis is doomed to be unusable because of its (partial) unreliability. As with any automated system, inconsistency is a real handicap, but a certain degree of inaccuracy can be dealt with, provided that it features some form of consistency and that there exist means of assessing the inaccuracy. MIR feature extraction systems do not always provide a confidence value alongside with the musical estimate, so that one has to blindly rely on the output produced, knowing that this assumption will be wrong in some instances.

In this work we propose a method to predict the reliability of MIR feature extraction independently of the extraction itself, so that potential failures can be handled. As a result MIR-powered musical analysis is made safely usable in larger systems. For instance, let us consider a hypothetical consumer-facing scenario in which a mobile app requires a tempo estimate to deliver a compelling experience to the user. Having a reliability value attached to tempo estimates enables the app to chose whether to use this data or not. Only tempi with high reliability value may be used to generate behaviours presented to the user (Fig.1).

Elio image

Fig.1 Prediction of the reliability of feature extraction

Now, how does such a prediction system work? The body of research MIR carried out since its infancy has allowed researchers to identify properties of the music signal that are challenging for MIR algorithms. In other words, when a track exhibit such properties, it is very likely that the feature estimation will fail. Our method consists in measuring these attributes and using them to produce a reliability estimate. Let us illustrate this process with an analogy with road driving. Let’s assume the task is to drive a car as fast as possible without crashing. The experience of drivers, car manufacturers and more generally the laws of physics clearly suggest that a much higher top speed is reachable on a modern motorway than on a muddy track in the woods (Fig. 2). Therefore, an estimate of the maximum top speed achievable can be produced by observing the track/road on which a car is to be driven, without the need for a test drive.

screenshot 2

Fig. 2 Check the track: analogy with road type vs. top speed

In short, our method assess whether the music recording under analysis looks more like a motorway (high reliability), a narrow muddy track (low reliability), or anything in between. This information is then used to produce a reliability estimate for the corresponding MIR feature (e.g. tempo).

Find out more about the technical details in our publication:

E.Quinton, M.Sandler, and S.Dixon, “Estimation of the Reliability of Multiple Rhythm Features Extraction from a Single Descriptor.” IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2016.

Can a computer tell me what notes I play? – Music Transcription in the Studio

Using a computer to detect which notes have been played in an audio recording of a piece of music is commonly referred to as automatic music transcription. In the context of understanding the music behind a recording, transcription techniques have often been called a key technology as concepts such as chords, harmony and rhythm are musically defined based on notes. Given the importance of this foundational technology, researchers have started working on automatic music transcription since computers became more widely available, with early approaches dating back to the 1970s. However, despite considerable interest during the next decades, the general transcription problem withstood all attempts to find a final solution to its challenges, holding back many interesting applications.

Given its musical importance, transcription has been a central component in the FAST project from the start. A major aim of the FAST project is to explore how signal processing and machine learning methods, which includes approaches for music transcription, can not only incrementally but substantially be improved by exploiting knowledge of the music production process. In this context, the FAST team has analyzed, when and why current transcription methods typically fail, from a musical, acoustical, statistical and numerical point of view, and how structured information about the recording process could be useful in this context.

spectrum

Figure 1. Time-frequency representation of a single note played on a piano

Figure 1 illustrates one such problem. Here, we can see a so called time-frequency representation of a single note played on a piano, which is a representation typically used to describe when a frequency is active in the recording and how strong it is. Here, we can see several lines or harmonics, which represent the strongest frequency components of the note and all together specify how the note sounds. However, although this is a single note, we can see that some harmonics decay more quickly than others, some vanish in between and reappear a little bit later, and at the beginning of the note almost all frequencies are more or less active. If the intensity of frequencies changes over time like this, a sound is often called non-stationary. This inner-note non-stationarity has usually not been considered in automatic music transcription because it would require a high level of detail in a computational model of the sound. For mathematical reasons, this level of detail would usually make a robust estimation of the most important parameters extremely difficult – figuratively speaking, with that much detail, it becomes difficult for an algorithm to see the wood for the trees.

After identifying such issues, we proposed and implemented a novel sound model that can take this level of detail into account to better identify which notes are active in a given recording. The foundation here was that in controlled recording conditions, we know, first, which type of instrument is playing and second, that we can obtain examples of single notes for that instrument. This way, we could focus on a specific instrument class (pitched percussive instruments such as the piano), which enabled us to increase the level of detail such that it not only became possible to model the interaction of the notes but additionally how we expect the notes to change over time. The result was a first sound model for music transcription that was capable of modelling highly non-stationary note sound-objects of variable length.

With this level of detail, a sound model contains many parameters, which we need to set correctly for the model to work as expected. From a mathematical point of view, this is quite difficult – i.e. it was not clear how we can find correct values for the parameters in our model. However, we developed a new parameter estimation method based on a mathematical framework called Alternating Directions Method of Multiplier (ADMM). This framework provides a lot of flexibility and enabled us to design various so called regularizers, which stabilize the parameter estimation process and make sure that we find meaningful values.

Overall, with the new sound model and parameter estimation methods for high detail sound modelling in place, we found our results to exceed the current state-of-the-art by far, by reducing error rates on often used, real acoustical recordings considerably, with previous error rates often being several times higher than ours. This considerable improvement demonstrates that additional information available from the production process can be translated into more detailed models while still being able to robustly find the parameters needed. The new method will enable a variety of new developments within the FAST project in the future.

Our method will be published soon in the IEEE/ACM Transcriptions on Audio, Speech and Language Processing: https://bit.ly/2asgHDf

Ethnographic Studies of Studio Based Music Production

by Glenn McGarry, PhD student, Mixed Reality Laboratory, University of Nottingham

‘Design Ethnography’ is a feature of the FAST project that is helping to drive technological developments within. These types of ethnographic studies seek to ‘get inside’ the work of a setting to gain first-hand knowledge of how work is accomplished through real-world, real-time organisation. Practically, this is done through direct observational fieldwork that is then analysed to inform the design of novel technical solutions.

Two such ethnographies of ‘traditional’ studio based music production activities (by University of Nottingham researchers) have recently been used by FAST project developers (from QMUL, Oxford University, and Birmingham University), to inform the design of a software demonstrator (scheduled for demo in December 2016). The demonstrator aims to show how labelling of audio in DAW software (Digital Audio Workstation) could be supported through intelligent instrument recognition. In this blog post, I give some background on how this concept came about via a series of FAST design workshops that reflected on scenarios derived from the studies .

GlennThe first of the studies observed a rock band’s recording session. This involved a recording engineer capturing multi-track recordings of real-time band performances in a professionally equipped studio. The aim of the session was not to come away with a finished product, but for the engineer to take away multi-track audio that he could later edit and mix.

The second study observed another engineer creating a ‘pre-mix’. This involved the handling of multi-track audio taken away from a previous recording session, similar to the one in the first study. The engineer’s job was to create a mix that closely resembled the final product, for various stakeholders in the project to evaluate (musicians, record companies, investors etc.).

In keeping with the FAST project aims, the ethnographic analyses of the studies included observations on the creation and use of metadata. This potentially opens up new design prospects for metadata-driven tools to add value to music objects and support production. In our studies the metadata was in the form of labelling applied to the recording equipment and software, to signify the presence of audio (both analogue and digital).

In the recording study, the engineer labelled the recording console channels and the DAW software, to indicate a sound source’s signal path (e.g. guitar microphone) and groupings of audio by instrument. This helped organise the studio space and to aid location and interaction with recording equipment controls. He also reasoned that the digital labelling in the DAW was adequate as a substitute for notes for transferring session data to the mixing stage of the process.

In the pre-mix study, similar such labelling was transferred and used alongside the DAW session data, but not without issues. The engineer had to significantly reorganise the audio in the DAW before embarking on the new task in hand. For example, legacy audio objects, such as headphone sub-mixes used to aid musicians’ performances, were not needed and so removed. He also regrouped audio tracks by instrument, rearranging them according to his preference, and sought out tracks hidden by the recording engineer that were causing problems.

Issues of coordination between production stages were highlighted by these studies. In particular, the mix engineer’s unpicking of the recording engineer’s work before him and reworking of handed over resources was a significant overhead in work. Nevertheless, in a “what if” scenario where labelling was incomplete, absent, or stripped out by incompatible technologies, the overhead in work would have arguably been much greater.

The proposed demonstrator then aims to be the first step in generating and refining the utility of metadata in support of the production process. Automatic labelling through instrument recognition alone is perhaps not sufficient to completely transform production practice. Nonetheless it is a start in a direction that promises to at least find some efficiency gains and smooth the hand-over between production stages. The design prospects from the results of our ethnographic studies do not stop here of course and I will be doing more studies in the area of studio based production that will contribute to future FAST project impact.