Author Archives: admin

Music making as a process, exploring the complexities of music making in a digital-physical space

by Sean McGrath, Mixed Reality Lab, Nottingham University

Recent work at the Mixed Reality Lab, Nottingham University has begun to explore and unpack the work of music producers and performers with a focus on their attitudes, working practices and use of tools and technology. We explore a complex set of interactions between people and technology in order to facilitate the production and dissemination of audio content in an evolving digital climate. The work extends from early production through to mixing, mastering, distribution and consumption. We explore how individuals in this space curate and collect content, with a view to reuse in the future, their roles, agendas and goals and the use of technology within this space.

Image 1. The definition of a studio environment changes as bedroom producers now have access to a range of tools

We also explore emerging technology, how technology is affecting practice and ways in which technology might be able to facilitate the work that people do in the future. Finally, we explore technological issues that pertain to music production and dissemination in its current state and implications for design for future applications and contexts. Some of the contexts that our work focuses on include:

Amateur producers
Pro-Amateur producers
Professional “at work” producers
Communities of artists (grime, hip-hop)
Mobile modular music making
The studio space, what this means and how its meaning is quickly changing
Distributed music production
The role of social media in music making

Much of our work has been about untangling the complexities of music production in a shifting sociotechnical environment. Metadata has emerged as a particularly interesting feature of the engagements with artists and communities of artists. This metadata pertains to the types of things that people might want to know about how a track is composed. This type of metadata ranges from the mundane (bpm, pitch, key) to more contextually rich information about how a track was recorded, locative data and the arrangement of technology within a space. Though many DAWs embed metadata in a number of ways, grouping according to associated themes, in particular spaces or colour coding, there is much work to be done in this space. We must take what we have learned from these engagements about how people work and what people do and try to apply these lessons in future production technologies.

Image 2. A digital audio workstation containing a range of contextually relevant metadata

Our work focuses on a community of artists with varying levels of technical and technological skill. We explore their roles, working patterns, behaviours and apply the particular lens of social media to investigate their activities and intentions within this space. This work will be presented at the 2nd AES Workshop on Intelligent Music Production on 13 Sep 2016 as a poster. We will also be presenting both a long paper on production practice and a poster on the social media aspect of the work at Audio Mostly 2016 on October the 6^th in Sweden.

Making MIR usable: How can we trust our computer-generated musical analysis?

by Elio Quinton, Centre for Digital Music, Queen Mary University of London

Research in the field of Music Information Retrieval (MIR) aims at developing methods and computational tools to automatically analyse the musical attributes of an audio recording. It could typically consist in extracting chords, tempo or the structural segmentation (e.g. verse, chorus, bridge etc.) of a piece. These tools can then be deployed at a scale that would not be achievable by human beings: it takes a human at the very least the duration of a piece to be able to listen to it in its entirety and analyse it whereas computer systems are fast enough to analyse dozens of tracks in the same time frame. As a result, enormous music collections can be analysed in just a couple of days or weeks.

The recorded music industry is currently going through a period of deep changes in its organisation and business models. Digital music providers, whether it is streaming or download, are no longer just providers of audio content, but strive to deliver a compelling experience centred on music to their customers. In order to achieve this goal, Digital music providers have deployed MIR-powered musical analysis on their music collections, typically totalling dozens of millions of tracks, as they regard musicological metadata as a useful asset. Say for instance that a given user tends to have a preference for songs in a minor key; a discovery playlist themed around songs in minor keys could then be tailored specifically for this user.

However, it is clear that such a system can only be successful if the musicological metadata (i.e. the chords, tempo, segmentation etc.) is correct: building a consumer facing system based on erroneous MIR data is doomed to failure. Despite very good performance, the current state of the art algorithm do not exhibit 100% success rate, which means that they will inevitable produce erroneous outputs. Given the subjective and ambiguous nature of music it is even very unlikely that achieving a 100% accuracy would ever be possible. Nevertheless, it does not mean that MIR-powered musical analysis is doomed to be unusable because of its (partial) unreliability. As with any automated system, inconsistency is a real handicap, but a certain degree of inaccuracy can be dealt with, provided that it features some form of consistency and that there exist means of assessing the inaccuracy. MIR feature extraction systems do not always provide a confidence value alongside with the musical estimate, so that one has to blindly rely on the output produced, knowing that this assumption will be wrong in some instances.

In this work we propose a method to predict the reliability of MIR feature extraction independently of the extraction itself, so that potential failures can be handled. As a result MIR-powered musical analysis is made safely usable in larger systems. For instance, let us consider a hypothetical consumer-facing scenario in which a mobile app requires a tempo estimate to deliver a compelling experience to the user. Having a reliability value attached to tempo estimates enables the app to chose whether to use this data or not. Only tempi with high reliability value may be used to generate behaviours presented to the user (Fig.1).

Fig.1 Prediction of the reliability of feature extraction

Now, how does such a prediction system work? The body of research MIR carried out since its infancy has allowed researchers to identify properties of the music signal that are challenging for MIR algorithms. In other words, when a track exhibit such properties, it is very likely that the feature estimation will fail. Our method consists in measuring these attributes and using them to produce a reliability estimate. Let us illustrate this process with an analogy with road driving. Let’s assume the task is to drive a car as fast as possible without crashing. The experience of drivers, car manufacturers and more generally the laws of physics clearly suggest that a much higher top speed is reachable on a modern motorway than on a muddy track in the woods (Fig. 2). Therefore, an estimate of the maximum top speed achievable can be produced by observing the track/road on which a car is to be driven, without the need for a test drive.

Fig. 2 Check the track: analogy with road type vs. top speed

In short, our method assess whether the music recording under analysis looks more like a motorway (high reliability), a narrow muddy track (low reliability), or anything in between. This information is then used to produce a reliability estimate for the corresponding MIR feature (e.g. tempo).

Find out more about the technical details in our publication:

E.Quinton, M.Sandler, and S.Dixon, “Estimation of the Reliability of Multiple Rhythm Features Extraction from a Single Descriptor.” IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2016.

Can a computer tell me what notes I play? – Music Transcription in the Studio

Using a computer to detect which notes have been played in an audio recording of a piece of music is commonly referred to as automatic music transcription. In the context of understanding the music behind a recording, transcription techniques have often been called a key technology as concepts such as chords, harmony and rhythm are musically defined based on notes. Given the importance of this foundational technology, researchers have started working on automatic music transcription since computers became more widely available, with early approaches dating back to the 1970s. However, despite considerable interest during the next decades, the general transcription problem withstood all attempts to find a final solution to its challenges, holding back many interesting applications.

Given its musical importance, transcription has been a central component in the FAST project from the start. A major aim of the FAST project is to explore how signal processing and machine learning methods, which includes approaches for music transcription, can not only incrementally but substantially be improved by exploiting knowledge of the music production process. In this context, the FAST team has analyzed, when and why current transcription methods typically fail, from a musical, acoustical, statistical and numerical point of view, and how structured information about the recording process could be useful in this context.

Figure 1. Time-frequency representation of a single note played on a piano

Figure 1 illustrates one such problem. Here, we can see a so called time-frequency representation of a single note played on a piano, which is a representation typically used to describe when a frequency is active in the recording and how strong it is. Here, we can see several lines or harmonics, which represent the strongest frequency components of the note and all together specify how the note sounds. However, although this is a single note, we can see that some harmonics decay more quickly than others, some vanish in between and reappear a little bit later, and at the beginning of the note almost all frequencies are more or less active. If the intensity of frequencies changes over time like this, a sound is often called non-stationary. This inner-note non-stationarity has usually not been considered in automatic music transcription because it would require a high level of detail in a computational model of the sound. For mathematical reasons, this level of detail would usually make a robust estimation of the most important parameters extremely difficult – figuratively speaking, with that much detail, it becomes difficult for an algorithm to see the wood for the trees.

After identifying such issues, we proposed and implemented a novel sound model that can take this level of detail into account to better identify which notes are active in a given recording. The foundation here was that in controlled recording conditions, we know, first, which type of instrument is playing and second, that we can obtain examples of single notes for that instrument. This way, we could focus on a specific instrument class (pitched percussive instruments such as the piano), which enabled us to increase the level of detail such that it not only became possible to model the interaction of the notes but additionally how we expect the notes to change over time. The result was a first sound model for music transcription that was capable of modelling highly non-stationary note sound-objects of variable length.

With this level of detail, a sound model contains many parameters, which we need to set correctly for the model to work as expected. From a mathematical point of view, this is quite difficult – i.e. it was not clear how we can find correct values for the parameters in our model. However, we developed a new parameter estimation method based on a mathematical framework called Alternating Directions Method of Multiplier (ADMM). This framework provides a lot of flexibility and enabled us to design various so called regularizers, which stabilize the parameter estimation process and make sure that we find meaningful values.

Overall, with the new sound model and parameter estimation methods for high detail sound modelling in place, we found our results to exceed the current state-of-the-art by far, by reducing error rates on often used, real acoustical recordings considerably, with previous error rates often being several times higher than ours. This considerable improvement demonstrates that additional information available from the production process can be translated into more detailed models while still being able to robustly find the parameters needed. The new method will enable a variety of new developments within the FAST project in the future.

Our method will be published soon in the IEEE/ACM Transcriptions on Audio, Speech and Language Processing: https://bit.ly/2asgHDf

Oxford partners participate in the Digital Humanities Summer School

Digital Humanities at Oxford Summer School, 4 – 8 July 2016

The Digital Humanities at Oxford Summer School is the largest in Europe (and second largest in the world). It aims to encourage, inspire, and provide the skills for innovative research across the humanities using digital technologies, and to foster a community around it.

One of the Digital Humanities at Oxford Summer School workshops in the programme was the Digital Musicology workshop. It was convened by Dr Kevin Page (e-Oxford Research Centre), one of our FAST project partners from Oxford. The workshop provided an introduction to computational and informatics methods that can be, and have been, successfully applied to musicology. It also included a Linked Data and Musicology in the Linked Data strand. Many of these techniques have their foundations in computer science, library and information science, mathematics and most recently Music Information Retrieval (MIR); sessions were delivered by expert practitioners from these fields and presented in the context of their collaborations with musicologists, and by musicologists relating their experiences of these multidisciplinary investigations. The workshop comprised a series of lectures and hands-on sessions, supplemented with reports from musicology research exemplars. Theoretical lectures were paired with practical sessions in which attendees were guided through their own exploration of the topics and tools covered.

Other FAST partners contributing to the Digital Humanities workshop were Professor Dave De Roure and Dr David Weigl, both from the Oxford e-Research Centre. De Roure’s workshop on Social Humanities also included a session on designing music social machines.

Finally, FAST project member from Queen Mary, Chris Cannam (Centre for Digital Music), gave two tutorial sessions at the workshop: ‘Applied computational and informatics methods for enhancing musicology’ and ‘Using computers to analyse recordings, An introduction to signal processing (with co-tutor Ben Fields, Oxford)’. Both sessions introduced the basics of computational treatment of recordings of music, which are based on the concept of ‘features’ derivable from this ‘signal’ by suitable processing. The hands-on session ‘Using computers to analyse recordings’ exposed the participants to software for extracting features from recordings, visualising those features, and helped them understand how features relate to perceptual and musical concepts.
Relevant links with further information:

http://www.oerc.ox.ac.uk/news/digi-humanities-summer-school
http://digital.humanities.ox.ac.uk/dhoxss/2016/
http://digital.humanities.ox.ac.uk/dhoxss/2016/workshops/digitalmusicology
http://digital.humanities.ox.ac.uk/dhoxss/2016/workshops/sochums

Ethnographic Studies of Studio Based Music Production

by Glenn McGarry, PhD student, Mixed Reality Laboratory, University of Nottingham

‘Design Ethnography’ is a feature of the FAST project that is helping to drive technological developments within. These types of ethnographic studies seek to ‘get inside’ the work of a setting to gain first-hand knowledge of how work is accomplished through real-world, real-time organisation. Practically, this is done through direct observational fieldwork that is then analysed to inform the design of novel technical solutions.

Two such ethnographies of ‘traditional’ studio based music production activities (by University of Nottingham researchers) have recently been used by FAST project developers (from QMUL, Oxford University, and Birmingham University), to inform the design of a software demonstrator (scheduled for demo in December 2016). The demonstrator aims to show how labelling of audio in DAW software (Digital Audio Workstation) could be supported through intelligent instrument recognition. In this blog post, I give some background on how this concept came about via a series of FAST design workshops that reflected on scenarios derived from the studies .

The first of the studies observed a rock band’s recording session. This involved a recording engineer capturing multi-track recordings of real-time band performances in a professionally equipped studio. The aim of the session was not to come away with a finished product, but for the engineer to take away multi-track audio that he could later edit and mix.

The second study observed another engineer creating a ‘pre-mix’. This involved the handling of multi-track audio taken away from a previous recording session, similar to the one in the first study. The engineer’s job was to create a mix that closely resembled the final product, for various stakeholders in the project to evaluate (musicians, record companies, investors etc.).

In keeping with the FAST project aims, the ethnographic analyses of the studies included observations on the creation and use of metadata. This potentially opens up new design prospects for metadata-driven tools to add value to music objects and support production. In our studies the metadata was in the form of labelling applied to the recording equipment and software, to signify the presence of audio (both analogue and digital).

In the recording study, the engineer labelled the recording console channels and the DAW software, to indicate a sound source’s signal path (e.g. guitar microphone) and groupings of audio by instrument. This helped organise the studio space and to aid location and interaction with recording equipment controls. He also reasoned that the digital labelling in the DAW was adequate as a substitute for notes for transferring session data to the mixing stage of the process.

In the pre-mix study, similar such labelling was transferred and used alongside the DAW session data, but not without issues. The engineer had to significantly reorganise the audio in the DAW before embarking on the new task in hand. For example, legacy audio objects, such as headphone sub-mixes used to aid musicians’ performances, were not needed and so removed. He also regrouped audio tracks by instrument, rearranging them according to his preference, and sought out tracks hidden by the recording engineer that were causing problems.

Issues of coordination between production stages were highlighted by these studies. In particular, the mix engineer’s unpicking of the recording engineer’s work before him and reworking of handed over resources was a significant overhead in work. Nevertheless, in a “what if” scenario where labelling was incomplete, absent, or stripped out by incompatible technologies, the overhead in work would have arguably been much greater.

The proposed demonstrator then aims to be the first step in generating and refining the utility of metadata in support of the production process. Automatic labelling through instrument recognition alone is perhaps not sufficient to completely transform production practice. Nonetheless it is a start in a direction that promises to at least find some efficiency gains and smooth the hand-over between production stages. The design prospects from the results of our ethnographic studies do not stop here of course and I will be doing more studies in the area of studio based production that will contribute to future FAST project impact.

Collaboration between FAST and Audiolabs Erlangen

By Sebastian Ewert (Centre for Digital Music, Queen Mary University of London)

FAST project member Sebastian Ewert (Queen Mary University of London) visited in July our project partner International Audio Laboratories Erlangen, a joint institution of the Friedrich-Alexander-University Erlangen-Nuernberg (FAU) and Fraunhofer Institut fuer Integrierte Schaltungen IIS. In a week-long meeting the partners exchanged their knowledge and experience regarding various aspects of Neural Networks, a technology currently dominating recent developments in signal processing and machine learning with applications ranging from automatically colouring gray scale images (*1), repainting photos in the style of famous artists (*2), to translation between languages and automatically steering cars and robots in the real world (*4). The discussion with Prof. Meinard Mueller and his doctoral students from the Semantic Audio Signal Processing group focused around the development of several new concepts to use or develop new variants of neural networks capable of extracting different types of semantical information from music recordings.

(*1) http://tinyclouds.org/colorize/
(*2) https://arxiv.org/abs/1508.06576
(*3) https://www.engadget.com/2016/03/11/google-is-using-neural-networks-to-improve-translate/
(*4) http://spectrum.ieee.org/computing/embedded-systems/bringing-big-neural-networks-to-selfdriving-cars-smartphones-and-drones

The Rough Mile project: a two-part location based audio walk

by Sean McGrath, Mixed Reality Lab, University of Nottingham

The project, The Rough Mile, is funded through the EPSRC’s FAST project (Fusing Semantic and Audio Technologies for Intelligent Music Production and Consumption, EPSRC EP/L019981/1) through the Mixed Reality Lab at the University of Nottingham. The Rough Mile is a two-part location-based audio walk that combines principles of immersive theatre with novel audio technologies, giving people the opportunity to give each other performance-based gifts based on digital music.

Listening to recorded music is generally construed as a passive act, yet it can be a passionately felt element of individual identity as well as a powerful mechanism for deepening social bonds. We see an enormous performative potential in the intensely meaningful act of listening to and sharing digital music, understood through the lens of intermedial performance. Our primary research method is exploring the dramaturgical and scenographic potentials of site-specific theatre in relation to music. Specific contexts of listening can make powerful affective connections between the music, the listener, and the multitude of memories and emotions that may be triggered by that music. These site-specific approaches are augmented by performance-based examinations of walking practices, especially Heddon and Turner’s (2012) feminist interrogation of the dérive, as music often shapes a person’s journey as much as it does any experience they arrive at.

For this project, as part of the EPSRC-funded FAST Programme Grant, we are creating a system through which people can devise ‘musical experiences’ based at a particular location or route, which they then share with others. They craft an immersive and intermedial experience drawing on music, narration, imagery, movement, and engagement with the particularities of each location, motivated by and imbued with the personal meanings and memories of both the creator of the experience and its recipient. We believe that this fluid engagement with digital media, technology, identity, and place will provide insights into the relationships between commercial music, the deeply personal significance that such music holds for individuals, the performativity inherent in the shared act of listening, and site-based intermedial performance.

We are developing a two-part, location-based audio walk performance for pairs of friends using Professor Chris Greenhalgh’s Daoplayer. They are drawn into a fictional world that prompts them to consider pieces of music that have personal significance in relation to their friend. Through the performance they choose songs and share narratives behind them, which are then combined to form a new audio walk performance, this time as a gift for their friend to experience. By focusing on the twin roles of memory and imagination in the audience experience, the work turns performance practice into a means of engaging more fully with digital technologies such as personal photos and music that people invest with profound personal significance.

The first part is immersive and intermedial audio walk drawing on music, narration, imagery, movement, and engagement with the particularities of the location in central Nottingham. The second part has participants retrace their original route, this time listening to the songs chosen for them by their friend, contextualised by snippets of audio from their friend’s verbal contributions. The musical gift is motivated by and imbued with the personal meanings and memories of both the giver and the receiver. We believe that this fluid engagement with digital media, technology, identity, and place will provide insights into the relationships between commercial music, the deeply personal significance that such music holds for individuals, location, movement, and performance.

FAST members present their work at MEC

FAST Report from the Music Encoding Conference, 17-20 May 2016, Montreal, Canada
(by David Weigl)

FAST project members David Weigl and Kevin Page (Oxford e-Research Centre) recently presented their work on Semantic Dynamic Notation at the Music Encoding Conference (MEC) in Montreal, Canada. The MEC is an important annual meeting of academics and industry professionals working on the next generation of digital music notation. The presented work builds on the Music Encoding Initiative (MEI) format for encoding musical documents in a machine-readable structure.

Semantic Dynamic Notation augments MEI using semantic technologies including RDF, JSON-LD, SPARQL, and the Open Annotation data model, enabling the fine-grained incorporation of musical notation within a web of Linked Data. This fusing of music and semantics affords the creation of rich Digital Music Objects supporting contemporary music consumption, performance, and musicological research.

The use case served by the presented demonstrator draws inspiration from an informal, late-night FAST project ‘jam session’ at the Oxford e-Research Centre. It enables musicians to annotate and manipulate musical notation during a performance in real-time, applying ‘call-outs’ to shape the structural elements of the performance, or signalling to the other players a new piece to transition to. Each performer’s rendered digital score synchronises to each shared action, making score adaptations immediately available to everyone in the session to support the collaborative performance. The supported actions transcend the symbolic representation of the music being played, and reference significant semantic context that can be captured or supplemented by metadata from related material (e.g., about the artist, or a particular style, or the music structure).

As well as augmenting and enabling dynamic manipulation of musical notation in a performance context, the system captures provenance information, providing insight into the temporal evolution of the performance in terms of the interactions with the musical score. This demonstrates how notation can be combined with semantic context within a Digital Music Object to provide rich additional interpretation, available to musicians in real-time during a performance, and to consumers as a performance outcome.

British Art Show 8

by Ben White, Centre for Digital Music, Queen Mary University of London (in collaboration with Eileen Simpson)

British Art Show 8
Talbot Rice Gallery, University of Edinburgh
13 February – 8 May 2016

Auditory Learning is part of the wider project of Open Music Archive, to find, distribute and reanimate out-of-copyright music recordings. Sourcing vinyl 45 rpm records of chart hits from 1962 – the last year that commercial recordings can be retrieved for public use until 2034 due to recent copyright revisions – and using emerging information retrieval technologies, the artists have extracted over 50,000 sounds to produce a new public sonic inventory. See Open Music Archive website: http://www.openmusicarchive.org/auditorylearning

At Talbot Rice Gallery two new works are presented which explore the reassembly of this corpus of 1962 sounds:

Assembled Onsets features eight modified turntables and individually lathe-cut records which play a new sound work assembled from the inventory of individual notes, percussive elements and vocal phonemes. The surrounding graphic fragments recall vinyl 7 inch printed paper sleeves.

Linear Search Through Feature Space is a looped audio work in which the algorithmic playback of acoustically similar sounds assembles an evolving rhythmic soundscape – a sonic journey recalling the sounds of 1962.

Auditory Learning will change and develop throughout the exhibition’s tour. Reassembled as part of a live event during British Art Show 8 and Huddersfield Contemporary Music Festival in Leeds, in Southampton it will form the soundtrack for a new film produced with a group of local teenagers.

The British Art Show is widely recognised as the most ambitious and influential exhibition of contemporary British art, with artists chosen for their significant contribution over the past five years. Organised by Hayward Touring at Southbank Centre, London, and taking place every five years, it introduces a broad public to a new generation of artists.

Eileen Simpson and Ben White, Auditory Learning: Linear Search Through Feature Space (2016) excerpt.

Photo 69: Nottingham workshop 8 June 2016

Fusing Audio and Semantic Technologies
For Intelligent Music Production and Consumption

Dedicated to the life and works of Alan Blumiein