This past month, a number of machine learning based music projects made some noteworthy releases and progress. We discuss three different projects: Musical source separation, auto-generation of a drumbeat track from a melody, and an AI-powered keyboard that adds supporting instrumentals. ML in the music space has the ability to help musicians create new musical experiences with synthesized sounds, or compliment their rehearsals by providing a drumbeat baseline to practice with.
State of the Art Music Source Separation Library
The separation of different audio sources that have been mixed into single source is a known difficult problem for machines, but much less difficult for humans; this is commonly referred to as the “Cocktail Party Effect”. This research and open source project from Facebook Research seeks to solve that with “Demucs”. Older methods seeked to extract the waveforms of individual instruments via masking through Fourier Transforms. The previous state-of-the-art method employed deep learning and vastly outperformed these methods, but left noticeable artifacts in the drum + bass instruments. Building on that recent research and inspiration of recent music-synthesis techniques, Facebook employes a U-Net architecture with convolutional encoders and decoders, coupled with several Bi-LSTM’s between them to generate the individually separated waveforms. Their results are both objectively and subjectively better than the previous SOTA Conv-Tasnet approach.
They provide a usable library and pretrained models for anyone to separate out individual instruments. If you have your own training data, it’s possible to transfer learn this model into a particular musical domain for better results.
There is another audio source separation project worth being aware of if this domain is of interest, however it is not focused on musical source separation:
The implications for music are exciting. Many modern soundtracks are based on samples, riffs, or aspects of other songs, so a high-fidelity tool like Demucs might enable artists to create new musical works, especially in cases where the original studio recording files are long gone. It could also be useful for remixing audio to be optimized for the medium it’s being played on, for instance a mobile phone.
Another use case for this is in the music copyright legal practice. The above case of sampling songs is sometimes done without permission, and there is a thriving legal industry that protects the rights of musicians whose work has been stolen. Being able to extract out the signals of interest in a high-fidelity manner could help a judge or jury determine if the two signals are similar enough to constitute a violation.
The Magenta Project Releases New Instruments to its ML-Music Generation Platform
The Google Magenta project was released in 2018 as a music synthesis toolkit for musicians. One of the featured functions learns the composition of different sounds + instruments, and then generates completely novel sounds that represent new instruments. For instance, one could combine a flute and a trumpet into an embedding that is then generated back into the sound domain. However, the new feature of interest today is the DrumBot. Given a simple audio track, DrumBot generates a drumbeat for it on the fly, and in the browser.
DrumBot is based on a GroVAE, which underpins the “Drumify” app for synthesizing drumbeat tracks based on either a drumming pattern (score) or style (groove). GroVAE is a variant of a Recurrent Variational Autoencoder with some tweaks and a collected dataset of 13+ hours of recorded drumming MIDI tracks. DrumBot extends Drumify to accept non-drum inputs by removing pitches from the melody; this effectively converts the input to the drum-like “score,” and then adds the style/groove to output a drumline that matches that melody. The implementation details of GroVAE / Drumify can be found here.
Plug in your QWERTY keyboard or MIDI keyboard into a computer and play with the DrumBot web app. It doesn’t offer knobs to adjust the synthesized drumbeat, but still creates a satisfying drumline off a starting melody. From there you can jam on top of the melody + beat loop. This feature is also available in the downloadable Magenta software suite.
Amazon Introduces an AI-Keyboard
Amazon makes their debut into the music space with this AI-powered MIDI keyboard called DeepComposer. The keyboard takes user musical input, and adds stylized melodies of other instruments to create a new song. Example melody styles that can be added ay include jazz, rock, funk, or classical among others. Part of the experience includes having the user select the model architecture, loss functions, and even tuning hyperparameters of the GAN to achieve various results. The keyboard is positioned partly as a way to help introduce machine learning applications gently to creators, but also as an educational tool to aid in learning ML concepts, as evidenced by the “model tuning” aspect.
The DeepComposer fits an interesting niche in the now expanding ML-music space with players like Google Magenta and OpenAI MuseNet: Not only is the ML capability built directly into the musical instrument itself, but it offers the user the unique ability to interact with the model in non-trivial ways. This may help to bridge the gap between the technical and the artistic for creators, or at least provide a high feedback system for an ML engineer to gain intuition on GAN’s. At the same time, it may not abstract enough technicality away to be useful to someone who wouldn’t read an academic paper to understand the knobs.