Music is a fundamental part of peoples’ lives and can be used in almost any situation, whether it be listening to a relaxing song while studying, creating a backdrop for a video game, or just to provide entertainment on a long drive. This work focuses on the development of automatic music generation by using an artificial intelligence algorithm to learn common characteristics and patterns in music to create similar, but unique, variations on inputted training songs. The initial work focuses on study music for its simpler musical structure, large collection of existing art, and utility for University students.
Initially, we thought that we could download study music found on YouTube for the training data. Unfortunately, our algorithm only takes in MIDI files, and there isn't a good way to convert from mp3 files to MIDI files. Common tools that convert mp3 to MIDI create lots of short, small notes in an attempt to replicate complex regions of songs, creating a file that doesn’t accurately represent the initial study music. One of these initial study music MIDI files is shown below so you can see for yourself why they are not a good choice for training our algorithm.
Eventually, we turned to songs from Minecraft, since most of the songs have a peaceful quality to them and could be relaxing to study to. The final training songs can be found by clicking the link below.
The model training starts by going through each note of every song and extracting five characteristics: the pitch, the volume, the duration, the tempo at that note, and the distance between that note and the note that was played before it.The data is normalized using the L2 norm.
The training data is calculated, and consists of five vectors, one for each extracted characteristic of a note, containing a “window” of n notes to train an LSTM-based model. There is also a vector consisting of every note that follows its corresponding window, such that the model can learn to predict what the next note should be given a window of notes.
Finally, the model is trained on the prepared data to a fixed epoch, e. Throughout our work, we varied the window of notes used (n), the epochs we trained to (e), and the architecture of the model over several months to refine the generation of music.
After training is complete, the final model is used to generate new music. Generation starts with a randomly selected window of notes from the training data. From there, this window is used as input to the model, and the model output is stored in a list to be converted to a note. This process is repeated to create the desired number of notes in the final piece.
Starting off with music trained on our initial study music MIDI files, we used 10 MIDI files similar to the one shown in the data section above. As you likely heard, these songs have many short notes where we would expect long, drawn out notes in study music. Even with this short-note problem, we still decided to run these songs through our algorithm, hoping that it would generate songs resembling the original study music found on YouTube. Unfortunately, that was not the case. The final generated songs still had this short note issue, and sounded much too chaotic to be relaxing. Below you can find an example of a song generated from these files.
At this point, we realized that our initial training songs would never produce good sounding output, and we decided to switch to the Minecraft songs for training. Still, these new training songs didn’t guarantee that our model would produce good sounding output songs. We have provided a couple of examples below to demonstrate this.
In the above example, we trained a model with a small window of notes used for prediction. Although this has the benefit of a shorter training time, the model doesn’t have enough previous notes to make a good prediction.
This second example actually comes from our final model, however, this song was generated using weights from a smaller epoch number (e). You can hear the quick loud notes near the beginning, and then the model seems to pick a pattern of notes to repeat. Allowing this model to train for many more epochs produced the much better sounding songs found in the section below.
In October 2020, we released the first composition, “Computer Generated Bells”, for consideration for the 100th anniversary celebration of the Altgeld Chimes. Throughout the current week, we are releasing our first full album, “nauka”, comprised of five tracks -- each around an hour in length -- that showcase the current state-of-the-art of our computer generated music: