I've always wanted to build something that blends my passion for music and technology. This tool converts textual descriptions into sheet music through an efficient, purpose-built notation system, solving the problem of finding resources for drum rudiment variations and embellishments.
After ideating, I thought this was the most applicable project that blends my passion for music and technology. When I started drumming, I was tasked with learning the "40 rudiments." I like to describe these as the "words" of the drumming "language." Most drumming phrases consist of these rudiments stitched together in different ways.
When I was learning these rudiments, my teacher pointed me to online resources. However, it got much harder to locate online resources as we began adding variations, or "embellishments," to these rudiments. This resulted in my teacher writing down the patterns on sheets that I would lose the next day 😬😅.
While basic rudiments are easy to find online, variations and embellishments are nowhere to be found. This meant relying on my teacher's handwritten sheets, which I'd inevitably lose the next day, constantly interrupting my drumming practice.
This brings me to the creation of the product. After experimenting with a variety of models, I concluded that fine-tuning an LLM would be the most effective strategy. It could easily process a user's textual input and produce an output I could fine-tune. I used musicXML and the Open Sheet Music Display API to write and display the sheet music. My LLM of choice had a foundation of musicXML, and I fine-tuned it on more data points to refine its skills.
At the end of the training, the model could produce a measure of music reliably; however, it was EXTREMELY slow and expensive. The outputs would be 200 lines of code for only 8-16 notes. I knew that this wouldn't be scalable.
After giving it more thought and consulting some people with more experience than myself, I came to the approach the model is currently using. I developed my own custom drum notation, which maps a sequence of characters to a given note. That custom drum notation is what the LLM is fine-tuned on. From there, the sequence of characters is tokenized and then converted to musicXML. That musicXML is processed by the Open Sheet Music Display API and displayed to the user.
This nifty approach resulted in times ranging from 5-10 seconds, meaning there was a significant reduction in the output time. It also became much cheaper as the outputs went from 200 lines to 200 characters.
The heart of the solution lies in this custom notation system that maps character sequences directly to musical notes and rhythms. Each notation begins with either 'R' or 'L' to indicate which hand should play the note (Right or Left), followed by the note value and any modifiers.
Right Hand: Note should be played with the right hand
Left Hand: Note should be played with the left hand
Whole Note: Occupies the entire 4/4 measure
Half Note: Half of a 4/4 measure
Quarter Note: One-quarter of a 4/4 measure
Eighth Note: One-eighth of a 4/4 measure
Sixteenth Note: One-sixteenth of a 4/4 measure
Thirty-Second Note: One thirty-second of a 4/4 measure
To modify how a note is played, the following suffixes are used:
Now that you understand the notation components, here's how a complete 16th note with a flam (RSF) translates to MusicXML:
3 characters: Right hand (R) + Sixteenth note (S) + Flam (F)
<note> <grace slash="yes"/> <unpitched> <display-step>C</display-step> <display-octave>5</display-octave> </unpitched> <type>eighth</type> <stem>up</stem> </note> <note> <unpitched> <display-step>C</display-step> <display-octave>5</display-octave> </unpitched> <duration>8</duration> <voice>1</voice> <type>16th</type> <stem>up</stem> <lyric number="1"> <syllabic>single</syllabic> <text>R</text> </lyric> </note>
20+ lines of XML for the same musical information
The AWS Lambda compiler handles the heavy lifting of converting my custom notation into industry-standard MusicXML. It parses each token to extract note durations, hand assignments, and embellishments, then calculates precise timing within measures and applies proper beaming rules for readability.
Here's how everything comes together in the deployed system. The workflow follows a streamlined pipeline from user input to rendered sheet music:
This project taught me the importance of thinking creatively about problem-solving. The first approach, while technically sound, wasn't practical. Beyond the technical skills gained in AI/ML, web development, and music technology, this project demonstrated how personal passions can drive innovative solutions to real-world problems.
The complete source code for this project is available on GitHub, including the custom notation system, Python compiler, and frontend implementation.
While I'd love to share the live tool publicly, I need to protect it from potential misuse that could drain my API credits. As a broke college student, I can't afford unexpected charges! 😅
If you'd like to try the actual AI Drum Beat Generator, feel free to email me and I'll be happy to share the link personally. I'm always excited to demo it for fellow music and tech enthusiasts!