Master's Thesis: A Multimodal Text-To-Midi Transformer Model with Special Consideration to Artist Usage

    Abstract: Generative AI systems are of great interest in the field of automated music production. Current well-known state-of-the-art music generation systems such as MusicLM, while incredibly versatile and tractable, do not allow for direct control of tone or texture of the generated instruments other than by altering the text prompt, which also alters the structure of the generated composition. Therefore, symbolic music generation would still be of great use to musicians. Using current state-of-the-art advancements in Deep Learning and an off-the-shelf MIDI dataset and pretrained English encoder, this work proposes a novel sequence-to-sequence music generation model that converts written descriptions of the style and artistic themes of a song into coherent MIDI musical representations. The architecture and synthetic dataset used in training this model were constructed with special consideration to the needs of musicians, namely, a native musical format, lack of association between any particular artist and musical style, and the possibility of live usage with a human accompaniment.

    The following generations were cherry-picked out of a selection of 3 generations per prompt. Not all of them perfect, but generally follow the prompt. All were generated with the same hyperparameters, so exerting more control over generation could produce better results. CFG value: 2.0 Temperature: 0.6 No repeat: 64 tokens Repetition Penalty: 1.05

    Prompt Result
    "Country song with elements of folk. Instruments are guitar, ethnic, strings, and bass. Key is G major. Tempo is 140BPM."
    "A hip-hop rap beat. Instruments are synth bass, synth lead, synth pad, and drums."
    "An R&B Motown song about the joy of young love. Instruments are piano, drums, strings, brass, bass, and electric guitar. Key is A Major."
    "An R&B Motown song about the sadness of ending a relationship. Instruments are piano, drums, strings, brass, bass, and electric guitar. Key is A Major."
    "A hip-hop rap beat with elements of baroque classical music. Instruments are synth drums, synth bass, and piano."
    "An ambient electronica composition with elements of folk and country. Instruments are ethnic, synth bass, synth lead, and guitar. The song is about the new life of the springtime."
    "An experimental funk song with an interesting rythmic component. Instruments are bass, electric guitar, and drums. The song is about exploring an alternate dimension."

    The following example was cherry-picked and produced into a song with custom synthesizer patches in REAPER. Other than setting initial instrument volumes, the arrangement remains unedited.

    "A relaxed, downtempo ambient electronic song with classical influence. The tempo is 150. The song is reminiscient of the springtime. Instruments are synth lead and synth bass."

    The following example is a failure case, the model tends to get stuck in loops:

    "An intense 80s synthwave song. synth bass, synth lead, and percussive."