Artificial intelligence is fundamentally changing music production. With tools like SongBloom from Tencent AI Lab (https://github.com/tencent-ailab/SongBloom), we can generate complex, full-length songs. However, the biggest hurdle for many users is the extreme format these models require for the lyrics.

We have developed an AI Assistant that elegantly overcomes this hurdle by combining the creativity of local LLMs (via Ollama) with the precision of SongBloom.

SongBloom AI Assistant Ollama ready to run

SongBloom AI Assistant Ollama ready to run

🎵 What is the SongBloom AI Assistant?

The SongBloom AI Assistant (Codebase here: https://github.com/custom-build-robots/SongBloom-AI-Assistant-with-OLLAMA) is a Gradio-based web application that offers a seamless workflow for AI music production:

  1. Idea Generation: You describe your song idea (genre, mood, theme) in natural language.
  2. Formatting: A local Ollama model of your choice (e.g., gpt-oss:20b) automatically generates the lyrics in the highly specialized SongBloom Token Format ([intro] [inst] [verse]...).
  3. Audio Generation: The cleaned text, along with a Style Prompt (a 10-second audio file) you upload, is sent directly to SongBloom’s infer.py script to produce the final music.

⚙️ The Technology Under the Hood

Our assistant uses a robust architecture to ensure maximum control and transparency:

  • Frontend: Gradio provides a simple, interactive interface.
  • Creativity: Ollama enables the use of powerful, local LLMs to write the song lyrics and force them into the correct, machine-readable format.
  • Audio Engine: The SongBloom framework handles the actual diffusion and generation of the song.
  • Media Tools: FFMPEG is used system-wide to automatically convert uploaded MP3s/FLACs to the 48kHz WAV format required by SongBloom and to optionally convert the final FLAC outputs into WAV or MP3.

🚀 Installation on Ubuntu

The setup of the workflow has been greatly simplified by a dedicated Bash installation script.

1. System Requirements

Before running the installation script, ensure that FFMPEG is available on your Ubuntu system, as it is essential for all audio conversion:

sudo apt update && sudo apt install -y ffmpeg

2. Using the Installation Script

The installation script we provide clones the SongBloom repository, creates an isolated Python 3.8 environment, and installs all necessary Python dependencies (PyTorch, Gradio, pydub, etc.):

# Adjust path if necessary
cd ~/scripts
./install_songbloom_web.sh

After installation, you will find the application write_me_a_song.py in the directory ~/SongBloom.

📝 Step-by-Step Instructions for Use

The assistant’s interface is divided into two main areas: Text Generation and Audio Generation.

Step 1: Generate Lyrics

  1. Ollama Configuration: Check the Ollama Server URL and select your preferred LLM (default is gpt-oss:20b).
  2. Input: In the field “Your Song Idea”, specify the genre, theme, and mood as detailed as possible.
  3. Start Generation: Click on “🚀 Generate Lyrics”.

After a few seconds, you will see two results under “Generated Lyrics”:

  • 1. Full LLM Output (Debug): Shows the entire raw response from the LLM, including its internal formatting thoughts.
  • 2. Clean SongBloom Text (Editable): This cleaned field contains only the SongBloom tokens. IMPORTANT: You can manually edit and correct this text before proceeding to audio generation.

Step 2: Generate and Archive Audio

  1. Upload Style Prompt: Under “Style Prompt Audio”, upload a 10-second WAV, MP3, or FLAC file that dictates the desired musical direction. The app automatically adjusts the length.
  2. Select Output Format: Under “Output Format”, choose the desired final format (FLAC, WAV, or MP3).
  3. Start Generation: Click on “▶️ Generate Audio”.

Upon successful generation, your song will be played directly in the player under “🎧 Your Song (Audio)”.

💾 Archiving

All created files (the clean song text, the JSONL input file, the pre-processed prompt WAV file, and the finished song) are stored permanently in the following directory:

~/SongBloom/generated_songs_archive/

This allows you to track and reuse your creations at any time.

▶️Example Prompt

“Write a song about a small robot-truck which loves his colorful chassis and power to drive around in the snow during x-maz suprising kids. The song should follow the lyris from imagine from John Lennon.”

🚀 Generated Prompt from gpt-oss:20b

[intro] [intro] [intro] [intro] [intro] ,
[verse] I think of a tiny truck with chrome shining bright. Its wheels spin in a line of color that gleams. It loves to glide through snow so fine. ,
[chorus] And in that bright world the truck is a dream. In a line its power makes everything gleam. It will drive across the snow. ,
[verse] The truck rolls along the snow with bright lines. It lights up the winter path like a shining sign. It keeps the kids laughing in a winter design. ,
[chorus] The truck keeps moving through the winter cold. Its bright chassis shines like a story told. The kids feel the thrill of a bright bright bold. ,
[verse] The truck hums with a happy sound. The snow around it shimmers with a bright round. It stays strong while the wind is profound. ,
[chorus] The truck moves through the winter bright. Its colorful chassis shines like a bright light. The kids cheer while the truck takes flight. ,
[verse] The truck's journey brings a sparkle in the snow. Its bright hull glows like a glowing show. The children sing as the wheels roll with a bright glow. ,
[chorus] The truck keeps the world bright with a gentle sound. Its colorful lines shine where the winter hounds. Kids cheer as the truck goes around and around. ,
[verse] The truck moves ahead like a bright star. Its chassis glows in the winter night afar. The kids shout as it goes with a bright guitar. ,
[chorus] The truck's lights shine like a winter star. Its colorful chassis glows like a bright star. Kids jump as it rolls across the snowy yard. ,
[inst] [inst] [inst] ,
[outro] [outro] [outro] [outro] [outro]

🎧Generated Song as MP3

Here is an example song that was generated with the prompt above.

💡 Project Summary

The SongBloom AI Assistant bridges the critical gap between human creativity and the complex input format of AI music generators like SongBloom.

  • The Challenge: SongBloom requires highly specialized, tokenized lyrics.
  • The Solution: A Gradio web application uses local LLMs (via Ollama) to automatically translate intuitive, natural-language song ideas into the correct, machine-readable token format.
  • The Workflow: Users enter an idea $\rightarrow$ Ollama formats the lyrics $\rightarrow$ SongBloom generates the audio.
  • The Technology: Robust architecture with Gradio (Frontend), Ollama (Creativity/Formatting), and FFMPEG (Audio Conversion), optimized for a simple Ubuntu installation via Bash script.

This tool optimizes the workflow and ensures that the data quality of the lyrics (in terms of the correct format) is consistently guaranteed to enable successful and reproducible AI music productions.