Artificial intelligence is fundamentally changing music production. With tools like SongBloom from Tencent AI Lab (https://github.com/tencent-ailab/SongBloom), we can generate complex, full-length songs. However, the biggest hurdle for many users is the extreme format these models require for the lyrics.
We have developed an AI Assistant that elegantly overcomes this hurdle by combining the creativity of local LLMs (via Ollama) with the precision of SongBloom.
🎵 What is the SongBloom AI Assistant?
The SongBloom AI Assistant (Codebase here: https://github.com/custom-build-robots/SongBloom-AI-Assistant-with-OLLAMA) is a Gradio-based web application that offers a seamless workflow for AI music production:
- Idea Generation: You describe your song idea (genre, mood, theme) in natural language.
- Formatting: A local Ollama model of your choice (e.g.,
gpt-oss:20b) automatically generates the lyrics in the highly specialized SongBloom Token Format ([intro] [inst] [verse]...). - Audio Generation: The cleaned text, along with a Style Prompt (a 10-second audio file) you upload, is sent directly to SongBloom’s
infer.pyscript to produce the final music.
⚙️ The Technology Under the Hood
Our assistant uses a robust architecture to ensure maximum control and transparency:
- Frontend: Gradio provides a simple, interactive interface.
- Creativity: Ollama enables the use of powerful, local LLMs to write the song lyrics and force them into the correct, machine-readable format.
- Audio Engine: The SongBloom framework handles the actual diffusion and generation of the song.
- Media Tools: FFMPEG is used system-wide to automatically convert uploaded MP3s/FLACs to the 48kHz WAV format required by SongBloom and to optionally convert the final FLAC outputs into WAV or MP3.
🚀 Installation on Ubuntu
The setup of the workflow has been greatly simplified by a dedicated Bash installation script.
1. System Requirements
Before running the installation script, ensure that FFMPEG is available on your Ubuntu system, as it is essential for all audio conversion:
sudo apt update && sudo apt install -y ffmpeg
2. Using the Installation Script
The installation script we provide clones the SongBloom repository, creates an isolated Python 3.8 environment, and installs all necessary Python dependencies (PyTorch, Gradio, pydub, etc.):
# Adjust path if necessary
cd ~/scripts
./install_songbloom_web.sh
After installation, you will find the application write_me_a_song.py in the directory ~/SongBloom.
📝 Step-by-Step Instructions for Use
The assistant’s interface is divided into two main areas: Text Generation and Audio Generation.
Step 1: Generate Lyrics
- Ollama Configuration: Check the Ollama Server URL and select your preferred LLM (default is
gpt-oss:20b). - Input: In the field “Your Song Idea”, specify the genre, theme, and mood as detailed as possible.
- Start Generation: Click on “🚀 Generate Lyrics”.
After a few seconds, you will see two results under “Generated Lyrics”:
- 1. Full LLM Output (Debug): Shows the entire raw response from the LLM, including its internal formatting thoughts.
- 2. Clean SongBloom Text (Editable): This cleaned field contains only the SongBloom tokens. IMPORTANT: You can manually edit and correct this text before proceeding to audio generation.
Step 2: Generate and Archive Audio
- Upload Style Prompt: Under “Style Prompt Audio”, upload a 10-second WAV, MP3, or FLAC file that dictates the desired musical direction. The app automatically adjusts the length.
- Select Output Format: Under “Output Format”, choose the desired final format (FLAC, WAV, or MP3).
- Start Generation: Click on “▶️ Generate Audio”.
Upon successful generation, your song will be played directly in the player under “🎧 Your Song (Audio)”.
💾 Archiving
All created files (the clean song text, the JSONL input file, the pre-processed prompt WAV file, and the finished song) are stored permanently in the following directory:
~/SongBloom/generated_songs_archive/
This allows you to track and reuse your creations at any time.
▶️Example Prompt
“Write a song about a small robot-truck which loves his colorful chassis and power to drive around in the snow during x-maz suprising kids. The song should follow the lyris from imagine from John Lennon.”
🚀 Generated Prompt from gpt-oss:20b
[intro] [intro] [intro] [intro] [intro] ,
[verse] I think of a tiny truck with chrome shining bright. Its wheels spin in a line of color that gleams. It loves to glide through snow so fine. ,
[chorus] And in that bright world the truck is a dream. In a line its power makes everything gleam. It will drive across the snow. ,
[verse] The truck rolls along the snow with bright lines. It lights up the winter path like a shining sign. It keeps the kids laughing in a winter design. ,
[chorus] The truck keeps moving through the winter cold. Its bright chassis shines like a story told. The kids feel the thrill of a bright bright bold. ,
[verse] The truck hums with a happy sound. The snow around it shimmers with a bright round. It stays strong while the wind is profound. ,
[chorus] The truck moves through the winter bright. Its colorful chassis shines like a bright light. The kids cheer while the truck takes flight. ,
[verse] The truck's journey brings a sparkle in the snow. Its bright hull glows like a glowing show. The children sing as the wheels roll with a bright glow. ,
[chorus] The truck keeps the world bright with a gentle sound. Its colorful lines shine where the winter hounds. Kids cheer as the truck goes around and around. ,
[verse] The truck moves ahead like a bright star. Its chassis glows in the winter night afar. The kids shout as it goes with a bright guitar. ,
[chorus] The truck's lights shine like a winter star. Its colorful chassis glows like a bright star. Kids jump as it rolls across the snowy yard. ,
[inst] [inst] [inst] ,
[outro] [outro] [outro] [outro] [outro]
🎧Generated Song as MP3
Here is an example song that was generated with the prompt above.
💡 Project Summary
The SongBloom AI Assistant bridges the critical gap between human creativity and the complex input format of AI music generators like SongBloom.
- The Challenge: SongBloom requires highly specialized, tokenized lyrics.
- The Solution: A Gradio web application uses local LLMs (via Ollama) to automatically translate intuitive, natural-language song ideas into the correct, machine-readable token format.
- The Workflow: Users enter an idea $\rightarrow$ Ollama formats the lyrics $\rightarrow$ SongBloom generates the audio.
- The Technology: Robust architecture with Gradio (Frontend), Ollama (Creativity/Formatting), and FFMPEG (Audio Conversion), optimized for a simple Ubuntu installation via Bash script.
This tool optimizes the workflow and ensures that the data quality of the lyrics (in terms of the correct format) is consistently guaranteed to enable successful and reproducible AI music productions.








The tutorial offers a clear and practical guide for setting up and running the Tensorflow Object Detection Training Suite. Could…
This works using an very old laptop with old GPU >>> print(torch.cuda.is_available()) True >>> print(torch.version.cuda) 12.6 >>> print(torch.cuda.device_count()) 1 >>>…
Hello Valentin, I will not share anything related to my work on detecting mines or UXO's. Best regards, Maker
Hello, We are a group of students at ESILV working on a project that aim to prove the availability of…