Full 3-Min Song with Gemini Lyria 3 Pro: My Experience

Rifin De Josh

13 June 2026 • 0 • min read

Table of Contents

Last Tuesday, I sat on my couch in New York with zero musical training. I can’t read sheet music. I’ve never touched a MIDI keyboard. The last time I tried to sing, my neighbour knocked on the wall.

Yet, by the time I finished my second cup of coffee, I had a fully produced 3‑minute song on my phone. Intro, verse, chorus, bridge, outro — the whole structure. Instruments, backing vocals, even a tasteful guitar solo in the bridge.

Full 3-Min Song with Gemini Lyria 3 Pro: My Experience

The old way of making a song from scratch meant either hiring a session musician ($200–$500 per track), buying beats from a producer ($50–$150), or spending six months learning a DAW like Ableton. I’ve tried all three. Each one left me frustrated and broke.

Then I opened the Gemini app on my Pixel, switched to the Lyria 3 Pro model (their latest music generation engine), and typed a single prompt. Eight seconds later, the app spat out a 30‑second demo. I refined it three times. By the 12th generation, I had a complete 3‑minute track that sounded like it belonged on an indie pop playlist.

I’m not a musician. I’m just someone who learned how to talk to an AI in its own language.

Below is the exact step‑by‑step I used to generate a full song with proper song structure, the prompt formula that forces Gemini to respect intros and bridges (it really wants to skip them), and the manual edits you cannot ignore — because AI music still gets drum fills wrong about 40% of the time.

TL;DR — Key Takeaways

Project Goal: A complete 3‑minute original song with clear sections: intro (8–12 seconds), verse (45–50 seconds), chorus (30–40 seconds), bridge (25–30 seconds), and outro (10–15 seconds). Genre: indie pop / alternative. Includes vocals (male, natural delivery), electric guitar, bass, drums, and a subtle synth pad.
Tool Used: Gemini app (iOS or Android) with Lyria 3 Pro model. Lyria 3 Pro is Google’s latest music generation engine, only available inside the Gemini app (not the web version) as of June 2026. I used a Pro subscription ($19.99/month) because the Free and Plus tiers cap song length at 60 seconds and don’t support multi‑section prompting.
Time Spent: 12 minutes of prompt engineering + 8 minutes of manual tweaking (adjusting a drum fill and shortening a guitar solo) = 20 minutes total.
Cost: $19.99/month for Pro. If I generate 20 songs a month, that’s ~$1 per song. A human producer would charge $200–$800 for the same work.

The One‑Time Prep (Nothing. Seriously.)

Unlike video or avatar generation, Lyria 3 Pro doesn’t need you to upload reference tracks or train a voice model. It’s a pure text‑to‑music engine. You just need to know two things before you start typing:

Your desired genre (be specific — “indie pop” is okay, “indie pop with a melancholic Sunday morning vibe” is better).
Your song structure (the AI will default to verse‑chorus‑verse if you don’t explicitly call out intro, bridge, and outro).

I also recommend having a lyric sheet ready — not because Gemini can’t write lyrics (it can), but because AI‑generated lyrics tend to be generic and repetitive. I wrote my own 16 lines about missing the last train home from Brooklyn. It took 10 minutes and made the final song feel personal.

Where to find Lyria 3 Pro inside Gemini:

Open the Gemini app (not the web version — the web version still uses an older music model as of June 2026).
Tap the model selector at the top of the screen (it usually says “Gemini 1.5” or “Gemini Omni”).
Scroll down and select Lyria 3 Pro.
You’ll see a note: “Music generation — Pro plan required for songs over 60 seconds.” Confirm your subscription.

That’s it. No instruments, no MIDI, no sound libraries.

The Prompt That Built a Whole Song in One Go

Most people fail at AI music because they prompt like this: “Make me a sad song.” That’s like telling a chef “make me food.” You’ll get something, but it won’t be what you wanted.

Here’s the exact prompt I used to generate the first 90 seconds of my song (the app can’t generate 3 minutes in one shot — more on that in a second). I typed this into the chat box exactly as written:

Generate a 90-second song in indie pop genre, key of C major, tempo 98 BPM. Male vocalist, natural delivery (not over-produced), slight reverb on voice. Song structure: 10-second intro (just electric guitar arpeggio and ambient pad), then verse 1 (45 seconds) with lyrics I provide, then chorus (30 seconds) with higher energy, full drums, and backing vocals humming 'ooh' behind the lead. Lyrics for verse 1: 'Subway doors keep closing without me / I'm watching the lights blink 42nd Street / My phone says 1:08 AM / Last train's a ghost I won't catch again.' No bridge yet — I'll add that in a separate generation. Outro not needed now. Instruments: electric guitar (clean, slightly chorused), fingerpicked acoustic in verse, driving drums in chorus, bass playing root notes, warm synth pad underneath. No drum fills in the verse — only simple kick/snare. Output as 256kbps MP3.

Why this prompt works (and why shorter prompts fail):

Key and BPM specified – Without these, Lyria 3 Pro defaults to whatever it feels like, often resulting in disjointed sections that don’t modulate smoothly. C major is beginner‑friendly and works for indie pop.
Explicit section timings – “10‑second intro, verse 45 seconds, chorus 30 seconds” forces the AI to respect structure. If you just say “verse then chorus,” the verse might be 10 seconds and the chorus 90 seconds.
Negative instructions – “No drum fills in the verse” is crucial. Lyria 3 Pro loves to add fancy fills even in quiet sections. Telling it what not to do is as important as telling it what to do.
Placeholder for bridge – I knew I’d generate the bridge separately. Being honest with the AI (“no bridge yet”) prevents it from inventing a bad one on its own.

What about the lyrics?

I typed my lyrics exactly as written. The AI matched them to the melody it generated. If you don’t have lyrics, Gemini can write them for you — just add “write lyrics about [topic] in [style]” to your prompt. But in my testing, AI‑written lyrics were never as good as my own half‑decent ones.

What Happened When I Hit Generate (And How I Fixed the Awkward Bits)

My first generation came back at 88 seconds (close enough). The vocals were clear, the guitar tone was lovely, and the chorus actually lifted — the backing “oohs” worked.

But three things were wrong.

Problem 1: The transition from verse to chorus was abrupt
The verse ended, there was a 0.5‑second silence, then the chorus slammed in. No drum fill, no crash cymbal, no anticipation. Fix: I added a new instruction to my next generation: “Add a 2‑second drum fill (snare rolls, then crash cymbal) at the end of the verse to lead into the chorus.” That did it. The chorus now felt earned.

Problem 2: The vocalist’s delivery was too pristine
My lyrics are about missing a train at 1 AM — tired, a little sad, maybe drunk. The AI sang it like a Broadway audition. Fix: I added “vocal delivery: slightly tired, breathy, intimate, like a late‑night voice memo.” The second generation was perfect. The singer sounded human.

Problem 3: The chorus drums were too busy
I asked for “driving drums,” and the AI gave me a fill every two bars. It cluttered the mix. Fix: I added “chorus drums: kick on beats 1 and 3, snare on 2 and 4, no fills except at the very end of the chorus.” That cleaned it up immediately.

The Magic Prompt Formula for Non‑Generic Music (use this template):

[Genre] + [Key + BPM] + [Vocal style] + [Exact section lengths in seconds] + [Lyrics in quotes or attached] + [Instrument list with specific playing techniques] + [Negative instructions (“no X, avoid Y”)] + [Output format]

Plug in your details. The more you describe how instruments should play (fingerpicked, chorused, driving), the less the AI defaults to generic MIDI sounds.

Why I Couldn’t Generate the Whole 3 Minutes at Once (And the Workaround)

Here’s the hard limit I discovered: Lyria 3 Pro on the Pro plan has a maximum generation length of 90 seconds per request. Ultra can do 180 seconds, but I’m not paying $100/month.

So to get a 3‑minute song, I generated it in four chunks:

Chunk 1: Intro + Verse 1 + Chorus 1 (90 seconds)
Chunk 2: Verse 2 + Chorus 2 (60 seconds) — started immediately after Chunk 1 ended
Chunk 3: Bridge + Final Chorus (70 seconds)
Chunk 4: Outro (20 seconds) — generated separately

How I made them sound like one continuous song:

I gave every chunk the exact same opening instructions: “Key of C major, 98 BPM, same vocalist, same instrument settings.” Then, after exporting all four MP3s, I used a free audio editor (Audacity) to stitch them together. I overlapped the end of Chunk 1 with the start of Chunk 2 by 1 second and crossfaded. You cannot hear the seam.

What if you don’t want to stitch? Upgrade to Ultra ($99.99/month) for 180‑second generations. Two generations = your full 3 minutes. But honestly, stitching is easy. I’ll show you how in Part 2.

Exporting the Audio (Don’t Let the App Trick You)

After each successful generation, here’s how I saved the MP3:

In the Gemini app, the generated audio appears as a waveform card with play/pause.
Tap the three dots (•••) on the top‑right of the card.
Select Export.
Choose MP3 (256 kbps) — the highest quality available on Pro. (Ultra offers 320 kbps, but I can’t hear the difference.)
The file saves to your device’s Downloads folder (Android) or Files app (iOS).
Rename it immediately. I use SongName_Chunk1_BPM98_Cmaj.mp3.

Critical warning: Gemini does not autosave your generations. If you close the app before exporting, the audio is gone. I lost a beautiful bridge this way. Export immediately.

The Prompt Engineering Matrix (Five Genres, One Song Structure)

I kept the same structural skeleton (intro → verse → chorus → bridge → outro) and the same lyric theme (missing the last train), but changed the genre and instrumentation. Here’s what worked — and what failed spectacularly.

Object Style / Goal	My Exact Prompt (shortened for space)	Result Quality
Indie Pop (my original)	`90 seconds, indie pop, C major, 98 BPM. Male vocal, natural. Structure: 10s guitar intro, 45s verse (lyrics provided), 30s chorus with 'ooh' backing vox. Instruments: clean electric, fingerpicked acoustic, driving drums in chorus. No verse drum fills.`	Excellent (9/10). This is Lyria 3 Pro’s home turf. The chorus lift was emotional. Only flaw: the acoustic guitar in the verse sounded slightly MIDI — but adding “use a real sampled acoustic guitar” fixed it.
Lo‑Fi Hip Hop (Study beats)	`90 seconds, lo-fi hip hop, F# minor, 75 BPM. Female vocal, whispered, slightly off-beat. Structure: 8s vinyl crackle intro, 50s verse, 25s chorus (no backing vox, just a filtered sample). Instruments: dusty drum machine, soft piano chords, double-bass, rain sound effect underneath. No 808 bass drops. Lyrics about waiting for a train in the rain.`	Very good (8/10). The whisper delivery was haunting. The rain effect worked. But the AI ignored “no 808 bass drops” about 30% of the time — I had to regenerate three times. Once it obeyed, the track was gorgeous. Worth the extra attempts.
Rock (Garage / Strokes style)	`90 seconds, indie rock, E major, 135 BPM. Male vocal, slightly distorted, lazy drawl. Structure: 5s drum count-in, then 40s verse (power chords, palm-muted), 35s chorus (open chords, noisy guitar solo at end). Instruments: overdriven Telecaster, roomy drums (no reverb), bass with fuzz. Lyrics: 'Train's gone / platform's empty / I'll walk the bridge home.'`	Fair (6/10). The verse was excellent — palm muting sounded authentic. But the guitar solo in the chorus was a mess: off‑key bends and nonsensical phrasing. Lyria 3 Pro cannot do convincing rock solos (yet). I ended up deleting the solo instruction and letting the chorus ride on chords alone. That version worked.
Ballad (Piano / Emotional)	`90 seconds, piano ballad, A flat major, 68 BPM. Male vocal, vulnerable, slight crack on high notes. Structure: 15s piano intro (simple chords), 40s verse (just piano and voice), 30s chorus (add strings and soft drums). No vocal harmonies. Lyrics: 'The last train's a lie they tell you so you wait / I'm done waiting.'`	Excellent (9.5/10). This was the surprise winner. The vocal crack on “done waiting” gave me chills. The string arrangement was tasteful, not cheesy. Lyria 3 Pro excels at ballads. If you want to impress someone with AI music, make a ballad.
Hip Hop (Boom bap)	`90 seconds, boom bap hip hop, G minor, 88 BPM. Male vocal, spoken-word style, no melodic singing. Structure: 8s intro (record scratch + vocal sample 'yeah'), 50s verse (16 bars), 25s chorus (simple loop, no vocals). Instruments: 90s drum machine (Roland 808), sampled piano loop, bass slide. Lyrics: 'Conductor looks through me / turnstile ate my last swipe / two blocks to the bridge but my legs are concrete.'`	Poor (4/10). The beat was solid. The vocal delivery was fine. But the AI couldn’t handle the 16‑bar structure — it rushed through the lyrics, fitting 16 bars into 30 seconds. The timing was off, and the piano loop repeated every 4 bars without variation. Hip hop is not Lyria 3 Pro’s strength. Stick to melodic genres.

Key takeaway: Lyria 3 Pro is a ballad and indie pop machine. It struggles with complex rhythmic genres (hip hop) and expressive soloing (rock). Use it where it shines, and don’t fight its limitations.

Comparison Table by Tier (Same 90‑Second Indie Pop Prompt, Four Plans)

I ran the exact same indie pop prompt across Free, Plus, Pro, and Ultra. The Free and Plus plans couldn’t even attempt the full prompt (length limits), so I had to adapt.

Object generation speed	Output results (same prompt)	The set limit (how many objects?)	Revisions / improvements required manually?
Free ($0): Not available — Free tier does not include Lyria 3 Pro. Only text and basic image gen.	N/A	N/A	N/A
Plus ($4.99/mo): 60‑second max length, so my 90‑second prompt was rejected. I resubmitted a 60‑sec version (verse + chorus only). Generation time: 12 seconds.	60‑second MP3 at 192 kbps. Vocals were clear, but the mix was muddy (instruments overlapped in frequency). No ability to specify key or BPM — the AI ignored those instructions.	200 music generations per month, max 60 sec each.	Yes — heavy. I had to use a separate EQ tool to clean up the low end. Plus is not suitable for serious music production.
Pro ($19.99/mo): 12 seconds for 90 seconds. (I timed it: 11.8 seconds average over 10 generations.)	256 kbps MP3. All instructions respected (key, BPM, instrument details). Clean mix, good separation. Vocals natural.	800 music generations per month, up to 90 sec each. (For 3‑minute songs, you need four generations — counts as 4 toward limit.)	Minimal. I adjust vocal level by +2dB sometimes and trim silence at the ends. That’s it.
Ultra ($99.99/mo or $199.99/mo): 8 seconds for 90 seconds. For 180‑second generations (Ultra exclusive), 14 seconds.	320 kbps MP3 (higher bitrate). 10‑bit depth (Pro is 8‑bit — noticeable only on high‑end headphones). Slightly better vocal clarity.	4,000 generations per month (Ultra $99.99) or 20,000 (Ultra $199.99). Max length: 180 sec per generation (so a 3‑min song needs two generations instead of four).	None required in my tests. Flawless output every time.

My honest recommendation for song creators: Pro is the minimum viable plan. Plus is useless for anything beyond 60‑second demos. Ultra is only worth it if you’re generating hours of music daily and every second of render time costs you money — or if you absolutely need 180‑second generations to avoid stitching. For everyone else, Pro at $19.99/month is the correct answer.

The Human Polish You Cannot Skip (Even on Pro)

I know I said Pro needed minimal edits. That’s true most of the time. But three issues keep appearing across generations, and ignoring them will make your song sound amateur.

1. Ghost lyrics (the AI adds words you never wrote)
In my rock test, the AI added a line: “The diesel’s crying on the track.” I didn’t write that. It wasn’t in my prompt. The vocalist just sang it. Fix: Listen to every generation with lyrics in front of you. If you hear an extra word or phrase, note the timestamp, then add to your next prompt: “At [time], remove the unauthorised lyric ‘[text]’ and replace with silence.” Gemini can regenerate just that section if you ask.

2. Digital clipping on chorus entrances
When the full drums and bass kick in at the chorus, about 30% of my generations had a split‑second of digital distortion (clipping). Fix: Download the MP3 and open it in Audacity. Look at the waveform at the chorus start — if it’s flat at the top, lower the gain by -3dB on just that section. Takes 30 seconds.

3. The outro fades too fast
Gemini loves a 2‑second fade‑out. It feels rushed. Fix: Generate the outro as a separate 15‑second chunk with the instruction: “End with a 6‑second fade, starting at 9 seconds into the outro.” Then stitch. A slow fade sounds professional. A fast fade sounds like a ringtone.

The one thing I never skip: a mono compatibility check.
I play the final MP3 on my phone’s single speaker (which sums stereo to mono). If the vocals disappear or the guitar drops out, that means the AI used phase issues in its stereo image. I regenerate with “ensure mono compatibility” added to the prompt. This saved a track that sounded great on headphones but vanished on a Bluetooth speaker.

Exporting the Final Song (Stitching Made Simple)

After you have your chunks (intro+verse1+chorus1, verse2+chorus2, bridge+final chorus, outro), here’s how to glue them together without losing quality.

My free, dead‑simple stitching workflow:

Download all chunks as MP3 (256 kbps for Pro).
Open Audacity (free, works on Windows/Mac/Linux).
Drag the first chunk onto the timeline.
Drag the second chunk so it overlaps the first by 0.8 seconds (estimate visually).
Select the overlapping region → Effect → Crossfade Tracks.
Repeat for all chunks.
Export as MP3 (256 kbps, joint stereo).

Total time for a 3‑minute song: 3 minutes of stitching.

What if you want a single file without manual stitching? Ultra’s 180‑second generations reduce the number of chunks from four to two. But you still need to stitch two files. There’s no “generate full 3 minutes” button on any tier as of June 2026.

One pro tip: Export each chunk with a 1‑second silence at the end. It makes crossfading cleaner. Add “end with 1 second of silence” to your prompt.

The Real Cost: AI Producer vs. Human Session Musician (New York, 2026)

Let’s price out a 3‑minute original song with vocals, full arrangement (guitar, bass, drums, keys), and professional mixing.

Option 1: Hire a session musician / producer in New York
- Producer to write arrangement and record: $150 – $300 per hour (minimum 3 hours) = $450 – $900
- Session vocalist (union rate): $250 – $400 for 3 hours
- Mixing engineer: $200 – $500 per track
- Mastering: $100 – $200
- Total: $1,000 – $2,000 per song
Option 2: Hire a remote producer (SoundBetter, Fiverr Pro)
- Full track production with vocals: $300 – $800
- Total: $300 – $800
Option 3: Gemini Pro (my method)
- Subscription: $19.99/month
- My time: 20 minutes of prompting + 10 minutes of stitching/editing
- Cost per song (if I make 10 songs a month): $2.00

Which is cheaper, more efficient, and better?
Cheapest: Gemini by a landslide.
Most efficient: Gemini. A human producer takes 3–10 days and requires back‑and‑forth.
Better (quality): A top‑tier human producer (the kind who charges $1,000+) will still beat Gemini. They can nuance a vocal phrasing or a drum feel in ways the AI can’t. But for the average musician, podcaster, or content creator? Gemini’s quality on ballads and indie pop is genuinely competitive with $300–$500 human productions.

My honest take: I use Gemini for demos, social media tracks, background music for videos, and any project where “really good” is enough. For my album? I’ll hire a human. But for 90% of real‑world use cases, Gemini is the smart choice.

The Usability Verdict (Specifically for Full Song Generation with Structure)

Using Gemini Plus (not recommended for this object):

Song length limit: 60 seconds ❌
Respects key/BPM instructions? ❌ No
Output quality: 4/10 (muddy mix)
Overall: 2/10 — Frustrating. Don't bother.

Using Gemini Pro:

90‑second chunks (4 chunks for 3 min): ✅ Manageable
Respects structure prompts: 9/10
Vocal naturalness: 9/10
Instrument realism: 8/10 (acoustic sounds slightly synthetic)
Mix quality: 8/10
Overall: 8.5/10 — Reliable, fast, and genuinely useful.

Using Gemini Ultra:

180‑second chunks (2 chunks for 3 min): ✅ Better
Slightly higher audio quality (320 kbps, 10‑bit): 9/10
Speed: 10/10
Overall: 9/10 — Excellent but overpriced for most.

Final rating for this specific object: 8.5/10 with Pro plan.

That’s a “yes” from me. The stitching workaround is minor. The quality on ballads and indie pop is shockingly good. Just avoid hip hop and rock solos, and you’ll be happy.

Intercepting Field Obstacles (Real Answers for Real Problems)

Gemini added a bridge I didn’t ask for, and it ruined the flow. How do I remove it?

You can’t delete a section after generation. But you can regenerate with “NO BRIDGE — skip directly from second chorus to outro.” I add “structure diagram” to my prompt now: [Intro] → [Verse 1] → [Chorus] → [Verse 2] → [Chorus] → [Outro] (no bridge). The AI follows diagrams better than paragraphs.

The vocalist sounds like the same person in every song. Can I change the voice?

Yes, but indirectly. Add “male vocalist, early 20s, tenor range” or “female vocalist, alto, smoky tone.” Lyria 3 Pro has about 8 distinct voice models under the hood. You can’t upload your own voice (that’s Lyria Voice, a separate product), but you can cycle through different vocal descriptions until you find one you like.

Why did Gemini refuse to generate my song about a specific brand or real person?

Content policy. I tried “a song about waiting for a delayed Delta flight at LaGuardia.” The AI rejected it because Delta is a trademarked brand. Replace with “airline” or “flight” and it works. Similarly, “song about Taylor Swift” was rejected. “Song about a famous pop star” was accepted but sounded terrible.

Can I use my Gemini song on Spotify or sell it?

Yes — with a catch. Google’s terms (as of June 2026) grant you full ownership of generated content on paid tiers (Pro and Ultra). Free and Plus? They retain a license. But Spotify’s AI‑generated music policy requires you to disclose AI involvement. I’ve uploaded three Gemini songs to DistroKid. No issues yet, but I mark “AI‑assisted” in the metadata.

The guitar sounds fake. How do I make it real?

You can’t — it’s a model, not a recording. But you can mask it. Add “add subtle room reverb and a tiny amount of vinyl crackle to the whole track.” The imperfections mask the synthetic sheen. I tested this side‑by‑side: the dry version sounded like a keyboard. The reverbed version sounded like a bedroom recording.

Your Song Is Waiting — Go Make It (Then Tell Me What Broke)

You now have everything: the prompt blueprint, the tier truth, the stitching method, and the manual fixes that turn “good for AI” into “good for real people.”

The song I made — the one about the 1:08 AM train — has been played 4,000 times on my SoundCloud. People ask who the singer is. They don’t believe me when I say it’s an AI. That’s the moment I realised this stuff isn’t a toy anymore.

Now I want to hear from you.

Did you try the ballad prompt? Share a link below (even a rough cut).
Did Gemini refuse your genre? Tell me what you typed — I’ll help you rephrase.
Did you find a way to make rock solos work? I’m desperate for that one.

Drop a comment. I’ll listen to your track and reply with one specific thing you can improve. Let’s prove that you don’t need a studio to make music anymore — just a good prompt and a little patience.

AI NY City