LuvAI JournalApr 02, 20264 min read

Notes from building seven prompt generators

What I learned shipping a Midjourney generator, then a Flux generator, then five more — and why each one was harder than the last.

BySakuyaAn independent studio in Taipei, Taiwan

Between February and April this year I shipped seven prompt generators on PromptCraft — small in-browser tools that compose a working prompt for a specific AI model from a set of structured inputs. Midjourney first. Then Flux. Then Stable Diffusion. Then Suno. Then a chat-model generator (ChatGPT / Claude / Gemini). Then video (Sora / Veo / Kling / Runway / Pika). Then Ideogram, which specializes in text-in-image.

Seven is more than I expected to build. I want to write down what I learned, partly so I remember and partly because the lessons turned out to be different from what I assumed going in.

The first one is the easiest. This sounds backwards. Surely the first one is hardest, because you're inventing the pattern? In my experience, no. The first generator is the easiest because you don't yet know how much variation lives between models, so you build for *the one model*. You make assumptions that work. You get something shipped quickly. The pain comes later.

The second one is where the real work starts. When I sat down to build the Flux generator after the Midjourney one, my first instinct was: just copy the Midjourney code, swap a few labels, done. That instinct was wrong. Flux's prompt grammar is meaningfully different — it weights natural-language phrasing more, it doesn't use Midjourney's parameter syntax, its negative-prompt behavior is different. By the time I'd ported the code I'd basically rewritten it. And in the rewriting I noticed all the assumptions baked into the Midjourney generator that *weren't actually about Midjourney* — they were just the first way I'd done it. That's where the real abstraction work happens. Not at generator one. At generator two.

By the time you've built three or four, the abstraction starts to fight you. Around the SD generator I had a moment where I tried to consolidate the previous three into a shared schema, and the schema kept growing optional fields to accommodate model-specific features. After about an hour I deleted the consolidation work and went back to letting each generator be its own component with its own template file. The abstraction was making the code worse, not better. Generators look similar from the outside, but the structure of *what they let the user choose* is genuinely model-specific. Forcing them through a shared shape was costing more than it was saving.

Music is harder than image. This surprised me. I assumed the music generator (Suno) would be simpler than the image generators because the input space is smaller — fewer "style" axes, no aspect ratio, etc. Wrong. Music has duration, structure (verse / chorus), tempo, key, instrumentation, mood, genre, vocal style, and lyrical content, and most of these interact in ways the image models don't have analogues for. The Suno generator ended up being the most complex of the seven. It took twice as long as I estimated.

Video is mostly aspect-ratio plumbing. This was the inverse surprise. I expected video to be the hardest — more dimensions than image, motion to manage, etc. — and it turned out to be mostly about defaults and aspect ratios. Video models share more structure than they don't. A single generator works for Sora, Veo, Kling, Runway, and Pika with model-specific defaults swapped in.

Text-in-image is its own genre. Ideogram is the seventh generator and it is genuinely different. Most image models are bad at rendering text reliably; Ideogram's whole point is that it's good at it. So the generator's primary input isn't the subject — it's the text content, which then gets wrapped in style + layout choices around it. Building it forced me to question the input ordering pattern I'd reused six times. That was useful. It exposed an assumption I hadn't realized I was making.

The cumulative lesson is: don't trust your first generator's structure to be the right structure. Build a couple, look at them side by side, then decide what (if anything) is actually shared. Most of the apparent commonality is superficial. The deep commonality is in the *flow* — input, preview, copy, run again — not in the *fields*.

Seven generators in two months is a faster pace than I'll sustain. The eighth will probably take longer because the underlying complexity is closer to surfaced. But the pace was useful while it lasted, because it let me see the pattern across multiple models before I overcommitted to any one architecture. If I'd built one generator and spent six months polishing it, I'd be in worse shape than I am now.

Sometimes the right move is shipping seven imperfect things in two months instead of one perfect thing in six. Especially when you're trying to figure out what perfect even means.