LuvAI JournalMay 13, 20265 min read

A tiny tool for an annoying chore

On shipping a free LRC subtitle generator because the API got cheap enough that the problem became embarrassing to leave unsolved. Plus a Chinese-language version below.

BySakuyaAn independent studio in Taipei, Taiwan

There's a particular kind of busywork I've watched friends do over the years that has always seemed like a category error: sitting in front of a music file, pausing every two seconds, typing a timestamp into a text file, and pressing play again. Karaoke makers do it. Indie filmmakers do it for music-heavy scenes. Language-learning content creators do it. Anyone making a TikTok with synchronized captions for a song does it. The work is mechanical, error-prone, and roughly thirty minutes per three-minute song if you're slow about it. It's also a problem that *was* unsolvable in 2018 and *is* trivially solvable in 2026, and the gap between those two facts is wider than most people seem to have noticed.

The category error is this: humans should not be the ones aligning text to audio when the audio already says the text and a machine can listen. The only reason it stayed a manual job for so long is that automatic transcription was unreliable enough, and word-level timestamps were rare enough, that the workflow never quite worked. You'd get a transcript that disagreed with the actual lyrics, or you'd get timestamps without granularity, or you'd get a working system locked inside a $30/month app that asked for your credit card before showing you whether it actually did the thing.

OpenAI's Whisper, the original 2022 release, was the first version of this technology that I'd describe as good enough. The newer transcription models are even better, though counterintuitively the original Whisper is still the only one that returns word-level timestamps in a format usable for forced alignment. Pricing is now $0.006 a minute. A three-and-a-half minute pop song costs about two cents to transcribe. The arithmetic on this finally tipped over: a tool that does the boring thirty minutes of work, in twenty seconds, for less than the cost of a single SMS message, is a tool that should exist.

So I built it. It lives on PromptCraft at [/lrc-sync-generator](https://prompt.luvai.net/lrc-sync-generator). You upload an MP3, paste the lyrics you already have, click a button, and a few seconds later there's an LRC file with timestamps. The hard part isn't the transcription — the hard part is the alignment, because Whisper hears what's in the audio and your lyrics are what you typed, and the two don't always match. So there's a small algorithm: take the first few characters of each lyric line, walk Whisper's word stream looking for them, take that position's timestamp. When it can't find a match, it falls back to the first two characters; when it can't find that, it leaves the line untimestamped and asks the human to fix it. Most of the time, on Chinese pop or Western pop, it gets 90%+ right on the first pass.

There are three other modes in the same tool because once I had the audio infrastructure built, the marginal cost of adding them was zero. A manual mode where you tap the spacebar while the song plays, for when the AI gets a tricky song wrong. A format converter between LRC and SRT (because karaoke uses LRC and video subtitles use SRT and nobody had a clean local-only converter that did both vertical and horizontal Chinese). A reverse extractor that strips timestamps out of either format, for when someone hands you a karaoke file and you just want the plain lyrics. None of these three need to call an API. They're all string manipulation that runs in the browser. Free, no auth, no servers.

The AI mode is gated to logged-in members at five credits a call, because the Whisper bill is real and I don't want it growing without bound. New signups get thirty credits on arrival, which buys you six songs. That seems like enough to find out whether the tool is useful to you. If you need more, the credit packages on the other LuvAI surface (stock.luvai.net) work here too — same accounts, same wallet, cross-domain by design.

There's nothing particularly clever about this tool. It's a thin algorithm on top of an existing API. What I want to record, because I think it's the thing worth recording, is that the *decision* to build it was based on a calculation I now find myself making more and more often: this thing used to be hard, and it isn't anymore, and the gap between "still hard" and "trivially easy" is filled with chores that real people are still doing manually because nobody got around to writing the trivial version. I think that's where most of the useful indie-AI-tool work is sitting right now. Not the moonshots. The boring chores that finally became automatable.

The other thing worth saying is that the architecture this tool sits on — the credits, the auth, the cross-domain wallet — was the work. Whisper integration was a couple of hours. The infrastructure that lets a brand-new subdomain plug into LuvAI's existing user system, charge against the existing point bucket, and stay synchronized with the existing admin tooling: that was months. This is the part the casual observer misses about indie software. The visible feature is the tip of an iceberg of less-visible plumbing. Every new tool is cheap because the plumbing was expensive once.

If you have lyrics-with-audio sitting somewhere on your hard drive that you've been meaning to align, give it a try. It's free if you've already got a LuvAI account; if not, the signup credits cover six songs. And if you do something interesting with the output, send it. I love hearing what tools get used for; it's how I know which ones to build next.

---

## 中文版

有一種工作我看朋友做了好幾年、一直覺得是「人不該做」的事：對著一個 MP3 檔案、聽兩秒、按暫停、在文字檔輸入一個時間戳、繼續聽。卡拉 OK 字幕製作者在做、做 TikTok 對唇影片的人在做、製作語言學習教材的人也在做。一首三分半的歌手工對軸大約要花三十分鐘。這項工作 2018 年沒辦法自動化，2026 年卻可以幾乎零成本解決 — 但中間這條鴻溝、很多人還沒意識到。

問題的本質是：當音訊本身就「說」了文字、機器又能聽得懂、人類就不該繼續手動把兩邊對起來。會拖到今天還沒被解掉、只是因為過去自動辨識不夠準、字級時間戳不夠細、能用的工具又卡在每月 30 美元的訂閱牆裡。

OpenAI 的 Whisper（原版 2022 釋出）是第一個我覺得「夠用」的版本。它的價格現在是每分鐘 $0.006 美金、一首三分半的流行歌轉錄成本約 NT$0.7。算盤終於翻過去了：一個能把三十分鐘無聊工作壓成二十秒、成本不到一則簡訊的工具、是該存在的工具。

所以我做了。位置在 PromptCraft 子網的 [/lrc-sync-generator](https://prompt.luvai.net/lrc-sync-generator)。流程：上傳 MP3、貼你已經有的歌詞、按一下產生、幾秒後拿到帶時間戳的 LRC 檔。難的不是轉錄、是對齊 — Whisper 聽到的字跟你貼的歌詞不一定一致、所以我寫了一段小演算法：抓每行歌詞前 4 個字、在 Whisper 字流裡 linear scan 找位置、取那個 timestamp。找不到 4 字 fallback 找 2 字、再找不到就讓那行沒 timestamp、人自己補。中文流行歌跟西洋歌平均第一次跑都能對到 90% 以上。

同一個工具還有三個 mode、因為音訊基建蓋好之後加它們邊際成本是零：手動模式（用空白鍵自己敲時間、AI 對錯時用）、LRC ↔ SRT 雙向格式轉換（卡拉 OK 用 LRC、影片字幕用 SRT、中文還區分直書橫書、市面沒有乾淨的純前端工具）、反向萃取（把 LRC/SRT 字幕去掉時間戳拿純文字）。後三個 mode 完全沒打 API、純瀏覽器 JavaScript 跑、免費、不需登入。

AI mode 限會員、單次扣 5 點 — Whisper 帳單是真的、我不希望沒上限。新用戶註冊送 30 點、夠你跑 6 首歌。如果你發現自己常用、stock.luvai.net 那邊的點數加購包是共用的、同一個帳號、同一個錢包、跨子網設計。

這個工具沒什麼了不起、就是套在現成 API 上的薄薄一層演算法。但我想記下來的是：做這個工具的「決定」本身、是基於一個我最近愈來愈常做的計算 — 「這件事以前很難、現在不難了、中間那道鴻溝填滿著還在手動做的人」。我覺得目前 indie AI 工具最有用的工作、大部份都坐在這個區段裡。不是登月計畫、是那些終於變得可以自動化的、無聊的家事。

另外想說的是：這工具坐的基建 — 點數系統、登入、跨子網錢包 — 才是真工作。Whisper 串接花幾小時、讓一個全新子網接進 LuvAI 既有用戶系統、用既有點數扣費、跟既有 admin 工具同步：那是好幾個月的事。這是業外人看 indie software 常漏掉的：可見的功能只是冰山一角、底下的水管管線才花了大錢、每多一個工具看似便宜、是因為管線那一次貴的已經付過了。

如果你硬碟裡有歌詞跟音訊各自存著、一直懶得對軸、試試看。已經有 LuvAI 帳號的話免費、沒有的話註冊送的點夠對 6 首。對出來如果做了什麼好玩的、寄信給我聽聽。看人怎麼用工具、是我知道下一個該做什麼的方式。