Overview
Audio2PDF converts audio recordings or live speech into structured, formatted PDF documents. It detects natural breakpoints, infers headings, and organises content into readable sections — turning raw transcription into a polished output.
The Inspiration
Sitting in back-to-back meetings and watching valuable spoken insights evaporate because nobody had time to take proper notes. I wanted a tool that could listen, understand structure, and produce a shareable document without any manual cleanup.
Tech Stack
OpenAI Whisper
Best-in-class transcription accuracy, especially for technical vocabulary and accented speech.
React
Component-driven UI made the multi-step upload → transcribe → review → export flow easy to reason about.
Node.js + Express
Lightweight API layer to orchestrate Whisper calls, text processing, and PDF assembly without introducing unnecessary complexity.
PDFKit
Programmatic PDF generation with fine-grained control over layout, fonts, and structure.
Challenges & Solutions
Long audio files caused timeout errors in serverless functions before transcription completed.
Implemented chunked audio splitting at silence boundaries, processing each chunk independently and merging transcripts with overlap-deduplication logic.
Raw transcriptions lacked structure — no headings, no paragraphs, just a wall of text.
Added a post-processing pass using GPT-4 to infer document structure from semantic cues in the transcript, injecting markdown formatting before PDF render.
What I Learned
- 01
Audio chunking at silence boundaries dramatically improves both accuracy and reliability over fixed-size splits.
- 02
LLMs are excellent post-processors for unstructured text when given a clear output schema.
- 03
File upload UX needs careful progress communication — users abandon flows when they feel stuck.
