2024live

Audio2PDF

Transform spoken words into structured documents instantly

ReactWhisper APINode.jsPDF Generation

Overview

Audio2PDF converts audio recordings or live speech into structured, formatted PDF documents. It detects natural breakpoints, infers headings, and organises content into readable sections — turning raw transcription into a polished output.

The Inspiration

Sitting in back-to-back meetings and watching valuable spoken insights evaporate because nobody had time to take proper notes. I wanted a tool that could listen, understand structure, and produce a shareable document without any manual cleanup.

Tech Stack

OpenAI Whisper

Best-in-class transcription accuracy, especially for technical vocabulary and accented speech.

React

Component-driven UI made the multi-step upload → transcribe → review → export flow easy to reason about.

Node.js + Express

Lightweight API layer to orchestrate Whisper calls, text processing, and PDF assembly without introducing unnecessary complexity.

PDFKit

Programmatic PDF generation with fine-grained control over layout, fonts, and structure.

Challenges & Solutions

The Problem

Long audio files caused timeout errors in serverless functions before transcription completed.

The Solution

Implemented chunked audio splitting at silence boundaries, processing each chunk independently and merging transcripts with overlap-deduplication logic.

The Problem

Raw transcriptions lacked structure — no headings, no paragraphs, just a wall of text.

The Solution

Added a post-processing pass using GPT-4 to infer document structure from semantic cues in the transcript, injecting markdown formatting before PDF render.

What I Learned

01
Audio chunking at silence boundaries dramatically improves both accuracy and reliability over fixed-size splits.
02
LLMs are excellent post-processors for unstructured text when given a clear output schema.
03
File upload UX needs careful progress communication — users abandon flows when they feel stuck.

All Projects View Live