Speech to Text
AI Runs in browserTranscribe audio to text free — upload MP3/WAV/WebM or record from microphone. AI-powered by Whisper, fully on-device and private.
Last updated 01 Apr 2026
Transcribes audio using Whisper Tiny, an AI speech recognition model running entirely in your browser. Supports MP3, WAV, WebM, M4A, OGG, and FLAC up to 100 MB, or record live from your microphone. Outputs plain text plus timestamped SRT subtitles. Auto-detects language. Your audio never leaves your device.
Click to upload or drag and drop
MP3, WAV, WEBM, M4A, OGG, FLAC up to 100MB
Audio never leaves your device — all processing runs locally in your browser
How to use
- 1
Choose input mode
Select Upload File to transcribe a saved audio file, or Microphone to record and transcribe live speech directly from your browser.
- 2
Provide your audio
In file mode, drag and drop or click to upload an MP3, WAV, WebM, M4A, OGG, or FLAC file up to 100 MB. In microphone mode, click Record, speak clearly, then click Stop.
- 3
Wait for transcription
On first use, the Whisper Tiny model (~40 MB) downloads and caches — subsequent transcriptions are instant. Progress updates as the model processes each 30-second audio chunk.
- 4
Use your transcript
Copy the plain text, download a .txt file, or export a .srt subtitle file with timestamps. The detected language is shown above the transcript.
Frequently asked questions
What audio formats are supported?
Is my audio uploaded to a server?
How large is the model download?
How accurate is Whisper Tiny?
Can it detect the language automatically?
What is an SRT file?
How long can the audio be?
Does the microphone mode save my recording?
Does it work on mobile?
How does it compare to Otter.ai?
Professional audio transcription used to mean uploading recordings to a remote
service and waiting. This tool flips that model: your audio is decoded and
transcribed locally using Whisper Tiny — OpenAI's open-source speech recognition
model — running in your browser via WebAssembly. Your recordings and files stay
completely private.
Two input modes: File mode accepts MP3, WAV, WebM, M4A, OGG, and FLAC uploads up
to 100 MB. Microphone mode uses your browser's MediaRecorder API to capture audio
directly — click Record, speak, then Stop to trigger transcription automatically.
Audio is resampled to 16 kHz mono (the format Whisper expects) using the browser's
OfflineAudioContext before being passed to the model.
Whisper automatically detects the spoken language from the first 30 seconds of
audio — no need to specify it in advance. Output includes a full plain-text
transcript alongside timestamped segments. Download as .txt or export as .srt
subtitle files for use in video editing software, YouTube captions, or
accessibility workflows.
Competitors like Otter.ai (paid) and Google Docs voice typing require sending your
audio to remote servers. Kordu processes everything on-device after a one-time
~40 MB model download.
Who is this for? Journalists transcribing interviews, video creators generating
captions, meeting participants converting recordings to notes, researchers
transcribing qualitative data, and anyone who values audio privacy.
Related tools
Word Counter
Count words, characters, sentences, and paragraphs with reading time, speaking time, and keyword density.
AI Text Summarizer
Summarize articles, essays, and documents with AI. Choose length, get bullet points, see word-count reduction. Runs on-device.
AI Language Detector
Identify the language of any text with a BERT neural network — accurate on short snippets and closely related languages. Runs on-device.