Speech to Text

AI Runs in browser

Transcribe audio to text free — upload MP3/WAV/WebM or record from microphone. AI-powered by Whisper, fully on-device and private.

Last updated 01 Apr 2026

Transcribes audio using Whisper Tiny, an AI speech recognition model running entirely in your browser. Supports MP3, WAV, WebM, M4A, OGG, and FLAC up to 100 MB, or record live from your microphone. Outputs plain text plus timestamped SRT subtitles. Auto-detects language. Your audio never leaves your device.

~40.0 MB download

Click to upload or drag and drop

MP3, WAV, WEBM, M4A, OGG, FLAC up to 100MB

Audio never leaves your device — all processing runs locally in your browser

Loading rating…

How to use

1

Choose input mode

Select Upload File to transcribe a saved audio file, or Microphone to record and transcribe live speech directly from your browser.
2

Provide your audio

In file mode, drag and drop or click to upload an MP3, WAV, WebM, M4A, OGG, or FLAC file up to 100 MB. In microphone mode, click Record, speak clearly, then click Stop.
3

Wait for transcription

On first use, the Whisper Tiny model (~40 MB) downloads and caches — subsequent transcriptions are instant. Progress updates as the model processes each 30-second audio chunk.
4

Use your transcript

Copy the plain text, download a .txt file, or export a .srt subtitle file with timestamps. The detected language is shown above the transcript.

Frequently asked questions

What audio formats are supported?

MP3, WAV, WebM, M4A (AAC), OGG, and FLAC — up to 100 MB per file. Audio is decoded by your browser's built-in AudioContext, which supports all formats the browser natively supports.

Is my audio uploaded to a server?

No. All transcription runs in your browser using WebAssembly. Your audio files and microphone recordings never leave your device.

How large is the model download?

Whisper Tiny is approximately 40 MB. It downloads once and is cached — subsequent uses are instant. We use the multilingual variant rather than the English-only model to support language detection.

How accurate is Whisper Tiny?

Whisper Tiny achieves strong accuracy on clean audio in English and dozens of other languages. Accuracy decreases with heavy background noise, strong accents, or very fast speech. For critical transcription work, larger Whisper models (Small, Base) offer better accuracy at higher model size.

Can it detect the language automatically?

Yes. Whisper identifies the spoken language from the first 30 seconds of audio and shows a language badge above the transcript. Manual language override is not required.

What is an SRT file?

SRT (SubRip Text) is the most widely supported subtitle format. Each entry has a sequence number, start/end timestamp, and transcript text. SRT files can be uploaded to YouTube, imported into Premiere Pro, Final Cut, DaVinci Resolve, and most video players.

How long can the audio be?

There is no hard duration limit. Whisper processes audio in 30-second chunks, so longer files take proportionally longer. Files over 30 minutes may take several minutes to transcribe and require sufficient device RAM.

Does the microphone mode save my recording?

No. The recording is held in memory only long enough to transcribe it. Once you reset or close the page, it is gone — nothing is stored on disk or sent anywhere.

Does it work on mobile?

Yes. Microphone transcription works in Chrome and Safari on iOS and Android. File upload works on any modern mobile browser. The 40 MB download applies on first use — Wi-Fi is recommended.

How does it compare to Otter.ai?

Otter.ai is a paid service that uploads your audio to remote servers. Kordu's transcription is free, processes everything on-device, and never uploads your audio — making it better for private or sensitive recordings. Otter.ai offers real-time collaboration and meeting integration that this tool does not.

Professional audio transcription used to mean uploading recordings to a remote

service and waiting. This tool flips that model: your audio is decoded and

transcribed locally using Whisper Tiny — OpenAI's open-source speech recognition

model — running in your browser via WebAssembly. Your recordings and files stay

completely private.

Two input modes: File mode accepts MP3, WAV, WebM, M4A, OGG, and FLAC uploads up

to 100 MB. Microphone mode uses your browser's MediaRecorder API to capture audio

directly — click Record, speak, then Stop to trigger transcription automatically.

Audio is resampled to 16 kHz mono (the format Whisper expects) using the browser's

OfflineAudioContext before being passed to the model.

Whisper automatically detects the spoken language from the first 30 seconds of

audio — no need to specify it in advance. Output includes a full plain-text

transcript alongside timestamped segments. Download as .txt or export as .srt

subtitle files for use in video editing software, YouTube captions, or

accessibility workflows.

Competitors like Otter.ai (paid) and Google Docs voice typing require sending your

audio to remote servers. Kordu processes everything on-device after a one-time

~40 MB model download.

Who is this for? Journalists transcribing interviews, video creators generating

captions, meeting participants converting recordings to notes, researchers

transcribing qualitative data, and anyone who values audio privacy.

Speech to Text

How to use

Choose input mode

Provide your audio

Wait for transcription

Use your transcript

Frequently asked questions

Related tools

Word Counter

AI Text Summarizer

AI Language Detector