Skip to content
Kordu Tools Kordu Tools

AI Image Caption & Alt Text Generator

AI Runs in browser

Generate descriptive captions and accessibility alt text for images using AI — free, no account, runs in your browser.

Last updated 01 Apr 2026

Generate natural-language captions or screen-reader-ready alt text for any image using ViT-GPT2 — an AI vision-language model. Caption mode produces a full descriptive sentence. Alt text mode produces a concise WCAG-compliant description under 125 characters with a ready-to-paste HTML snippet. Runs in your browser after a one-time 90MB model download. Your images are never uploaded.

Checking model cache...
Loading rating…

How to use

  1. 1

    Load the AI model

    Click Download Model to load the 90MB ViT-GPT2 model into your browser. This is a one-time download — it caches for all future visits.

  2. 2

    Choose your output mode

    Select Caption for a full descriptive sentence, or Alt Text for a concise WCAG-compliant description optimized for screen readers (under 125 characters).

  3. 3

    Upload your image

    Drag and drop or click to upload a PNG, JPG, or WebP image up to 20MB.

  4. 4

    Generate and copy

    The AI generates a description automatically. Copy the caption text or use the ready-made HTML snippet for immediate use in your code or CMS.

Frequently asked questions

Why does this tool require a 90MB download?
The ViT-GPT2 model runs locally in your browser for privacy — your images are never sent to a server. The 90MB model downloads once and is cached, so future visits are instant with no re-download.
What is alt text and why does it matter?
Alt text (alternative text) is a text description attached to an HTML image element. Screen readers read alt text aloud to visually impaired users. Search engines use it to understand image content and index pages correctly. WCAG 2.2 requires meaningful alt text on all informative images.
How accurate are the generated captions?
ViT-GPT2 performs well on common subjects — people, animals, outdoor scenes, and everyday objects. Abstract art, data charts, technical diagrams, screenshots of text, and highly stylized images may produce less accurate descriptions.
What is the alt text character limit?
Most screen readers read alt text up to approximately 125 characters before truncating. Alt text mode automatically constrains output to this length and strips filler phrases like 'image of' for maximum information density.
Are my images private?
Yes. All processing runs locally in your browser using WebAssembly. Your images are never uploaded to any server or seen by anyone else.
Can I generate multiple caption options?
Yes. Click Generate again to produce an alternative caption for the same image — the model uses some sampling randomness in text generation.
Does this help with SEO?
Yes. Search engines like Google use alt text to understand image content and index pages for relevant queries. Descriptive, keyword-rich alt text can improve image search rankings and overall page SEO.
What languages does the caption output in?
ViT-GPT2 was trained on English data and generates captions in English. For other languages, translate the output using a separate translation tool.
Does this tool work for e-commerce product images?
Yes. It generates descriptive captions for product photos — useful for writing alt text at scale for large product catalogues.

AI image captioning converts any photo into a descriptive sentence — a task that matters

for accessibility, SEO, and content production. Screen readers rely on alt text to describe

images to visually impaired users, search engines index alt attributes to understand page

content, and CMS workflows need accurate descriptions for photo libraries and social posts.

WCAG 2.2 requires meaningful alt text on every informative image, yet the majority of web

images still have empty or generic alt attributes.

This tool uses ViT-GPT2, a vision-language transformer combining a Vision Transformer (ViT)

image encoder with a GPT-2 text decoder. Upload any photo and get a fluent English description

of the scene. Switch between two modes: Caption mode for full descriptive sentences, and Alt

Text mode for concise screen-reader-optimized descriptions capped at 125 characters with

a ready-to-paste HTML img snippet.

Perfect for web developers and content teams doing accessibility audits, bloggers captioning

stock photos, e-commerce teams writing product image descriptions, and anyone optimizing

image SEO at scale. The 90MB model downloads once and caches in your browser — future

sessions are instant. Your images are never uploaded to any server.

Related tools