HTML to Markdown
Convert HTML to clean Markdown — paste HTML source and get properly formatted .md output with headings, links, lists, and code blocks.
How to use HTML to Markdown
-
Paste your HTML
Paste HTML source into the input panel. It can be a full page, a fragment copied from view-source, a rendered email, a CMS export, or the output of a scraper. Pasting works for both escaped and unescaped markup.
-
Choose a heading style
Pick ATX (`# Heading`) for maximum compatibility across GitHub, Obsidian, and every modern static site generator, or Setext (underline) if your toolchain specifically requires it. ATX is the safe default.
-
Pick a link style
Inline `[text](url)` keeps URLs next to their anchor text and is best for short content. Reference `[text][1]` moves URLs into a numbered bibliography at the bottom — preferable when the same long URL appears many times.
-
Select a bullet character
Choose `-`, `*`, or `+` for unordered list markers. All three are valid CommonMark; match whatever convention your existing repository or style guide already uses so diffs stay clean.
-
Pick a code block style
Fenced (triple backticks) is the GFM standard and works in almost every renderer. Indented (four-space) is strict original Markdown. Use fenced unless you specifically need to target a legacy renderer.
-
Convert to Markdown
Click Convert to Markdown. Output appears instantly in the right-hand panel with a live line count. The conversion runs in your browser — no network request is made.
-
Review the output for edge cases
Skim the Markdown for custom elements, inline styles, or iframes that were stripped. Re-add any language hints on fenced code blocks (```ts, ```python) that the original HTML may have dropped.
-
Copy or save the result
Click the copy button to place the Markdown on your clipboard, or select and copy manually. Paste into Obsidian, a `.md` file in your repo, a Notion page, or wherever you need portable formatted text.
HTML to Markdown FAQ
Is my HTML sent to a server?
Does it handle tables?
What Markdown flavor does it produce?
How are classes, IDs, and inline styles treated?
Can it convert an entire webpage from a URL?
Does it handle MathJax, LaTeX, or KaTeX content?
Are code-block language hints preserved?
Does it remove `<script>`, `<style>`, and `<iframe>`?
Can I customize which HTML tags are converted?
How does it handle `<br>` line breaks?
Will the Markdown round-trip back to HTML accurately?
Does it work with malformed or invalid HTML?
Is it suitable for large documents?
Why use Turndown instead of a hand-rolled converter?
Background
The Kordu HTML to Markdown converter transforms raw HTML into clean,
portable Markdown in a single click. Paste a full page, a component
fragment, a rendered email, a scraped article, or the output of
any CMS, and get properly formatted .md that renders identically in
GitHub, GitLab, Bitbucket, Obsidian, Notion, VS Code preview, Hugo, Jekyll,
Astro, Next.js, Docusaurus, MkDocs, and every other Markdown-aware
platform. No signup, no upload, no byte limit — your HTML never leaves
your browser.
Why convert HTML to Markdown
Markdown is the lingua franca of modern documentation, static site
generators, note-taking apps, and LLM prompt engineering. HTML is how
the open web stores and transmits formatted content. Teams constantly
need to move between the two: migrating a WordPress blog to Jekyll or
Hugo, importing a legacy Confluence or SharePoint knowledge base into
Notion or Obsidian, extracting the article body from a web page for an
LLM context window, converting MailChimp or Stripo email templates into
newsletter Markdown, turning API reference pages into .md files that
ship next to code, or pulling a rendered React component's HTML back
into a .mdx snippet. Doing this by hand is tedious and error-prone —
headings, links, nested lists, and tables all have different syntax in
Markdown, and manual rewrites lose formatting on every pass.
How it works
The converter is powered by Turndown, the mature, battle-tested HTML-to-Markdown library used by Obsidian's web clipper, Notion's import tools, Ghost's content importer, and countless developer utilities. Turndown parses your HTML into a DOM tree inside the browser, walks the tree node by node, and applies rule-based rewriting to produce idiomatic Markdown. Because it operates on a real DOM, it correctly handles nested structures, malformed input, unclosed tags, and the quirky output that CMSs and rich-text editors generate — the same edge cases that trip up naïve regex-based converters.
Supported HTML elements
- Headings
h1throughh6— rendered as ATX (# Heading) or Setext (underline) depending on your setting - Paragraphs,
<br>line breaks, and soft-wrap reflow - Inline formatting:
<strong>,<b>,<em>,<i>,<code>,<del>,<s>for strikethrough - Links (
<a>) in either inline[text](url)or reference[text][1]style with full reference keys - Images (
<img>) with alt text and src preserved as - Unordered and ordered lists, including deeply nested mixed lists
- Code blocks (
<pre><code>) as GFM fenced blocks with triple backticks, or as indented four-space blocks - Inline code (
<code>) wrapped with single backticks - Blockquotes (
<blockquote>) prefixed with>on every wrapped line - Horizontal rules (
<hr>) rendered as--- - Tables (
<table>) converted to GitHub-Flavored Markdown pipe tables with alignment separator rows
Configurable output
The tool exposes the four Turndown options most likely to matter when you're feeding the output into a specific downstream system:
- Heading style — ATX (
# Heading) for maximum compatibility across every Markdown flavor, or Setext (Heading\n======) if your toolchain is one of the few that prefers it. ATX is the safe default. - Link style — Inline
[text](url)for short content, or reference[text][1]with a numbered bibliography at the bottom when you have repeated long URLs that would make paragraphs unreadable. - Bullet character —
-,*, or+. CommonMark accepts all three; pick whichever matches your repo's existing convention to keep diffs tidy. - Code block style — Fenced triple-backtick blocks (GFM, Obsidian, most static site generators) or indented four-space blocks (strict original Markdown).
Real-world use cases
- CMS migrations — Moving a WordPress, Ghost, Drupal, or Contentful blog into a static site generator that expects
.mdfiles. Export the HTML, run it through the converter, and commit the Markdown files. - Knowledge-base imports — Pulling a Confluence, SharePoint, or HelpScout knowledge base into Obsidian, Notion, Logseq, or a Git-backed docs site. Most exporters dump HTML; this turns it into something human-editable.
- LLM context preparation — Feeding a web page into Claude, GPT, or Gemini. Markdown is far more token-efficient than HTML because the angle brackets, class names, and inline styles vanish, leaving only semantic structure.
- Email archiving — Converting rendered HTML newsletters and transactional emails into searchable Markdown notes for an internal wiki or customer-communications archive.
- Documentation round-trips — Pulling rendered API docs, release notes, or blog posts back into the Markdown source files that generated them, so they can be edited and re-published.
- MDX authoring — Turning a designer's HTML mockup or a Figma-exported snippet into
.mdxcontent that ships in an Astro or Next.js site.
Common pitfalls and edge cases
- Inline styles and classes are discarded. Markdown has no representation for
style="color: red"orclass="callout", so they're stripped. If styling matters, keep the original HTML or use MDX with custom components. - Custom elements and Web Components are ignored. Turndown only knows the standard HTML vocabulary;
<my-tab>,<mat-card>, or framework-specific wrappers fall back to their inner text. <iframe>,<script>, and<style>content is dropped. This is intentional — embeds have no Markdown equivalent and scripts should never survive a content conversion.<img srcset>is not preserved. Markdown images support a single source URL, so responsive-image metadata is lost. If you need art-direction, keep the HTML or switch to MDX with a custom<Image>component.- Deeply nested lists may reflow. Markdown is stricter than HTML about indentation, and some authors rely on HTML-only visual nesting that doesn't survive the round trip.
- Semantic classes like syntax-highlighting hints are lost. A
<code class="language-ts">becomes a plain fenced block; you may need to add the language hint back manually in batch with a regex.
Markdown flavors and compatibility
The output is CommonMark-compatible by default, with GitHub-Flavored Markdown extensions for tables and strikethrough — the same superset used by GitHub, GitLab, Bitbucket, Obsidian, VS Code, and the majority of static site generators. It pastes cleanly into Notion, which parses Markdown on import, and into MDX files, provided you escape any literal JSX-like brackets that survive the conversion. If your target is strict original Markdown (no fenced code, no tables), switch to indented code blocks and convert tables manually.
How this compares to alternatives
Manual rewriting is slow and loses formatting on every cycle. Pandoc is powerful but requires a local install, CLI fluency, and produces output tuned for academic writing rather than web content. Online converters based on regex patterns fail on nested structures and malformed input. Turndown strikes the right balance: DOM-accurate, configurable, and battle-tested in production tools used by millions. That's why we wrap it rather than reinventing the wheel.
When NOT to use a Markdown converter
If your downstream system needs to render the exact original HTML — with all its classes, scripts, embedded media, and inline styles — Markdown is a lossy target. Keep the HTML. Markdown is the right choice when the output will be edited by humans, committed to Git, diffed across versions, or passed to an LLM that reasons better on semantic text than on angle brackets.
Privacy
Conversion runs entirely in your browser. There is no backend, no upload, no logging, and no rate limit. Paste sensitive internal HTML — product spec pages, customer email templates, internal wiki articles — without it ever touching a server. The page works offline once loaded, so you can keep using it on a plane, a train, or behind a corporate firewall.
Related tools
Markdown Preview
Write Markdown and see a live GitHub-flavoured HTML preview side by side — export clean HTML instantly.
HTML Formatter
Format and beautify messy HTML with configurable indent, wrapping, and attribute handling — paste or type, get clean output instantly.
HTML Encoder / Decoder
Encode the five HTML-special characters into entities and decode named, decimal, and hex entities back to text. Runs fully in-browser.
Diff Checker
Compare two texts, code files, or documents side by side — word-level diff highlighting, private and browser-based.
JSON Formatter
Format, validate, and minify JSON instantly — with configurable indentation, error location, and tree view.
Text Cleaner
Clean up messy text — strip HTML, remove extra whitespace, trim blank lines, normalize line endings, and more.
Word Counter
Count words, characters, sentences, and paragraphs with reading time, speaking time, and keyword density.
Readability Score
Check text readability using the Flesch Reading Ease score and Flesch-Kincaid grade level.