Skip to content
Kordu Tools
Developer Tools Runs in browser Updated 18 Apr 2026

HTML to Markdown

Convert HTML to clean Markdown — paste HTML source and get properly formatted .md output with headings, links, lists, and code blocks.

Headings
Links
Bullets
Code
Loading rating…

How to use HTML to Markdown

  1. Paste your HTML

    Paste HTML source into the input panel. It can be a full page, a fragment copied from view-source, a rendered email, a CMS export, or the output of a scraper. Pasting works for both escaped and unescaped markup.

  2. Choose a heading style

    Pick ATX (`# Heading`) for maximum compatibility across GitHub, Obsidian, and every modern static site generator, or Setext (underline) if your toolchain specifically requires it. ATX is the safe default.

  3. Pick a link style

    Inline `[text](url)` keeps URLs next to their anchor text and is best for short content. Reference `[text][1]` moves URLs into a numbered bibliography at the bottom — preferable when the same long URL appears many times.

  4. Select a bullet character

    Choose `-`, `*`, or `+` for unordered list markers. All three are valid CommonMark; match whatever convention your existing repository or style guide already uses so diffs stay clean.

  5. Pick a code block style

    Fenced (triple backticks) is the GFM standard and works in almost every renderer. Indented (four-space) is strict original Markdown. Use fenced unless you specifically need to target a legacy renderer.

  6. Convert to Markdown

    Click Convert to Markdown. Output appears instantly in the right-hand panel with a live line count. The conversion runs in your browser — no network request is made.

  7. Review the output for edge cases

    Skim the Markdown for custom elements, inline styles, or iframes that were stripped. Re-add any language hints on fenced code blocks (```ts, ```python) that the original HTML may have dropped.

  8. Copy or save the result

    Click the copy button to place the Markdown on your clipboard, or select and copy manually. Paste into Obsidian, a `.md` file in your repo, a Notion page, or wherever you need portable formatted text.

HTML to Markdown FAQ

Is my HTML sent to a server?

No. Conversion runs entirely in your browser using the Turndown library. Your HTML, text, and settings never leave your device — there is no backend, no logging, and no rate limiting. The page works offline once loaded.

Does it handle tables?

Yes. HTML `<table>` elements are converted to GitHub-Flavored Markdown pipe tables, with a header row, an alignment separator row of dashes, and one line per body row. Nested tables are flattened because Markdown tables cannot nest — review those cases manually.

What Markdown flavor does it produce?

CommonMark by default, plus GitHub-Flavored Markdown extensions for tables and strikethrough. The output renders identically in GitHub, GitLab, Bitbucket, Obsidian, VS Code, Hugo, Jekyll, Astro, Next.js, Docusaurus, and MkDocs.

How are classes, IDs, and inline styles treated?

They are stripped. Markdown has no representation for `class`, `id`, `style`, `data-*`, or ARIA attributes. If presentation matters, keep the original HTML or convert to MDX where you can wrap content in React components that carry the styling.

Can it convert an entire webpage from a URL?

Not directly — the tool works on HTML you paste in. Open the page, use your browser's View Source or Copy Outer HTML, paste the markup here, then convert. This keeps the tool fully client-side and avoids the CORS and scraping headaches that server-side URL fetchers create.

Does it handle MathJax, LaTeX, or KaTeX content?

Math markup passes through as inline text. If your downstream renderer supports math in Markdown (KaTeX in Obsidian, MathJax in MkDocs Material), you may need to re-wrap the expressions in `$…$` or `$$…$$` delimiters after conversion.

Are code-block language hints preserved?

Partially. The code fence is created, but the language hint on `<code class="language-ts">` does not survive because Turndown does not read class names by default. Batch-add the hints back with a regex, or paste the output into a Markdown editor that can detect the language automatically.

Does it remove `<script>`, `<style>`, and `<iframe>`?

Yes — and that is intentional. Scripts, stylesheets, and embedded iframes have no Markdown representation, and leaving them in would produce unsafe output. If you need embeds, use MDX with custom components or keep the HTML in a raw block.

Can I customize which HTML tags are converted?

Not through the UI — the current options cover the four settings that matter most in practice (headings, links, bullets, code fences). For fully custom rules you would run Turndown directly in Node with a `.addRule()` call. Open a feature request if you need a specific preset added here.

How does it handle `<br>` line breaks?

A `<br>` is converted to two trailing spaces followed by a newline, which is the CommonMark syntax for a hard line break. Most renderers display this as expected; a few strict renderers ignore single hard breaks, in which case use a blank line for a paragraph break.

Will the Markdown round-trip back to HTML accurately?

For semantic content — headings, paragraphs, links, lists, tables, emphasis, code — yes. Presentational HTML (classes, inline styles, custom elements, iframes) is not preserved on the way out, so a Markdown → HTML round trip returns plain semantic HTML rather than the original decorated markup.

Does it work with malformed or invalid HTML?

Yes. Turndown parses input through the browser's real HTML parser, which is extremely forgiving — the same parser that renders every webpage on the public internet. Unclosed tags, missing quotes, and tag-soup from legacy CMSs are handled gracefully.

Is it suitable for large documents?

Yes for anything under a few megabytes of HTML. The conversion is CPU-bound and runs on the main thread, so very large pages (tens of MB) may briefly block the UI while parsing. For multi-document batches, consider running Turndown in a Node script instead.

Why use Turndown instead of a hand-rolled converter?

Turndown operates on a real DOM tree and ships with rules for every standard HTML element, which means it correctly handles nested structures, inline formatting inside headings, tables, and malformed input. Regex-based converters fail on all of the above. Using the battle-tested library means your output matches what Obsidian, Notion, and Ghost produce on the same input.

Background

The Kordu HTML to Markdown converter transforms raw HTML into clean, portable Markdown in a single click. Paste a full page, a component fragment, a rendered email, a scraped article, or the output of any CMS, and get properly formatted .md that renders identically in GitHub, GitLab, Bitbucket, Obsidian, Notion, VS Code preview, Hugo, Jekyll, Astro, Next.js, Docusaurus, MkDocs, and every other Markdown-aware platform. No signup, no upload, no byte limit — your HTML never leaves your browser.

Why convert HTML to Markdown

Markdown is the lingua franca of modern documentation, static site generators, note-taking apps, and LLM prompt engineering. HTML is how the open web stores and transmits formatted content. Teams constantly need to move between the two: migrating a WordPress blog to Jekyll or Hugo, importing a legacy Confluence or SharePoint knowledge base into Notion or Obsidian, extracting the article body from a web page for an LLM context window, converting MailChimp or Stripo email templates into newsletter Markdown, turning API reference pages into .md files that ship next to code, or pulling a rendered React component's HTML back into a .mdx snippet. Doing this by hand is tedious and error-prone — headings, links, nested lists, and tables all have different syntax in Markdown, and manual rewrites lose formatting on every pass.

How it works

The converter is powered by Turndown, the mature, battle-tested HTML-to-Markdown library used by Obsidian's web clipper, Notion's import tools, Ghost's content importer, and countless developer utilities. Turndown parses your HTML into a DOM tree inside the browser, walks the tree node by node, and applies rule-based rewriting to produce idiomatic Markdown. Because it operates on a real DOM, it correctly handles nested structures, malformed input, unclosed tags, and the quirky output that CMSs and rich-text editors generate — the same edge cases that trip up naïve regex-based converters.

Supported HTML elements

  • Headings h1 through h6 — rendered as ATX (# Heading) or Setext (underline) depending on your setting
  • Paragraphs, <br> line breaks, and soft-wrap reflow
  • Inline formatting: <strong>, <b>, <em>, <i>, <code>, <del>, <s> for strikethrough
  • Links (<a>) in either inline [text](url) or reference [text][1] style with full reference keys
  • Images (<img>) with alt text and src preserved as ![alt](src)
  • Unordered and ordered lists, including deeply nested mixed lists
  • Code blocks (<pre><code>) as GFM fenced blocks with triple backticks, or as indented four-space blocks
  • Inline code (<code>) wrapped with single backticks
  • Blockquotes (<blockquote>) prefixed with > on every wrapped line
  • Horizontal rules (<hr>) rendered as ---
  • Tables (<table>) converted to GitHub-Flavored Markdown pipe tables with alignment separator rows

Configurable output

The tool exposes the four Turndown options most likely to matter when you're feeding the output into a specific downstream system:

  • Heading style — ATX (# Heading) for maximum compatibility across every Markdown flavor, or Setext (Heading\n======) if your toolchain is one of the few that prefers it. ATX is the safe default.
  • Link style — Inline [text](url) for short content, or reference [text][1] with a numbered bibliography at the bottom when you have repeated long URLs that would make paragraphs unreadable.
  • Bullet character-, *, or +. CommonMark accepts all three; pick whichever matches your repo's existing convention to keep diffs tidy.
  • Code block style — Fenced triple-backtick blocks (GFM, Obsidian, most static site generators) or indented four-space blocks (strict original Markdown).

Real-world use cases

  • CMS migrations — Moving a WordPress, Ghost, Drupal, or Contentful blog into a static site generator that expects .md files. Export the HTML, run it through the converter, and commit the Markdown files.
  • Knowledge-base imports — Pulling a Confluence, SharePoint, or HelpScout knowledge base into Obsidian, Notion, Logseq, or a Git-backed docs site. Most exporters dump HTML; this turns it into something human-editable.
  • LLM context preparation — Feeding a web page into Claude, GPT, or Gemini. Markdown is far more token-efficient than HTML because the angle brackets, class names, and inline styles vanish, leaving only semantic structure.
  • Email archiving — Converting rendered HTML newsletters and transactional emails into searchable Markdown notes for an internal wiki or customer-communications archive.
  • Documentation round-trips — Pulling rendered API docs, release notes, or blog posts back into the Markdown source files that generated them, so they can be edited and re-published.
  • MDX authoring — Turning a designer's HTML mockup or a Figma-exported snippet into .mdx content that ships in an Astro or Next.js site.

Common pitfalls and edge cases

  • Inline styles and classes are discarded. Markdown has no representation for style="color: red" or class="callout", so they're stripped. If styling matters, keep the original HTML or use MDX with custom components.
  • Custom elements and Web Components are ignored. Turndown only knows the standard HTML vocabulary; <my-tab>, <mat-card>, or framework-specific wrappers fall back to their inner text.
  • <iframe>, <script>, and <style> content is dropped. This is intentional — embeds have no Markdown equivalent and scripts should never survive a content conversion.
  • <img srcset> is not preserved. Markdown images support a single source URL, so responsive-image metadata is lost. If you need art-direction, keep the HTML or switch to MDX with a custom <Image> component.
  • Deeply nested lists may reflow. Markdown is stricter than HTML about indentation, and some authors rely on HTML-only visual nesting that doesn't survive the round trip.
  • Semantic classes like syntax-highlighting hints are lost. A <code class="language-ts"> becomes a plain fenced block; you may need to add the language hint back manually in batch with a regex.

Markdown flavors and compatibility

The output is CommonMark-compatible by default, with GitHub-Flavored Markdown extensions for tables and strikethrough — the same superset used by GitHub, GitLab, Bitbucket, Obsidian, VS Code, and the majority of static site generators. It pastes cleanly into Notion, which parses Markdown on import, and into MDX files, provided you escape any literal JSX-like brackets that survive the conversion. If your target is strict original Markdown (no fenced code, no tables), switch to indented code blocks and convert tables manually.

How this compares to alternatives

Manual rewriting is slow and loses formatting on every cycle. Pandoc is powerful but requires a local install, CLI fluency, and produces output tuned for academic writing rather than web content. Online converters based on regex patterns fail on nested structures and malformed input. Turndown strikes the right balance: DOM-accurate, configurable, and battle-tested in production tools used by millions. That's why we wrap it rather than reinventing the wheel.

When NOT to use a Markdown converter

If your downstream system needs to render the exact original HTML — with all its classes, scripts, embedded media, and inline styles — Markdown is a lossy target. Keep the HTML. Markdown is the right choice when the output will be edited by humans, committed to Git, diffed across versions, or passed to an LLM that reasons better on semantic text than on angle brackets.

Privacy

Conversion runs entirely in your browser. There is no backend, no upload, no logging, and no rate limit. Paste sensitive internal HTML — product spec pages, customer email templates, internal wiki articles — without it ever touching a server. The page works offline once loaded, so you can keep using it on a plane, a train, or behind a corporate firewall.