survives HTML encoding if it lands inside a string literal. You need JavaScript string escaping (backslash escapes for quotes, newlines, <, >, /) plus optional HTML escaping if the script is inside an event handler attribute. Use a templating engine that understands the JS context, not manual concatenation."}},{"@type":"Question","name":"How do I handle double-encoded text like &amp;?","acceptedAnswer":{"@type":"Answer","text":"Double-encoding happens when already-encoded content is encoded again. Decode the string, inspect the result — if visible entities like & remain, decode a second time. Some producers triple-encode; decode until the output stabilizes. The Kordu decoder is idempotent, so decoding plain text that contains no entities returns it unchanged."}},{"@type":"Question","name":"Does the decoder handle Unicode and emoji?","acceptedAnswer":{"@type":"Answer","text":"Yes. Every Unicode code point can be expressed as a numeric reference. ☺ decodes to ☺, 😀 decodes to 😀, and 中文 decodes to 中文. The decoder supports the full Unicode range, including supplementary planes and emoji in the astral planes (code points above U+FFFF)."}},{"@type":"Question","name":"Can I decode hex references like ☺?","acceptedAnswer":{"@type":"Answer","text":"Yes. Hex numeric references (&#x followed by hex digits and a semicolon), decimal numeric references (&# followed by decimal digits and a semicolon), and named entities (©) are all accepted. A document that mixes all three formats decodes cleanly — ©, ©, and © all resolve to ©."}},{"@type":"Question","name":"Do I need both " and '?","acceptedAnswer":{"@type":"Answer","text":"Yes, if the encoded text might appear inside either single-quoted or double-quoted attribute values. Most templating engines encode both to be safe, since you may not know at encode time which quote style the downstream template uses. Kordu emits " for \" and ' for ' (rather than ') because ' is not a valid named entity in HTML 4 and has inconsistent legacy support."}},{"@type":"Question","name":"What about < inside a URL parameter?","acceptedAnswer":{"@type":"Answer","text":"URL parameters need percent-encoding (%3C for <), not HTML entity encoding (<). They are different encoding layers that solve different problems. For a tagged URL placed inside an href attribute, percent-encode the query values first (encodeURIComponent), then HTML-encode the whole URL when inserting it into the attribute. Skipping either step creates broken links or injection opportunities."}},{"@type":"Question","name":"Does this tool replace a server-side escape library?","acceptedAnswer":{"@type":"Answer","text":"No. It is a browser utility for content preparation, inspection, and debugging. Production code that renders untrusted input into HTML must escape on the server at the template boundary with an audited library that understands every context (element, attribute, script, URL, CSS). Client-side escaping is convenient but never a security boundary by itself."}},{"@type":"Question","name":"Can I encode a whole HTML page at once?","acceptedAnswer":{"@type":"Answer","text":"You can, but the result will be literal text that displays the page's source code rather than a usable HTML document. Whole-page encoding is useful for tutorials, documentation, or blog posts where you want readers to see raw HTML. For normal development, only encode the dynamic content being injected — never the static template markup around it."}},{"@type":"Question","name":"Is my text uploaded anywhere?","acceptedAnswer":{"@type":"Answer","text":"No. All encoding and decoding runs entirely in your browser using JavaScript. Your text never leaves your device, nothing is logged, and the tool works offline once the page has loaded. Safe to use for proprietary content, private code snippets, or internal data."}}]} Skip to content
Kordu Tools
Developer Tools Runs in browser Updated 18 Apr 2026

HTML Encoder / Decoder

Encode the five HTML-special characters into entities and decode named, decimal, and hex entities back to text. Runs fully in-browser.

HTML encoded
Loading rating…

How to use HTML Encoder / Decoder

  1. Choose encode or decode

    Switch the mode toggle to Encode when you need to convert plain text or HTML into entities, or Decode when you need to turn entity-laden content back into readable text.

  2. Paste your content

    Paste the text, HTML snippet, or entity-encoded string into the input area. The tool accepts anything from a single character to multi-kilobyte documents and streams.

  3. Run the conversion

    The result appears instantly as you type — encoding turns &, <, >, ", and ' into their entity forms; decoding resolves named, decimal, and hex entities back to the original characters.

  4. Inspect the result for edge cases

    Scan the output for double-encoded remnants like &amp;amp; (a sign the source was already encoded) or partial encoding where only some ampersands were transformed. Decode twice if needed.

  5. Copy the result

    Click Copy to place the encoded or decoded text on your clipboard. The tool does not alter whitespace, line breaks, or character order, so the output is safe to paste into templates, editors, or diff tools.

  6. Integrate into your template

    Drop encoded content into HTML element positions (inside <p>, <div>, <span>, etc.). For attributes, confirm the attribute is quoted. For JavaScript strings, CSS, or URL parameters, apply the additional context-specific escaping that HTML encoding alone does not cover.

  7. Verify in the target context

    Load the page in a browser and view source to confirm entities render as the intended characters and that no stray < or & survived the encoding pass. For security-sensitive contexts, also run the input through a production-grade server-side escape library.

HTML Encoder / Decoder FAQ

Why encode HTML?

HTML is a markup language with characters that carry structural meaning. Unescaped <, >, &, and quotes can break the page, cause rendering errors, or open an XSS vulnerability when untrusted content is injected. Encoding converts those characters to inert entity references so the browser displays them as literal text instead of parsing them as markup.

What are the five critical characters that must always be encoded?

& becomes &amp;, < becomes &lt;, > becomes &gt;, " becomes &quot;, and ' becomes &#39;. OWASP and the WHATWG HTML spec agree these five are the minimum-safe set for HTML element content and attribute values. The ampersand must be encoded first so later substitutions do not produce double-encoding.

Are there more HTML entities than the five?

Yes — the WHATWG HTML Living Standard defines 2,125+ named character references, covering Latin supplements, punctuation, mathematical symbols, Greek letters, arrows, currency signs, and more. Common examples: &nbsp; (non-breaking space), &copy; (©), &trade; (™), &mdash; (—), &hellip; (…), &larr; (←). Any character can also be referenced numerically.

Named entities vs numeric entities — which should I use?

Named entities are more readable (&copy; is clearer than &#169;). Numeric entities are universal — every Unicode code point has a decimal and hex form, even when no name exists. For content you edit by hand, named is friendlier. For machine output or rare characters, numeric is safer. All three forms decode identically; pick the one your tooling emits consistently.

Is this tool safe to use for preventing XSS?

It correctly applies the five-character rule for HTML element content, which is one layer of XSS defence. It is not a complete security solution. Always escape on the server at the template boundary using an audited library (OWASP Java Encoder, html-escaper, Django auto-escape, Rails h(), Go html/template). Client-side encoding alone is never a security boundary — an attacker controlling the client can skip it.

Attribute values vs element content — same rules?

Both need the five-character rule, but attribute values must also be quoted. An unquoted attribute (title=foo) accepts whitespace, =, backticks, and more as terminators, which vastly expands the escape surface. Always quote attribute values with double quotes and apply the standard five-character encoding inside them.

What about content inside <script> tags or JavaScript strings?

HTML encoding is not sufficient in JavaScript contexts. A payload like </script><script>alert(1)</script> survives HTML encoding if it lands inside a string literal. You need JavaScript string escaping (backslash escapes for quotes, newlines, <, >, /) plus optional HTML escaping if the script is inside an event handler attribute. Use a templating engine that understands the JS context, not manual concatenation.

How do I handle double-encoded text like &amp;amp;?

Double-encoding happens when already-encoded content is encoded again. Decode the string, inspect the result — if visible entities like &amp; remain, decode a second time. Some producers triple-encode; decode until the output stabilizes. The Kordu decoder is idempotent, so decoding plain text that contains no entities returns it unchanged.

Does the decoder handle Unicode and emoji?

Yes. Every Unicode code point can be expressed as a numeric reference. &#x263A; decodes to ☺, &#x1F600; decodes to 😀, and &#x4E2D;&#x6587; decodes to 中文. The decoder supports the full Unicode range, including supplementary planes and emoji in the astral planes (code points above U+FFFF).

Can I decode hex references like &#x263A;?

Yes. Hex numeric references (&#x followed by hex digits and a semicolon), decimal numeric references (&# followed by decimal digits and a semicolon), and named entities (&copy;) are all accepted. A document that mixes all three formats decodes cleanly — &copy;, &#169;, and &#xA9; all resolve to ©.

Do I need both &quot; and &#39;?

Yes, if the encoded text might appear inside either single-quoted or double-quoted attribute values. Most templating engines encode both to be safe, since you may not know at encode time which quote style the downstream template uses. Kordu emits &quot; for " and &#39; for ' (rather than &apos;) because &apos; is not a valid named entity in HTML 4 and has inconsistent legacy support.

What about &lt; inside a URL parameter?

URL parameters need percent-encoding (%3C for <), not HTML entity encoding (&lt;). They are different encoding layers that solve different problems. For a tagged URL placed inside an href attribute, percent-encode the query values first (encodeURIComponent), then HTML-encode the whole URL when inserting it into the attribute. Skipping either step creates broken links or injection opportunities.

Does this tool replace a server-side escape library?

No. It is a browser utility for content preparation, inspection, and debugging. Production code that renders untrusted input into HTML must escape on the server at the template boundary with an audited library that understands every context (element, attribute, script, URL, CSS). Client-side escaping is convenient but never a security boundary by itself.

Can I encode a whole HTML page at once?

You can, but the result will be literal text that displays the page's source code rather than a usable HTML document. Whole-page encoding is useful for tutorials, documentation, or blog posts where you want readers to see raw HTML. For normal development, only encode the dynamic content being injected — never the static template markup around it.

Is my text uploaded anywhere?

No. All encoding and decoding runs entirely in your browser using JavaScript. Your text never leaves your device, nothing is logged, and the tool works offline once the page has loaded. Safe to use for proprietary content, private code snippets, or internal data.

Background

The Kordu HTML Encoder/Decoder converts between plain text and HTML entity notation in both directions. Encoding replaces characters that have special meaning in HTML markup with their safe entity equivalents so that user-supplied or untrusted text can be injected into an HTML document without breaking the page or opening a cross-site scripting (XSS) vulnerability. Decoding reverses the process — converting HTML entities scraped from feeds, APIs, datasets, or AI outputs back to the original characters so the content is human-readable again.

Why encode HTML in the first place

HTML is a markup language, which means a handful of characters carry structural meaning. The moment any of those characters appear in text that the browser parses as HTML, the parser treats them as part of the markup rather than as data. An unescaped < starts a tag, an unescaped & can begin an entity reference, and an unescaped " can close an attribute and let an attacker inject arbitrary attributes or event handlers. Three concrete reasons to encode:

  1. XSS prevention. Reflecting untrusted input — URL parameters, form submissions, database fields, webhook payloads — directly into HTML is the classic XSS vector. Encoding turns <script> into the inert literal text &lt;script&gt; so the browser renders it as characters rather than executing it. This mitigates stored XSS (attacker payload persisted in the database), reflected XSS (payload echoed back from a query string), and DOM-based XSS (payload injected client-side via innerHTML or similar).
  2. Character display. When you want the literal characters <div> to appear on the page as text (inside a tutorial, code snippet, or documentation block), you must encode them so the browser shows them instead of parsing them as a tag.
  3. Data integrity. Round-tripping content through HTML without encoding silently mutates ampersands, quotes, and angle brackets. Encoding preserves the original bytes across serialization boundaries.

The five minimum-safe characters

For HTML element content and attribute values, the OWASP HTML escape rule and the WHATWG HTML spec agree on five characters that must always be encoded:

  • & becomes &amp; — the ampersand starts every entity reference, so it has to be escaped first (before any other character) to avoid double-encoding.
  • < becomes &lt; — opens a tag; unescaped < can close surrounding elements or start an attacker-controlled tag.
  • > becomes &gt; — strictly speaking not required everywhere, but encoding it makes parsing deterministic and avoids edge cases in legacy parsers.
  • " becomes &quot; — closes double-quoted attribute values; unescaped " inside an attribute can terminate the attribute and inject new ones.
  • ' becomes &#39; (or &apos; in XHTML/XML) — closes single-quoted attribute values. Most encoders emit the numeric form &#39; because &apos; is not a valid named entity in HTML 4 and some older browsers treat it inconsistently.

The Kordu encoder emits &#39; for the apostrophe to maximize compatibility across HTML 4, HTML 5, and XML consumers.

Entity families — named, decimal, hex

HTML supports three notations for referencing characters, and all three decode to the same underlying code point:

  • Named entities — human-readable labels like &amp;, &lt;, &nbsp; (non-breaking space), &copy; (©), &trade; (™), &mdash; (—), &hellip; (…), &larr; (←). The WHATWG HTML Living Standard defines a list of 2,125+ named character references, ranging from Latin supplements to mathematical operators and arrows.
  • Decimal numeric references&# followed by the decimal code point and a semicolon: &#160; for &nbsp;, &#169; for ©, &#8212; for . Every Unicode code point can be referenced this way, including emoji and characters outside the Basic Multilingual Plane.
  • Hexadecimal numeric references&#x followed by the hex code point and a semicolon: &#xA0; is the same non-breaking space as &#160; and &nbsp;. Hex is often more compact for high code points and aligns with how Unicode tables are typically published.

All three resolve to the same character. The Kordu decoder accepts every form — a document containing &copy;, &#169;, and &#xA9; will decode all three to © identically.

Context-sensitive escaping

This is the most commonly overlooked aspect of safe HTML output. Different contexts require different escaping rules, and applying the wrong rule can silently leave you vulnerable even when the output looks encoded:

  • Element content (<p>HERE</p>) — the five-character rule is sufficient.
  • Attribute value (<a title="HERE">) — also needs the five-character rule, but quoting the attribute is mandatory. An unquoted attribute value opens a much larger escape surface (whitespace, =, backticks, and more can end the attribute).
  • JavaScript string (<script>var x = "HERE";</script>) — HTML encoding alone is not safe. You must also escape JavaScript string terminators (\, ", ', <, >, /, newline) using JS-escape rules, then optionally HTML-escape the result if it sits inside an event handler attribute. The correct production approach is a templating engine that understands the JS context, not manual concatenation.
  • URL parameter (<a href="/page?q=HERE">) — use percent-encoding (encodeURIComponent) first, then HTML-encode the whole URL for attribute placement. HTML-encoding alone leaves ?, &, =, and # active as URL separators, which breaks the link and can let an attacker inject additional parameters.
  • CSS string (<div style="background: url('HERE')">) — needs CSS escaping rules (backslash-hex notation) plus HTML escaping at the attribute boundary.

Use a trusted library on the server

For production server code that handles untrusted input, do not hand-roll encoding. Use a maintained library: OWASP Java Encoder, DOMPurify (for sanitizing rather than escaping), html-escaper or he in Node.js, Django or Jinja auto-escaping, Rails h(), Go's html/template. These libraries handle every context correctly and are audited against the OWASP XSS Prevention Cheat Sheet. Client-side encoding on its own is not a security boundary — an attacker controlling the client can simply skip it. Treat the Kordu tool as a utility for content preparation, inspection, and debugging — not as your last line of defense against XSS.

Dangerous characters beyond the five

In specialised contexts, more characters matter. Backticks can terminate unquoted attribute values in old IE. Forward slashes close self-closing tags. Parentheses matter inside JavaScript or CSS url(). Control characters and non-printing Unicode (bidi overrides, zero-width joiners) can hide payloads. If you're building a sanitizer rather than an encoder, reach for DOMPurify or a server-side HTML sanitizer that strips dangerous tags and attributes entirely.

Decoding scraped content and feeds

RSS feeds, JSON APIs, scraped web pages, and AI model outputs often contain HTML entities — sometimes double-encoded (&amp;amp; meaning a literal &amp; was itself encoded). If one decode pass still leaves visible &amp; in your output, decode a second time. Some producers triple-encode; decode until the output stabilizes. The Kordu decoder is idempotent: running it on plain text that contains no entities returns the text unchanged.

Unicode and numeric character references

Any Unicode code point can be expressed as a numeric reference, which is how legacy systems transport characters they cannot emit directly. &#x263A; decodes to , &#x1F600; decodes to 😀, and &#x4E2D;&#x6587; decodes to 中文. The Kordu decoder handles the full Unicode range including supplementary planes and emoji.

Common mistakes

  • Encoding already-encoded content, producing &amp;amp; in your output. Always decode first if you're not sure of input state.
  • Forgetting to encode attribute values — the < rule alone is not enough if attacker input lands inside title="…" unquoted.
  • Relying on client-side escaping for security. Escape on the server, at the template boundary, where the output is actually rendered.
  • Confusing HTML encoding with URL encoding. They use different rules and serve different purposes — %20 is a URL-encoded space, &nbsp; is a non-breaking space entity, they are not interchangeable.

HTML encoding vs URL encoding

They look similar but apply in different layers. URL encoding (percent- encoding) makes bytes safe inside a URL — spaces become %20, & becomes %26. HTML encoding makes characters safe inside HTML markup — spaces stay spaces, & becomes &amp;. A tagged URL destined for an href attribute needs percent-encoding on the query values and HTML-encoding on the whole string when it's placed into the attribute.

Privacy and data processing

All encoding and decoding runs entirely in your browser. The Kordu HTML Encoder/Decoder is pure client-side JavaScript — there is no backend request, no logging, and no storage. Your text never leaves your device, which means the tool works offline once the page is loaded and is safe to use for proprietary content, snippets from private codebases, or internal data.