HTML Encoder / Decoder
Encode the five HTML-special characters into entities and decode named, decimal, and hex entities back to text. Runs fully in-browser.
How to use HTML Encoder / Decoder
-
Choose encode or decode
Switch the mode toggle to Encode when you need to convert plain text or HTML into entities, or Decode when you need to turn entity-laden content back into readable text.
-
Paste your content
Paste the text, HTML snippet, or entity-encoded string into the input area. The tool accepts anything from a single character to multi-kilobyte documents and streams.
-
Run the conversion
The result appears instantly as you type — encoding turns &, <, >, ", and ' into their entity forms; decoding resolves named, decimal, and hex entities back to the original characters.
-
Inspect the result for edge cases
Scan the output for double-encoded remnants like &amp; (a sign the source was already encoded) or partial encoding where only some ampersands were transformed. Decode twice if needed.
-
Copy the result
Click Copy to place the encoded or decoded text on your clipboard. The tool does not alter whitespace, line breaks, or character order, so the output is safe to paste into templates, editors, or diff tools.
-
Integrate into your template
Drop encoded content into HTML element positions (inside <p>, <div>, <span>, etc.). For attributes, confirm the attribute is quoted. For JavaScript strings, CSS, or URL parameters, apply the additional context-specific escaping that HTML encoding alone does not cover.
-
Verify in the target context
Load the page in a browser and view source to confirm entities render as the intended characters and that no stray < or & survived the encoding pass. For security-sensitive contexts, also run the input through a production-grade server-side escape library.
HTML Encoder / Decoder FAQ
Why encode HTML?
What are the five critical characters that must always be encoded?
Are there more HTML entities than the five?
Named entities vs numeric entities — which should I use?
Is this tool safe to use for preventing XSS?
Attribute values vs element content — same rules?
What about content inside <script> tags or JavaScript strings?
How do I handle double-encoded text like &amp;?
Does the decoder handle Unicode and emoji?
Can I decode hex references like ☺?
Do I need both " and '?
What about < inside a URL parameter?
Does this tool replace a server-side escape library?
Can I encode a whole HTML page at once?
Is my text uploaded anywhere?
Background
The Kordu HTML Encoder/Decoder converts between plain text and HTML entity notation in both directions. Encoding replaces characters that have special meaning in HTML markup with their safe entity equivalents so that user-supplied or untrusted text can be injected into an HTML document without breaking the page or opening a cross-site scripting (XSS) vulnerability. Decoding reverses the process — converting HTML entities scraped from feeds, APIs, datasets, or AI outputs back to the original characters so the content is human-readable again.
Why encode HTML in the first place
HTML is a markup language, which means a handful of characters carry structural
meaning. The moment any of those characters appear in text that the browser
parses as HTML, the parser treats them as part of the markup rather than as
data. An unescaped < starts a tag, an unescaped & can begin an entity
reference, and an unescaped " can close an attribute and let an attacker
inject arbitrary attributes or event handlers. Three concrete reasons to
encode:
- XSS prevention. Reflecting untrusted input — URL parameters, form
submissions, database fields, webhook payloads — directly into HTML is the
classic XSS vector. Encoding turns
<script>into the inert literal text<script>so the browser renders it as characters rather than executing it. This mitigates stored XSS (attacker payload persisted in the database), reflected XSS (payload echoed back from a query string), and DOM-based XSS (payload injected client-side viainnerHTMLor similar). - Character display. When you want the literal characters
<div>to appear on the page as text (inside a tutorial, code snippet, or documentation block), you must encode them so the browser shows them instead of parsing them as a tag. - Data integrity. Round-tripping content through HTML without encoding silently mutates ampersands, quotes, and angle brackets. Encoding preserves the original bytes across serialization boundaries.
The five minimum-safe characters
For HTML element content and attribute values, the OWASP HTML escape rule and the WHATWG HTML spec agree on five characters that must always be encoded:
&becomes&— the ampersand starts every entity reference, so it has to be escaped first (before any other character) to avoid double-encoding.<becomes<— opens a tag; unescaped<can close surrounding elements or start an attacker-controlled tag.>becomes>— strictly speaking not required everywhere, but encoding it makes parsing deterministic and avoids edge cases in legacy parsers."becomes"— closes double-quoted attribute values; unescaped"inside an attribute can terminate the attribute and inject new ones.'becomes'(or'in XHTML/XML) — closes single-quoted attribute values. Most encoders emit the numeric form'because'is not a valid named entity in HTML 4 and some older browsers treat it inconsistently.
The Kordu encoder emits ' for the apostrophe to maximize compatibility
across HTML 4, HTML 5, and XML consumers.
Entity families — named, decimal, hex
HTML supports three notations for referencing characters, and all three decode to the same underlying code point:
- Named entities — human-readable labels like
&,<, (non-breaking space),©(©),™(™),—(—),…(…),←(←). The WHATWG HTML Living Standard defines a list of 2,125+ named character references, ranging from Latin supplements to mathematical operators and arrows. - Decimal numeric references —
&#followed by the decimal code point and a semicolon: for ,©for©,—for—. Every Unicode code point can be referenced this way, including emoji and characters outside the Basic Multilingual Plane. - Hexadecimal numeric references —
ollowed by the hex code point and a semicolon: is the same non-breaking space as and . Hex is often more compact for high code points and aligns with how Unicode tables are typically published.
All three resolve to the same character. The Kordu decoder accepts every form
— a document containing ©, ©, and © will decode all three
to © identically.
Context-sensitive escaping
This is the most commonly overlooked aspect of safe HTML output. Different contexts require different escaping rules, and applying the wrong rule can silently leave you vulnerable even when the output looks encoded:
- Element content (
<p>HERE</p>) — the five-character rule is sufficient. - Attribute value (
<a title="HERE">) — also needs the five-character rule, but quoting the attribute is mandatory. An unquoted attribute value opens a much larger escape surface (whitespace,=, backticks, and more can end the attribute). - JavaScript string (
<script>var x = "HERE";</script>) — HTML encoding alone is not safe. You must also escape JavaScript string terminators (\,",',<,>,/, newline) using JS-escape rules, then optionally HTML-escape the result if it sits inside an event handler attribute. The correct production approach is a templating engine that understands the JS context, not manual concatenation. - URL parameter (
<a href="/page?q=HERE">) — use percent-encoding (encodeURIComponent) first, then HTML-encode the whole URL for attribute placement. HTML-encoding alone leaves?,&,=, and#active as URL separators, which breaks the link and can let an attacker inject additional parameters. - CSS string (
<div style="background: url('HERE')">) — needs CSS escaping rules (backslash-hex notation) plus HTML escaping at the attribute boundary.
Use a trusted library on the server
For production server code that handles untrusted input, do not hand-roll
encoding. Use a maintained library: OWASP Java Encoder, DOMPurify (for
sanitizing rather than escaping), html-escaper or he in Node.js, Django
or Jinja auto-escaping, Rails h(), Go's html/template. These libraries
handle every context correctly and are audited against the OWASP XSS
Prevention Cheat Sheet. Client-side encoding on its own is not a
security boundary — an attacker controlling the client can simply skip it.
Treat the Kordu tool as a utility for content preparation, inspection, and
debugging — not as your last line of defense against XSS.
Dangerous characters beyond the five
In specialised contexts, more characters matter. Backticks can terminate
unquoted attribute values in old IE. Forward slashes close self-closing tags.
Parentheses matter inside JavaScript or CSS url(). Control characters and
non-printing Unicode (bidi overrides, zero-width joiners) can hide payloads.
If you're building a sanitizer rather than an encoder, reach for DOMPurify or
a server-side HTML sanitizer that strips dangerous tags and attributes
entirely.
Decoding scraped content and feeds
RSS feeds, JSON APIs, scraped web pages, and AI model outputs often contain
HTML entities — sometimes double-encoded (&amp; meaning a literal
& was itself encoded). If one decode pass still leaves visible &
in your output, decode a second time. Some producers triple-encode; decode
until the output stabilizes. The Kordu decoder is idempotent: running it on
plain text that contains no entities returns the text unchanged.
Unicode and numeric character references
Any Unicode code point can be expressed as a numeric reference, which is how
legacy systems transport characters they cannot emit directly. ☺
decodes to ☺, 😀 decodes to 😀, and 中文 decodes
to 中文. The Kordu decoder handles the full Unicode range including
supplementary planes and emoji.
Common mistakes
- Encoding already-encoded content, producing
&amp;in your output. Always decode first if you're not sure of input state. - Forgetting to encode attribute values — the
<rule alone is not enough if attacker input lands insidetitle="…"unquoted. - Relying on client-side escaping for security. Escape on the server, at the template boundary, where the output is actually rendered.
- Confusing HTML encoding with URL encoding. They use different rules and
serve different purposes —
%20is a URL-encoded space, is a non-breaking space entity, they are not interchangeable.
HTML encoding vs URL encoding
They look similar but apply in different layers. URL encoding (percent-
encoding) makes bytes safe inside a URL — spaces become %20, & becomes
%26. HTML encoding makes characters safe inside HTML markup — spaces stay
spaces, & becomes &. A tagged URL destined for an href attribute
needs percent-encoding on the query values and HTML-encoding on the whole
string when it's placed into the attribute.
Privacy and data processing
All encoding and decoding runs entirely in your browser. The Kordu HTML Encoder/Decoder is pure client-side JavaScript — there is no backend request, no logging, and no storage. Your text never leaves your device, which means the tool works offline once the page is loaded and is safe to use for proprietary content, snippets from private codebases, or internal data.
Related tools
URL Encoder/Decoder
Encode or decode URLs and query string components instantly — supports encodeURIComponent, decodeURIComponent, and full URL encoding.
Base64 Encoder/Decoder
Encode text or files to Base64 or decode Base64 strings back to plain text — real-time, fully in your browser.
HTML Formatter
Format and beautify messy HTML with configurable indent, wrapping, and attribute handling — paste or type, get clean output instantly.
HTML to Markdown
Convert HTML to clean Markdown — paste HTML source and get properly formatted .md output with headings, links, lists, and code blocks.
String Escape / Unescape
Escape or unescape strings for JSON, HTML, JavaScript, CSV, and SQL — real-time, client-side, zero dependencies.
Markdown Preview
Write Markdown and see a live GitHub-flavoured HTML preview side by side — export clean HTML instantly.
JSON Formatter
Format, validate, and minify JSON instantly — with configurable indentation, error location, and tree view.
Regex Tester
Test regular expressions live with color-coded match highlighting, capture groups, replace mode, and common presets.