Inspect each character's code point, UTF-8/UTF-16 encoding, Unicode category, script, and HTML entity — entirely in your browser.
unicode is a per-character breakdown tool for any text. Paste or type text and instantly see a detailed card for every Unicode code point: its hex code point, decimal value, raw bytes in UTF-8 and UTF-16, Unicode general category, script, and HTML entity. All processing happens locally — no text is sent to any server.
This is useful any time you need to debug encoding issues, understand how a character is represented in memory, or inspect emoji and multi-byte characters.
Each Unicode code point gets its own card with eight fields:
Code Point The Unicode code point in U+XXXX format (e.g. U+0041 for A, U+1F600 for 😀).
Decimal The code point expressed as a decimal integer (e.g. 65 for A).
UTF-8 The byte sequence used to encode this code point in UTF-8, shown as space-separated uppercase hex bytes (e.g. 41 for ASCII characters, F0 9F 98 80 for 😀). ASCII characters are single-byte; characters above U+007F require 2–4 bytes.
UTF-16 The code unit(s) used in UTF-16 encoding, shown as space-separated uppercase hex values. Characters in the Basic Multilingual Plane (U+0000–U+FFFF) are a single 4-digit code unit. Characters above U+FFFF (supplementary characters, many emoji) are represented as a surrogate pair — two 4-digit code units (e.g. D83D DE00 for 😀).
Category The Unicode general category, which classifies what kind of character it is. Examples:
Letter, Uppercase (Lu) — A, ZLetter, Lowercase (Ll) — a, zNumber, Decimal (Nd) — 0–9Punctuation, Other (Po) — ., !, ?Symbol, Other (So) — most emojiSeparator, Space (Zs) — spaceOther, Control (Cc) — tab, newlineScript The Unicode script the character belongs to. Detected scripts include: Latin, Greek, Cyrillic, Han, Hiragana, Katakana, Arabic, Hebrew, Devanagari, Bengali, Thai, Hangul, Georgian, Armenian, Ethiopic, Common, and Emoji. Characters not matching a known script show Unknown.
HTML Entity The HTML entity representation. Common characters have named entities (e.g. &, <, ©). All other characters use the hexadecimal numeric form &#xXXXX; (e.g. A for A).
Text is iterated by Unicode code point, not by JavaScript string index. This matters for:
é can be either a precomposed code point (U+00E9) or a base letter e plus a combining accent (U+0065 + U+0301). Both forms are shown faithfully.For performance, the inspector displays a maximum of 512 code points. If your input exceeds this, a notice appears below the text area showing how many total code points were found. Only the first 512 are rendered.
The URL updates live as you type — no button required. The query parameter used:
v — the text content, encoded as btoa(encodeURIComponent(text))Share or bookmark the URL to return to the same inspection state.
/unicode?v=SGVsbG8%3DLoads the text Hello and shows cards for each of its 5 code points.
Use the Copy button to copy the current URL to your clipboard.
Use the Reset button to clear the input and return to the bare /unicode path.