Zero-Width Character Steganography

Introduction

You can hide a complete paragraph inside a single sentence and no one will see it. Zero-width Unicode characters — U+200B (Zero Width Space), U+200C (Zero Width Non-Joiner), U+200D (Zero Width Joiner), and U+FEFF (Zero Width No-Break Space) — are rendered as nothing by every browser, word processor, and terminal. This tool uses those four code points as a base-4 alphabet to encode any text as invisible sequences interspersed throughout ordinary prose. The carrier text looks and prints normally; the payload is recovered only by running it through the decoder. All processing happens in your browser with no server contact.

What this tool does

Encodes any UTF-8 text into sequences of four Unicode zero-width characters (U+200B, U+200C, U+200D, U+FEFF), each representing one base-4 digit.
Interleaves the invisible encoded payload into a visible carrier string so the combined output looks and copies normally.
Decodes any text containing zero-width characters back to the original hidden message.
Shows an optional 'inspector' view that highlights zero-width characters in the output so you can verify encoding.
Runs entirely client-side — no data is transmitted over the network.

How this tool works

Each character in the secret message is converted to its Unicode code point, then expressed as a base-4 number. Because base-4 digits only go from 0 to 3, each digit maps to one of the four zero-width characters: '0' → U+200B, '1' → U+200C, '2' → U+200D, '3' → U+FEFF. A four-digit base-4 string can represent code points up to 255 (Latin-1), and longer representations handle higher Unicode planes.

The encoder then walks through the carrier text and inserts these invisible sequences between the visible characters. The decoder does the reverse: it scans every character in the input for the four zero-width code points, accumulates base-4 digits in groups of four, converts each group back to a code point, and reassembles the original message. The visible carrier text is untouched during decoding.

How the cipher or encoding works

Zero-width characters entered the Unicode standard for legitimate typographic and linguistic purposes. U+200B (Zero Width Space) was introduced to indicate line-break opportunities in writing systems that lack word spaces, such as Thai and Khmer. U+200C (Zero Width Non-Joiner) prevents cursive joining in Arabic and Indic scripts. U+200D (Zero Width Joiner) forces joining — it is what connects the components of compound emoji like "👨‍👩‍👧" (family). U+FEFF is the Byte Order Mark, historically used as a Unicode signature at the start of files.

As covert channels, zero-width characters have been studied extensively. A 2019 paper by Rizzo et al. (DOI: 10.1145/3368089.3417046) analysed Unicode-based text watermarking for authorship attribution. Because these characters have zero glyph width, they survive copy-paste in most environments and are invisible to human readers. They do not appear in word counts, spell checkers, or search engine snippets, making them effective for both watermarking and covert messaging.

Compared to whitespace steganography (extra spaces/tabs), zero-width characters are harder to strip accidentally because they are not in the whitespace character class and many normalisation routines leave them intact. However, dedicated Unicode sanitisers and some messaging platforms (WhatsApp, for example) aggressively remove zero-width characters, so survivability must be verified per channel.

How to use this tool

Enter the visible carrier text — the public-facing message that will hide the payload.
Enter the secret message you want to conceal.
Click 'Encode'. The output box contains your carrier text with zero-width characters woven through it.
Copy the encoded output and send it via your chosen channel (email, chat, document).
To recover the secret, paste the output into the 'Decode' tab and click 'Decode'. The hidden message appears instantly.

Real-world examples

Invisible authorship watermark

A technical writer encodes their employee ID ("E-4821") into every document they export. The visible text is a standard product spec sheet. If the document leaks, they paste any paragraph from it into the decoder and confirm "E-4821" — establishing a chain of custody without any visible footer, watermark image, or metadata that could be stripped.

Proof-of-knowledge challenge

A developer wants to prove ownership of a domain without adding a DNS TXT record. They post a public article on their site whose first paragraph encodes the string "ownership-token-7a3b". A third-party verifier copies the article text, runs it through the decoder, and confirms the token — a zero-visible-change proof-of-control suitable for lightweight domain verification flows.

Covert channel detection in a security audit

A penetration tester checks whether an internal document management system strips non-standard Unicode before storing user input. They paste a string containing U+200D into a text field and retrieve the stored value via the API. If the zero-width characters survive, they report a potential covert channel: an insider could exfiltrate identifiers through document metadata without triggering DLP rules that only inspect visible character content.

Comparison with similar methods

Method	Complexity	Typical use
Zero-width characters (this tool)	4 code points per base-4 digit	Document watermarking, covert identifiers in prose
Emoji variation selectors	2 selectors per bit (binary)	Emoji-carrier strings; higher capacity per carrier
Whitespace steganography	1 bit per space/tab	Source code comments, plain text files
Image LSB steganography	1 bit per pixel channel	Large payloads hidden in image files

Limitations or considerations

Some platforms strip zero-width characters entirely: WhatsApp, many CMS sanitisers, and Markdown renderers remove them on input. Code above U+00FF (non-Latin scripts) requires more than four base-4 digits per character, reducing capacity. Forensic Unicode analysis tools like Homoglyph and Unicode Inspector will flag zero-width characters immediately — this technique provides obscurity, not security. Always encrypt the payload before encoding if confidentiality is required.

Frequently asked questions

Conclusion

Zero-width steganography is one of the cleanest ways to mark digital text without altering its appearance. The base-4 encoding over four Unicode code points is compact and survives most copy-paste workflows. Use it for lightweight watermarking, covert identifiers, and steganographic puzzles — but test your target channel first, and always pair with encryption when the payload must stay private. For emoji-based channels, see the Emoji Steganography tool.

Zero-Width Character Steganography

Share this tool

Introduction

What this tool does

How this tool works

How the cipher or encoding works

How to use this tool

Real-world examples

Invisible authorship watermark

Proof-of-knowledge challenge

Covert channel detection in a security audit

Comparison with similar methods

Limitations or considerations

Frequently asked questions

Conclusion

Related Tools

Zero-Width Character Steganography

Share this tool

Introduction

What this tool does

How this tool works

How the cipher or encoding works

How to use this tool

Real-world examples

Invisible authorship watermark

Proof-of-knowledge challenge

Covert channel detection in a security audit

Comparison with similar methods

Limitations or considerations

Frequently asked questions

Do zero-width characters affect text length in databases?

Can search engines index the hidden text?

Will the hidden text survive PDF export?

What is the maximum payload size?

Conclusion

Related Tools