CrossSpeak

Imagine a writing layer designed for how modern AI reads. It looks different. It means the same. AI still understands. That’s CrossSpeak.

CrossSpeak is a Unicode homoglyph transformation system developed by Syhunt that re-expresses text using parallel characters from diverse scripts, preserving semantic meaning while altering its orthographic representation. Modern multilingual LLMs with robust Unicode generalization can still interpret the text reliably.

With CrossSpeak encoding, you get semantic invariance: the meaning of the text remains intact even as its character layer undergoes a structural transformation. Characters from Greek, Cyrillic, Armenian, phonetic notation, mathematics, and historical alphabets reshape the surface of the text without altering intent. The result is text that:

Looks unfamiliar to ASCII-bound validators.
Remains readable to humans.
Stays semantically coherent for robust AI models.

Security Research Applications

CrossSpeak can be used by security researchers to evaluate how AI-integrated web applications handle orthographic variation. In particular, it helps analyze:

The interaction between Unicode normalization and LLM-driven transformations.
How AI-mediated rewriting or canonicalization may affect input validation.
The consistency of XSS defenses when model output is reintroduced into web contexts.

These evaluation scenarios are examined in detail in our paper on Cross-Model Scripting (XMS), which formalizes AI-mediated injection conditions arising from post-model transformations and inconsistent validation boundaries.

By revealing the gap between byte-level validation and semantic interpretation, CrossSpeak serves as a diagnostic instrument for studying the XMS class of vulnerabilities. It is not an exploitation framework, but a structured method for understanding how AI-driven processing can reshape security assumptions — and for strengthening defenses accordingly.

Why the Name CrossSpeak: CrossSpeak is named for what it enables. text that crosses orthographic and system boundaries without semantic loss.

It crosses scripts - moving between Latin, Greek, Cyrillic, Armenian, and beyond.
It crosses assumptions - from ASCII-bound validation to Unicode diversity.
It crosses systems - remaining interpretable by modern AI models even when rigid validators falter.

LLM-compatible ≠ universally compatible: Models with limited Unicode robustness or narrow tokenization assumptions may degrade, fragment, or respond unpredictably to CrossSpeak text.

Not a new language: CrossSpeak does not create a new language. It demonstrates that language can travel across representational layers while preserving semantic intent.

Not encryption: CrossSpeak does not conceal information. It re-encodes text at the character layer while preserving semantic content.

Example

As you can see, the resulting writing is:

recognizable to human readers
semantically intact for multilingual AI models
structurally foreign to systems that expect strict ASCII

Live Demo

CrossSpeak

CrossSpeak is a Unicode transformation system that preserves semantic meaning while altering the orthographic encoding of text.

Text processed through CrossSpeak remains fully readable and interpretable by large language models. Its defining property is semantic invariance: the meaning of the text does not change, even though its character representation does.

CrossSpeak re-expresses ordinary language using parallel characters drawn from real-world scripts. It modifies only how text is еոϲοɗеɗ in Unicode - not what the text says.

It does not encrypt, hide, or obfuscate information. It preserves meaning while геѕτгυϲτυгɩոɡ геρгеѕеոτɑτɩοո.

Why CrossSpeak Works

Unicode was designed to represent the full diversity of human writing - Greek, Cyrillic, Armenian, phonetic notation, mathematics, and historical alphabets. Many of these characters resemble Latin letters through shared ancestry, yet remain distinct codepoints so that each language can exist digitally without corruption.

This structural richness creates parallel glyph space: visually similar characters that are semantically equivalent in context, but computationally distinct at the encoding level.

CrossSpeak leverages this property.

Instead of altering words, grammar, or meaning, it substitutes characters with carefully selected Unicode counterparts drawn from real-world scripts. The resulting text remains readable to humans and interpretable by robust language models, because the semantic structure of the sentence is unchanged.

Modern LLMs are trained on multilingual corpora that already include a wide range of scripts and orthographic variation. As a result, many models learn to generalize across visually and structurally similar characters. When they encounter CrossSpeak text, they can often resolve these variations back into coherent linguistic meaning.

In short:

Unicode permits orthographic diversity. Language models learn semantic generalization. CrossSpeak sits at the intersection of the two.

It changes representation - not meaning.

Legitimate Uses

1. AI-Native Communication: Keep documents and prompts intelligible to modern language models in environments where rigid ASCII filters distort otherwise valid expression.
1. Security & Robustness Research: Examine how tokenizers, LLMs, and moderation systems respond to realistic cross-script input and design normalization-aware defenses.
1. Reduction of False Positives: Avoid over-blocking caused by keyword lists that ignore the multilingual nature of Unicode.
1. Dataset Evaluation: Test watermarking, detection methods, and training pipelines against script-diverse perturbations.
1. Creative Literacy: Enable artistic and narrative forms that live between alphabets without altering intent.

Why the Name CrossSpeak

The name CrossSpeak was chosen deliberately. At its core, CrossSpeak enables text to cross representational boundaries without losing meaning. It moves across scripts - Latin, Greek, Cyrillic, Armenian, mathematical and historical alphabets - while preserving semantic intent. It also crosses system expectations, remaining interpretable to modern AI models even when ASCII-centric validation routines do not recognize the transformed character patterns.

In a cybersecurity context, the name carries a second, intentional resonance: cross-site scripting (XSS).

CrossSpeak helps security researchers study scenarios where:

Unicode-transformed input bypasses ASCII-bound pattern checks
An LLM is asked to rewrite, translate, or normalize that input
The model outputs canonical ASCII representations
That output is inserted into an HTML context without proper escaping

In these pipelines, the “cross” is not just cross-site - it is cross-boundary:

crossing from filtered input to model transformation,
crossing from Unicode variation back to executable form.

CrossSpeak does not exist to facilitate attacks. It exists to make these boundary transitions visible, testable, and understandable in AI-integrated applications.

Angle Bracket Encoding Options

CrossSpeak lets you choose how < and > are handled.

Modes

Untouched (default)
Keeps < > as ASCII.
Option A — Mathematical
< > → ⟨ ⟩
Option B — Guillemets
< > → ‹ ›
Option C — Fullwidth
< > → ＜＞

Related Cybersecurity Research

Introducing Cross-Model Scripting (XMS) vulnerabilities https://www.syhunt.com/en/?n=Articles.2026-CrossModelScripting
Evading AI-Generated Content Detectors using Homoglyphs https://arxiv.org/abs/2406.11239v1
Defending LLM Applications Against Unicode Character Smuggling https://www.cloudthat.com/resources/blog/defending-llm-applications-against-unicode-character-smuggling

License & Credits

Released under the a 3-clause BSD license for research and experimental use - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
src		src
LICENSE		LICENSE
README.md		README.md
crossspeak-logo.png		crossspeak-logo.png
example.png		example.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Security Research Applications

Example

Live Demo

CrossSpeak

Why CrossSpeak Works

Legitimate Uses

Why the Name CrossSpeak

Angle Bracket Encoding Options

Modes

Related Cybersecurity Research

License & Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Security Research Applications

Example

Live Demo

CrossSpeak

Why CrossSpeak Works

Legitimate Uses

Why the Name CrossSpeak

Angle Bracket Encoding Options

Modes

Related Cybersecurity Research

License & Credits

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages