Structural check: detect invisible Unicode characters and directional overrides

## Summary

The validator should detect invisible Unicode characters and directional text overrides in skill files. These characters can hide or disguise content from both human reviewers and pattern-matching scanners, while remaining fully visible to agents that consume the raw file.

## Background

Skill files are plain-text Markdown consumed verbatim by LLM agents. A skill file that contains zero-width characters or directional overrides is either corrupted or deliberately trying to hide content. There is no legitimate use case for these characters in skill files — they should be plain ASCII/UTF-8 text.

### Threat model

- **Zero-width characters** can insert invisible content between visible characters, making a file appear clean to reviewers while containing hidden instructions for the agent.
- **Directional overrides** (RTL/LTR) can make text render in a different order than it's stored, so a reviewer reads one thing while the agent reads another.
- Both can also be used to bypass keyword-based scanners — for example, inserting a zero-width space into `su​do` (where ​ is U+200B) defeats a regex match for `sudo` while the agent may still interpret it as the same instruction.

## What to detect

### Zero-width characters

| Character | Codepoint | Name |
|-----------|-----------|------|
| ​ | U+200B | Zero-width space |
| ‌ | U+200C | Zero-width non-joiner |
| ‍ | U+200D | Zero-width joiner |
| ﻿ | U+FEFF | BOM (when not at byte position 0) |

A BOM at byte position 0 is a legitimate encoding marker and should be excluded. A BOM appearing _mid-file_ is suspicious and should be flagged.

### Directional override characters

| Range | Name |
|-------|------|
| U+202A–U+202E | LRE, RLE, PDF, LRO, RLO |
| U+2066–U+2069 | LRI, RLI, FSI, PDI |

These embed or override text direction. They're used in internationalized text but have no place in English-language Markdown skill files.

## Proposed severity levels

| Pattern | Level |
|---------|-------|
| Zero-width characters (U+200B, U+200C, U+200D) | ERROR |
| BOM mid-file (U+FEFF not at position 0) | ERROR |
| Directional overrides (U+202A–U+202E, U+2066–U+2069) | ERROR |

All of these should be errors rather than warnings — there is no legitimate use case, and their presence is a strong corruption or tampering signal.

## Implementation notes

- Scan all files in the skill package: SKILL.md, all reference files, and any other text files
- Report the file path, byte offset, and character name for each finding so authors can locate and remove them
- The check can operate on raw bytes/runes — no Markdown parsing needed
- Consider a new check function like `CheckInvisibleCharacters(dir string, files []string) []types.Result` in the `structure` package
- The scan should iterate over runes in each file and check against a set of flagged codepoints
- False positive rate is near zero — skill files should never contain these characters

## Out of scope (for now)

Homoglyph detection (e.g., Cyrillic `о` vs Latin `o`) is related but significantly harder to implement reliably without internationalization false positives. That can be considered separately if there's demand.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Structural check: detect invisible Unicode characters and directional overrides #66

Summary

Background

Threat model

What to detect

Zero-width characters

Directional override characters

Proposed severity levels

Implementation notes

Out of scope (for now)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Character	Codepoint	Name
	U+200B	Zero-width space
‌	U+200C	Zero-width non-joiner
‍	U+200D	Zero-width joiner
	U+FEFF	BOM (when not at byte position 0)

Range	Name
U+202A–U+202E	LRE, RLE, PDF, LRO, RLO
U+2066–U+2069	LRI, RLI, FSI, PDI

Pattern	Level
Zero-width characters (U+200B, U+200C, U+200D)	ERROR
BOM mid-file (U+FEFF not at position 0)	ERROR
Directional overrides (U+202A–U+202E, U+2066–U+2069)	ERROR

Uh oh!

Structural check: detect invisible Unicode characters and directional overrides #66

Description

Summary

Background

Threat model

What to detect

Zero-width characters

Directional override characters

Proposed severity levels

Implementation notes

Out of scope (for now)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions