diffURL is an Advanced URL Parser differential and canonicalization tool. It enumerates how different parsers (Python urllib.parse vs. WHATWG URL in Node.js) and normalization forms interpret the same URL—then highlights security-relevant mismatches and produces minimized witnesses you can replay.
It’s built for product security teams, red teams, bug-bounty hunters, and CI/CD pipelines that demand rigor, speed, and explainability.
Most URL “validators” normalize aggressively or assume a single parser worldview. Real systems don’t. diffURL embraces parser diversity and shows you where assumptions break:
- Security-first differentials — Host/authority shifts, ETLD+1, scheme downgrades, traversal and backslash confusion.
- Unicode-aware — NFC/NFKC/NFD/NFKD passes, confusables (UTS#39) & homograph signals.
- Variant engines — Percent-encoding (single/double), dot-segments, backslashes, host forms.
- PSL-powered — Effective TLD+1 comparisons using the Public Suffix List (optional).
- Witness minimization — Auto-reduce to the shortest PoC that still reproduces the bug.
- Operator-ready outputs — Terminal (color), JSON, NDJSON, SARIF. Clipboard PoC for quick sharing.
- CI-friendly — Severity tiers +
--fail-onflags; deterministic exit codes.
- Dual Parser Core: Python
urllib.parsevs. WHATWG (Node) side-by-side. - Normalization Lab: NFC/NFKC/NFD/NFKD component-level normalization.
- Confusables Analyzer: UTS#39-inspired skeletons & risk signals.
- Variant Generators: Percent-encoding (depth/hex-case), backslash, dot segments, host forms.
- Set-Cover “Apex”: Budget-aware selection that maximizes category diversity.
- Risk Scoring & Tiers: CRITICAL/HIGH/MEDIUM/LOW with exploit flags (e.g.,
HOST_SHIFT,ETLD+1_SHIFT). - Witness Minimizer: Shrinks a long payload to a tiny, still-bad “witness.”
- PSL Updater: Offline PSL snapshot with on-demand refresh.
- Reporting: Terminal, JSON, NDJSON (streaming), SARIF (GitHub Code Scanning).
Flow: Provide URL(s) → Normalize & generate variants → Parse with two engines → Detect differentials → Score & rank → Minimize witnesses → Report/Export.
- Python ≥ 3.11
- Node.js ≥ 16 (WHATWG worker)
- macOS/Linux/WSL2 recommended (Windows supported; see notes)
- For Linux clipboard support:
xcliporxsel(optional)
git clone https://github.com/bl4ck0w1/diffURL.git
cd diffurl
python3 -m venv .venv && source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
#Optional (Recommended)
pip install .If you prefer running in place without installing the console script, you can invoke the repo’s CLI directly:
python3 diffurl.py --helpIf installed as a package, you’ll have the
diffurlcommand on PATH.
node -v
# Optional: healthcheck (if you pulled the worker package.json)
npm run healthcheckdiffurl psl-update
# or
python3 diffurl.py psl-updatediffurl "https://exɑmple.com/..%2fadmin" --verbose
# or
python3 diffurl.py "https://exɑmple.com/..%2fadmin" --verbosediffurl -f urls.txt --json out.json --budget 24diffurl targets.txt --apex --norms nfc,nfkc,nfd,nfkd --budget 48 --ndjsondiffurl "https://a.b.example.com" --psl offdiffurl "https://exɑmple.com/%252e%252e/%255cadmin" --copy-pocdiffurl --help
_ ,__ ,__ . . .___ .
___/ ` / ` / ` / / / \ /
/ | | |__ |__ | | |__-' |
,' | | | | | | | \ |
`___,' / | | `._.' / \ /---/
` / /
URL Parser Differential Security Tool
Input Options:
urls (positional) One or more URLs to analyze
-f, --file FILE Read URLs from file (one per line)
-i, --stdin Read URLs from standard input
Analysis Modes:
--mode {standard,apex,minimal} Analysis mode (default: standard)
--budget INT Maximum variants per URL (default: 24)
--apex Alias for --mode apex
Normalization Controls:
--norms FORMS Comma-separated: nfc,nfkc,nfd,nfkd (default: nfc,nfkc)
Variant Generation:
--variants TYPES Comma-separated: pct,double-pct,hexcase,backslash,dotsegs,host-forms
Output Formats:
--json FILE Write JSON report to file
--ndjson Stream NDJSON to stdout
--sarif FILE Write SARIF report to file
--minimal Minimal terminal output (summary only)
--verbose Detailed terminal output with metadata
--copy-poc Copy first minimized witness to clipboard
Security Policies:
--fail-on FLAGS Comma-separated exploit flags to fail CI (e.g., HOST_SHIFT,ETLD+1_SHIFT)
--top-witness-per-severity N Emit top N minimized witnesses per severity (default: 1)
PSL & Data:
--psl {on,off} Enable/disable PSL for ETLD+1 analysis (default: on)
Subcommands:
psl-update [--force] Update the vendored PSL snapshot
General Options:
--node-path PATH Path to Node.js executable (default: node)
-v, --version Show version and exit
1) Minimal run
diffurl "https://shop.example.com/%2e%2e/admin"2) JSON report + verbose logs
diffurl "https://portal.example.com/%252f../" --json reports/portal.json --budget 24 --verbose3) Apex coverage + all norms + NDJSON stream
diffurl targets.txt --apex --norms nfc,nfkc,nfd,nfkd --ndjson4) CI fail on critical authority shifts
diffurl release_urls.txt --fail-on HOST_SHIFT,ETLD+1_SHIFT --sarif code-scanning.sarif- Terminal: human-readable summary (colorized), with CRITICAL/HIGH sections and minimized witnesses.
- JSON: all differentials, flags, metadata, parser versions (stable schema for automation).
- NDJSON: one record per URL—great for large batch pipelines.
- SARIF: importable into GitHub Advanced Security / Code Scanning.
PoC Minimized Witness: The --copy-poc switch copies the first minimized witness to your clipboard (Linux needs xclip/xsel).
-
What makes diffURL different from “URL validators” or a single parser? Real systems mix libraries and environments. diffURL compares two widely-used parsers (urllib vs WHATWG), across normalization forms and attack-style variants, surfacing security-relevant discrepancies (host/authority/scheme/path).
-
Do I need the Public Suffix List (PSL)? For ETLD+1 comparisons (
ETLD+1_SHIFT), yes. The tool vendors a PSL snapshot and providespsl-update. You can disable PSL with--psl off(you’ll still get other flags). -
Does this catch Unicode homographs automatically? diffURL includes a confusables analyzer and Unicode normalization passes (NFC/NFKC/etc.). It flags suspicious hostnames and shows the skeleton, but you should still review context (fonts/IDNs/policies vary).
-
How are false positives controlled? Findings are differentials—two parsers disagree on a security-relevant component. Risk scoring prioritizes CRITICAL/HIGH issues (e.g.,
HOST_SHIFT,ETLD+1_SHIFT,SCHEME_SHIFT) ahead of cosmetic differences. -
Can I integrate it with CI and fail builds on specific flags? Yes. Use
--fail-on HOST_SHIFT,ETLD+1_SHIFT(or any flags you care about). Exit code10signals a policy violation.
- If you encounter any issues, please open an issue on GitHub.
We welcome contributions! Please see our Contributing Guidelines for details.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the APACHE 2.0 License - see the LICENSE file for details.
Security Researcher
- LinkedIn: www.linkedin.com/in/elie-uwimana
Remember: With great power comes great responsibility. Use diffURL ethically, legally, and only on systems you have explicit permission to test.
