Skip to content

bl4ck0w1/diffURL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

diffURL

diffurl-logo

diffURL is an Advanced URL Parser differential and canonicalization tool. It enumerates how different parsers (Python urllib.parse vs. WHATWG URL in Node.js) and normalization forms interpret the same URL—then highlights security-relevant mismatches and produces minimized witnesses you can replay.

It’s built for product security teams, red teams, bug-bounty hunters, and CI/CD pipelines that demand rigor, speed, and explainability.

Why diffURL?

Most URL “validators” normalize aggressively or assume a single parser worldview. Real systems don’t. diffURL embraces parser diversity and shows you where assumptions break:

  • Security-first differentials — Host/authority shifts, ETLD+1, scheme downgrades, traversal and backslash confusion.
  • Unicode-aware — NFC/NFKC/NFD/NFKD passes, confusables (UTS#39) & homograph signals.
  • Variant engines — Percent-encoding (single/double), dot-segments, backslashes, host forms.
  • PSL-powered — Effective TLD+1 comparisons using the Public Suffix List (optional).
  • Witness minimization — Auto-reduce to the shortest PoC that still reproduces the bug.
  • Operator-ready outputs — Terminal (color), JSON, NDJSON, SARIF. Clipboard PoC for quick sharing.
  • CI-friendly — Severity tiers + --fail-on flags; deterministic exit codes.

Features at a Glance

  • Dual Parser Core: Python urllib.parse vs. WHATWG (Node) side-by-side.
  • Normalization Lab: NFC/NFKC/NFD/NFKD component-level normalization.
  • Confusables Analyzer: UTS#39-inspired skeletons & risk signals.
  • Variant Generators: Percent-encoding (depth/hex-case), backslash, dot segments, host forms.
  • Set-Cover “Apex”: Budget-aware selection that maximizes category diversity.
  • Risk Scoring & Tiers: CRITICAL/HIGH/MEDIUM/LOW with exploit flags (e.g., HOST_SHIFT, ETLD+1_SHIFT).
  • Witness Minimizer: Shrinks a long payload to a tiny, still-bad “witness.”
  • PSL Updater: Offline PSL snapshot with on-demand refresh.
  • Reporting: Terminal, JSON, NDJSON (streaming), SARIF (GitHub Code Scanning).

Flow: Provide URL(s) → Normalize & generate variants → Parse with two engines → Detect differentials → Score & rank → Minimize witnesses → Report/Export.


🚀 Quick Start

Requirements

  • Python ≥ 3.11
  • Node.js ≥ 16 (WHATWG worker)
  • macOS/Linux/WSL2 recommended (Windows supported; see notes)
  • For Linux clipboard support: xclip or xsel (optional)

Install

git clone https://github.com/bl4ck0w1/diffURL.git
cd diffurl

python3 -m venv .venv && source .venv/bin/activate

pip install --upgrade pip
pip install -r requirements.txt

#Optional (Recommended)
pip install .

If you prefer running in place without installing the console script, you can invoke the repo’s CLI directly:

python3 diffurl.py --help

If installed as a package, you’ll have the diffurl command on PATH.

Verify WHATWG worker

node -v
# Optional: healthcheck (if you pulled the worker package.json)
npm run healthcheck

Update the Public Suffix List

diffurl psl-update
# or
python3 diffurl.py psl-update

Single Target

diffurl "https://exɑmple.com/..%2fadmin" --verbose
# or
python3 diffurl.py "https://exɑmple.com/..%2fadmin" --verbose

Multiple Targets (file)

diffurl -f urls.txt --json out.json --budget 24

Deep / Apex Mode

diffurl targets.txt --apex --norms nfc,nfkc,nfd,nfkd --budget 48 --ndjson

Turn off PSL (quick scans; skip ETLD+1)

diffurl "https://a.b.example.com" --psl off

Copy PoC to Clipboard

diffurl "https://exɑmple.com/%252e%252e/%255cadmin" --copy-poc

🧰 CLI Reference

diffurl --help

      _   ,__  ,__  .     . .___  .    
   ___/ ` /  ` /  ` /     / /   \ /    
  /   | | |__  |__  |     | |__-' |    
 ,'   | | |    |    |     | |  \  |    
 `___,' / |    |     `._.'  /   \ /---/
      `   /    /                       

URL Parser Differential Security Tool

Input Options:
  urls (positional)           One or more URLs to analyze
  -f, --file FILE             Read URLs from file (one per line)
  -i, --stdin                 Read URLs from standard input

Analysis Modes:
      --mode {standard,apex,minimal}   Analysis mode (default: standard)
      --budget INT             Maximum variants per URL (default: 24)
      --apex                   Alias for --mode apex

Normalization Controls:
      --norms FORMS           Comma-separated: nfc,nfkc,nfd,nfkd (default: nfc,nfkc)

Variant Generation:
      --variants TYPES         Comma-separated: pct,double-pct,hexcase,backslash,dotsegs,host-forms

Output Formats:
      --json FILE              Write JSON report to file
      --ndjson                 Stream NDJSON to stdout
      --sarif FILE             Write SARIF report to file
      --minimal                Minimal terminal output (summary only)
      --verbose                Detailed terminal output with metadata
      --copy-poc               Copy first minimized witness to clipboard

Security Policies:
      --fail-on FLAGS          Comma-separated exploit flags to fail CI (e.g., HOST_SHIFT,ETLD+1_SHIFT)
      --top-witness-per-severity N  Emit top N minimized witnesses per severity (default: 1)

PSL & Data:
      --psl {on,off}           Enable/disable PSL for ETLD+1 analysis (default: on)
  Subcommands:
      psl-update [--force]     Update the vendored PSL snapshot

General Options:
      --node-path PATH         Path to Node.js executable (default: node)
  -v, --version                Show version and exit

1) Minimal run

diffurl "https://shop.example.com/%2e%2e/admin"

2) JSON report + verbose logs

diffurl "https://portal.example.com/%252f../" --json reports/portal.json --budget 24 --verbose

3) Apex coverage + all norms + NDJSON stream

diffurl targets.txt --apex --norms nfc,nfkc,nfd,nfkd --ndjson

4) CI fail on critical authority shifts

diffurl release_urls.txt --fail-on HOST_SHIFT,ETLD+1_SHIFT --sarif code-scanning.sarif

🧾 Reports & PoCs

  • Terminal: human-readable summary (colorized), with CRITICAL/HIGH sections and minimized witnesses.
  • JSON: all differentials, flags, metadata, parser versions (stable schema for automation).
  • NDJSON: one record per URL—great for large batch pipelines.
  • SARIF: importable into GitHub Advanced Security / Code Scanning.

PoC Minimized Witness: The --copy-poc switch copies the first minimized witness to your clipboard (Linux needs xclip/xsel).

FAQ — Questions You Should Ask

  1. What makes diffURL different from “URL validators” or a single parser? Real systems mix libraries and environments. diffURL compares two widely-used parsers (urllib vs WHATWG), across normalization forms and attack-style variants, surfacing security-relevant discrepancies (host/authority/scheme/path).

  2. Do I need the Public Suffix List (PSL)? For ETLD+1 comparisons (ETLD+1_SHIFT), yes. The tool vendors a PSL snapshot and provides psl-update. You can disable PSL with --psl off (you’ll still get other flags).

  3. Does this catch Unicode homographs automatically? diffURL includes a confusables analyzer and Unicode normalization passes (NFC/NFKC/etc.). It flags suspicious hostnames and shows the skeleton, but you should still review context (fonts/IDNs/policies vary).

  4. How are false positives controlled? Findings are differentials—two parsers disagree on a security-relevant component. Risk scoring prioritizes CRITICAL/HIGH issues (e.g., HOST_SHIFT, ETLD+1_SHIFT, SCHEME_SHIFT) ahead of cosmetic differences.

  5. Can I integrate it with CI and fail builds on specific flags? Yes. Use --fail-on HOST_SHIFT,ETLD+1_SHIFT (or any flags you care about). Exit code 10 signals a policy violation.

🛠 Troubleshooting

🤝 Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

📄 License

This project is licensed under the APACHE 2.0 License - see the LICENSE file for details.

👤 Author

Security Researcher

Remember: With great power comes great responsibility. Use diffURL ethically, legally, and only on systems you have explicit permission to test.

About

diffURL is an Advanced URL Parser differential and canonicalization tool. It enumerates how different parsers (Python urllib.parse vs. WHATWG URL in Node.js) and normalization forms interpret the same URL—then highlights security-relevant mismatches and produces minimized witnesses you can replay.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors