webReaper

webReaper is a lightweight web reconnaissance and endpoint-ranking tool designed to help security professionals quickly identify the most interesting parts of a web attack surface.

It collects URLs from multiple discovery sources, probes them for behavior and metadata, and ranks results using a novel ReapScore so you know where to start first.

🆕 WebSentinel - Security Scanner

This repository now includes WebSentinel, a defensive web application security scanner CLI for identifying common web security misconfigurations and vulnerabilities. See WEBSENTINEL.md for full documentation.

# Scan targets for security issues
websentinel scan --target https://example.com --out results/

# Scan multiple targets from file
websentinel scan --targets targets.txt --out results/ --format json,md

Why webReaper?

Web reconnaissance often produces hundreds or thousands of URLs, making it difficult to decide what deserves attention first.
webReaper focuses on prioritization over volume, surfacing high-signal endpoints that are more likely to be useful during manual testing.

How webReaper Works

Target
  │
  ▼
┌─────────────────┐
│  HARVEST PHASE  │  Crawling (katana)
├─────────────────┤  Historical URLs (gau)
│  URL Discovery  │  Known paths (path packs)
└────────┬────────┘  robots.txt / sitemap.xml (planned)
         │
         ▼
┌─────────────────┐
│   PROBE PHASE   │  HTTP status & redirects
├─────────────────┤  Content type & title
│  HTTP Metadata  │  Technology detection (httpx)
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│   RANK PHASE    │  Discovery value
├─────────────────┤  Input/parameter signals
│   ReapScore     │  Access hints (auth/forbidden)
└────────┬────────┘  Anomalies (errors/timing)
         │
         ▼
┌─────────────────┐
│  REPORT PHASE   │  Ranked endpoints (ReapScore)
├─────────────────┤  Markdown + JSON output
│   Structured    │  Technical + ELI5 formats
└─────────────────┘

webReaper does not exploit targets — it provides discovery, context, and prioritization to guide manual investigation.

Key Features

🎯 Smart Prioritization: ReapScore algorithm ranks endpoints by testing value
🕷️ Multi-Source Discovery: Integrates katana, gau, and intelligent path guessing
⚡ Fast Probing: Configurable threading and rate limiting with httpx
📊 Dual Reports: Technical (JSON/MD) and beginner-friendly (ELI5) formats
🔧 Highly Configurable: Fine-tune filtering, scoping, and tool behavior
🛡️ Safety First: Safe mode enabled by default, with ethical controls

Quick Start

Prerequisites

Required:

Python 3.10 or higher
httpx (ProjectDiscovery)

Optional (for full functionality):

katana (ProjectDiscovery) — web crawler
gau — historical URL aggregator
gospider — fast web spider (optional)
hakrawler — fast web crawler (optional)

Installation

Option 1: Automated Setup (Recommended for Kali Linux)

The setup script automatically installs all dependencies including Go and required tools:

# Clone the repository
git clone https://github.com/gh0stshe11/webreaper.git
cd webreaper

# Run the automated setup script
./setup.sh

# Create virtual environment and install webReaper
python3 -m venv .venv
source .venv/bin/activate
pip install -e .

The setup.sh script will:

Check and install Go if not present
Install required tools (httpx, katana, gau)
Install optional tools (gospider, hakrawler)
Configure your PATH environment variables
Provide helpful error messages and next steps

Option 2: Manual Installation

# Clone the repository
git clone https://github.com/gh0stshe11/webreaper.git
cd webreaper

# Install Go (if not already installed)
# Visit https://go.dev/doc/install

# Install required tools
go install github.com/projectdiscovery/httpx/cmd/httpx@latest
go install github.com/projectdiscovery/katana/cmd/katana@latest
go install github.com/lc/gau/v2/cmd/gau@latest

# Install optional tools
go install github.com/jaeles-project/gospider@latest
go install github.com/hakluke/hakrawler@latest

# Ensure Go binaries are in PATH
export PATH=$PATH:$HOME/go/bin

# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install webReaper
pip install -e .

Option 3: Windows Installer (Recommended for Windows)

For Windows users who want a complete installation experience:

Download and Install:

Download the latest installer from GitHub Releases
Run WebReaper-Setup.exe
Follow the installation wizard
Install required Go tools (httpx, katana, gau) - see installer prompts

Or build the installer yourself:

# Clone the repository
git clone https://github.com/gh0stshe11/webreaper.git
cd webreaper

# Build the complete installer (requires Python and Inno Setup)
build_installer.bat

The installer provides:

Professional installation experience with GUI and silent modes
Both webReaper and WebSentinel executables
Desktop shortcuts and Start Menu integration
Optional PATH configuration
Complete uninstaller
All documentation included

For detailed installer documentation, see INSTALLER.md.

Option 4: Windows Executable Only (No Python Required)

For Windows users who prefer standalone executables without an installer:

# Clone the repository
git clone https://github.com/gh0stshe11/webreaper.git
cd webreaper

# Run the build script (requires Python 3.10+ on build machine)
build_windows.bat

This creates standalone executables in the dist/ folder that can run on any Windows machine without Python installed. You'll still need to install the Go tools (httpx, katana, gau, etc.) on the target machine.

For detailed Windows executable build instructions, see BUILDING_WINDOWS.md.

Automatic Dependency Installation

webReaper can automatically check and install missing tools at runtime:

# Enable automatic installation without prompting (useful for automation)
export WEBREAPER_AUTO_INSTALL=true
webreaper reap https://example.com -o out/

# Or install interactively when prompted
webreaper reap https://example.com -o out/
# You'll be prompted to install any missing tools

Troubleshooting

Tools not found after installation:

If you get errors about missing tools even after installation, ensure Go binaries are in your PATH:

# Add to your ~/.bashrc or ~/.zshrc
export PATH=$PATH:/usr/local/go/bin
export PATH=$PATH:$HOME/go/bin

# Reload your shell configuration
source ~/.bashrc  # or source ~/.zshrc

Go not installed:

If you don't have Go installed, either:

Run ./setup.sh which will install it automatically
Visit https://go.dev/doc/install for manual installation

Permission issues:

If you encounter permission issues during setup:

# Make sure setup.sh is executable
chmod +x setup.sh

# Some operations may require sudo
sudo ./setup.sh

Basic Usage

# Simple scan with default settings
webreaper reap https://example.com -o out/

# Advanced scan with custom filters
webreaper reap https://example.com -o out/ \
  --exclude-ext png,jpg,jpeg,gif,css,js,svg,ico,woff,woff2 \
  --exclude-path logout,signout \
  --max-params 8

# Scan with specific tools
webreaper reap https://example.com -o out/ \
  --katana --no-gau \
  --paths-pack api,auth

# List available path packs
webreaper packs

CLI Options

Commands

webreaper reap <target> — Run full reconnaissance pipeline
webreaper scan <target> — Alias for reap command
webreaper packs — List available path packs

Core Options

Option	Default	Description
`-o, --out`	`out/`	Output directory for results
`-q, --quiet`	`false`	Disable banner and progress output
`-v, --verbose`	`false`	Show detailed timing and stage info
`--safe/--active`	`--safe`	Safe mode (disables JS execution)
`--timeout`	`600`	Timeout in seconds per tool

Discovery Tools

Option	Default	Description
`--katana/--no-katana`	`--katana`	Enable/disable katana web crawler
`--gau/--no-gau`	`--gau`	Enable/disable gau historical URLs
`--gospider/--no-gospider`	`--no-gospider`	Enable/disable gospider web crawler (optional)
`--hakrawler/--no-hakrawler`	`--no-hakrawler`	Enable/disable hakrawler web crawler (optional)
`--robots/--no-robots`	`--robots`	Enable/disable robots.txt and sitemap.xml discovery
`--katana-depth`	`2`	Maximum crawl depth for katana
`--katana-rate`	`50`	Rate limit (requests/sec) for katana
`--katana-concurrency`	`5`	Concurrent connections for katana
`--gospider-depth`	`2`	Maximum crawl depth for gospider
`--gospider-concurrency`	`5`	Concurrent connections for gospider
`--hakrawler-depth`	`2`	Maximum crawl depth for hakrawler
`--gau-limit`	`1500`	Maximum URLs to fetch from gau

Path Discovery

Option	Default	Description
`--paths/--no-paths`	`--paths`	Enable/disable path pack probing
`--paths-pack`	`common`	Comma-separated pack names (see `webreaper packs`)
`--paths-top`	`120`	Number of paths to include from packs
`--paths-extra`	``	Comma-separated custom paths to add

Available packs: common, auth, api, ops, files, sensitive, admin, discovery, all

HTTP Probing

Option	Default	Description
`--httpx-threads`	`25`	Number of concurrent httpx threads
`--httpx-rate`	`50`	Rate limit (requests/sec) for httpx
`--max-urls`	`1500`	Hard cap on total URLs to probe

Filtering & Scope

Option	Default	Description
`--scope`	(none)	Comma-separated hosts in scope (e.g., `example.com,api.example.com`)
`--no-subdomains`	`false`	Require exact host match (disable subdomain inclusion)
`--exclude-host`	(none)	Comma-separated hosts to exclude
`--include-path`	(none)	Only keep URLs with these path tokens (substring match)
`--exclude-path`	(none)	Drop URLs with these path tokens (substring match)
`--exclude-ext`	(none)	Drop URLs with these file extensions (e.g., `png,jpg,css,js`)
`--max-params`	`10`	Drop URLs with more than N query parameters
`--require-param`	`false`	Keep only URLs that have query parameters

Output

webReaper writes structured output to the specified directory:

File	Description
`REPORT.md`	Ranked endpoints with ReapScore details (top 25)
`ELI5-REPORT.md`	Plain-language summary for non-technical stakeholders
`findings.json`	Complete machine-readable results with all metadata
`urls.txt`	Simple list of all discovered URLs
`hosts.txt`	List of all discovered hosts
`raw_katana_*.txt`	Raw output from katana crawler
`raw_gau_*.txt`	Raw output from gau historical URLs
`raw_gospider_*.txt`	Raw output from gospider crawler (if enabled)
`raw_hakrawler_*.txt`	Raw output from hakrawler crawler (if enabled)
`raw_robots.txt`	Raw robots.txt content (if robots discovery enabled)
`raw_sitemap_*.xml`	Raw sitemap XML content (if robots discovery enabled)
`raw_httpx.jsonl`	Raw JSON-lines output from httpx
`run.log`	Timestamped execution log with timing info

Start with the top-ranked endpoints in REPORT.md to guide further investigation.

Understanding ReapScore

ReapScore is a 0-100 composite score made up of four weighted subscores:

🌱 HarvestIndex (30%) — Discovery & Surface Expansion

Source diversity (katana, gau, path packs)
New hosts/vhosts discovery
Path depth and uniqueness
Application content types

🧪 JuiceScore (35%) — Input & Sensitivity Potential

Query parameters present
High-signal parameter names (id, token, redirect, file, etc.)
Path keywords (admin, login, api, graphql, etc.)
Dynamic extensions (.php, .aspx, .jsp)

🚪 AccessSignal (20%) — Authentication Hints

HTTP 401 (Unauthorized) and 403 (Forbidden)
Redirects to login/auth pages
WWW-Authenticate and Set-Cookie headers

⚠️ AnomalySignal (15%) — Errors & Oddities

5xx server errors
Slow responses (>2 seconds)
Large responses (>1MB)

Example output:

| Pri | ReapScore | Status | Sources | URL | Why | Subscores |
|---:|---:|---:|---|---|---|---|
| 🔴 | 78 | 403 | katana,gau | example.com/admin/users?id=1 | status:403; high_signal_params:id; path_keywords | 🌱H:45 🧪J:75 🚪A:35 ⚠️N:0 |

Examples

Basic Reconnaissance

# Scan a target with default settings
webreaper reap https://example.com -o results/

API-Focused Scan

# Focus on API endpoints with relevant path packs
webreaper reap https://api.example.com -o api-results/ \
  --paths-pack api,ops \
  --include-path api,graphql,swagger \
  --exclude-ext html,css,js

Subdomain-Aware Scope

# Scan with subdomain inclusion
webreaper reap https://example.com -o wide-scan/ \
  --scope example.com
  
# Scan with exact host matching (no subdomains)
webreaper reap https://example.com -o narrow-scan/ \
  --scope example.com \
  --no-subdomains

Aggressive Scan (Active Mode)

# Enable JavaScript execution and increase limits
webreaper reap https://example.com -o aggressive/ \
  --active \
  --katana-depth 3 \
  --max-urls 3000 \
  --gau-limit 2000

Quiet Mode for Automation

# Minimal console output, suitable for scripts
webreaper reap https://example.com -o automated/ --quiet

# Parse results programmatically
jq '.endpoints[] | select(.reap.score > 50)' automated/findings.json

Architecture & Development

For detailed architecture documentation, see ARCHITECTURE.md.

For contribution guidelines, see CONTRIBUTING.md.

For SIEM integration patterns, see SIEM_INTEGRATION.md.

For detailed tools documentation, see TOOLS.md.

Key Design Principles

Prioritization over volume — Surface high-signal endpoints first
Modular tool integration — Easy to add new crawlers and parsers
Transparent scoring — ReapScore reasons included in output
Safety by default — Conservative settings to avoid harm
Community extensibility — Plugin support for custom scoring functions

Tools System

webReaper includes a modular tools system for extending data collection and scoring:

Discovery Tools - Find URLs from various sources:

🌐 robots/sitemap - Parse robots.txt and sitemap.xml (enabled by default)
🕷️ katana - Modern web crawler (enabled by default)
📜 gau - Historical URL aggregator (enabled by default)
🕸️ gospider - Fast web spider (optional)
🦀 hakrawler - JS-heavy crawler (optional)

Analyzer Tools - Extract metadata from responses:

🔒 security_headers - Analyze HTTP security headers for auth signals
🔍 content_patterns - Detect sensitive data patterns in response bodies
📊 technology_scorer - Score based on detected web technologies

Scoring Tools - Enhance ReapScore calculation:

🎯 technology_scorer - Bonus points for high-value tech stacks (admin panels, debug tools, etc.)

See TOOLS.md for complete documentation on built-in tools and how to create custom tools.

Extending webReaper

Add new crawlers: Create parser in webreaper/parsers/, integrate in CLI (see: gospiper, hakrawler)
Add custom tools: Implement DiscoveryTool, AnalyzerTool, or ScoringTool interfaces (see: TOOLS.md)
Customize scoring: Modify weights and signals in webreaper/scoring.py or use extensions
Add path packs: Extend wordlists in webreaper/paths_packs.py
New report formats: Add renderers in webreaper/report/
SIEM integration: Follow patterns in SIEM_INTEGRATION.md

See CONTRIBUTING.md for detailed instructions.

Roadmap

Recent enhancements (v0.6.5+):

✅ Tools system — Modular framework for discovery, analysis, and scoring tools
✅ robots.txt/sitemap.xml — Automatic discovery file parsing
✅ Security headers analyzer — HTTP security header analysis for scoring
✅ Content pattern detector — Identifies sensitive data and error patterns
✅ Technology scorer — Bonus scoring for high-value technology stacks
✅ gospider/hakrawler integration — Additional crawler options with noise controls
✅ Enhanced path packs — More specialized wordlists (auth, sensitive files, APIs, admin, discovery)
✅ Modular scoring system — Community-contributed scoring extensions support
✅ SIEM integration patterns — Export formats for enterprise workflows
✅ Comprehensive documentation — ARCHITECTURE.md, CONTRIBUTING.md, SIEM_INTEGRATION.md, TOOLS.md

Planned future enhancements:

Advanced content analysis — JavaScript API extraction and form analysis
DNS enumeration tool — Subdomain discovery via DNS
Certificate transparency — CT log parsing for subdomain discovery
Improved noise filtering — ML-based false positive reduction
Custom report templates — User-defined report formats
Distributed scanning — Multi-node scanning for large targets

License

This project is open source. See LICENSE file for details.

Credits

webReaper integrates the following excellent open-source tools:

httpx by ProjectDiscovery
katana by ProjectDiscovery
gau by lc

Disclaimer

webReaper is intended for authorized security testing only. Users are responsible for obtaining proper authorization before scanning any target. The authors are not responsible for misuse or damage caused by this tool.

Always follow responsible disclosure practices and respect scope limitations.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.github/workflows		.github/workflows
assets		assets
webreaper.egg-info		webreaper.egg-info
webreaper		webreaper
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
BUILDING_WINDOWS.md		BUILDING_WINDOWS.md
COMPLETION_SUMMARY.md		COMPLETION_SUMMARY.md
CONTRIBUTING.md		CONTRIBUTING.md
INSTALLER.md		INSTALLER.md
INSTALLER_IMPLEMENTATION.md		INSTALLER_IMPLEMENTATION.md
INSTALLER_QUICKSTART.md		INSTALLER_QUICKSTART.md
INSTALLER_README.md		INSTALLER_README.md
LICENSE		LICENSE
README.md		README.md
SIEM_INTEGRATION.md		SIEM_INTEGRATION.md
TOOLS.md		TOOLS.md
WEBSENTINEL.md		WEBSENTINEL.md
WINDOWS_QUICKSTART.md		WINDOWS_QUICKSTART.md
build_installer.bat		build_installer.bat
build_installer.ps1		build_installer.ps1
build_windows.bat		build_windows.bat
build_windows.ps1		build_windows.ps1
check_build_env.bat		check_build_env.bat
create_icons.py		create_icons.py
get_version.py		get_version.py
installer.iss		installer.iss
pyproject.toml		pyproject.toml
setup.sh		setup.sh
webreaper.spec		webreaper.spec
websentinel.spec		websentinel.spec

Folders and files

Latest commit

History

Repository files navigation

webReaper

🆕 WebSentinel - Security Scanner

Why webReaper?

How webReaper Works

Key Features

Quick Start

Prerequisites

Installation

Option 1: Automated Setup (Recommended for Kali Linux)

Option 2: Manual Installation

Option 3: Windows Installer (Recommended for Windows)

Option 4: Windows Executable Only (No Python Required)

Automatic Dependency Installation

Troubleshooting

Basic Usage

CLI Options

Commands

Core Options

Discovery Tools

Path Discovery

HTTP Probing

Filtering & Scope

Output

Understanding ReapScore

🌱 HarvestIndex (30%) — Discovery & Surface Expansion

🧪 JuiceScore (35%) — Input & Sensitivity Potential

🚪 AccessSignal (20%) — Authentication Hints

⚠️ AnomalySignal (15%) — Errors & Oddities

Examples

Basic Reconnaissance

API-Focused Scan

Subdomain-Aware Scope

Aggressive Scan (Active Mode)

Quiet Mode for Automation

Architecture & Development

Key Design Principles

Tools System

Extending webReaper

Roadmap

License

Credits

Disclaimer

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages