webReaper is a lightweight web reconnaissance and endpoint-ranking tool designed to help security professionals quickly identify the most interesting parts of a web attack surface.
It collects URLs from multiple discovery sources, probes them for behavior and metadata, and ranks results using a novel ReapScore so you know where to start first.
This repository now includes WebSentinel, a defensive web application security scanner CLI for identifying common web security misconfigurations and vulnerabilities. See WEBSENTINEL.md for full documentation.
# Scan targets for security issues
websentinel scan --target https://example.com --out results/
# Scan multiple targets from file
websentinel scan --targets targets.txt --out results/ --format json,mdWeb reconnaissance often produces hundreds or thousands of URLs, making it difficult to decide what deserves attention first.
webReaper focuses on prioritization over volume, surfacing high-signal endpoints that are more likely to be useful during manual testing.
Target
β
βΌ
βββββββββββββββββββ
β HARVEST PHASE β Crawling (katana)
βββββββββββββββββββ€ Historical URLs (gau)
β URL Discovery β Known paths (path packs)
ββββββββββ¬βββββββββ robots.txt / sitemap.xml (planned)
β
βΌ
βββββββββββββββββββ
β PROBE PHASE β HTTP status & redirects
βββββββββββββββββββ€ Content type & title
β HTTP Metadata β Technology detection (httpx)
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β RANK PHASE β Discovery value
βββββββββββββββββββ€ Input/parameter signals
β ReapScore β Access hints (auth/forbidden)
ββββββββββ¬βββββββββ Anomalies (errors/timing)
β
βΌ
βββββββββββββββββββ
β REPORT PHASE β Ranked endpoints (ReapScore)
βββββββββββββββββββ€ Markdown + JSON output
β Structured β Technical + ELI5 formats
βββββββββββββββββββ
webReaper does not exploit targets β it provides discovery, context, and prioritization to guide manual investigation.
- π― Smart Prioritization: ReapScore algorithm ranks endpoints by testing value
- π·οΈ Multi-Source Discovery: Integrates katana, gau, and intelligent path guessing
- β‘ Fast Probing: Configurable threading and rate limiting with httpx
- π Dual Reports: Technical (JSON/MD) and beginner-friendly (ELI5) formats
- π§ Highly Configurable: Fine-tune filtering, scoping, and tool behavior
- π‘οΈ Safety First: Safe mode enabled by default, with ethical controls
Required:
- Python 3.10 or higher
- httpx (ProjectDiscovery)
Optional (for full functionality):
- katana (ProjectDiscovery) β web crawler
- gau β historical URL aggregator
- gospider β fast web spider (optional)
- hakrawler β fast web crawler (optional)
The setup script automatically installs all dependencies including Go and required tools:
# Clone the repository
git clone https://github.com/gh0stshe11/webreaper.git
cd webreaper
# Run the automated setup script
./setup.sh
# Create virtual environment and install webReaper
python3 -m venv .venv
source .venv/bin/activate
pip install -e .The setup.sh script will:
- Check and install Go if not present
- Install required tools (
httpx,katana,gau) - Install optional tools (
gospider,hakrawler) - Configure your PATH environment variables
- Provide helpful error messages and next steps
# Clone the repository
git clone https://github.com/gh0stshe11/webreaper.git
cd webreaper
# Install Go (if not already installed)
# Visit https://go.dev/doc/install
# Install required tools
go install github.com/projectdiscovery/httpx/cmd/httpx@latest
go install github.com/projectdiscovery/katana/cmd/katana@latest
go install github.com/lc/gau/v2/cmd/gau@latest
# Install optional tools
go install github.com/jaeles-project/gospider@latest
go install github.com/hakluke/hakrawler@latest
# Ensure Go binaries are in PATH
export PATH=$PATH:$HOME/go/bin
# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install webReaper
pip install -e .For Windows users who want a complete installation experience:
Download and Install:
- Download the latest installer from GitHub Releases
- Run
WebReaper-Setup.exe - Follow the installation wizard
- Install required Go tools (httpx, katana, gau) - see installer prompts
Or build the installer yourself:
# Clone the repository
git clone https://github.com/gh0stshe11/webreaper.git
cd webreaper
# Build the complete installer (requires Python and Inno Setup)
build_installer.batThe installer provides:
- Professional installation experience with GUI and silent modes
- Both webReaper and WebSentinel executables
- Desktop shortcuts and Start Menu integration
- Optional PATH configuration
- Complete uninstaller
- All documentation included
For detailed installer documentation, see INSTALLER.md.
For Windows users who prefer standalone executables without an installer:
# Clone the repository
git clone https://github.com/gh0stshe11/webreaper.git
cd webreaper
# Run the build script (requires Python 3.10+ on build machine)
build_windows.batThis creates standalone executables in the dist/ folder that can run on any Windows machine without Python installed. You'll still need to install the Go tools (httpx, katana, gau, etc.) on the target machine.
For detailed Windows executable build instructions, see BUILDING_WINDOWS.md.
webReaper can automatically check and install missing tools at runtime:
# Enable automatic installation without prompting (useful for automation)
export WEBREAPER_AUTO_INSTALL=true
webreaper reap https://example.com -o out/
# Or install interactively when prompted
webreaper reap https://example.com -o out/
# You'll be prompted to install any missing toolsTools not found after installation:
If you get errors about missing tools even after installation, ensure Go binaries are in your PATH:
# Add to your ~/.bashrc or ~/.zshrc
export PATH=$PATH:/usr/local/go/bin
export PATH=$PATH:$HOME/go/bin
# Reload your shell configuration
source ~/.bashrc # or source ~/.zshrcGo not installed:
If you don't have Go installed, either:
- Run
./setup.shwhich will install it automatically - Visit https://go.dev/doc/install for manual installation
Permission issues:
If you encounter permission issues during setup:
# Make sure setup.sh is executable
chmod +x setup.sh
# Some operations may require sudo
sudo ./setup.sh# Simple scan with default settings
webreaper reap https://example.com -o out/
# Advanced scan with custom filters
webreaper reap https://example.com -o out/ \
--exclude-ext png,jpg,jpeg,gif,css,js,svg,ico,woff,woff2 \
--exclude-path logout,signout \
--max-params 8
# Scan with specific tools
webreaper reap https://example.com -o out/ \
--katana --no-gau \
--paths-pack api,auth
# List available path packs
webreaper packswebreaper reap <target>β Run full reconnaissance pipelinewebreaper scan <target>β Alias forreapcommandwebreaper packsβ List available path packs
| Option | Default | Description |
|---|---|---|
-o, --out |
out/ |
Output directory for results |
-q, --quiet |
false |
Disable banner and progress output |
-v, --verbose |
false |
Show detailed timing and stage info |
--safe/--active |
--safe |
Safe mode (disables JS execution) |
--timeout |
600 |
Timeout in seconds per tool |
| Option | Default | Description |
|---|---|---|
--katana/--no-katana |
--katana |
Enable/disable katana web crawler |
--gau/--no-gau |
--gau |
Enable/disable gau historical URLs |
--gospider/--no-gospider |
--no-gospider |
Enable/disable gospider web crawler (optional) |
--hakrawler/--no-hakrawler |
--no-hakrawler |
Enable/disable hakrawler web crawler (optional) |
--robots/--no-robots |
--robots |
Enable/disable robots.txt and sitemap.xml discovery |
--katana-depth |
2 |
Maximum crawl depth for katana |
--katana-rate |
50 |
Rate limit (requests/sec) for katana |
--katana-concurrency |
5 |
Concurrent connections for katana |
--gospider-depth |
2 |
Maximum crawl depth for gospider |
--gospider-concurrency |
5 |
Concurrent connections for gospider |
--hakrawler-depth |
2 |
Maximum crawl depth for hakrawler |
--gau-limit |
1500 |
Maximum URLs to fetch from gau |
| Option | Default | Description |
|---|---|---|
--paths/--no-paths |
--paths |
Enable/disable path pack probing |
--paths-pack |
common |
Comma-separated pack names (see webreaper packs) |
--paths-top |
120 |
Number of paths to include from packs |
--paths-extra |
`` | Comma-separated custom paths to add |
Available packs: common, auth, api, ops, files, sensitive, admin, discovery, all
| Option | Default | Description |
|---|---|---|
--httpx-threads |
25 |
Number of concurrent httpx threads |
--httpx-rate |
50 |
Rate limit (requests/sec) for httpx |
--max-urls |
1500 |
Hard cap on total URLs to probe |
| Option | Default | Description |
|---|---|---|
--scope |
(none) | Comma-separated hosts in scope (e.g., example.com,api.example.com) |
--no-subdomains |
false |
Require exact host match (disable subdomain inclusion) |
--exclude-host |
(none) | Comma-separated hosts to exclude |
--include-path |
(none) | Only keep URLs with these path tokens (substring match) |
--exclude-path |
(none) | Drop URLs with these path tokens (substring match) |
--exclude-ext |
(none) | Drop URLs with these file extensions (e.g., png,jpg,css,js) |
--max-params |
10 |
Drop URLs with more than N query parameters |
--require-param |
false |
Keep only URLs that have query parameters |
webReaper writes structured output to the specified directory:
| File | Description |
|---|---|
REPORT.md |
Ranked endpoints with ReapScore details (top 25) |
ELI5-REPORT.md |
Plain-language summary for non-technical stakeholders |
findings.json |
Complete machine-readable results with all metadata |
urls.txt |
Simple list of all discovered URLs |
hosts.txt |
List of all discovered hosts |
raw_katana_*.txt |
Raw output from katana crawler |
raw_gau_*.txt |
Raw output from gau historical URLs |
raw_gospider_*.txt |
Raw output from gospider crawler (if enabled) |
raw_hakrawler_*.txt |
Raw output from hakrawler crawler (if enabled) |
raw_robots.txt |
Raw robots.txt content (if robots discovery enabled) |
raw_sitemap_*.xml |
Raw sitemap XML content (if robots discovery enabled) |
raw_httpx.jsonl |
Raw JSON-lines output from httpx |
run.log |
Timestamped execution log with timing info |
Start with the top-ranked endpoints in REPORT.md to guide further investigation.
ReapScore is a 0-100 composite score made up of four weighted subscores:
- Source diversity (katana, gau, path packs)
- New hosts/vhosts discovery
- Path depth and uniqueness
- Application content types
- Query parameters present
- High-signal parameter names (
id,token,redirect,file, etc.) - Path keywords (
admin,login,api,graphql, etc.) - Dynamic extensions (
.php,.aspx,.jsp)
- HTTP 401 (Unauthorized) and 403 (Forbidden)
- Redirects to login/auth pages
- WWW-Authenticate and Set-Cookie headers
- 5xx server errors
- Slow responses (>2 seconds)
- Large responses (>1MB)
Example output:
| Pri | ReapScore | Status | Sources | URL | Why | Subscores |
|---:|---:|---:|---|---|---|---|
| π΄ | 78 | 403 | katana,gau | example.com/admin/users?id=1 | status:403; high_signal_params:id; path_keywords | π±H:45 π§ͺJ:75 πͺA:35 β οΈN:0 |
# Scan a target with default settings
webreaper reap https://example.com -o results/# Focus on API endpoints with relevant path packs
webreaper reap https://api.example.com -o api-results/ \
--paths-pack api,ops \
--include-path api,graphql,swagger \
--exclude-ext html,css,js# Scan with subdomain inclusion
webreaper reap https://example.com -o wide-scan/ \
--scope example.com
# Scan with exact host matching (no subdomains)
webreaper reap https://example.com -o narrow-scan/ \
--scope example.com \
--no-subdomains# Enable JavaScript execution and increase limits
webreaper reap https://example.com -o aggressive/ \
--active \
--katana-depth 3 \
--max-urls 3000 \
--gau-limit 2000# Minimal console output, suitable for scripts
webreaper reap https://example.com -o automated/ --quiet
# Parse results programmatically
jq '.endpoints[] | select(.reap.score > 50)' automated/findings.jsonFor detailed architecture documentation, see ARCHITECTURE.md.
For contribution guidelines, see CONTRIBUTING.md.
For SIEM integration patterns, see SIEM_INTEGRATION.md.
For detailed tools documentation, see TOOLS.md.
- Prioritization over volume β Surface high-signal endpoints first
- Modular tool integration β Easy to add new crawlers and parsers
- Transparent scoring β ReapScore reasons included in output
- Safety by default β Conservative settings to avoid harm
- Community extensibility β Plugin support for custom scoring functions
webReaper includes a modular tools system for extending data collection and scoring:
Discovery Tools - Find URLs from various sources:
- π robots/sitemap - Parse robots.txt and sitemap.xml (enabled by default)
- π·οΈ katana - Modern web crawler (enabled by default)
- π gau - Historical URL aggregator (enabled by default)
- πΈοΈ gospider - Fast web spider (optional)
- π¦ hakrawler - JS-heavy crawler (optional)
Analyzer Tools - Extract metadata from responses:
- π security_headers - Analyze HTTP security headers for auth signals
- π content_patterns - Detect sensitive data patterns in response bodies
- π technology_scorer - Score based on detected web technologies
Scoring Tools - Enhance ReapScore calculation:
- π― technology_scorer - Bonus points for high-value tech stacks (admin panels, debug tools, etc.)
See TOOLS.md for complete documentation on built-in tools and how to create custom tools.
- Add new crawlers: Create parser in
webreaper/parsers/, integrate in CLI (see: gospiper, hakrawler) - Add custom tools: Implement DiscoveryTool, AnalyzerTool, or ScoringTool interfaces (see: TOOLS.md)
- Customize scoring: Modify weights and signals in
webreaper/scoring.pyor use extensions - Add path packs: Extend wordlists in
webreaper/paths_packs.py - New report formats: Add renderers in
webreaper/report/ - SIEM integration: Follow patterns in
SIEM_INTEGRATION.md
See CONTRIBUTING.md for detailed instructions.
Recent enhancements (v0.6.5+):
- β Tools system β Modular framework for discovery, analysis, and scoring tools
- β robots.txt/sitemap.xml β Automatic discovery file parsing
- β Security headers analyzer β HTTP security header analysis for scoring
- β Content pattern detector β Identifies sensitive data and error patterns
- β Technology scorer β Bonus scoring for high-value technology stacks
- β gospider/hakrawler integration β Additional crawler options with noise controls
- β Enhanced path packs β More specialized wordlists (auth, sensitive files, APIs, admin, discovery)
- β Modular scoring system β Community-contributed scoring extensions support
- β SIEM integration patterns β Export formats for enterprise workflows
- β Comprehensive documentation β ARCHITECTURE.md, CONTRIBUTING.md, SIEM_INTEGRATION.md, TOOLS.md
Planned future enhancements:
- Advanced content analysis β JavaScript API extraction and form analysis
- DNS enumeration tool β Subdomain discovery via DNS
- Certificate transparency β CT log parsing for subdomain discovery
- Improved noise filtering β ML-based false positive reduction
- Custom report templates β User-defined report formats
- Distributed scanning β Multi-node scanning for large targets
This project is open source. See LICENSE file for details.
webReaper integrates the following excellent open-source tools:
webReaper is intended for authorized security testing only. Users are responsible for obtaining proper authorization before scanning any target. The authors are not responsible for misuse or damage caused by this tool.
Always follow responsible disclosure practices and respect scope limitations.