Skip to content

gh0stshe11/webreaper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

46 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

webReaper

webReaper is a lightweight web reconnaissance and endpoint-ranking tool designed to help security professionals quickly identify the most interesting parts of a web attack surface.

It collects URLs from multiple discovery sources, probes them for behavior and metadata, and ranks results using a novel ReapScore so you know where to start first.

πŸ†• WebSentinel - Security Scanner

This repository now includes WebSentinel, a defensive web application security scanner CLI for identifying common web security misconfigurations and vulnerabilities. See WEBSENTINEL.md for full documentation.

# Scan targets for security issues
websentinel scan --target https://example.com --out results/

# Scan multiple targets from file
websentinel scan --targets targets.txt --out results/ --format json,md

Why webReaper?

Web reconnaissance often produces hundreds or thousands of URLs, making it difficult to decide what deserves attention first.
webReaper focuses on prioritization over volume, surfacing high-signal endpoints that are more likely to be useful during manual testing.

How webReaper Works

Target
  β”‚
  β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  HARVEST PHASE  β”‚  Crawling (katana)
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€  Historical URLs (gau)
β”‚  URL Discovery  β”‚  Known paths (path packs)
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜  robots.txt / sitemap.xml (planned)
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   PROBE PHASE   β”‚  HTTP status & redirects
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€  Content type & title
β”‚  HTTP Metadata  β”‚  Technology detection (httpx)
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   RANK PHASE    β”‚  Discovery value
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€  Input/parameter signals
β”‚   ReapScore     β”‚  Access hints (auth/forbidden)
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜  Anomalies (errors/timing)
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  REPORT PHASE   β”‚  Ranked endpoints (ReapScore)
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€  Markdown + JSON output
β”‚   Structured    β”‚  Technical + ELI5 formats
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

webReaper does not exploit targets β€” it provides discovery, context, and prioritization to guide manual investigation.

Key Features

  • 🎯 Smart Prioritization: ReapScore algorithm ranks endpoints by testing value
  • πŸ•·οΈ Multi-Source Discovery: Integrates katana, gau, and intelligent path guessing
  • ⚑ Fast Probing: Configurable threading and rate limiting with httpx
  • πŸ“Š Dual Reports: Technical (JSON/MD) and beginner-friendly (ELI5) formats
  • πŸ”§ Highly Configurable: Fine-tune filtering, scoping, and tool behavior
  • πŸ›‘οΈ Safety First: Safe mode enabled by default, with ethical controls

Quick Start

Prerequisites

Required:

  • Python 3.10 or higher
  • httpx (ProjectDiscovery)

Optional (for full functionality):

  • katana (ProjectDiscovery) β€” web crawler
  • gau β€” historical URL aggregator
  • gospider β€” fast web spider (optional)
  • hakrawler β€” fast web crawler (optional)

Installation

Option 1: Automated Setup (Recommended for Kali Linux)

The setup script automatically installs all dependencies including Go and required tools:

# Clone the repository
git clone https://github.com/gh0stshe11/webreaper.git
cd webreaper

# Run the automated setup script
./setup.sh

# Create virtual environment and install webReaper
python3 -m venv .venv
source .venv/bin/activate
pip install -e .

The setup.sh script will:

  • Check and install Go if not present
  • Install required tools (httpx, katana, gau)
  • Install optional tools (gospider, hakrawler)
  • Configure your PATH environment variables
  • Provide helpful error messages and next steps

Option 2: Manual Installation

# Clone the repository
git clone https://github.com/gh0stshe11/webreaper.git
cd webreaper

# Install Go (if not already installed)
# Visit https://go.dev/doc/install

# Install required tools
go install github.com/projectdiscovery/httpx/cmd/httpx@latest
go install github.com/projectdiscovery/katana/cmd/katana@latest
go install github.com/lc/gau/v2/cmd/gau@latest

# Install optional tools
go install github.com/jaeles-project/gospider@latest
go install github.com/hakluke/hakrawler@latest

# Ensure Go binaries are in PATH
export PATH=$PATH:$HOME/go/bin

# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install webReaper
pip install -e .

Option 3: Windows Installer (Recommended for Windows)

For Windows users who want a complete installation experience:

Download and Install:

  1. Download the latest installer from GitHub Releases
  2. Run WebReaper-Setup.exe
  3. Follow the installation wizard
  4. Install required Go tools (httpx, katana, gau) - see installer prompts

Or build the installer yourself:

# Clone the repository
git clone https://github.com/gh0stshe11/webreaper.git
cd webreaper

# Build the complete installer (requires Python and Inno Setup)
build_installer.bat

The installer provides:

  • Professional installation experience with GUI and silent modes
  • Both webReaper and WebSentinel executables
  • Desktop shortcuts and Start Menu integration
  • Optional PATH configuration
  • Complete uninstaller
  • All documentation included

For detailed installer documentation, see INSTALLER.md.

Option 4: Windows Executable Only (No Python Required)

For Windows users who prefer standalone executables without an installer:

# Clone the repository
git clone https://github.com/gh0stshe11/webreaper.git
cd webreaper

# Run the build script (requires Python 3.10+ on build machine)
build_windows.bat

This creates standalone executables in the dist/ folder that can run on any Windows machine without Python installed. You'll still need to install the Go tools (httpx, katana, gau, etc.) on the target machine.

For detailed Windows executable build instructions, see BUILDING_WINDOWS.md.

Automatic Dependency Installation

webReaper can automatically check and install missing tools at runtime:

# Enable automatic installation without prompting (useful for automation)
export WEBREAPER_AUTO_INSTALL=true
webreaper reap https://example.com -o out/

# Or install interactively when prompted
webreaper reap https://example.com -o out/
# You'll be prompted to install any missing tools

Troubleshooting

Tools not found after installation:

If you get errors about missing tools even after installation, ensure Go binaries are in your PATH:

# Add to your ~/.bashrc or ~/.zshrc
export PATH=$PATH:/usr/local/go/bin
export PATH=$PATH:$HOME/go/bin

# Reload your shell configuration
source ~/.bashrc  # or source ~/.zshrc

Go not installed:

If you don't have Go installed, either:

  1. Run ./setup.sh which will install it automatically
  2. Visit https://go.dev/doc/install for manual installation

Permission issues:

If you encounter permission issues during setup:

# Make sure setup.sh is executable
chmod +x setup.sh

# Some operations may require sudo
sudo ./setup.sh

Basic Usage

# Simple scan with default settings
webreaper reap https://example.com -o out/

# Advanced scan with custom filters
webreaper reap https://example.com -o out/ \
  --exclude-ext png,jpg,jpeg,gif,css,js,svg,ico,woff,woff2 \
  --exclude-path logout,signout \
  --max-params 8

# Scan with specific tools
webreaper reap https://example.com -o out/ \
  --katana --no-gau \
  --paths-pack api,auth

# List available path packs
webreaper packs

CLI Options

Commands

  • webreaper reap <target> β€” Run full reconnaissance pipeline
  • webreaper scan <target> β€” Alias for reap command
  • webreaper packs β€” List available path packs

Core Options

Option Default Description
-o, --out out/ Output directory for results
-q, --quiet false Disable banner and progress output
-v, --verbose false Show detailed timing and stage info
--safe/--active --safe Safe mode (disables JS execution)
--timeout 600 Timeout in seconds per tool

Discovery Tools

Option Default Description
--katana/--no-katana --katana Enable/disable katana web crawler
--gau/--no-gau --gau Enable/disable gau historical URLs
--gospider/--no-gospider --no-gospider Enable/disable gospider web crawler (optional)
--hakrawler/--no-hakrawler --no-hakrawler Enable/disable hakrawler web crawler (optional)
--robots/--no-robots --robots Enable/disable robots.txt and sitemap.xml discovery
--katana-depth 2 Maximum crawl depth for katana
--katana-rate 50 Rate limit (requests/sec) for katana
--katana-concurrency 5 Concurrent connections for katana
--gospider-depth 2 Maximum crawl depth for gospider
--gospider-concurrency 5 Concurrent connections for gospider
--hakrawler-depth 2 Maximum crawl depth for hakrawler
--gau-limit 1500 Maximum URLs to fetch from gau

Path Discovery

Option Default Description
--paths/--no-paths --paths Enable/disable path pack probing
--paths-pack common Comma-separated pack names (see webreaper packs)
--paths-top 120 Number of paths to include from packs
--paths-extra `` Comma-separated custom paths to add

Available packs: common, auth, api, ops, files, sensitive, admin, discovery, all

HTTP Probing

Option Default Description
--httpx-threads 25 Number of concurrent httpx threads
--httpx-rate 50 Rate limit (requests/sec) for httpx
--max-urls 1500 Hard cap on total URLs to probe

Filtering & Scope

Option Default Description
--scope (none) Comma-separated hosts in scope (e.g., example.com,api.example.com)
--no-subdomains false Require exact host match (disable subdomain inclusion)
--exclude-host (none) Comma-separated hosts to exclude
--include-path (none) Only keep URLs with these path tokens (substring match)
--exclude-path (none) Drop URLs with these path tokens (substring match)
--exclude-ext (none) Drop URLs with these file extensions (e.g., png,jpg,css,js)
--max-params 10 Drop URLs with more than N query parameters
--require-param false Keep only URLs that have query parameters

Output

webReaper writes structured output to the specified directory:

File Description
REPORT.md Ranked endpoints with ReapScore details (top 25)
ELI5-REPORT.md Plain-language summary for non-technical stakeholders
findings.json Complete machine-readable results with all metadata
urls.txt Simple list of all discovered URLs
hosts.txt List of all discovered hosts
raw_katana_*.txt Raw output from katana crawler
raw_gau_*.txt Raw output from gau historical URLs
raw_gospider_*.txt Raw output from gospider crawler (if enabled)
raw_hakrawler_*.txt Raw output from hakrawler crawler (if enabled)
raw_robots.txt Raw robots.txt content (if robots discovery enabled)
raw_sitemap_*.xml Raw sitemap XML content (if robots discovery enabled)
raw_httpx.jsonl Raw JSON-lines output from httpx
run.log Timestamped execution log with timing info

Start with the top-ranked endpoints in REPORT.md to guide further investigation.

Understanding ReapScore

ReapScore is a 0-100 composite score made up of four weighted subscores:

🌱 HarvestIndex (30%) β€” Discovery & Surface Expansion

  • Source diversity (katana, gau, path packs)
  • New hosts/vhosts discovery
  • Path depth and uniqueness
  • Application content types

πŸ§ͺ JuiceScore (35%) β€” Input & Sensitivity Potential

  • Query parameters present
  • High-signal parameter names (id, token, redirect, file, etc.)
  • Path keywords (admin, login, api, graphql, etc.)
  • Dynamic extensions (.php, .aspx, .jsp)

πŸšͺ AccessSignal (20%) β€” Authentication Hints

  • HTTP 401 (Unauthorized) and 403 (Forbidden)
  • Redirects to login/auth pages
  • WWW-Authenticate and Set-Cookie headers

⚠️ AnomalySignal (15%) β€” Errors & Oddities

  • 5xx server errors
  • Slow responses (>2 seconds)
  • Large responses (>1MB)

Example output:

| Pri | ReapScore | Status | Sources | URL | Why | Subscores |
|---:|---:|---:|---|---|---|---|
| πŸ”΄ | 78 | 403 | katana,gau | example.com/admin/users?id=1 | status:403; high_signal_params:id; path_keywords | 🌱H:45 πŸ§ͺJ:75 πŸšͺA:35 ⚠️N:0 |

Examples

Basic Reconnaissance

# Scan a target with default settings
webreaper reap https://example.com -o results/

API-Focused Scan

# Focus on API endpoints with relevant path packs
webreaper reap https://api.example.com -o api-results/ \
  --paths-pack api,ops \
  --include-path api,graphql,swagger \
  --exclude-ext html,css,js

Subdomain-Aware Scope

# Scan with subdomain inclusion
webreaper reap https://example.com -o wide-scan/ \
  --scope example.com
  
# Scan with exact host matching (no subdomains)
webreaper reap https://example.com -o narrow-scan/ \
  --scope example.com \
  --no-subdomains

Aggressive Scan (Active Mode)

# Enable JavaScript execution and increase limits
webreaper reap https://example.com -o aggressive/ \
  --active \
  --katana-depth 3 \
  --max-urls 3000 \
  --gau-limit 2000

Quiet Mode for Automation

# Minimal console output, suitable for scripts
webreaper reap https://example.com -o automated/ --quiet

# Parse results programmatically
jq '.endpoints[] | select(.reap.score > 50)' automated/findings.json

Architecture & Development

For detailed architecture documentation, see ARCHITECTURE.md.

For contribution guidelines, see CONTRIBUTING.md.

For SIEM integration patterns, see SIEM_INTEGRATION.md.

For detailed tools documentation, see TOOLS.md.

Key Design Principles

  1. Prioritization over volume β€” Surface high-signal endpoints first
  2. Modular tool integration β€” Easy to add new crawlers and parsers
  3. Transparent scoring β€” ReapScore reasons included in output
  4. Safety by default β€” Conservative settings to avoid harm
  5. Community extensibility β€” Plugin support for custom scoring functions

Tools System

webReaper includes a modular tools system for extending data collection and scoring:

Discovery Tools - Find URLs from various sources:

  • 🌐 robots/sitemap - Parse robots.txt and sitemap.xml (enabled by default)
  • πŸ•·οΈ katana - Modern web crawler (enabled by default)
  • πŸ“œ gau - Historical URL aggregator (enabled by default)
  • πŸ•ΈοΈ gospider - Fast web spider (optional)
  • πŸ¦€ hakrawler - JS-heavy crawler (optional)

Analyzer Tools - Extract metadata from responses:

  • πŸ”’ security_headers - Analyze HTTP security headers for auth signals
  • πŸ” content_patterns - Detect sensitive data patterns in response bodies
  • πŸ“Š technology_scorer - Score based on detected web technologies

Scoring Tools - Enhance ReapScore calculation:

  • 🎯 technology_scorer - Bonus points for high-value tech stacks (admin panels, debug tools, etc.)

See TOOLS.md for complete documentation on built-in tools and how to create custom tools.

Extending webReaper

  • Add new crawlers: Create parser in webreaper/parsers/, integrate in CLI (see: gospiper, hakrawler)
  • Add custom tools: Implement DiscoveryTool, AnalyzerTool, or ScoringTool interfaces (see: TOOLS.md)
  • Customize scoring: Modify weights and signals in webreaper/scoring.py or use extensions
  • Add path packs: Extend wordlists in webreaper/paths_packs.py
  • New report formats: Add renderers in webreaper/report/
  • SIEM integration: Follow patterns in SIEM_INTEGRATION.md

See CONTRIBUTING.md for detailed instructions.

Roadmap

Recent enhancements (v0.6.5+):

  • βœ… Tools system β€” Modular framework for discovery, analysis, and scoring tools
  • βœ… robots.txt/sitemap.xml β€” Automatic discovery file parsing
  • βœ… Security headers analyzer β€” HTTP security header analysis for scoring
  • βœ… Content pattern detector β€” Identifies sensitive data and error patterns
  • βœ… Technology scorer β€” Bonus scoring for high-value technology stacks
  • βœ… gospider/hakrawler integration β€” Additional crawler options with noise controls
  • βœ… Enhanced path packs β€” More specialized wordlists (auth, sensitive files, APIs, admin, discovery)
  • βœ… Modular scoring system β€” Community-contributed scoring extensions support
  • βœ… SIEM integration patterns β€” Export formats for enterprise workflows
  • βœ… Comprehensive documentation β€” ARCHITECTURE.md, CONTRIBUTING.md, SIEM_INTEGRATION.md, TOOLS.md

Planned future enhancements:

  • Advanced content analysis β€” JavaScript API extraction and form analysis
  • DNS enumeration tool β€” Subdomain discovery via DNS
  • Certificate transparency β€” CT log parsing for subdomain discovery
  • Improved noise filtering β€” ML-based false positive reduction
  • Custom report templates β€” User-defined report formats
  • Distributed scanning β€” Multi-node scanning for large targets

License

This project is open source. See LICENSE file for details.

Credits

webReaper integrates the following excellent open-source tools:

Disclaimer

webReaper is intended for authorized security testing only. Users are responsible for obtaining proper authorization before scanning any target. The authors are not responsible for misuse or damage caused by this tool.

Always follow responsible disclosure practices and respect scope limitations.

About

webReaper collects, probes, and ranks web endpoints to help pentesters prioritize the most interesting attack surface first.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors