Project Settings Reference

Every project in RedAmon has 245+ configurable parameters that control the behavior of each reconnaissance module, the AI agent, and CypherFix automated remediation. These settings are managed through the project form UI (16 tabs across four groups: Scope, Recon Pipeline, AI Agent, Remediation), stored in PostgreSQL, and fetched by the recon container and agent at runtime.

Project Form Tabs

Defaults: Sensible defaults are loaded automatically from the server when creating a new project. You only need to fill in the required fields (project name and target domain — or target IPs in IP mode) and adjust what you want.

Recon Presets: Instead of configuring the 215+ parameters below individually, you can apply a Recon Preset that sets all recon parameters at once. See Recon Presets for the full list of 21 built-in presets and how to create your own.

Target Configuration
Scan Module Toggles
Port Scanner (Masscan)
Port Scanner (Naabu)
Nmap Service Detection
HTTP Prober (httpx)
Technology Detection (Wappalyzer)
Banner Grabbing
Web Crawler (Katana)
Passive URL Discovery (GAU)
ParamSpider Passive Parameter Discovery
API Discovery (Kiterunner)
Web Crawler (Hakrawler)
JavaScript Analysis (jsluice)
JS Reconnaissance
Parameter Discovery (Arjun)
GraphQL Security Testing
Subdomain Takeover Detection
VHost & SNI Enumeration
Vulnerability Scanner (Nuclei)
CVE Enrichment
MITRE Mapping
Security Checks
GVM Vulnerability Scan
Subdomain Discovery
URLScan.io Enrichment
Shodan OSINT Enrichment
Uncover Multi-Engine Search
Threat Intelligence Enrichment (7 OSINT Tools)
GitHub Secret Hunting
TruffleHog Secret Scanning
Agent Behavior
Cross-Site Scripting (XSS)
Hydra Credential Testing
Social Engineering Simulation
CypherFix Configuration
Tool Phase Restrictions

Target Configuration

Parameter	Default	Description
Start from IP (IP Mode)	false	Toggle between domain mode and IP/CIDR targeting mode. Locked after project creation. When enabled, hides domain fields and shows IP/CIDR input
Target Domain	—	The root domain to assess (required in domain mode, hidden in IP mode)
Target IPs / CIDRs	[]	IP addresses and CIDR ranges to scan (IP mode only). Accepts IPv4, IPv6, and CIDR notation up to /24 (256 hosts)
Subdomain List	[]	Specific subdomain prefixes to scan (empty = discover all). Domain mode only
Verify Domain Ownership	false	Require DNS TXT record proof before scanning. Domain mode only
Ownership Token	(auto)	Unique token for TXT record verification
Ownership TXT Prefix	`_redamon`	DNS record name prefix
Stealth Mode	false	Forces passive-only techniques — disables active scanning, brute force, and GVM
Use Tor	false	Route all recon traffic through the Tor network
Use Bruteforce	true	Enable Knockpy active subdomain bruteforcing. Domain mode only

Scan Module Toggles

Modules can be individually enabled/disabled with automatic dependency resolution — disabling a parent module automatically disables all children:

domain_discovery (root)
  └── port_scan
       └── http_probe
            ├── resource_enum
            └── vuln_scan

Parameter	Default	Description
Scan Modules	all enabled	Array of phases to execute
Update Graph DB	true	Auto-import results into Neo4j
WHOIS Max Retries	3	Retry attempts for WHOIS lookups
DNS Max Retries	3	Retry attempts for DNS resolution

Port Scanner (Masscan)

High-speed SYN port scanner optimized for large networks and IP/CIDR ranges. Runs in parallel with Naabu — results are merged and deduplicated automatically. Incompatible with Tor (raw SYN packets bypass TCP stack). Both scanners are enabled by default.

Graph nodes — consumes: IP, Domain | produces: Port, Service

Parameter	Default	Description
Enabled	true	Toggle Masscan on/off
Top Ports	1000	Port selection: 100, 1000, or "full" for all 65535
Custom Ports	—	Manual port range (e.g., `80,443,8080-8090`). Overrides Top Ports
Rate	1000	Packets per second. Masscan handles very high rates (10k+)
Banners	false	Capture service banners (SSH, HTTP, etc.). Increases scan time
Wait	10	Seconds to wait for late responses after scan completes
Retries	1	Retry attempts for unresponsive ports
Exclude Targets	—	Comma-separated IPs/CIDRs to exclude from scanning

Warning: If both Masscan and Naabu are disabled, port scanning is skipped entirely and downstream modules (HTTP probe, vulnerability scanning) will produce no results.

How it works

Masscan is a stateless asynchronous SYN scanner — instead of completing TCP handshakes, it crafts and sends raw SYN packets directly via a custom user-space TCP/IP stack and listens promiscuously for SYN-ACK replies. The custom stack is what makes it fast (the OS kernel TCP stack is the throughput ceiling for normal scanners) and what makes it Tor-incompatible (raw packets ignore the SOCKS proxy entirely).

The module first calls resolve_targets_to_ips(recon_data) to extract every IPv4 from the in-progress recon graph (subdomain A records, raw IPs, anything with a numeric address), filtering out non-routable space. Hostnames are not passed to masscan — it only scans IPs — but a ip_to_hostnames map is preserved so the parser can stamp the original hostname back onto each discovered Port for the graph merge.

build_masscan_command then assembles the CLI: target IPs are written to a file (-iL), the port spec comes from either Top Ports (translated to a literal range) or Custom Ports verbatim, the --rate <pps> flag controls packets-per-second pacing, --wait <seconds> controls how long masscan listens for late SYN-ACKs after the scan ends (default 10 — too low loses slow responders, too high wastes time), and an optional --excludefile is generated from the Exclude Targets list and IP-filter blocklist.

The subprocess is launched via subprocess.Popen with stdout/stderr captured. Masscan writes its own NDJSON to disk (one record per port-open event), which parse_masscan_output then ingests line-by-line. Each record gets matched back to its hostnames via ip_to_hostnames and emitted as a Port + Service node tuple ready for graph MERGE.

Permission requirement: masscan needs CAP_NET_RAW to craft raw packets. The recon image already grants this capability on container start, so no additional setup is needed inside the pipeline. If you ever see Permission denied — masscan requires root or CAP_NET_RAW, the container's capabilities have been stripped — check the docker-compose security_opt section.

When to use Masscan over Naabu: masscan is the right choice for IP/CIDR sweeps (e.g. 192.0.2.0/24, 198.51.100.0/16) where the target set is millions of IPs. Naabu's scan loop is per-host and starts to slow down past tens of thousands of targets. For typical web-app pentest scope (single apex, dozens of subdomains, tens of IPs), Naabu and Masscan finish in seconds either way — running both is essentially free and gives you cross-validation of open-port findings.

Port Scanner (Naabu)

Controls how ports are discovered on target hosts.

Graph nodes — consumes: IP, Domain | produces: Port, Service

Parameter	Default	Description
Top Ports	1000	Port selection: 100, 1000, or custom
Custom Ports	—	Manual port range (e.g., `80,443,8080-8090`)
Scan Type	SYN	SYN (fast, requires root) or CONNECT (slower, no root needed)
Rate Limit	1000	Packets per second
Threads	25	Parallel scanning threads
Timeout	10000	Per-port timeout in milliseconds
Retries	3	Retry attempts for unresponsive ports
Exclude CDN	true	Skip CDN-hosted IPs (Cloudflare, Akamai, etc.)
Display CDN	true	Show CDN info but don't scan deeper
Skip Host Discovery	false	Skip ping-based host check
Verify Ports	false	Double-check ports with TCP handshake
Passive Mode	false	Use Shodan InternetDB instead of active scanning (zero packets)

How it works

Naabu is a Go-based stateless port scanner from ProjectDiscovery, run inside a Docker container (projectdiscovery/naabu:latest). Unlike masscan, it accepts hostnames directly and does its own DNS resolution before scanning, which means it can probe both example.com:443 and 198.51.100.42:443 from the same target file.

The module starts by checking that the Docker daemon is reachable (is_docker_installed, is_docker_running) and pulling the configured image if missing. extract_targets_from_recon then walks the in-progress graph and pulls every IP, Subdomain, and apex Domain into a target file — three sets, written one entry per line.

SYN with auto-fallback to CONNECT: when Scan Type is s (SYN), naabu needs raw-socket privileges. The module runs the scan once with -scan-type s; if the container exits with a permission error or returns no results, it automatically retries with -scan-type c (CONNECT scan via the kernel TCP stack). This fallback handles environments where CAP_NET_RAW was stripped without forcing the user to manually flip the toggle. The actual scan type used is logged so you know which one produced the results.

build_naabu_command assembles the full Docker invocation: image name, mount points for input/output files, naabu CLI flags. Notable flags:

-tp <N> for Top Ports (100/1000/full) or -p <range> for Custom Ports
-rate <N> for packets/sec ceiling
-c <N> for concurrent scanning threads
-timeout <ms> per-port timeout
-retries <N> retry budget per port
-Pn when Skip Host Discovery is on (skips the ping/ARP check, recommended for web-host targets that block ICMP)
-verify when Verify Ports is on (does a follow-up TCP handshake on every claimed-open port to weed out false positives — adds 10-20% scan time but eliminates the SYN-ACK reflection misclassification class)
-exclude-cdn and -display-cdn toggle CDN-IP behaviour: by default naabu skips deep scanning of CDN-fronted IPs (since you'd just be scanning Cloudflare's datacenters) but still records that the IP belongs to a CDN

Passive Mode: when on, naabu is bypassed entirely and the module instead queries https://internetdb.shodan.io/{ip} for each IP — Shodan's free, key-less InternetDB endpoint that returns the ports it has cataloged for that IP. Zero packets are sent to the target. Useful for stealth scans, pre-engagement reconnaissance, and any case where you want a port snapshot without burning rate-limit budget on the target's network.

Tor / proxychains: when Tor is enabled, naabu is run with the -proxy <socks5> flag pointing at the bundled Tor SOCKS5 proxy. Note that Tor's bandwidth ceiling makes Naabu over Tor extremely slow — 1000 ports × 50 hosts can take an hour. SYN scans are not supported over Tor (they bypass SOCKS); the module forces CONNECT mode under Tor.

parse_naabu_output ingests the JSON-line output and emits (host, ip, port, service) tuples. The service guess comes from naabu's IANA service-name table baked into the binary — these are advisory and get refined by httpx's tech detection in the next phase.

File ownership fix: naabu runs as root inside the container, so the output files it writes are owned by root on the host. fix_file_ownership chowns them back to the calling user before the parser reads them — without this you'd hit permission errors on subsequent reads.

Nmap Service Detection

Deep service version detection (-sV) and NSE vulnerability scripts (--script vuln) on discovered open ports. Runs after the port-scan merge, only probing ports already confirmed open by Masscan/Naabu. Detected service versions feed into the CVE lookup pipeline for NVD/Vulners enrichment.

Graph nodes -- consumes: IP, Port | produces: Port (enriched), Service (enriched), Technology, Vulnerability, CVE

Parameter	Default	Description
Enabled	true	Toggle Nmap service detection on/off
Version Detection (-sV)	true	Probe open ports for service/version info
NSE Vulnerability Scripts	true	Run `--script vuln` for vulnerability detection
Timing Template	T3	Nmap timing template: T1 (Sneaky), T2 (Polite), T3 (Normal), T4 (Aggressive), T5 (Insane)
Total Timeout	600	Maximum total scan duration in seconds
Per-Host Timeout	300	Maximum time per target host in seconds
Parallelism	2	Number of IPs to scan concurrently. Higher values speed up scanning but increase network load (1-10)

Stealth mode overrides: timing forced to T2 (Polite), NSE scripts disabled.

How it works

Unlike Naabu/Masscan which discover open ports, nmap is run only against the open-port set already discovered — the module never re-scans the full port range. build_nmap_targets walks the merged port-scan output, builds a per-IP map of {ip: [open_ports]}, and emits one nmap invocation per IP with a comma-separated -p <ports> argument. This is the single biggest reason Nmap is fast in this pipeline despite being slower per-port than Naabu/Masscan: it's only ever asked to probe ~5-50 ports per host, never 65k.

build_nmap_command assembles each per-IP command:

-sV for version detection — sends carefully-chosen probes to each open port (a bank of fingerprints in nmap's nmap-service-probes file, ~12,000 probe/match patterns) and matches responses against known service signatures. Returns the exact software + version string (Apache 2.4.49, OpenSSH 8.2p1, MySQL 5.7.34) which is the input the CVE Lookup pipeline needs for accurate matching against NVD/Vulners.
--script vuln for NSE vulnerability scripts — runs the entire vuln script category (~150 scripts) against open services. These scripts are written in Lua and target specific CVEs (Heartbleed, Shellshock, MS17-010, smb-vuln-*, http-shellshock, ssl-poodle, etc.). Each match emits a structured Vulnerability record with CVE ID, severity, and the script's confidence verdict.
-T<N> timing template — controls inter-packet delay, parallelism, retries, and timeout values. T3 (Normal) is the default. T1 Sneaky drops packet rate to 1/15s for stealth. T5 Insane saturates the link and is rarely useful (services start dropping packets).
--host-timeout <sec> — Per-Host Timeout. Caps the time spent on any single IP to prevent a slow/blackholed host from dragging the whole scan.
-oX <path> writes XML output, which is what the parser consumes.

Concurrent scanning: instead of one big nmap invocation against all IPs (which would be a single slow process), the module runs Parallelism separate nmap subprocesses concurrently — one per IP. Each subprocess has its own per-host timeout. This is significantly faster for multi-host scans because nmap's own --max-parallelism is per-process and doesn't leverage multiple cores well.

XML parsing: parse_nmap_xml reads the per-IP XML output via xml.etree.ElementTree. Three things get extracted:

Service version data for each port (<service> elements with name, product, version, extrainfo) — feeds Port and Service node enrichment plus a Technology node when the service is a recognized stack (apache, nginx, openssh, etc.).
NSE vuln output (<script id="..." output="..."> elements where the id starts with one of the known vuln-script prefixes) — parsed into per-finding Vulnerability records with CVE IDs extracted via regex from the script output (CVE-\d{4}-\d{4,7} pattern).
Nmap version string for telemetry (which version of the scanner produced these results — useful when reproducing findings months later).

Stealth mode behavior: when stealth is enabled at the project level, the timing template is forced to T2 (Polite — adds 0.4s delay between probes, drops parallelism to 1) and NSE scripts are disabled (the vuln script category sends loud probes that defeat stealth). Service version detection (-sV) stays on because version probes are necessary for CVE matching and modern WAFs typically don't alert on standard -sV traffic.

Why the pipeline runs Nmap after Naabu/Masscan rather than instead: Naabu/Masscan are 10-100x faster at the port-discovery step but their service detection is shallow (just IANA port-number guesses). Nmap is the gold standard for version detection but slow at port discovery. Splitting the work — fast scanners do the discovery, nmap does the deep probing only on confirmed-open ports — gets the best of both.

HTTP Prober (httpx)

Controls what metadata is extracted from live HTTP services.

Graph nodes — consumes: Domain, IP, Port, Service | produces: BaseURL, Certificate, Technology, Header, Service, Port

How it works

httpx is the bridge between port discovery and web-layer scanning. It takes the open-port set from Naabu/Masscan plus the subdomain set from Domain Discovery and decides which of those endpoints actually speak HTTP — then it harvests every piece of metadata it can in a single request per target.

Target construction: build_targets_from_naabu walks the merged port-scan output and emits one URL per (host, port) pair. The scheme is auto-selected — port 443/8443 and any service whose IANA name contains https/ssl/tls get https://, everything else gets http://. build_targets_from_dns adds a fallback layer: any Subdomain that doesn't have an open port from the scanners gets probed on the standard ports (80, 443) anyway, since some hosts only respond to web traffic and were silent during port scanning.

Docker invocation: httpx runs as projectdiscovery/httpx:latest inside Docker. The Docker command is built dynamically from the project's probe toggles — every individual flag (-status-code, -content-length, -title, -server, -tech-detect, -tls-grab, -jarm, -favicon, -asn, -cdn, etc.) maps 1:1 to a probe toggle in the form. Probes you don't need are simply not requested, which makes the per-target round-trip faster.

Probe internals:

-tls-grab performs a full TLS handshake and serializes the certificate chain (subject CN, SAN list, issuer, validity range, signature algorithm). The full cert is stored on the BaseURL as a Certificate node — used downstream by Subdomain Takeover detection (cert fingerprint lookups) and Security Checks (expiry-soon detection).
-jarm computes the JARM TLS fingerprint — a hash that identifies the server's TLS stack quirks. Useful for clustering hosts (CDN-fronted vs origin-direct, same product family, malware C2 signatures).
-favicon computes mmh3 hash of the favicon — fingerprint for finding identical hosts in Shodan/Censys/uncover via favicon hash search (a classic way to find dev/staging copies of a target).
-tech-detect runs httpx's built-in lightweight tech detection (header, body, and HTML-element pattern matching).
-asn and -cdn annotate the response with autonomous system + CDN provider (extracted from IP allocation databases) — feeds the Exclude CDN logic in port scanning.

Banner grabbing for non-HTTP ports: separately, run_banner_grab takes every port that's not in the HTTP-port allowlist (i.e. SSH, FTP, MySQL, Redis, etc.) and opens a raw socket with a configurable timeout, reads up to Max Length bytes, and runs identify_service on the banner — pattern-matching against known service signatures (e.g. SSH-2.0-OpenSSH_8.2p1 → openssh 8.2p1). This fills the gap where naabu/masscan only know IANA port numbers and httpx only handles HTTP.

Wappalyzer second pass: when Wappalyzer is enabled, every HTML response captured by httpx is run through a second-pass technology fingerprinter (~6,000 fingerprints from the open-source Wappalyzer dataset). This catches stack components httpx's built-in detection misses (specific JS libraries, CMS plugins, analytics frameworks). Results merge into the same Technology nodes — duplicates are MERGE-deduplicated.

Why httpx runs once per host: httpx is built around the principle "extract everything possible in one request." Other tools like wappalyzer-cli or sslyze would each make their own connection — running them against thousands of hosts doubles or triples the round-trip count. httpx's flag-driven probe model means one connection, all metadata.

Connection Settings:

Parameter	Default	Description
Threads	50	Concurrent HTTP probes
Timeout	15	Request timeout (seconds)
Retries	0	Retry attempts for failed requests
Rate Limit	150	Requests per second
Follow Redirects	true	Follow HTTP redirects
Max Redirects	10	Maximum redirect chain depth

Probe Toggles (each individually enabled/disabled):

Probe	Default	Description
Status Code	true	HTTP response status code
Content Length	true	Response body size
Content Type	true	MIME type of response
Title	true	HTML page title
Server	true	Server header value
Response Time	true	Time to first byte
Word Count	true	Number of words in response
Line Count	true	Number of lines in response
Tech Detect	true	Built-in technology fingerprinting
IP	true	Resolved IP address
CNAME	true	CNAME DNS records
TLS Info	true	TLS certificate details
TLS Grab	true	Full TLS handshake data
Favicon	false	Favicon hash (for fingerprinting)
JARM	false	JARM TLS fingerprint
ASN	true	Autonomous System Number
CDN	true	CDN provider detection
Response Hash	—	Hash algorithm for response body
Include Response	false	Include full response body
Include Response Headers	false	Include all response headers

Filtering:

Parameter	Default	Description
Paths	[]	Additional paths to probe on each host
Custom Headers	[]	Extra headers to send with requests
Match Codes	[]	Only keep responses with these status codes
Filter Codes	[]	Exclude responses with these status codes

Technology Detection (Wappalyzer)

Second-pass technology fingerprinting engine with 6,000+ fingerprints.

Parameter	Default	Description
Enabled	true	Master toggle for Wappalyzer
Min Confidence	50	Minimum detection confidence (0-100%)
Require HTML	false	Only fingerprint responses with HTML content
Auto Update	true	Update fingerprint database from npm
NPM Version	6.10.56	Wappalyzer npm package version
Cache TTL (hours)	24	How long to cache fingerprint data

Banner Grabbing

Raw socket banner extraction for non-HTTP services.

Parameter	Default	Description
Enabled	true	Master toggle for banner grabbing
Timeout	5	Connection timeout (seconds)
Threads	10	Concurrent banner grab connections
Max Length	1024	Maximum banner size (bytes)

Web Crawler (Katana)

Active web crawling for endpoint and parameter discovery.

Graph nodes — consumes: BaseURL | produces: Endpoint, Parameter, BaseURL

How it works

Katana is the primary discovery engine in the resource-enumeration phase. It runs as projectdiscovery/katana:latest inside a Docker container, takes every BaseURL produced by httpx as input, and crawls each one to extract URLs, query parameters, form action targets, and JavaScript-embedded endpoints.

Two crawl modes:

Standard (HTML parser) — parses HTML statically with goquery, follows <a href>, <form action>, <script src>, <link href>, and a few less-common element references. Fast, lightweight, no browser overhead. Misses anything that requires JavaScript execution to render.
Headless mode (-jc) — spins up a headless Chromium inside the container, loads each page, waits for JS execution, and crawls the rendered DOM. Resolves SPA routes (React Router, Vue Router, Next.js dynamic routes), client-side fetch()/axios calls, and dynamically-injected forms. Significantly slower (5-10x) and uses more memory, but catches modern apps where the standard parser would just see a <div id="root"></div> shell.

Scope control: the --scope flag is set to the comma-separated list of allowed apex domains (the project apex plus any explicitly added scope domains). This stops katana from wandering off into third-party CDN URLs and Google Analytics endpoints. Anything outside scope still gets recorded — but as ExternalDomain nodes for situational awareness, never crawled deeper.

Form parsing: when katana hits a <form> element, the parse_forms_from_html helper extracts the action URL, method, and every named input field. Each input becomes a Parameter node attached to the form's endpoint. This is one of the highest-signal sources of parameter data in the whole pipeline because form fields are guaranteed to be backend-recognized (vs Arjun's brute-forced names which might or might not be read).

Custom headers: when set, headers are passed to katana via -H flags — used for authenticated crawls (Cookie/Bearer/Basic), specific User-Agent overrides, or X-Forwarded-For-style header injection for testing trusted-header bugs.

Output streaming: katana writes JSON-line output to stdout. The module reads it line-by-line so progress is visible in the recon log even on long-running crawls. Each line is parsed into a (url, method, source, fields) tuple and merged into the graph.

Why headless mode is off by default: a single React/Vue app load can take 2-5 seconds (DOM parse + JS execution + idle wait). Multiply that by a few hundred BaseURLs and you've added 30+ minutes to the scan. For most targets the standard parser catches >80% of the URLs anyway because most modern frameworks ship a static manifest of routes that the HTML parser can find. Turn headless on when you know the target is a JS-heavy SPA where the static parser is finding suspiciously few endpoints.

Parameter	Default	Description
Enable Katana	true	Master toggle for active web crawling
Crawl Depth	2	How many links deep to follow (1-10). Each level adds ~50% time
Max URLs	300	Maximum URLs to collect per domain. 300: ~1-2 min/domain, 1000+: scales linearly
Rate Limit	50	Requests per second
Timeout	3600	Overall crawl timeout in seconds (default: 60 minutes)
JavaScript Crawling	false	Parse JS files with headless browser (+50-100% time)
Parameters Only	false	Only keep URLs with query parameters for DAST fuzzing
Exclude Patterns	[100+ patterns]	URL patterns to skip — static assets, images, CDN URLs
Custom Headers	[]	Browser-like headers to avoid detection
Parallelism	5	Number of target URLs to crawl simultaneously via `-p` flag (1-50)
Concurrency	10	Concurrent HTTP fetchers per target URL via `-c` flag (1-50)

Passive URL Discovery (GAU)

Passive URL discovery from web archives and threat intelligence sources.

Graph nodes — consumes: Domain, Subdomain | produces: Endpoint, Parameter, BaseURL

How it works

GAU (GetAllUrls) queries four archive sources in parallel for every URL ever recorded against the apex + each Subdomain. The four sources have different blind spots, so combining them produces broader coverage than any one alone:

Provider	Source	Strength	Blind spot
wayback	Internet Archive Wayback Machine	Historical depth — URLs that existed in 2010 are still here	Recent URLs (<24h) often not yet crawled
commoncrawl	Common Crawl public dataset	Web-scale breadth	Targeted URLs may be missed if not crawled
otx	AlienVault OTX URL list	Threat-intelligence-flagged URLs	Limited to URLs in security feeds
urlscan	URLScan.io scan history	URLs that were submitted for scanning (often by researchers/operators)	Auto-disabled when the URLScan module already ran (avoids double-counting)

run_gau_for_domain runs gau as a Docker container per (provider × domain) combination — all combinations are launched concurrently via a worker pool sized by Workers. Results are streamed back, deduplicated against the in-memory URL set, and capped at Max URLs per domain. The Year Range filter (Wayback only) is applied as a --from / --to flag pair on the gau command — useful when you want to exclude very old URLs whose paths don't exist anymore.

Blacklist extensions filter out static-asset URLs (.png, .jpg, .css, .pdf, .zip, .woff, etc.) that pollute results without being attack surface. The default list covers ~30 extensions; you can extend or shrink it per-project.

URL Verification (optional second pass): when on, every archived URL is re-checked with a HEAD request against the live target. This:

Strips dead URLs — a URL recorded by Wayback in 2018 might 404 today; without verification it would pollute the graph and waste downstream scanner time.
Captures live status code — useful for differentiating live endpoints from those that consistently 401/403 (still attack surface) vs 404 (gone).
Optionally detects HTTP methods — when Detect Methods is on, an additional OPTIONS probe per URL extracts the Allow: header listing supported methods (POST/PUT/DELETE/PATCH). This produces full per-method endpoint coverage without needing to actually send those methods.

The verification pass is rate-limited and threaded separately from the discovery pass (since it's actually touching the target — turning a passive scan into a low-volume active scan).

merge_gau_into_by_base_url maps each discovered URL back to its Subdomain/host, adds it to the in-progress by_base_url map (the canonical structure that downstream modules read), and emits Endpoint + Parameter + BaseURL nodes for the graph. Duplicates against existing Katana/Hakrawler/JsLuice findings are folded silently.

Why GAU complements Katana rather than replacing it: Katana finds URLs the target currently links to; GAU finds URLs the target ever exposed. Many production apps have abandoned endpoints (/api/v1/legacy/..., /admin/old-panel/, debug routes from previous deployments) that aren't in any sitemap or HTML response today but still respond when hit directly. These are some of the highest-yield findings in a typical engagement, and only GAU surfaces them.

Parameter	Default	Description
Enable GAU	false	Master toggle for passive URL discovery
Providers	wayback, commoncrawl, otx, urlscan	Data sources for archived URLs
Max URLs	1000	Maximum URLs per domain (0 = unlimited)
Timeout	60	Request timeout per provider (seconds)
Threads	5	Parallel fetch threads (1-20)
Year Range	[]	Filter Wayback by year (e.g., "2020, 2024"). Empty = all
Verbose Output	false	Detailed logging
Blacklist Extensions	[png, jpg, css, pdf, zip, ...]	File extensions to exclude
Workers	10	Parallel domain query workers (replaces hardcoded limit of 5) (1-20)

URL Verification (when enabled, GAU confirms URLs are still live):

Parameter	Default	Description
Verify URLs	false	HTTP check on archived URLs
Verify Timeout	5	Seconds per URL check
Verify Rate Limit	100	Verification requests per second
Verify Threads	50	Concurrent verification threads (1-100)
Accept Status Codes	[200, 201, 301, ...]	Status codes indicating a live URL
Filter Dead Endpoints	true	Exclude 404/500/timeout URLs

HTTP Method Detection (when verification is enabled):

Parameter	Default	Description
Detect Methods	false	Send OPTIONS to discover allowed methods
Method Detect Timeout	5	Seconds per OPTIONS request
Method Detect Rate Limit	50	Requests per second
Method Detect Threads	25	Concurrent threads

ParamSpider Passive Parameter Discovery

ParamSpider discovers URL parameters from the Wayback Machine archives. It queries web.archive.org for historical URLs containing query parameters, providing passive parameter discovery without sending any requests to the target. Disabled by default.

Graph nodes - consumes: Domain, Subdomain | produces: Endpoint, Parameter

How it works

ParamSpider queries the Wayback Machine's CDX index API (http://web.archive.org/cdx/search/cdx?url=*.<domain>/*&output=text&fl=original&collapse=urlkey) for every snapshot URL ever recorded for the apex domain plus each discovered Subdomain. The CDX API returns one URL per line — fast and structured.

The crucial filter is what comes next: every returned URL is parsed and only URLs containing a query string (?key=value) are kept. Everything else (static pages, asset URLs, parameterless API routes) is dropped. Where GAU returns all historical URLs, ParamSpider returns only the parameterized ones — the surface area you actually need for SQLi/XSS/SSRF/IDOR fuzzing.

For each surviving URL, ParamSpider replaces every parameter value with the string FUZZ (or whatever placeholder is configured). So https://example.com/api/users?id=42&debug=true becomes https://example.com/api/users?id=FUZZ&debug=FUZZ. This makes the output drop-in compatible with downstream fuzzers — Nuclei DAST, FFuf, sqlmap, ffuf wordlists, etc. all consume FUZZ-templated URLs natively.

The Workers setting controls how many domains are queried concurrently. Each worker is a separate request to web.archive.org's CDX API — increasing this past ~10 hits the archive's per-IP rate limits, which return slow/empty responses rather than HTTP 429.

Why this is so effective in practice: developers tend to rotate parameters less than they rotate URLs. A debug parameter (?debug=1, ?test=true, ?admin=1) added once during development frequently survives across years of refactoring even when the URL it was on gets renamed. ParamSpider catches these "ghost params" by mining the historical record.

Domain mode only: ParamSpider is skipped in IP mode because the Wayback Machine indexes by hostname, not by IP. There's no useful CDX query for 198.51.100.42.

Parameter	Default	Description
Enable ParamSpider	false	Master toggle for passive parameter discovery
Placeholder	FUZZ	Placeholder string injected into parameter values for downstream fuzzing
Timeout	120	Overall timeout in seconds
Workers	5	Parallel domain workers for Wayback Machine queries (1-10)

API Discovery (Kiterunner)

API endpoint brute-forcing using real-world Swagger/OpenAPI wordlists.

Graph nodes — consumes: BaseURL | produces: Endpoint, BaseURL

How it works

Kiterunner is Assetnote's purpose-built API route brute-forcer, run as a native Go binary (auto-downloaded on first use from the Assetnote GitHub releases). The crucial difference from generic content discovery (FFuf) is the wordlist source: kiterunner ships with .kite-format wordlists derived from tens of thousands of real-world Swagger/OpenAPI specifications scraped from public APIs. So instead of testing /admin, /login, /test from common.txt, it tests routes humans actually deploy: /api/v1/users/{id}/profile, /api/v2/orders/search, /v1/auth/refresh-token, /api/admin/users/{userId}/permissions.

Binary + wordlist provisioning (ensure_kiterunner_binary): on first scan, the helper detects the host architecture (linux-amd64 / linux-arm64 / macos-amd64 / macos-arm64), downloads the corresponding kiterunner release tarball from the Assetnote GitHub release, and unpacks the binary into tools/kiterunner/. The chosen wordlist (routes-large.kite ~140k routes, or routes-small.kite ~20k routes) is downloaded from https://wordlists-cdn.assetnote.io/data/kiterunner/... and cached on disk. Subsequent scans reuse the cached binary and wordlist.

Method detection layer: where most fuzzers only test GET, kiterunner enumerates all HTTP methods on each found route. Two modes:

bruteforce (default) — for each route returned with a 2xx/3xx/4xx, sends additional POST, PUT, DELETE, PATCH requests and records which methods are accepted. Slower but reliable.
options — sends a single OPTIONS request per route and parses the Allow: response header. Faster but unreliable on misconfigured servers that don't honor OPTIONS.

This matters because write-side methods (POST/PUT/DELETE) are the highest-yield attack surface for IDOR, mass-assignment, and unauthenticated mutation bugs — and most discovery tools never test them.

Status code filtering (Match Status Codes / Ignore Status Codes): the default match list keeps [200, 201, 202, 204, 301, 302, 303, 307, 308, 401, 403]. Note that 401 and 403 are kept by default — they indicate the route exists but requires auth. From an attacker's perspective those are still attack surface (potential privesc, auth bypass, or auth-header injection targets), so the default policy is "find them all, filter at review time."

Authenticated scans: when Custom Headers are set (Authorization: Bearer <token>, Cookie: session=..., etc.), they're passed to kiterunner via -H flags and applied to every request. This is essential when scanning APIs behind auth — the routes a logged-in user sees are different from the public ones.

Concurrency: Connections controls per-target concurrent connections, Threads controls how many targets are scanned in parallel. The two are multiplicative — 100 connections × 50 threads means up to 5000 simultaneous requests at peak, throttled by Rate Limit (req/sec) at the server-output stage.

Why this is run after Katana/Hakrawler rather than instead: crawlers find routes the app links to. Kiterunner finds routes the app implements but doesn't expose. The two are complementary — each catches a different blind spot.

Parameter	Default	Description
Enable Kiterunner	true	Master toggle for API brute-forcing
Wordlist	routes-large	`routes-large` (~100k, 10-30 min) or `routes-small` (~20k, 5-10 min)
Rate Limit	100	Requests per second
Connections	100	Concurrent connections per target
Timeout	10	Per-request timeout (seconds)
Scan Timeout	1000	Overall scan timeout (seconds)
Threads	50	Parallel scanning threads
Min Content Length	0	Ignore responses smaller than this (bytes)
Parallelism	2	Number of wordlists to process in parallel (1-5)

Status Code Filters:

Parameter	Default	Description
Ignore Status Codes	[]	Blacklist: filter out noise (e.g., 404, 500)
Match Status Codes	[200, 201, ...]	Whitelist: only keep these codes. Includes 401/403
Custom Headers	[]	For authenticated API scanning

Method Detection:

Parameter	Default	Description
Detect Methods	true	Find POST/PUT/DELETE methods beyond GET
Detection Mode	bruteforce	`bruteforce` (slower, more accurate) or `options` (faster)
Bruteforce Methods	POST, PUT, DELETE, PATCH	Methods to try in bruteforce mode
Method Detect Timeout	5	Seconds per request
Method Detect Rate Limit	50	Requests per second
Method Detect Threads	25	Concurrent threads

Web Crawler (Hakrawler)

Hakrawler is a DOM-aware web crawler that runs as a Docker container alongside Katana. It provides an additional crawling perspective with scope-aware link following.

Graph nodes — consumes: BaseURL | produces: Endpoint, Parameter, BaseURL

How it works

Hakrawler (by hakluke) is a lightweight Go-based crawler run as jauderho/hakrawler:latest inside Docker. Its key differentiator from Katana is implementation simplicity: it's ~500 lines of Go using colly for HTTP and goquery for HTML parsing, no headless browser, no JS execution, just fast HTML link extraction. This means it's faster per page than Katana but less thorough on JS-heavy targets.

Stdin-based invocation: hakrawler reads its target list from stdin instead of a file, so the module pipes BaseURLs directly into the container via echo "..." | docker run -i .... This avoids the file-mount round-trip and makes per-URL invocation cleaner.

Per-URL parallel containers: instead of one big hakrawler invocation against all BaseURLs, the module spins up Parallelism separate Docker containers — each crawls one BaseURL — and runs them concurrently. This is faster than a single in-process hakrawler crawl because it lets Docker's kernel-level parallelism scale across CPU cores, and each container has its own goroutine pool. The trade-off is container-startup overhead per URL (small, ~200ms each).

Crawl Depth controls link-follow recursion: depth=2 (the default) means hakrawler crawls the start URL, then every URL discovered there, then stops. Higher depths exponentially increase crawl time and rarely find more attack surface than depth 2-3 on typical web apps.

Include Subdomains toggle: when on, hakrawler will follow links to subdomains of the apex (e.g. crawling app.example.com discovers a link to api.example.com and follows it). When off, it stays on the exact host of the start URL. Even when on, the discovered URLs are still scope-filtered before merge — anything outside the project apex set is recorded as ExternalDomain rather than crawled deeper.

Skip TLS Verify: the default is on because targets in scope frequently have self-signed or mis-issued certs (staging hosts, internal CAs). Without this flag, hakrawler would refuse to crawl those targets and you'd silently lose coverage.

Custom Headers: passed via -h flag — used identically to Katana for authenticated crawls.

Auto-disabled in stealth mode: the project-level stealth toggle forces hakrawler off entirely, leaving Katana as the sole active crawler. This halves the request count to the target during stealth scans without significantly cutting coverage.

Why run two crawlers in parallel: Katana and hakrawler use different parsers, different scope-following heuristics, and different priority orderings. On most real-world targets they each find routes the other misses. The merge cost is essentially zero (deduplicated against each other and against Katana's output before graph insert), so unless you're optimizing for stealth, running both is a clear net win.

Parameter	Default	Description
Enable Hakrawler	true	Master toggle for Hakrawler crawling
Docker Image	jauderho/hakrawler:latest	Docker image to use
Crawl Depth	2	How many links deep to follow (1-10)
Threads	5	Concurrent crawling threads
Per-URL Timeout	30	Timeout per URL in seconds
Max URLs	500	Maximum URLs to discover
Include Subdomains	true	Allow crawler to follow links to subdomains. Results are still scope-filtered
Skip TLS Verify	true	Skip TLS certificate verification
Custom Headers	[]	Custom HTTP headers for requests
Parallelism	4	Number of URLs to crawl in parallel Docker containers (1-10)

Stealth mode: Hakrawler is automatically disabled in stealth mode to reduce the active crawling footprint.

JavaScript Analysis (jsluice)

jsluice is a JavaScript analysis tool compiled into the recon container. It downloads JS files discovered by Katana/Hakrawler from the target and analyzes them to extract hidden URLs, API endpoints, and embedded secrets.

Graph nodes — consumes: BaseURL, Endpoint | produces: Endpoint, Parameter, BaseURL, Secret

How it works

jsluice is Bishop Fox's Go-based static-analysis tool, compiled directly into the recon container image (no Docker round-trip per invocation). It uses tree-sitter-javascript to parse JS into an Abstract Syntax Tree, then walks the AST looking for two pattern classes that regex-grep approaches miss:

URL builders — fetch('/api/v1/' + userId), axios.get(\${BASE}/users/${id}`), template-literal patterns, string concatenation patterns. jsluice reconstructs the resulting URL by tracing variable assignments back through the AST. This catches API routes that are dynamically assembled at runtime, which a static regex like https?://[^"']+` would never see.
Secret patterns — AWS access keys, GCP service-account JSON snippets, JWT secrets, generic high-entropy strings, hardcoded API tokens. Each match is annotated with a confidence score based on context (a string starting with AKIA next to aws_secret_access_key is high-confidence, a random 40-char base64 string in isolation is low).

Pipeline: the module first walks the in-progress graph to find every JS-file URL referenced by BaseURLs and Endpoints (anything ending in .js, .mjs, .tsx, or with application/javascript content type). Each URL is downloaded once via urllib.request (no Selenium/headless browser — just the raw script source). The downloads are concurrent up to Concurrency workers and capped at Max Files total.

For each downloaded file, jsluice is run as a subprocess with the file as input. Its stdout is JSON-line: each line is one finding (URL, parameter, or secret). The parser ingests these and merges them into the graph:

URLs become Endpoint + BaseURL nodes (if the URL is in-scope) or ExternalDomain nodes (if it's third-party).
Parameters become Parameter nodes attached to the discovered Endpoint.
Secrets become Secret nodes — these are flagged for the AI agent's review and surface in the Insights dashboard's secrets panel.

No additional crawling: the only network traffic from jsluice is the JS file downloads themselves. All parsing is local to the container. So a jsluice run against 50 JS files is 50 GET requests total — significantly quieter than Katana's depth-3 crawl which can be hundreds of requests per BaseURL.

Why pair jsluice with JS Recon (the deeper module): jsluice is the "fast pass" — it surfaces low-hanging URLs and obvious secrets in seconds. JS Recon (GROUP 5b in the pipeline) does the deeper work: live API validation against 21 services, source-map discovery, dependency-confusion checks, DOM-XSS sink detection, framework version fingerprinting. Jsluice runs in the resource-enumeration phase to catch URLs early so they feed into Nuclei DAST in the same scan; JS Recon runs after and goes deeper.

Parameter	Default	Description
Enable jsluice	true	Master toggle for JavaScript analysis
Max Files	50	Maximum number of JS files to analyze
Timeout	120	Overall analysis timeout in seconds
Concurrency	5	Files to process concurrently
Extract URLs	true	Extract URLs and API endpoints from JS
Extract Secrets	true	Detect API keys, tokens, and credentials
Parallelism	3	Number of base URLs to analyze in parallel (1-10)

Note: jsluice downloads JS files from the target (HTTP requests) and analyzes them locally. No additional crawling beyond fetching the JS files themselves.

JS Reconnaissance

JS Recon is a deep JavaScript analysis engine that runs after resource enumeration. It downloads discovered JS files and runs parallel analysis modules to extract secrets, endpoints, dependency confusion risks, source map exposures, DOM XSS sinks, and framework fingerprints. Disabled by default.

How it works

JS Recon is the deeper companion to jsluice. Where jsluice does fast pattern matching during resource enumeration, JS Recon waits until all JavaScript files in the graph have been collected (Katana, Hakrawler, jsluice, GAU/Wayback) and then runs seven independent analysis passes in parallel on the unified file set. Each pass is a separate Python module under recon/helpers/js_recon/.

File source aggregation: at run time, JS Recon walks the in-progress graph and collects every JS-flavored URL from BaseURLs and Endpoints. The Include toggles control which classes are pulled:

Include Webpack Chunks — .chunk.js, .bundle.js, [hash].js files that Katana usually deprioritizes (they're hashed and treat as cache-busting noise) but actually contain the bulk of an SPA's logic
Include Framework JS — Next.js (/_next/static/chunks/), Nuxt.js (/_nuxt/), Astro (_astro/) framework chunks
Include Archived JS — historical JS URLs from GAU/Wayback that may contain secrets that have since been rotated but are still valid in practice

Files are downloaded via urllib.request with Concurrency workers, capped at Max Files total, and timed-boxed by the global timeout.

The seven analysis modules (each one its own helper module):

#	Module	What it does
1	Secret detection (`patterns.py`, `validators.py`)	Scans every JS file with 90+ regex patterns covering AWS keys, GCP service accounts, GitHub PATs, Slack tokens, Stripe keys, JWT secrets, GitHub Apps, Twilio creds, Mailgun, SendGrid, npm tokens, Heroku, Datadog, etc. Then for any matched secret, runs live API validation against the real provider (`https://sts.amazonaws.com/`, `https://api.github.com/user`, etc.) — verified secrets get high confidence, regex-only matches get low. Custom user patterns can be uploaded per-project (additive to defaults)
2	Source-map discovery (`sourcemap.py`)	Three independent strategies: parse `//# sourceMappingURL=...` comments at the file end; check the response `SourceMap:` and `X-SourceMap:` HTTP headers; probe common path patterns (`{file}.map`, `{file.replace('.js', '.js.map')}`, `/{base}/{filename}.map`, etc.). Custom probe paths uploadable per-project. Exposed source maps leak the entire pre-minification source tree
3	Dependency-confusion check (`dependency.py`)	Extracts every `import`/`require` reference to scoped packages (`@company/internal-utils`) and queries `https://registry.npmjs.org/<scope>/<name>` to check if the public registry has an entry. If the scope is registered to a different owner, an attacker could publish a malicious version with a higher semver and supply-chain the build pipeline
4	Endpoint extraction (`endpoints.py`)	Pattern-matches against extended endpoint signatures: REST routes (`/api/v\d+/...`), GraphQL endpoints (anything `POST`-only with `graphql` in the path), WebSocket URLs (`ws://`, `wss://`), router declarations (`{ path: '/admin/...' }` from React Router/Vue Router/Angular), admin/debug routes (`/admin/...`, `/debug/...`, `/internal/...`). User-uploaded custom keywords extend the default set
5	DOM-XSS sink detection	Pattern-matches against 15 dangerous JS sinks: `innerHTML`, `outerHTML`, `document.write`, `document.writeln`, `eval`, `Function()`, `setTimeout` with string arg, `setInterval` with string arg, `setAttribute('src'/'href', ...)`, `__proto__`, `Object.assign(window, ...)`, dynamic `import()`, etc. The output flags the file + line number for manual review — these sinks aren't always bugs but they're always candidates
6	Framework detection (`framework.py`)	12 frameworks with version extraction — React, Vue, Angular, Svelte, Next.js, Nuxt.js, Gatsby, Ember, Backbone, Knockout, jQuery, MooTools. Version regexes are tuned per-framework (e.g. React's `React.version = '17.0.2'` vs Vue's `Vue.version = '3.2.45'`). Custom framework signatures uploadable as JSON
7	Dev-comment extraction	Walks every comment in the JS files matching `TODO

Concurrency model: each of the seven modules runs in its own worker pool concurrently per file, so a 500-file scan with 10 concurrent files runs all seven analyses on each file in parallel. The total wall-clock time is bounded by the slowest module per file (usually the secret-validation pass when it has to make live API calls).

Output: every finding becomes a JsReconFinding node in the graph plus, when applicable, a typed Secret or Endpoint node. The Insights dashboard surfaces them in the JS-Recon panel; the AI agent reads them as context during exploit planning.

Graph nodes -- consumes: BaseURL, Endpoint | produces: JsReconFinding, Secret, Endpoint

Core Settings:

Parameter	Default	Description
Enable JS Recon	false	Master toggle for JS Reconnaissance (GROUP 5b)
Max Files	500	Maximum number of JS files to download and analyze
Concurrency	10	Concurrent file download threads
Timeout	900	Overall JS Recon timeout in seconds
Include Framework JS	true	Include framework-specific chunks (`/_next/static/`, `/_nuxt/`)
Include Chunks	true	Include `.chunk.js` and `.bundle.js` files
Include Archived JS	true	Include JS URLs from GAU/Wayback archive sources

Module Toggles:

Parameter	Default	Description
Secret Pattern Scanning	true	Scan JS files against 90+ regex patterns for credentials, tokens, and secrets
Source Map Discovery	true	Discover exposed `.map` files via comment parsing, HTTP headers, and path probing
Dependency Confusion Check	true	Check scoped npm packages against public registry for confusion risks
Endpoint Extraction	true	Extract REST, GraphQL, WebSocket, router, and admin/debug endpoints
DOM Sink Detection	true	Detect 15 DOM XSS sink patterns (innerHTML, eval, proto, etc.)
Framework Detection	true	Identify 12 frameworks with version extraction
Dev Comment Extraction	true	Extract TODO/FIXME/HACK comments with sensitive keywords

Secret Validation:

Parameter	Default	Description
Validate Discovered Keys	true	Test discovered secrets against their service APIs (21 services supported)
Validation Timeout	5	Per-validation request timeout in seconds
Minimum Confidence	low	Minimum confidence level to keep findings: `low`, `medium`, or `high`

JS File Sources:

Parameter	Default	Description
Include Webpack Chunks	true	Analyze `.chunk.js` and `.bundle.js` files excluded by Katana
Include Framework JS	true	Fetch Next.js (`/_next/static/chunks/`) and Nuxt.js (`/_nuxt/`) bundles
Include Archived JS	true	Analyze historical JS files from Wayback Machine/GAU

Custom Extensions (file uploads -- edit mode only):

Parameter	Default	Format	Description
Custom Secret Patterns	--	JSON array or TXT (`name\|regex\|severity\|confidence` per line)	Additional regex patterns. JSON schema: `[{name, regex, severity?, confidence?}]`
Custom Source Map Paths	--	TXT (one URL template per line using `{url}`, `{base}`, `{filename}`)	Extra paths to probe for `.map` files
Custom Internal Packages	--	TXT (one `@scope/name` per line)	Known internal npm package names to check against public registry
Custom Endpoint Keywords	--	TXT (one keyword per line, min 2 chars)	Extra keywords to search for in JS content
Custom Framework Signatures	--	JSON array `[{name, patterns[], version_regex}]`	Detection signatures for custom frameworks

All custom files have client-side validation before upload. Files are additive and do not replace built-in defaults.

Manual File Upload (edit mode only):

Parameter	Default	Description
Uploaded JS Files	[]	JS files for analysis without crawling -- from Burp Suite, mobile APKs, DevTools, or authenticated areas (`.js`, `.mjs`, `.map`, `.json`, max 10 MB each, multiple files supported)

Note: JS Recon is passive -- it downloads JS files already discovered by crawlers and analyzes them locally. Secret validation sends one minimally-scoped API request per secret with per-service rate limiting (1 req/sec). See JS Reconnaissance for full upload schemas, validation rules, and format examples.

Directory Fuzzer (FFuf)

How it works

FFuf (Fuzz Faster U Fool) is a Go-based directory + content brute-forcer baked into the recon container as a binary. The module runs it once per BaseURL with a generated wordlist + matcher/filter set, then ingests the JSON output for graph merge.

The discovery problem FFuf solves: crawlers (Katana, Hakrawler) can only find URLs the application links to. FFuf finds URLs the application responds to but doesn't link to — unlinked admin panels, .git / .env / .bak backups, dotfile leaks (.DS_Store, .svn/entries), debug consoles, undocumented API routes, common CMS paths (wp-admin/, phpmyadmin/), and version-control artifacts.

Wordlist source: the module ships a default wordlist (curated paths with the highest hit-rate per request) and supports per-project user-uploaded wordlists. Uploaded wordlists are stored under the project ID and mounted into the recon container at scan time, so you can load engagement-specific wordlists (e.g. tech-stack-specific lists for known PHP/Java/Node targets).

Smart Fuzzing toggle: when on, FFuf only fuzzes paths under directories already discovered by Katana/Hakrawler — instead of fuzzing the root /, it fuzzes /api/, /admin/, /static/, etc. that crawlers found. This dramatically cuts noise and request count: a typical web app might have 5-10 discovered directories, so 5,000-word fuzzing limits to those is 25k-50k requests instead of 5k requests against a single irrelevant root that returns 404 for every word.

Filter and matcher engines: ffuf supports filtering on response status (-fc), size (-fs), word count (-fw), line count (-fl), regex (-fr), and time (-ft). The flip side is matchers (-mc, -ms, -mw, -ml, -mr). Default project policy uses match-status filters — only keep responses with codes in the configured allowlist (typically 200,201,301,302,401,403). Tweak the size filter when the target returns a custom error page that's a fixed size (helps cut false positives).

Recursion: FFuf supports recursive fuzzing of discovered directories. The default depth is conservative (1-2 levels) because depth=3+ exponentially balloons the request count. Increase only when you have specific reason to believe deep nested paths exist.

Custom HTTP method, headers, and POST body: configurable for testing API endpoints that require POST/PUT/DELETE or auth. The headers field is the same place you set Authorization: / Cookie: for authenticated fuzzing.

Per-target rate limiting: the rate-limit setting is enforced via ffuf's built-in -rate flag (req/sec). Lower values keep the scan stealthy and avoid tripping WAFs that throttle at high rates.

FFuf (Fuzz Faster U Fool) brute-forces directory and endpoint paths using wordlists to discover hidden content that crawlers cannot find — admin panels, backup files, configuration pages, and undocumented APIs. Runs after jsluice and before Kiterunner in the pipeline. Disabled by default.

Graph nodes — consumes: BaseURL, Endpoint | produces: Endpoint, BaseURL

Parameter	Default	Description
Enable FFuf	false	Master toggle for directory fuzzing
Wordlist	common.txt	SecLists wordlist: `common.txt`, `raft-medium-directories.txt`, or `directory-list-2.3-small.txt`. Custom uploaded wordlists also appear here
Threads	40	Concurrent fuzzing threads
Rate	0	Requests per second (0 = unlimited). Capped by RoE if active
Timeout	10	Per-request timeout in seconds
Max Time	600	Overall fuzzing timeout in seconds (per target)
Match Codes	200, 201, 204, 301, 302, 307, 308, 401, 403, 405	HTTP status codes to keep
Filter Codes	[]	HTTP status codes to exclude
Filter Size		Response sizes to filter (comma-separated, e.g., `0,4242`)
Extensions	[]	File extensions to append (e.g., `.php`, `.bak`, `.env`)
Recursion	false	Enable recursive fuzzing into discovered directories
Recursion Depth	2	Maximum recursion depth (1-5)
Auto-Calibrate	true	Automatically filter false positives
Follow Redirects	false	Follow HTTP redirects
Custom Headers	[]	Custom HTTP headers (one per line, `Name: Value` format)
Smart Fuzz	true	Fuzz under base paths discovered by crawlers (e.g., `/api/v1/FUZZ`)
Parallelism	3	Number of targets to fuzz in parallel. Per-target threads are automatically reduced to avoid resource contention (1-10)

Custom Wordlists:

Upload your own .txt wordlists per-project via the FFuf settings UI. Uploaded wordlists appear in the dropdown under "Your custom lists" alongside the built-in SecLists. Maximum file size: 50 MB.

Stealth mode: FFuf is automatically disabled in stealth mode (it is an active brute-force tool).

RoE: When Rules of Engagement are active and FFUF_RATE is 0 (unlimited), it is automatically capped to the RoE max requests per second.

Parameter Discovery (Arjun)

How it works

Arjun (by s0md3v) is a Python-based hidden-parameter discovery tool. The core technique is response-differential analysis: for each endpoint, Arjun sends a baseline request and records the response signature (status code, content length, body hash, response time). Then it injects parameter names from a built-in wordlist (~25,000 common names like debug, admin, access_token, redirect, callback, test, id, uid, userId, email) and watches for any response whose signature differs from the baseline. A difference means the backend read the injected parameter and behaved differently — proof the parameter is recognized even though no link, form, or JS file ever named it.

Why this catches "ghost params" no other tool finds: developers leave parameters wired up long after the UI that used them is gone. A debug toggle (?debug=1) added during a 2-week sprint typically survives across years of deployments because nothing forces it to be removed. These are goldmines for:

IDOR — a UI says "you can only see your own profile" but the backend still honors ?userId=42 if you remember to send it
Mass assignment — a User model accepts a role field that the frontend never sends, but Arjun finds the backend silently honoring it
Debug-mode toggles — ?debug=true returns stack traces, ?test=1 skips auth checks, ?admin=1 reveals hidden UI
SSRF — an internal ?callback=http://... param the API uses for webhooks but never advertises
Open redirects — ?next=, ?return_to=, ?redirect= honored without validation

Multiple HTTP methods in parallel: Arjun tests GET (query string), POST (form-urlencoded), POST (JSON body), and POST (XML body) in parallel. Each method gets its own injection round because backends often accept the same parameter in multiple places (e.g. read from query string in GET, read from JSON body in POST). The output records which method-location combinations work for each found parameter.

Threads + delay: configurable to balance speed vs WAF triggering. Modern WAFs alert on rapid parameter-name fuzzing more than on path fuzzing, so Arjun's defaults are deliberately more conservative than FFuf's.

Output merging: each found parameter becomes a Parameter node MERGE-deduplicated against those already discovered by Katana (form fields), GAU (archived URLs), Hakrawler (link extraction), and JS Recon (JS endpoint extraction). The combined Parameter set is what Nuclei DAST then fuzzes — having Arjun in the pipeline can 2-5x the parameter count fed to DAST, dramatically improving coverage.

Required input: Arjun runs against existing Endpoints (URLs from Katana/GAU/etc.) plus BaseURLs. If those are empty (no resource enum has run yet), Arjun has no targets and exits without findings.

Arjun discovers hidden HTTP query and body parameters on discovered endpoints by testing ~25,000 common parameter names. It finds debug parameters, admin functionality, and hidden API inputs that aren't visible in HTML forms or JavaScript. Runs after FFuf in the pipeline, testing endpoints already discovered by crawlers and fuzzers. Disabled by default.

Graph nodes — consumes: BaseURL, Endpoint | produces: Parameter

Setting	Default	Description
Enable Arjun	false	Master toggle for parameter discovery
HTTP Methods	GET	Methods to test: GET (query params), POST (form body), JSON (JSON body), XML (XML body). Multiple methods run in parallel.
Max Endpoints	50	Maximum number of discovered endpoints to test. API and dynamic endpoints are prioritized over static ones.
Threads	2	Concurrent parameter testing threads per Arjun process
Request Timeout	15s	Per-request timeout
Scan Timeout	600s	Overall scan timeout per method
Chunk Size	500	Number of parameters tested per request batch. Lower values increase accuracy but make more requests.
Rate Limit	0	Max requests per second (0 = unlimited)
Stable Mode	false	Add random delays between requests to avoid WAF detection. Forces threads to 1 internally.
Passive Mode	false	Use CommonCrawl, OTX, and WaybackMachine only — no active requests to target
Disable Redirects	false	Do not follow HTTP redirects during parameter testing
Custom Headers	[]	Custom HTTP headers (e.g., auth tokens) added to every request

Stealth mode: Arjun is automatically switched to passive mode in stealth mode (queries archives only, sends no requests to the target).

RoE: When Rules of Engagement are active and ARJUN_RATE_LIMIT is 0 (unlimited), it is automatically capped to the RoE max requests per second.

GraphQL Security Testing

Dedicated GraphQL security testing module that discovers GraphQL endpoints, tests for exposed introspection, extracts the schema, flags sensitive fields, and (optionally) runs the external graphql-cop Docker container for 12 additional misconfiguration checks (alias overloading, batch query DoS, GraphiQL detection, trace mode, CSRF variants, etc.). Runs in parallel with Nuclei — both scanners read BaseURL/Endpoint/Technology and write Vulnerability nodes, but have zero data dependency on each other. Disabled by default.

How it works

GraphQL is fundamentally different from REST: a single endpoint accepts a query language that lets the client describe the exact shape of data it wants. This makes GraphQL much more expressive but introduces a unique attack surface: introspection (the schema-discovery feature) leaks the entire API surface; alias overloading lets an attacker bundle many queries in one request for DoS amplification; batching lets a single HTTP request exfiltrate hundreds of records; and the per-field resolver model means that authorization bugs are usually field-level and trivially bypassable.

Five-source endpoint discovery: the scanner doesn't blindly probe /graphql — it builds an evidence-weighted candidate list:

Source	Signal
User-specified	Custom Endpoints field in the project form — explicit operator input, highest priority
HTTP probe	Endpoints with `application/graphql` Content-Type or GraphQL indicators in httpx response bodies
Resource enum	Endpoints from Katana/Hakrawler/FFuf/GAU whose path contains `graphql`/`gql`/`query` (POST only — GET on these paths usually means a query string with the GraphQL document) — plus endpoints with `query`, `mutation`, `variables`, or `operationName` parameters
JS Recon	Findings of type `graphql` or `graphql_introspection` extracted from JS analysis
Pattern probing	Common GraphQL paths appended to every BaseURL (`/graphql`, `/api/graphql`, `/v1/graphql`, `/v2/graphql`). Secondary paths (`/query`, `/api/query`, `/gql`, `/api/gql`, `/graphiql`, `/api/graphiql`, `/playground`, `/api/playground`) only on BaseURLs that already showed GraphQL evidence

The evidence-gating on secondary paths matters because blind-probing every BaseURL with 12 path variants generates a lot of 404s; gating to "only probe further on hosts that look graphql-y" cuts noise sharply.

Three-stage probe per endpoint: for each candidate, the scanner sends a sequence:

Sanity probe — { __typename }. Cheapest possible valid GraphQL query. If this fails entirely (timeout / connection refused / 500), the endpoint is marked unreachable and the deeper tests skip it. If it succeeds, the endpoint is confirmed as a GraphQL server.
Simple introspection — { __schema { types { name } } }. Quick check whether introspection is enabled at all. If the response is errors: [{message: "GraphQL introspection is not allowed"}], the scanner stops here and records the endpoint as introspection-disabled.
Deep introspection — full IntrospectionQuery recursing through every type, field, argument, and directive (with the Introspection Depth Limit controlling how many levels of TypeRef fragment recursion it requests). This returns the full schema, which the scanner then walks to count operations, hash for change-detection, and pattern-match field names against the sensitive-field regex (password|token|secret|key|ssn|credit|cvv|pin|apikey|api_key|...).

Auth injection: the auth subsystem supports bearer, cookie, basic, header, and apikey modes. Each emits a different header pattern (Authorization: Bearer <value>, Authorization: Basic <base64>, custom name, etc.). Auth values are masked in logs ([masked]). Used by both the native scanner and the graphql-cop sidecar (forwarded via -H JSON header arg).

graphql-cop sidecar (when enabled): runs dolevf/graphql-cop:1.14 in Docker-in-Docker against each discovered endpoint. graphql-cop is the canonical GraphQL misconfig scanner with 12 tests covering alias overloading (DoS), batch-query DoS, directive overloading, circular query introspection, GraphiQL/Playground IDE exposure, GET-method support (CSRF surface), trace mode (Apollo timing leakage), GET-based mutations (full CSRF), POST url-encoded CSRF, field suggestions, unhandled-error stack traces, and a redundant introspection probe (off by default to dedupe with the native test).

The reason for pinning to 1.14: v1.15+ added an -e flag for excluding tests, but that flag isn't yet on DockerHub. Per-test exclusions are applied post-execution in Python by filtering the JSON output — heavier-traffic DoS tests (alias/batch/directive/circular) still hit the target if the master toggle is on, but the findings are filtered out before merge. For genuine stealth, use the master Enable graphql-cop toggle.

Endpoint capability flags: beyond emitting Vulnerability nodes for positive findings, graphql-cop sets boolean flags directly on the Endpoint node — graphql_graphiql_exposed, graphql_tracing_enabled, graphql_get_allowed, graphql_field_suggestions_enabled, graphql_batching_enabled, graphql_cop_ran. These flags are recorded even when the test result is negative (e.g. graphql_graphiql_exposed: false), so the AI agent can reason about confirmed-safe vs unprobed.

RoE filtering: discovered endpoints are filtered against ROE_EXCLUDED_HOSTS (with *.example.com wildcards supported) before testing. Out-of-scope endpoints are skipped and counted in the endpoints_skipped summary field.

Stealth mode overrides: rate limit forced to 2 req/s, concurrency forced to 1 (sequential), per-request timeout extended to 60s, and the four DoS-class graphql-cop tests forced off. The native introspection probe still runs because it's passive (a single read query).

Graph nodes -- consumes: BaseURL, Endpoint, Domain, Technology | produces: Endpoint (with GraphQL capability flags), Vulnerability, CVE

Core Settings:

Parameter	Default	Description
Enable GraphQL Security	false	Master toggle for the GraphQL security scanner (GROUP 6 Phase A)
Introspection Test	true	Probe each candidate endpoint for exposed introspection (`__schema`, `__type`). When enabled, extracts the full schema, counts queries/mutations/subscriptions, computes a schema hash, and flags sensitive fields (password, token, secret, key, ssn, credit, cvv, etc.)
Request Timeout	30	Per-request timeout in seconds (clamped 1-600). Applies to the initial `{ __typename }` probe, the simple introspection query, and the deep introspection query
Rate Limit	10	Maximum requests per second across all endpoints (clamped 0-100, 0 = unlimited). Enforced globally — delay = `1/rate_limit` between submissions
Concurrency	5	Parallel endpoint-testing threads (clamped 1-20, auto-reduced when fewer endpoints than threads). Endpoints are tested via `ThreadPoolExecutor`; `1` forces sequential mode
Introspection Depth Limit	10	Recursion depth for the TypeRef fragment in the full introspection query (clamped 1-20). Higher values extract more info on deeply-wrapped types (`NON_NULL → LIST → NON_NULL → NAMED`). Lower values avoid server-side query rejection on limit-aware GraphQL engines
Retry Count	3	HTTP retry attempts on transient failures (clamped 0-10). Targets `429`, `500`, `502`, `503`, `504` and connection-level errors
Retry Backoff	2.0	Base backoff factor in seconds between retries (clamped 0-10). Uses exponential backoff via urllib3 `Retry(backoff_factor=)`
Verify SSL	true	Verify TLS certificates on all GraphQL probes. Disable to test endpoints with self-signed or untrusted certificates
Custom Endpoints	—	Comma-separated GraphQL endpoint URLs to test explicitly, in addition to auto-discovered ones (e.g. `https://api.example.com/graphql,https://app.example.com/v1/query`)

Endpoint Discovery:

The module auto-discovers GraphQL endpoints from five sources, deduplicated and sorted:

Source	How It Discovers
User-specified	Values in the Custom Endpoints setting
HTTP probe	Endpoints with `application/graphql` Content-Type or GraphQL indicators in response
Resource enum	Katana/Hakrawler/FFuf/GAU endpoints whose path contains `graphql`, `gql`, or `query` (POST only) — plus endpoints with `query`, `mutation`, `variables`, or `operationName` parameters
JS Recon	Findings of type `graphql` or `graphql_introspection` extracted from JavaScript analysis
Pattern probing	Appends common GraphQL paths to every discovered base URL: `/graphql`, `/api/graphql`, `/v1/graphql`, `/v2/graphql`. Secondary patterns (`/query`, `/api/query`, `/gql`, `/api/gql`, `/graphiql`, `/api/graphiql`, `/playground`, `/api/playground`) are tested only on base URLs that already show GraphQL evidence elsewhere

Authentication:

When Auth Type is set, the scanner attaches auth headers to every introspection probe and to graphql-cop (via -H JSON headers). Authentication values are masked in logs.

Parameter	Default	Description
Auth Type	—	One of: `bearer`, `cookie`, `header`, `basic`, `apikey` (case-insensitive). Empty = no auth
Auth Value	—	The token, cookie string, raw header value, or `username:password` pair (basic)
Auth Header Name	—	Custom header name used when `Auth Type` is `header` (defaults to `X-Auth-Token`) or `apikey` (defaults to `X-API-Key`)

Auth type behavior:

Type	Emitted Header
`bearer`	`Authorization: Bearer <value>`
`cookie`	`Cookie: <value>`
`basic`	`Authorization: Basic <base64(username:password)>`
`header`	`<Auth Header Name>: <value>`
`apikey`	`<Auth Header Name or X-API-Key>: <value>`

graphql-cop External Scanner (opt-in Docker-in-Docker):

An optional Phase 2 scanner that wraps dolevf/graphql-cop:1.14 and runs 12 additional misconfiguration checks per endpoint. Automatically skipped when disabled or when all 12 per-test toggles are off. Uses Docker-in-Docker — requires the Docker socket to be mounted to the recon container.

Parameter	Default	Description
Enable graphql-cop	false	Master toggle for graphql-cop (opt-in — requires Docker socket access)
Docker Image	`dolevf/graphql-cop:1.14`	Docker image to execute. Pinned to `1.14` because the `-e` exclusion flag (v1.15+) is not yet on DockerHub — per-test exclusions are applied Python-side
Timeout	120	Seconds per endpoint before the container is killed (`subprocess.TimeoutExpired`)
Force Scan	false	Pass the `-f` flag to scan the endpoint even when graphql-cop does not detect it as GraphQL. Useful when the endpoint returns non-standard errors or custom wrappers
Debug Mode	false	Pass the `-d` flag to add `X-GraphQL-Cop-Test` header to every request for correlation with target logs

Network mode: graphql-cop uses the default Docker bridge network. When Use Tor is enabled at the project level, the container is started with --network host and passed the -T flag to route probes through Tor. When a global HTTP_PROXY is set, it is forwarded via -x.

Heavy-traffic tests: alias_overloading, batch_query, directive_overloading, and circular_query_introspection send DoS-class probes. In stealth mode the four DoS toggles are automatically forced to false. Because graphql-cop 1.14 doesn't honor -e, those probes still hit the target if the master toggle is on — use the master Enable graphql-cop toggle for true stealth.

Per-Test Toggles (12 tests — all run by default except introspection):

Each toggle below enables/disables one graphql-cop test. Exclusions are applied post-execution because the v1.14 image ignores the -e flag.

Parameter	Default	Severity	Description
Field Suggestions	true	info	Detects "Did you mean..." schema leakage that bypasses introspection-disabled defences
Introspection (cop)	false	high	Secondary introspection probe — disabled by default to deduplicate with the native introspection test above
GraphiQL IDE Exposed	true	medium	Detects exposed GraphiQL / GraphQL Playground / Apollo Studio IDE pages
GET Method Support	true	medium	Endpoint accepts queries via HTTP GET (enables cache poisoning + CSRF)
Alias Overloading	true	low	Tests server tolerance of aliased-field DoS. DoS — disabled in stealth mode
Array-based Query Batching	true	low	Tests array-batched query DoS amplification. DoS — disabled in stealth mode
Trace Mode	true	info	Apollo `tracing` extension exposes query timings (schema and resolver info leak)
Directive Overloading	true	low	Tests server tolerance of repeated directives on a single field. DoS — disabled in stealth mode
Circular Introspection	true	low	Recursive introspection query causing exponential parse cost. DoS — disabled in stealth mode
GET-based Mutation	true	high	Mutation allowed over GET (full CSRF surface)
POST url-encoded CSRF	true	medium	Mutation accepts `application/x-www-form-urlencoded` (cross-origin CSRF possible)
Unhandled Error Detection	true	info	Endpoint leaks stack traces / internal error paths on malformed queries

Endpoint Capability Flags:

Beyond creating Vulnerability nodes on positive findings, graphql-cop also sets these boolean flags directly on the GraphQL Endpoint node — even when a test returned negative (e.g. "GraphiQL exposed: false" is recorded explicitly):

Flag	Set By	Meaning
`graphql_graphiql_exposed`	detect_graphiql	IDE page served at the endpoint
`graphql_tracing_enabled`	trace_mode	Apollo tracing extension returns timing data
`graphql_get_allowed`	get_method_support	Endpoint accepts GET queries
`graphql_field_suggestions_enabled`	field_suggestions	"Did you mean..." responses enabled
`graphql_batching_enabled`	batch_query	Server responds to array-batched requests
`graphql_cop_ran`	—	Set to `true` after graphql-cop completes

Output:

Per tested endpoint, results are stored under combined_result.graphql_scan.endpoints[endpoint]:

introspection_enabled, schema_extracted — booleans
queries_count, mutations_count, subscriptions_count — operation counts
schema_hash — 16-char SHA256 prefix for change detection
operations.{queries,mutations,subscriptions} — lists of operation names
error — last error message if tests failed
All graphql-cop endpoint-flag booleans from the table above

Scan-wide summary at combined_result.graphql_scan.summary: endpoints_discovered, endpoints_tested, endpoints_skipped (RoE excluded), introspection_enabled, vulnerabilities_found, by_severity.{critical,high,medium,low,info}.

Rules of Engagement: Discovered endpoints are filtered by ROE_EXCLUDED_HOSTS (supports *.example.com wildcards) before testing. Out-of-scope endpoints are skipped and counted in endpoints_skipped.

Stealth mode overrides: GRAPHQL_RATE_LIMIT=2, GRAPHQL_CONCURRENCY=1 (sequential only), GRAPHQL_TIMEOUT=60, and the four DoS-class graphql-cop tests (alias/batch/directive/circular) forced to false. The native introspection test still runs because it is passive.

Partial recon: GraphQL scanning is available as a Partial Recon tool. The modal accepts custom URLs (validated against project scope) that are injected via GRAPHQL_ENDPOINTS and expanded by the same discovery pipeline. GRAPHQL_SECURITY_ENABLED is force-set to true for partial runs regardless of the project toggle. See Recon Pipeline Workflow — Partial Recon.

Subdomain Takeover Detection

Layered takeover scanner that stacks three independent engines against dangling DNS records and orphaned SaaS targets: Subjack (Apache-2.0 Go binary, DNS-first fingerprints), Nuclei takeover templates (-t http/takeovers/ -t dns/, HTTP-fingerprint coverage), and the BadDNS sidecar (AGPL-3.0, opt-in, Docker-in-Docker isolated image with 10 addressable modules covering CNAME/NS/MX/TXT/SPF/DMARC/wildcard/NSEC/zone-transfer/references). Findings are deduplicated across tools on (hostname, provider, method), scored 0-100, mapped to a confirmed / likely / manual_review verdict, and emitted as Vulnerability nodes with source="takeover_scan". Runs in parallel with Nuclei and GraphQL. Disabled by default. See the dedicated Subdomain Takeover Detection page for the full design and scoring rules.

How it works

A subdomain takeover happens when a DNS record (CNAME, NS, MX, A, etc.) points to an external service that the target no longer owns — e.g. a CNAME to oldproject.github.io whose GitHub Pages site has been deleted, leaving the namespace claimable by anyone. The result: an attacker registers the namespace and serves arbitrary content from a hostname under the target's apex.

This module layers three independent detection engines because no single engine catches everything: Subjack is fast and DNS-first but misses pure-HTTP fingerprints; Nuclei templates catch HTTP-fingerprint cases but require alive URLs; BadDNS catches advanced cases (NSEC walking, zone transfers, reference loops) but is heavy and AGPL-isolated.

Scoring algorithm (additive, clamped 0-100):

Signal	Weight
Confirmed by 2+ tools	+30
Subjack reports `confirmed`	+25
Provider is in the auto-exploitable list	+20
Nuclei takeover template matches	+15
Detection method is `cname` (most reliable)	+10
Detection method is `stale_a` or `mx` (probabilistic, needs human verification)	-15
Provider is unknown / not in the lookup table	-10

Verdicts: score >= threshold + 10 becomes confirmed; score >= threshold becomes likely; everything else is manual_review. Default threshold is 60, so confirmed >= 70, likely 60-69, manual review <60.

Auto-exploitable providers (single-step claim, +20 confidence bonus): GitHub Pages, Heroku, AWS S3, Shopify, Fastly, Ghost, Unbounce, ReadTheDocs, Surge, Webflow, Tumblr, Statuspage. The full fingerprint table covers ~40 service signals plus ~30 CNAME patterns and is in recon/helpers/takeover_helpers.py::PROVIDER_FROM_SIGNAL.

Subjack engine: native Go binary baked into the recon image. DNS-first — for every Subdomain in the graph, queries CNAME/NS/MX records and matches the targets against the takeover-prone provider list. Fast (~100s of subdomains per minute) and produces low false positives. Optional flags:

-ssl / Force HTTPS — probe over HTTPS (default off — most takeovers are detectable on HTTP)
-a / Test Every URL — probe every subdomain, not just CNAME-bearing ones (slower, more thorough)
-ns / Check NS Takeovers — detects expired NS delegations and dangling cloud-DNS zones (e.g. abandoned Route53 hosted zones)
-ar / Check Stale A Records — flags A records pointing to dead cloud IPs (probabilistic — high false-positive rate, requires human verification, scoring penalty applied automatically)
-mail / Check SPF/MX Takeovers — audits SPF includes and MX records for dead infrastructure references

Nuclei takeover templates engine: invokes the same nuclei binary as the main vuln scan but with two specific template directories: http/takeovers/ (HTTP-response-body fingerprints for ~50 SaaS providers — "There isn't a GitHub Pages site here" page, Heroku's "No such app" page, etc.) and dns/ (DNS-response-pattern templates). Targets are the alive URLs from httpx, so this layer only fires on hosts that respond to HTTP. Severity filter defaults to critical, high, medium. Has its own rate limit (default 50 req/s) independent of the main vuln scan. Interactsh is always off here since takeover templates don't need OOB.

BadDNS sidecar (opt-in): the 10-module BadDNS toolkit runs in an isolated Docker container (built once via docker compose --profile tools build baddns-scanner). Each module checks a different DNS record class:

Module	Catches
`cname`	Standard CNAME-to-dead-provider takeovers
`ns`	Dangling NS delegations (whole-zone takeover potential)
`mx`	MX records pointing to dead mail providers
`txt`	Dangling references in TXT records (verification tokens for services no longer used)
`spf`	SPF includes pointing to dead infrastructure (email spoofing surface)
`dmarc`	DMARC misconfig + reporting-address takeover
`wildcard`	Wildcard-record interaction with takeovers
`nsec`	NSEC zone walking (only opt-in — slow on large zones)
`zonetransfer`	AXFR allowed (full zone disclosure)
`references`	Cross-zone reference loops

Default module set is cname, ns, mx, txt, spf — the others are opt-in because they can be slow or noisy. Custom DNS resolvers can be configured to bypass the system resolvers if needed (useful in environments where the local resolver is rate-limited).

Deduplication and rescan idempotence: findings from all three engines get a deterministic ID hash from (hostname, provider, method) so re-running the scan converges on the same Vulnerability node instead of creating duplicates. When a finding is found by 2+ tools, the +30 confirmation bonus pushes it from likely to confirmed automatically.

Graph nodes -- consumes: Domain, Subdomain, DNSRecord, BaseURL (alive URLs) | produces: Vulnerability (source="takeover_scan", type="subdomain_takeover")

Master toggle:

Parameter	Default	Description
Enable Subdomain Takeover	false	Master toggle. When off the whole module is skipped and output contains `{ "skipped_reason": "disabled" }`

Subjack (Apache-2.0, native Go binary baked into the recon image):

Parameter	Default	Description
Enable Subjack	true	Enable the DNS-first Subjack layer. Requires the master toggle on
Threads	10	Concurrent Subjack workers (`-t`, clamped 1-100)
Request Timeout	30	Per-request connection timeout in seconds (`-timeout`)
Force HTTPS	true	Probe over HTTPS (`-ssl`). Improves accuracy against HTTPS-only SaaS providers
Test Every URL	false	Probe every subdomain, not just CNAME-bearing ones (`-a`). Slower but more thorough
Check NS Takeovers	false	Detect expired nameserver delegations and dangling cloud DNS zones (`-ns`)
Check Stale A Records	false	Flag A records pointing to dead cloud IPs (`-ar`). Probabilistic, requires human verification
Check SPF/MX Takeovers	false	Audit SPF includes and MX records for dead infrastructure references (`-mail`)
Subjack Run Timeout	900	Overall hard cap on the Subjack subprocess in seconds (minimum 60)

Nuclei Takeover Templates (HTTP fingerprint layer):

Parameter	Default	Description
Enable Nuclei Takeovers	true	Enable the Nuclei takeover layer. Targets are the alive URLs from httpx
Nuclei Takeover Run Timeout	1800	Overall hard cap on the Nuclei takeover subprocess in seconds
Severity Filter	`critical, high, medium`	Severity filter passed to Nuclei. Defaults to the three action-worthy levels
Rate Limit	50	Nuclei req/s rate limit for this layer only (does not affect the main vuln scan)

Shared from the main Nuclei block: NUCLEI_BULK_SIZE, NUCLEI_CONCURRENCY, NUCLEI_TIMEOUT, NUCLEI_RETRIES, NUCLEI_SYSTEM_RESOLVERS, NUCLEI_FOLLOW_REDIRECTS, NUCLEI_MAX_REDIRECTS, NUCLEI_DOCKER_IMAGE. Global NUCLEI_EXCLUDE_TAGS is not inherited here (would drop the takeover tag and neuter the whole layer). Interactsh is always off for this layer since takeover templates do not need OOB interactions.

Scoring & Verdicts:

Parameter	Default	Description
Confidence Threshold	60	Minimum score for `likely`; `threshold + 10` for `confirmed` (clamped 0-100)
Auto-publish Manual Review	false	Promote `manual_review` findings from `severity=info` to `severity=medium` so they appear in the main findings table instead of the review queue

Scoring is additive: +30 (confirmed by 2+ tools), +25 (Subjack confirmed), +20 (provider in auto-exploitable list), +15 (Nuclei template match), +10 (method=cname), -15 (method=stale_a/mx), -10 (provider unknown). Verdicts: >= threshold+10 -> confirmed; >= threshold -> likely; otherwise manual_review.

BadDNS (AGPL-3.0 isolated Docker-in-Docker sidecar, opt-in):

Parameter	Default	Description
Enable BadDNS	false	Opt-in AGPL sidecar. Requires `docker compose --profile tools build baddns-scanner` once before the first run
Docker Image	`redamon-baddns:latest`	Sidecar image tag. Override only when testing a non-default build
Modules	`cname, ns, mx, txt, spf`	Active module set. Full addressable list: `cname`, `ns`, `mx`, `txt`, `spf`, `dmarc`, `wildcard`, `nsec`, `references`, `zonetransfer`. `nsec` and `zonetransfer` are opt-in because they can be slow on large targets
Nameservers	[]	Optional custom DNS resolvers. Empty = system resolvers
BadDNS Run Timeout	1800	Overall hard cap on the baddns subprocess in seconds. Orphan containers are reaped via `docker kill <container_name>` on timeout

Auto-exploitable providers (single-step claim, +20 confidence bonus): github-pages, heroku, aws-s3, shopify, fastly, ghost, unbounce, readthedocs, surge, webflow, tumblr, statuspage. Full fingerprint table lives in recon/helpers/takeover_helpers.py::PROVIDER_FROM_SIGNAL and covers ~40 signals plus ~30 CNAME patterns.

Stealth mode overrides: NUCLEI_TAKEOVERS_ENABLED=false, BADDNS_ENABLED=false, SUBJACK_ALL=false, SUBJACK_CHECK_NS=true, SUBJACK_CHECK_MAIL=true (both DNS-only and safe at low concurrency), SUBJACK_THREADS=3, TAKEOVER_RATE_LIMIT=10. Subjack stays on in DNS-only mode because CNAME/NS/MX resolution does not generate HTTP traffic to the target.

Partial recon: Subdomain Takeover is a Partial Recon tool. The modal accepts custom subdomains (validated against project scope -- entry must equal the apex or end with .<apex>). User-provided dangling subdomains with no A/AAAA are still scanned because they are the prime takeover candidates. SUBDOMAIN_TAKEOVER_ENABLED is force-set to true for partial runs. Rescans converge on the same Vulnerability.id (deterministic hash of hostname|provider|method) instead of duplicating. See Recon Pipeline Workflow -- Partial Recon.

VHost & SNI Enumeration

Discovers hidden virtual hosts on every target IP by sending two crafted curl probes per candidate hostname. The L7 probe overrides the HTTP Host: header to catch classic Apache/Nginx vhosts that route on the application layer. The L4 probe uses curl --resolve to force the TLS SNI value to the candidate hostname, catching modern reverse proxies (k8s ingress, Traefik, NGINX-ingress, Cloudflare, AWS ALB) that route at the TLS handshake before any HTTP is parsed. Each response is compared to a baseline (raw IP request, no Host override) and anomalies are emitted as Vulnerability nodes with source="vhost_sni_enum". When L7 and L4 disagree on the same hostname, the finding is escalated to host_header_bypass (high severity) — a routing inconsistency that can bypass edge controls. Runs in parallel with Nuclei, GraphQL, and Subdomain Takeover. Disabled by default. See the dedicated VHost & SNI Enumeration page for the full design, candidate-source priority, and severity rules.

How it works

The fundamental insight: many web servers serve different content for https://example.com than they do for https://198.51.100.42 even though those resolve to the same IP. The selection mechanism is one of two things — the HTTP Host: header (L7 routing — Apache, nginx, classic vhosts) or the TLS Server Name Indication value sent during the handshake (L4 routing — most modern reverse proxies, k8s ingress, Cloudflare, AWS ALB). This module probes both layers independently to enumerate the full set of hostnames a target IP serves.

Per-candidate, per-IP probe sequence:

Baseline — curl https://<IP>/ with no Host override and no SNI hint. Records status code, body size, response hash. This is what an unhinted attacker sees.
L7 probe — curl -H "Host: <candidate>" https://<IP>/. Sends the configured candidate hostname as the HTTP Host header but keeps the TLS SNI as the IP. If the response differs meaningfully from the baseline, the IP is serving a vhost for that candidate at the application layer.
L4 probe — curl --resolve <candidate>:<port>:<IP> https://<candidate>/. Forces curl to send the candidate as both the SNI value and the Host header, but pins the resolution to the target IP. If the response differs from the baseline, the IP is serving a vhost selected at the TLS layer (the candidate's certificate is presented even though the connection went to the IP).

The host_header_bypass finding (escalated to high severity): when L7 and L4 disagree on the same candidate (i.e. L7 returns one response, L4 returns a different one), it means the target's edge proxy and origin disagree on routing. This is exploitable: a request crafted with Host: and SNI different values can route past edge controls (WAF, auth, geo-fencing) to a different origin. This pattern is common in misconfigured CDN-fronted apps and k8s clusters with ingress mismatches.

Candidate source priority (deduplicated across all sources, capped at Max Candidates Per IP):

Priority	Source	Why
1	Existing Subdomain nodes	Already known to be valid hostnames in scope
2	ExternalDomain nodes	Known third-party associations — useful for shared-infrastructure detection
3	TLS SAN list from existing Certificates	The cert says the host is valid for these names — high signal
4	CNAME targets resolving to this IP	DNS evidence of association
5	Reverse-DNS PTR records	The IP claims this hostname
6	Bundled `vhost-common.txt` wordlist	~2,380 common admin/dev/staging/internal/modern-stack prefixes expanded as `{prefix}.{apex}`
7	Custom user wordlist	Per-project additions in the form

When the candidate count exceeds the per-IP cap, excess entries are dropped deterministically (alphabetic sort) so reruns hit the exact same set — enables idempotent finding deduplication.

Severity model:

high — L7 and L4 disagree on the same hostname (proxy bypass primitive)
medium — discovered hostname matches an internal-keyword pattern (admin, jenkins, k8s, vault, argocd, phpmyadmin, grafana, kibana, gitlab, internal, …) — internal services exposed via vhost
low — status code differs from baseline (some response routing happened, but no obvious internal-keyword signal)
info — only body-size differs by more than tolerance (subtle routing, mostly noise)

The internal-keyword matcher uses longest-match wins, lex tiebreak so reruns produce the same severity tag for the same hostname.

Performance & concurrency: per-IP probes run in a ThreadPoolExecutor sized by Concurrency (default 20). Each probe has a configurable connect timeout (default 3s); the total budget per probe is 3× that. Baseline Size Tolerance (default 50 bytes) controls how much body-size delta is considered noise vs signal — useful to suppress Set-Cookie / CSRF-token / timestamp jitter that varies between requests.

Hostname injection safety: candidates pass through an RFC-1123 validator that's anchored with \Z (not $, which would let evil\n.example.com slip past). Colons, newlines, spaces, quotes, backticks, dollar signs, NUL bytes, underscores, and labels longer than 63 chars are all rejected before reaching curl --resolve. All subprocess calls use subprocess.run([...], shell=False) — defense in depth even though shell metacharacters can never reach a shell.

Inject Discovered URLs: when on, every confirmed hidden vhost is automatically added as a BaseURL node and pushed into http_probe.by_url so downstream modules (sister Subdomain Takeover scanner, follow-up partial-recon Nuclei/Katana) pick it up. This converts a discovery-phase finding directly into expanded attack surface for vuln scanning in the same run.

Tools used: only curl (already in the recon container) and httpx (already pulled by the HTTP probing phase, used in partial-recon mode only). No new Docker image, pip dependency, or API key — the module is self-contained and doesn't add to the container build.

Graph nodes -- consumes: Subdomain, IP, Port, BaseURL, Certificate, DNSRecord, ExternalDomain | produces: Vulnerability (source="vhost_sni_enum"), BaseURL (for hidden vhosts), defensive Subdomain | enriches: Subdomain (vhost_hidden, vhost_routing_layer, sni_routed, ...), IP (is_reverse_proxy, vhost_baseline_*, hidden_vhost_count, ...)

Master toggle:

Parameter	Default	Description
Enable VHost & SNI	false	Master toggle. When off the module is skipped and output contains `{ "skipped_reason": "disabled" }`

Test layers:

Parameter	Default	Description
L7 (HTTP Host header)	true	Sends `curl -H "Host: candidate" https://IP`. Catches classic Apache/Nginx vhost routing
L4 (TLS SNI)	true	Sends `curl --resolve candidate:port:IP https://candidate`. Only fires on HTTPS ports. Catches reverse-proxy / k8s ingress / Cloudflare routing

If both layers are off the module exits with { "skipped_reason": "all_layers_disabled" }.

Candidate sources:

Parameter	Default	Description
Use Graph Candidates	true	Pull hostnames from existing Subdomain, ExternalDomain, TLS SAN list, CNAME targets, and reverse-DNS PTR records resolving to each target IP. Highest-signal source
Use Default Wordlist	true	Use the bundled `recon/wordlists/vhost-common.txt` (~2,380 admin/dev/staging/internal/modern-stack prefixes), expanded as `{prefix}.{target_apex}` per IP
Custom Wordlist	""	Optional newline-separated prefixes/hostnames pasted in the project form. Bare prefixes are expanded against the apex; full hostnames (containing a dot) are used as-is. Stored in a `Text` column (no length cap). Excluded from project-preset export (per-project content)
Max Candidates Per IP	2000	Hard cap on candidates per IP. Excess entries are dropped deterministically (sorted alphabetically) so reruns hit the same set

Performance:

Parameter	Default	Clamp	Description
Per-Request Timeout	3	>= 1	curl `--connect-timeout` per probe in seconds. Total budget per probe is 3× this value
Concurrency	20	>= 1	Parallel probes per (IP, port) via ThreadPoolExecutor. Higher = faster, louder
Baseline Size Tolerance	50	>= 0	Bytes of size delta to ignore when status code matches baseline. Suppresses Set-Cookie / CSRF token / timestamp jitter
Inject Discovered URLs	true	-	When a hidden vhost is confirmed, create a `BaseURL` node and add the URL to `http_probe.by_url` so downstream tools (sister Subdomain Takeover scanner, follow-up partial-recon Nuclei/Katana) pick it up

Severity model: high when L7 and L4 disagree on the same hostname (proxy bypass primitive); medium when the discovered hostname matches an internal-keyword pattern (admin, jenkins, k8s, vault, argocd, phpmyadmin, etc.); low when status code differs from baseline; info when only body size differs beyond tolerance. The internal-keyword matcher uses the longest match (lex tiebreak) so reruns produce the same severity tag.

Tools used: curl (already in the recon container) and httpx (already pulled by GROUP 4 — used in partial-recon mode only). No new Docker image, pip dependency, or API key.

Hostname injection safety: the candidate pipeline ends with an RFC-1123 validator anchored with \Z (not $, which would let evil\n.example.com slip past). Colons, newlines, spaces, quotes, backticks, dollar signs, NUL bytes, underscores, and labels longer than 63 chars are all rejected before reaching curl --resolve. Subprocess calls use subprocess.run([...], shell=False) — defense in depth even though shell metacharacters can never reach a shell.

Stealth tuning: there is no automatic stealth-mode override at the runtime layer. Stealth is handled at the preset layer instead — red-team-operator sets VHOST_SNI_TEST_L4=false, VHOST_SNI_USE_DEFAULT_WORDLIST=false, VHOST_SNI_CONCURRENCY=5. stealth-recon disables the module entirely (2,380 probes through Tor would be catastrophic).

Partial recon: VHost & SNI is a Partial Recon tool. The modal accepts custom subdomains (added as candidate hostnames, must be in scope) and custom IPs (added as extra targets, validated as IPv4 or CIDR /24-/32). VHOST_SNI_ENABLED is force-set to true for partial runs. Rescans converge on the same Vulnerability.id (deterministic vhost_sni_<host>_<ip>_<port>_<layer>) instead of duplicating. When both graph and custom inputs are empty, the run exits with a "no IP targets" message. See Recon Pipeline Workflow -- Partial Recon.

Vulnerability Scanner (Nuclei)

Template-based vulnerability scanning with 9,000+ community templates.

Graph nodes — consumes: BaseURL, Endpoint, Technology, Domain | produces: Vulnerability, Endpoint, Parameter, CVE, MitreData, Capec

How it works

Nuclei is the heaviest single module in the pipeline by output volume — most web-layer findings come from here. It runs as projectdiscovery/nuclei:latest inside Docker with templates auto-updated from the public ProjectDiscovery template repository on each scan (when Auto Update Templates is on). Templates are stored in a persistent Docker volume so updates are incremental, not from-scratch.

Target construction (UNION-based): nuclei targets are built as the deduplicated UNION of every web-layer source already in the graph:

Endpoints with parameters from Resource Enumeration (Katana/Hakrawler/GAU/Arjun)
BaseURLs verified by httpx (the live web-host set)
http(s)://<sub> fallbacks for any Subdomain whose host isn't already covered by sources 1 or 2

This third bucket exists because it's possible for Domain Discovery to surface a Subdomain that hasn't yet been probed by httpx (e.g. a recently-discovered host or one where httpx errored). Without the fallback, those subdomains would be silently skipped. Nuclei sees them with default ports.

IPs are excluded by default to avoid scanning shared infrastructure. The Scan All IPs toggle includes them when needed (e.g. raw-IP exposed services).

DAST mode is a filter, not an addition: when DAST Mode is on, nuclei is invoked with the -dast flag which filters the loaded template set down to only templates with a fuzz: directive (~300 of the ~9,000 total) — these are the active fuzzing templates for SQLi, XSS, SSRF, OS injection, etc. Detection-class templates (CVE detection, exposure detection, panel detection) are skipped entirely. So if you turn DAST on but use detection-class tags (graphql, apollo, hasura, exposure), the resulting template set is empty and nuclei errors out. Use DAST-native tags only when DAST is on (sqli, xss, ssrf, lfi, rfi, xxe, ssti, openredirect, cmdi).

Most production scans should leave DAST off — the detection templates catch real CVEs and misconfigurations on a much larger set of templates, while DAST is best run as a focused targeted scan after detection finds something interesting.

Interactsh integration: when on, nuclei is wired up to the public Interactsh server (or a self-hosted one). For blind injection templates (blind SQLi, blind SSRF, OOB XXE), nuclei generates a unique callback URL on the Interactsh server, embeds it in the payload, and listens for the OOB hit that confirms the target reached the callback. This catches a class of vulnerabilities that have no in-band response signal at all.

Severity filter and tag includes/excludes: -severity filters which severity levels to keep (excluding info is ~70% faster because info-level templates are the bulk count). -include-tags and -exclude-tags further narrow the template set — dos and fuzz are excluded by default for production scans because they generate volumetric load.

Stream parsing: _execute_nuclei_pass runs nuclei via subprocess.Popen with stdout streamed line-by-line. Each JSON-line is parsed via parse_nuclei_finding and merged into the graph immediately, so progress is visible in the recon log even on multi-hour scans. Per-template-id deduplication keeps the same finding from being recorded multiple times when nuclei retries.

Output enrichment: each finding emits a Vulnerability node + a CVE node (when the template has a CVE ID in its info block) + MitreData / Capec nodes (when the template references CWE/CAPEC). Vulnerability + CVE + MITRE in one pass means nuclei findings are immediately ready for the Insights dashboard's CVE/MITRE views without a separate enrichment phase.

Performance Settings:

Parameter	Default	Description
Severity Levels	critical, high, medium, low, info	Severity filter. Excluding "info" is ~70% faster
Rate Limit	100	Requests per second
Bulk Size	25	Hosts processed in parallel
Concurrency	25	Templates executed in parallel
Timeout	10	Request timeout per check (seconds)
Retries	1	Retry attempts for failed requests (0-10)
Max Redirects	10	Maximum redirect chain (0-50)

Template Configuration:

Parameter	Default	Description
Template Folders	[]	Directories to include (cves, vulnerabilities, misconfiguration, exposures, etc.). Empty = all
Exclude Template Paths	[]	Exclude specific directories or files
Custom Template Paths	[]	Your own templates in addition to the official repo
Include Tags	[]	Filter by tags: cve, xss, sqli, rce, lfi, ssrf, xxe, ssti. Empty = all
Exclude Tags	[]	Exclude tags — recommended: dos, fuzz for production

Template Options:

Parameter	Default	Description
Auto Update Templates	true	Download latest before scan (+10-30 seconds)
New Templates Only	false	Only run templates added since last update
DAST Mode	true	Active fuzzing for XSS, SQLi, RCE (+50-100% time)

Advanced Options:

Parameter	Default	Description
Headless Mode	false	Use headless browser for JS pages (+100-200% time)
System DNS Resolvers	false	Use OS DNS instead of Nuclei defaults
Interactsh	true	Blind vulnerability detection via out-of-band callbacks
Follow Redirects	true	Follow HTTP redirects during scanning
Scan All IPs	false	Scan all resolved IPs, not just hostnames

CVE Enrichment

Enrich findings with CVSS scores, descriptions, and references.

Graph nodes — consumes: Technology | produces: CVE, MitreData, Capec

Parameter	Default	Description
Enable CVE Lookup	true	Master toggle
CVE Source	nvd	Data source: `nvd` or `vulners`
Max CVEs per Finding	20	Maximum entries per technology (1-100)
Min CVSS Score	0	Only include CVEs at or above this score (0-10)

Note: NVD and Vulners API keys are configured in Global Settings → API Keys (user-scoped), not in project settings.

How it works

CVE enrichment turns a Technology node like Apache 2.4.49 into a list of CVE nodes attached to the same host. The challenge is that Technology strings come from many sources with inconsistent formatting (httpx, Wappalyzer, Nmap NSE, OSINT tools each format service identifiers differently), so the lookup pipeline goes through three normalization steps before it queries the upstream database:

Server-header splitting (split_server_header): a single Server: Apache/2.4.49 (Ubuntu) PHP/7.4.3 header contains multiple products. The splitter parses this into separate (name, version) tuples (apache 2.4.49, php 7.4.3) — each gets its own CVE lookup.
Technology-string parsing (parse_technology_string): handles formats from Wappalyzer (React 17.0.2), Nmap NSE (OpenSSH 8.2p1), and httpx tech-detect (nginx-1.18.0). Returns a normalized (name, version) pair.
Product-name normalization (normalize_product_name): canonicalizes vendor naming inconsistencies — microsoft-iis ↔ iis ↔ Microsoft IIS all map to a single key. Also strips marketing suffixes (Enterprise/Pro/Lite) that NVD doesn't track separately. Semver is extracted via _extract_semver so 2.4.49-deb10u1 becomes 2.4.49 for the CPE lookup.

NVD backend (lookup_cves_nvd): queries https://services.nvd.nist.gov/rest/json/cves/2.0 with the normalized product name + version. The response contains the full CVE record with both CVSS v3.1 and v2.0 metrics — the parser prefers v3.1 (newer, richer attack-vector data) and falls back to v2.0 only when v3 isn't available. CVSS score and severity are extracted from metrics.cvssMetricV31[0].cvssData.baseScore (v3) or metrics.cvssMetricV2[0].cvssData.baseScore (v2). The classified severity (classify_cvss_score) is recomputed from the score using the standard NVD bands (Critical >= 9.0, High >= 7.0, Medium >= 4.0, Low >= 0.1, None = 0).

Vulners backend: queries https://vulners.com/api/v3/burp/software/ — Vulners' Burp-API endpoint that takes a product+version pair and returns the matching CVE list, EPSS exploit probability scores, and exploit-DB references. Vulners is generally faster and includes more recent CVEs than NVD's REST API but requires an API key for any meaningful query volume.

Min CVSS Score filter: applied client-side after the API response. Setting this above 0 dramatically cuts noise (CVE records with no CVSS score get dropped, which is most third-party CMS plugin CVEs).

Max CVEs per Finding cap: applied per-technology before merge. A single Apache version can have 100+ historical CVEs — the cap keeps the graph from being dominated by ancient CVEs that don't matter for current exploitation. Sort order is severity desc, score desc, date desc, so the top-N are always the most exploitable.

API key handling: keys live in Global Settings (user-scoped), so the same key works across projects without re-entry. Without a key, NVD enforces a 5 req/30s rate limit (5 lookups per 30 seconds) — fine for small targets but a bottleneck on large multi-tech graphs.

MITRE Mapping

CWE/CAPEC enrichment of CVE findings.

Parameter	Default	Description
Auto Update DB	true	Auto-update CWE/CAPEC database
Include CWE	true	Map CVEs to CWE weaknesses
Include CAPEC	true	Map CWEs to CAPEC attack patterns
Enrich Recon CVEs	true	Enrich CVEs from reconnaissance
Enrich GVM CVEs	true	Enrich CVEs from GVM scans
Cache TTL (hours)	24	Database cache duration

How it works

The MITRE module enriches every CVE in the graph with two layers of attacker-knowledge metadata: the CWE (Common Weakness Enumeration) — the underlying weakness class — and the CAPEC (Common Attack Pattern Enumeration and Classification) — the attacker techniques used to exploit that weakness class. Together they let you go from "this is a CVE-2021-44228 finding" to "this is a CWE-502 deserialization weakness exploitable via CAPEC-586 (Object Injection) and CAPEC-129 (Pointer Manipulation)".

Database provisioning: on first scan, the module downloads two official MITRE datasets:

CWE database — https://cwe.mitre.org/data/xml/cwec_latest.xml.zip — the canonical CWE hierarchy (~900 weakness entries) with descriptions, mitigation guidance, and direct links to CAPEC patterns
CAPEC database — https://capec.mitre.org/data/xml/capec_latest.xml — the canonical CAPEC catalog (~600 attack patterns) with prerequisites, attack steps, and example instances

These XML files are parsed (xml.etree.ElementTree) and converted into JSON databases (cwe_db.json, capec_db.json, cwe_metadata.json) stored in the recon container's data directory. The conversion happens once and the JSONs are reused; the XMLs are big and slow to parse on every scan.

Cache TTL: the database is considered fresh for Cache TTL (hours) after download (default 24h). Once expired, the next scan re-downloads if Auto Update DB is on. This balances "always have the latest CWE/CAPEC entries" against "don't burn 30s downloading XMLs for every scan."

The mapping itself: for every CVE in the graph, the module:

Looks up the CVE's primary CWE from the NVD response (NVD records ship a weaknesses array with the official CWE classification per CVE — usually 1-2 entries)
Loads the CWE node from the local cwe_db.json (description, parent CWE relationships, related CWEs)
Walks the CWE's Related_Attack_Patterns list to find directly-mapped CAPECs
Loads each CAPEC's attack-pattern details (prerequisites, attack-steps narrative, MITRE ATT&CK technique mappings where applicable)
Emits MitreData (CWE) and Capec nodes attached to the CVE in the graph

Why direct mappings only: CWE has a notion of "indirect" relationships (CWE-79 → CWE-78 → CWE-88, transitively related). The module deliberately uses only direct CWE→CAPEC mappings (Include CAPEC toggle controls this) because indirect chains generate noise — a single CVE can transitively map to dozens of unrelated CAPECs through deep CWE inheritance.

Dual-source enrichment: the module enriches CVEs from both reconnaissance (Nuclei findings, Nmap NSE, JS Recon) and GVM (the network-vuln-scan output) when both are enabled. Each toggle controls whether that source's CVEs go through MITRE enrichment — useful when you want fast scans (skip MITRE on recon CVEs which are usually well-known) and only enrich the GVM scan's deeper findings.

Output usage: MitreData and Capec nodes feed two consumers — the Insights dashboard (CWE-breakdown chart, attack-patterns chart, top-CWE-by-frequency) and the AI agent, which reads the attack-pattern descriptions during exploit planning to align its tool selection with documented attack steps for that weakness class.

Security Checks

25+ individual toggle-controlled checks grouped into six categories. Each check creates a Vulnerability node in the graph if the condition is detected.

Graph nodes — consumes: BaseURL, IP, Subdomain, Domain | produces: Vulnerability

How it works

Security Checks runs after every other recon module so it has the full graph (Subdomains, IPs, BaseURLs, Certificates, DNS records) to query. Each check is a small focused Python function that hits a specific configuration question — no Docker, no external tools, just requests + socket + ssl + dns.resolver calls done in parallel via a ThreadPoolExecutor sized by Max Workers.

Each category targets a specific class of misconfiguration:

Network Exposure: checks whether origin IPs leak past a CDN / WAF (the classic WAF-bypass primitive). check_direct_ip_http and check_direct_ip_https open raw IP-based connections (http://198.51.100.42/), but emit a finding only when the IP exposure is a real risk. Without filtering, every cloud-hosted site's load balancer would generate findings; three layers suppress that noise.

Pre-filter on known edges. IPs in Cloudflare's published prefix list (cloudflare.com/ips-v4, ips-v6, fetched per scan with a hardcoded fallback), IPs in known CDN ASNs (Cloudflare 13335 / 209242), and IPs flagged is_cdn=true by Naabu or httpx with a reliable edge CDN name are removed before any probe. The reliable list is cloudflare, cloudfront, akamai, fastly, imperva, incapsula, sucuri, stackpath, azurefrontdoor, gcore. Generic cloud-provider labels (aws, amazon, azure, gcp, google) are deliberately excluded because they cover bare ALB / EC2 / Cloud-LB origins that legitimately serve the application; trusting those would suppress real findings on Cloudflare-fronted apps whose origin happens to be on AWS.
Response fingerprint. Each remaining IP probe is inspected for edge markers: headers cf-ray, cf-cache-status, x-amz-cf-id, x-served-by: cache-..., Server: cloudflare, and body matching Error 1003 / Direct IP access not allowed / Attention Required! | Cloudflare. Match suppresses the finding.
Bare-origin comparison test. The IP response is compared against the response of every hostname that resolves to it. If any hostname and the IP return the same status code, the same Server header, similar content size (within 10% or 500 bytes), and neither carries CDN/WAF markers, the IP is the bare public origin and there is no protection layer in between, so the finding is suppressed. If the hostname carries CDN markers (cf-ray, Server: cloudflare, ...) absent from the IP response, that is a real WAF bypass and the finding fires.

check_ip_api_exposed runs the same prefilter-and-fingerprint flow against API-shaped paths (/api, /graphql, /v1, ...). check_waf_bypass is the consolidated higher-severity finding produced when the comparison test concludes that the hostname goes through a different stack from the IP.

A Direct IP Access finding therefore means one of:

Hostname goes through a CDN / WAF, IP does not: real origin leak
IP serves materially different content from the hostname (different status, different Server, large size delta): misconfigured exposure
No hostnames resolved to the IP, comparison could not run, IP still responds: bare-IP scan target, informational
IP 30x-redirects to a hostname: legacy info-severity finding (the IP responds at all, even though it enforces hostname-based access)

It does NOT fire for:

Cloudflare / CloudFront / Akamai / Fastly edge IPs (pre-filtered)
AWS ALB / Azure Front Door / GCP LB origins where the IP IS the public endpoint by design (suppressed by the bare-origin comparison)
IPs that respond with CDN error templates (response fingerprint match)
IPs that are unreachable or return 5xx

TLS / Certificate — get_ssl_certificate opens a TLS handshake (ssl.create_default_context, verify_mode=CERT_NONE to allow self-signed) and pulls the cert. parse_cert_date parses the validity range, check_tls_expiring_soon computes days-to-expiry against the configurable threshold (default 30 days). Note that verify_mode=CERT_NONE is intentional: scanning self-signed staging hosts is a common need; cert chain verification is a separate concern handled by the broader Certificate node analysis.

Security Headers — check_security_headers makes a single HEAD (or GET if HEAD is rejected) request and checks for the presence of: Referrer-Policy, Permissions-Policy, Cross-Origin-Opener-Policy (COOP), Cross-Origin-Resource-Policy (CORP), Cross-Origin-Embedder-Policy (COEP). check_cache_control_missing is split out because it's about caching/sensitive-data leakage, not the cross-origin posture. CSP Unsafe Inline parses the Content-Security-Policy header and looks for 'unsafe-inline' directives that defeat XSS protections.

Authentication — check_login_no_https walks every BaseURL looking for forms with <input type="password"> and verifies the form action submits to HTTPS (forms posting passwords to HTTP is a credential-exposure issue). check_session_cookies parses Set-Cookie headers from BaseURL responses and flags missing Secure and HttpOnly flags on session-shaped cookies. check_basic_auth_no_tls looks for WWW-Authenticate: Basic responses on HTTP-only endpoints.

DNS Security — uses dns.resolver to query each domain's DNS records:

SPF Missing — no v=spf1 ... TXT record
DMARC Missing — no v=DMARC1 record at _dmarc.<domain>
DNSSEC Missing — no DS/DNSKEY records
Zone Transfer — attempts AXFR against the apex's nameservers; if the transfer succeeds, the entire zone is publicly disclosed (high-severity, classic misconfig)

Exposed Services — uses the existing port-scan output to check for common misconfigured services:

Admin Port Exposed — port 22 (SSH), 23 (Telnet), 3389 (RDP), 5985/5986 (WinRM), 8089 (Splunk), … reachable from the public internet
Database Exposed — 3306 (MySQL), 5432 (Postgres), 27017 (MongoDB), 6379 (Redis), 9200 (Elasticsearch), 11211 (Memcached), 1521 (Oracle) reachable
Redis No Auth — opens a TCP connection to Redis port and sends INFO\r\n — if the response includes redis_version: (rather than NOAUTH), Redis is unauthenticated
Kubernetes API Exposed — checks 6443 (kube-apiserver) and 10250 (kubelet) for unauthenticated /healthz and /api responses
SMTP Open Relay — connects to port 25 and runs the classic relay test (MAIL FROM: <off-domain> → RCPT TO: <off-domain> — if the server accepts both, it's an open relay)

Application — Insecure Form Action looks for forms with action="http://..." (not just login forms). No Rate Limiting opens a tight loop of 100 requests against a representative endpoint over ~10 seconds and checks whether any 429 / Retry-After response was returned — absence of throttling is a finding.

Why these are split into a dedicated module rather than rolled into Nuclei: Nuclei is template-based and excels at known-CVE matching, but most of these checks are configuration-state questions that don't have a CVE attached. Implementing them as Python functions in the recon container is faster, more testable, and integrates more cleanly with the graph than maintaining custom Nuclei templates would. They're also high-signal findings the AI agent uses to prioritize early-stage exploit attempts (a missing-Secure-cookie finding is a fast lead to session hijacking; an exposed Redis is immediate code-execution).

Global Settings:

Parameter	Default	Description
Enable Security Checks	true	Master toggle for all checks
Timeout	10	Per-check timeout (seconds)
Max Workers	10	Concurrent check threads

Network Exposure:

Check	Default	Description
Direct IP HTTP	true	HTTP accessible via IP address
Direct IP HTTPS	true	HTTPS accessible via IP address
IP API Exposed	true	API endpoints accessible via IP
WAF Bypass	true	WAF can be bypassed via direct IP

TLS/Certificate:

Check	Default	Description
TLS Expiring Soon	true	Certificate expires within configurable days
TLS Expiry Days	30	Days before expiry to trigger warning

Security Headers:

Check	Default	Description
Missing Referrer-Policy	true	No Referrer-Policy header
Missing Permissions-Policy	true	No Permissions-Policy header
Missing COOP	true	No Cross-Origin-Opener-Policy
Missing CORP	true	No Cross-Origin-Resource-Policy
Missing COEP	true	No Cross-Origin-Embedder-Policy
Cache-Control Missing	true	No Cache-Control header
CSP Unsafe Inline	true	Content-Security-Policy allows unsafe-inline

Authentication:

Check	Default	Description
Login No HTTPS	true	Login form served over HTTP
Session No Secure	true	Session cookie missing Secure flag
Session No HttpOnly	true	Session cookie missing HttpOnly flag
Basic Auth No TLS	true	Basic Authentication without TLS

DNS Security:

Check	Default	Description
SPF Missing	true	No SPF record for the domain
DMARC Missing	true	No DMARC record
DNSSEC Missing	true	DNSSEC not configured
Zone Transfer	true	DNS zone transfer allowed

Exposed Services:

Check	Default	Description
Admin Port Exposed	true	Administrative ports publicly accessible
Database Exposed	true	Database ports publicly accessible
Redis No Auth	true	Redis accessible without authentication
Kubernetes API Exposed	true	Kubernetes API publicly accessible
SMTP Open Relay	true	SMTP server allows open relay

Application:

Check	Default	Description
Insecure Form Action	true	Form submits over HTTP
No Rate Limiting	true	No rate limiting detected on endpoints

GVM Vulnerability Scan

Configure GVM/OpenVAS network-level scanning.

Graph nodes — consumes: IP, Port, Subdomain, Domain | produces: Vulnerability, Technology, Traceroute, Certificate, ExploitGvm, CVE, MitreData, Capec

How it works

GVM (Greenbone Vulnerability Management, the OpenVAS suite) is a separate on-demand scanning pipeline rather than part of the main recon flow — start it from the Red Zone toolbar after recon completes. It operates at the network layer where Nuclei doesn't reach: SMB/NetBIOS misconfigs, FTP/SMTP/POP/IMAP weaknesses, SSH cipher audits, SNMP defaults, exposed RPC, Telnet, and deep CVE matching against Nmap-style service fingerprints — all driven by 170,000+ Network Vulnerability Tests (NVTs) maintained by Greenbone.

Architecture: GVM runs as its own dockerized service stack (gvm_scan/) with Greenbone Community Feed pulling NVT updates daily. The recon container talks to GVM via the gmp (Greenbone Management Protocol) over its native socket. The protocol is XML-based and stateful — every scan is a sequence of GMP commands: create target, create task, start task, poll for status, retrieve report, cleanup.

Scan flow for a project:

Target preparation — depending on Targets Strategy (both / ips_only / hostnames_only), the module pulls IPs and/or hostnames from the recon graph and constructs a GMP <create_target> command. Targets exceed-the-default-limit are split into multiple GVM target objects (each capped at GVM's per-target host limit).
Task creation — <create_task> with the chosen Scan Profile config UUID. The seven profiles (Full and fast, Full and very deep, Full and very deep ultimate, Discovery, Host discovery, System discovery, Empty) trade speed against thoroughness — Full and fast runs ~50k NVTs in 30-60min on a typical target, Full and very deep ultimate runs ~150k in 4-6 hours.
Task start — <start_task>. GVM begins running NVTs against each host in the target. NVTs are written in NASL (Nessus Attack Scripting Language) and run as a forked process per check; GVM internally parallelizes within the configured scanner profile.
Status polling loop — every Poll Interval seconds (default 5, range 5-300), <get_tasks> is queried for the task's progress percentage. The loop continues until the task reports Done status or Task Timeout seconds have elapsed (default 14400 = 4 hours; 0 = unlimited).
Report extraction — <get_reports> retrieves the structured XML report. Each finding has CVE IDs, CVSS metrics, NVT family, severity, and remediation guidance. The XML is parsed and converted into Vulnerability + CVE + ExploitGvm graph nodes (ExploitGvm is GVM's exploit-availability indicator distinct from regular CVEs).
Cleanup — when Cleanup After Scan is on, <delete_target> and <delete_task> are called to remove the GVM-side artifacts. Without this, GVM's database accumulates targets/tasks across scans, eventually slowing the management UI.

Why GVM scans are slow: each NVT is a separate forked process running its own protocol implementation. A single SMB-vulnerabilities NVT might take 30-60 seconds per host as it negotiates dialects, attempts authentication, queries shares, etc. Across 170k NVTs and N hosts, the wall-clock can easily exceed a working day. The trade-off vs Nuclei: GVM has dramatically broader and deeper coverage of network services but is impractical for fast iteration.

Output enrichment: the parsed Vulnerability nodes get the same MITRE enrichment pass as recon CVEs (controlled by Enrich GVM CVEs in the MITRE Mapping section) — so GVM findings show up in the Insights dashboard's CWE/CAPEC views with the same depth as Nuclei findings.

Scan Configuration:

Parameter	Default	Description
Scan Profile	Full and fast	GVM scan preset — see GVM Vulnerability Scanning for all 7 profiles
Scan Targets Strategy	both	`both` (IPs + hostnames), `ips_only`, or `hostnames_only`

Timeouts & Polling:

Parameter	Default	Description
Task Timeout	14400	Maximum seconds per scan task (4 hours). 0 = unlimited
Poll Interval	5	Seconds between status checks (5-300)

Post-Scan:

Parameter	Default	Description
Cleanup After Scan	true	Remove targets/tasks from GVM after results are extracted

Subdomain Discovery

Configure passive and active subdomain enumeration. Located in the Discovery & OSINT tab.

Graph nodes — consumes: Domain | produces: Domain, Subdomain, IP, DNSRecord

Each passive source has an enabled toggle and a max results cap. All sources run in parallel and results are merged and deduplicated. After merging, Puredns validates the combined list against public DNS resolvers to remove wildcard and DNS-poisoned entries before DNS resolution proceeds.

How it works

The module fans out the apex domain across five enumeration engines that run concurrently in a ThreadPoolExecutor, then folds their outputs into a single deduplicated set. Each engine has its own discovery strategy:

Engine	Source	How
crt.sh	Certificate Transparency logs	HTTPS query against `crt.sh?q=%25.<domain>&output=json` — extracts every CN/SAN ever issued for the apex. Picks up subdomains that were once requested a TLS cert (most of them)
HackerTarget	Passive DNS database	HTTPS query against `api.hackertarget.com/hostsearch/?q=<domain>` — returns historical DNS-resolved hostnames. Free tier: 50 queries/day
Subfinder	50+ passive sources	Runs `projectdiscovery/subfinder:latest` Docker image with a 720-second timeout. Aggregates results from CT logs, DNS databases (SecurityTrails, BinaryEdge), web archives, and search engines (Bing, DNSDumpster). Picks up the highest cardinality of any single tool
Amass	50+ data sources	Runs `caffix/amass:latest` Docker image. Passive mode by default; optional active mode enables zone transfers and certificate name grabs (forced off in stealth). Optional bruteforce mode runs a DNS brute-force after passive enumeration (forced off in stealth, significantly slower)
Knockpy	Wordlist-based	Runs as a subprocess (no Docker). Active brute-forcing against a built-in wordlist. With Use Bruteforce off it falls back to passive mode

After the five engines complete, results are unioned and deduplicated into a single candidate set. Puredns then runs as a Docker sidecar (frost19k/puredns:latest, 600-second timeout) and validates each candidate against public DNS resolvers — its job is to strip three classes of noise that would otherwise pollute downstream modules:

Wildcard records — a domain like *.<apex> resolves to a single IP for every imaginable hostname, generating thousands of false positives. Puredns identifies the wildcard signature by querying random nonsense subdomains and learning the wildcard answer set, then removes any candidate whose answer matches it.
DNS-poisoned entries — open-resolver poisoning attacks return injected IPs for unrelated domains. Puredns cross-validates against multiple resolvers and discards inconsistent answers.
NXDOMAIN false positives that some passive sources report stale for.

After Puredns, the survivor list is passed to a parallel DNS resolution pass (ThreadPoolExecutor with up to DNS Max Workers threads — default 50, max 200). Each subdomain is queried for all 7 record types simultaneously (A, AAAA, MX, NS, TXT, SOA, CNAME) using a per-hostname inner thread pool when DNS Record Parallelism is on. Each record-type query has its own retry budget (default 3) with exponential backoff.

Tor / proxychains support: when anonymous mode is on, every requests-based call goes through a Tor SOCKS session and every Docker-based call is wrapped in a proxychains prefix that funnels traffic through Tor. Puredns and Amass active mode are both forced off under Tor since their high-rate DNS queries would burn the circuit.

Domain ownership verification: a verify_domain_ownership helper publishes a TXT record under _redamon-verify.<domain> and queries for it — used during scan setup to confirm the operator actually controls the target apex before active mode is allowed.

Why five engines instead of one? Each source has a different blind spot. CT logs miss internal-only subdomains never issued a public cert. Passive DNS databases miss recently-deployed names. Search engines miss subdomains never indexed. Wordlist brute-force misses anything not in the dictionary. Running them in parallel and merging gets coverage that no single tool can match.

Parameter	Default	Description
crt.sh	enabled, max 5000	Certificate Transparency log queries for subdomain discovery
HackerTarget	enabled, max 5000	Passive DNS lookup database
Subfinder	enabled, max 5000	Passive enumeration using 50+ online sources (CT logs, DNS databases, web archives). Runs via Docker (`projectdiscovery/subfinder`). No API key required
Amass	disabled, max 5000	OWASP Amass subdomain enumeration using 50+ data sources (certificate logs, DNS databases, web archives, WHOIS). Runs via Docker (`caffix/amass`). No API key required for passive mode
Amass Timeout	10	Enumeration timeout in minutes (1-120)
Amass Active Mode	false	Enable zone transfers and certificate name grabs — sends DNS queries directly to target. Forced off in stealth mode
Amass Bruteforce	false	DNS brute forcing after passive enumeration — significantly increases scan time. Forced off in stealth mode
Knockpy Recon	enabled, max 5000	Passive wordlist-based subdomain enumeration
Use Bruteforce	true	Enable Knockpy active subdomain brute-forcing. Domain mode only
Puredns Wildcard Filtering	enabled	Validates discovered subdomains against public DNS resolvers and removes wildcard entries and DNS-poisoned results. Runs after all discovery tools complete, before DNS resolution. Active tool — sends DNS queries. Runs via Docker (`frost19k/puredns`). Disabled in stealth mode
Puredns Threads	0	Parallel resolution threads (0 = auto-detect)
Puredns Rate Limit	0	DNS queries per second (0 = unlimited). Capped by RoE global rate limit when enabled
WHOIS Max Retries	3	Retry attempts for WHOIS lookups
DNS Max Retries	3	Retry attempts for DNS resolution
DNS Max Workers	50	Parallel DNS resolution worker threads (was hardcoded at 20) (1-200)
DNS Record Parallelism	Enabled	Query all 7 DNS record types (A, AAAA, MX, NS, TXT, SOA, CNAME) in parallel per hostname

URLScan.io Enrichment

Passive OSINT enrichment using URLScan.io historical scan data. Runs in the recon pipeline after domain discovery and before port scanning. Located in the Discovery & OSINT tab.

Parameter	Default	Description
URLScan Enabled	false	Master toggle for URLScan.io enrichment
Max Results	500	Maximum scan results to fetch per domain (1-10000)

API Key: Optional. Configure in Global Settings → API Keys. Without an API key, only public scan results are available with lower rate limits. With a key, you get access to private scans and higher rate limits.

Graph nodes — consumes: Domain, BaseURL | produces: Domain, Subdomain, ExternalDomain, IP, Endpoint, Parameter. URL paths from historical scans are parsed into Endpoint and Parameter nodes (only when a matching BaseURL already exists from httpx). External domains encountered in scans are tracked as ExternalDomain nodes for situational awareness.

GAU deduplication: When URLScan enrichment runs successfully, the urlscan provider is automatically removed from GAU's data sources to avoid redundant API calls.

How it works

The module hits urlscan.io/api/v1/search/ with the query domain:<apex> OR page.domain:<apex> and paginates through results until either Max Results is reached or the API runs out (each page returns up to 100 records, with a 60-second per-request timeout). For each historical scan record returned, four data extraction passes run on the JSON envelope:

Subdomain harvesting — every task.url and page.url is parsed; anything ending in .<apex> becomes a Subdomain candidate. Anything not ending in .<apex> becomes an ExternalDomain (third-party assets the target loaded — useful for supply-chain mapping).
IP enrichment — page.ip and task.ip are extracted and merged with existing IP nodes; new IPs trigger downstream port-scan eligibility.
Endpoint reconstruction — the URL path + query string is split into a relative endpoint and individual parameters. These are only attached when a matching BaseURL already exists in the graph (i.e. httpx already confirmed the host serves HTTP) — otherwise they're held aside to avoid orphan endpoints. Each parameter gets its own Parameter node deduplicated against future Arjun/Katana findings.
Metadata pulls — ASN, country, server header, technology fingerprints from page.server and the Wappalyzer rollup are merged into existing nodes. Screenshot URLs (screenshot field) are stored on the BaseURL for later display in the Insights dashboard.

Key rotation is supported transparently: if multiple URLScan keys are set in Global Settings, requests round-robin across them automatically — useful for large target sets that would otherwise hit per-key quotas mid-scan.

Why URLScan is run before port scanning: the IPs and subdomains it surfaces feed into the port-scan target list, expanding coverage to assets that passive DNS/CT logs missed but that someone (the public, a researcher, an automated scan) once submitted to URLScan. This is especially valuable for catching CDN-fronted hosts and short-lived staging environments.

Shodan OSINT Enrichment

Passive internet-wide OSINT enrichment using the Shodan REST API. Runs in the recon pipeline after domain/IP discovery and before port scanning. Located in the Discovery & OSINT tab. Each feature is independently toggled and all require a Shodan API key set in Global Settings.

API Key Required: All toggles are disabled until a Shodan API key is configured in Global Settings. Host Lookup, Reverse DNS, and Passive CVEs automatically fall back to the free InternetDB API when the paid Shodan API returns 403. Domain DNS requires a paid Shodan plan (no free fallback).

Parameter	Default	Description
Host Lookup	false	Query each discovered IP for OS, ISP, organization, geolocation, and known vulnerabilities. Uses `/shodan/host/{ip}` (paid plan: full banners, geo, services) or falls back to InternetDB (free: ports, hostnames, CPEs, CVEs, tags — no geo or banners)
Reverse DNS	false	Discover hostnames for known IPs. Uses `/dns/reverse` (paid) or falls back to InternetDB hostnames (free). Can reveal subdomains missed by standard enumeration
Domain DNS	false	Subdomain enumeration and DNS records via `/dns/domain/{domain}`. Requires paid Shodan plan — no free fallback. Domain mode only (skipped in IP mode)
Passive CVEs	false	Extract known CVEs associated with discovered IPs. Reuses Host Lookup data if available; otherwise queries InternetDB directly (free, no key needed)
Workers	5	Parallel IP lookup workers for Shodan/InternetDB queries (1-20)

Graph nodes — consumes: IP, Subdomain, Domain | produces: IP, Port, Service, Subdomain, ExternalDomain, DNSRecord, Vulnerability, CVE. All use MERGE-based deduplication — data from Shodan is automatically merged with findings from Naabu, Nuclei, and other tools.

How it works

The module pulls every IP that prior subdomain discovery has resolved into the graph (_extract_ips_from_recon) and feeds the list to four independent enrichment passes. Each pass owns its own ThreadPoolExecutor (sized by Workers) plus a custom _RateLimiter that paces requests to avoid burning through the daily quota:

Host Lookup — GET /shodan/host/{ip}. Paid plan returns the full host record: open ports with raw service banners, OS guess, ISP, organization, ASN, geolocation (country/city/lat-lon), domain associations, and the Shodan-CVE list per detected service. On HTTP 403 (free key trying paid endpoint) or 404 (no record), the request falls back to https://internetdb.shodan.io/{ip} — this returns a stripped-down record (ports, CPE strings, CVE list, hostnames, tags) with no rate limit and no key required. The result is unified into a common shape so downstream code doesn't care which source answered.
Reverse DNS — GET /dns/reverse?ips=<list>. Paid plan only. The free InternetDB fallback uses the hostnames field from the host lookup instead. Hostnames returned here that match the apex pattern feed back into the Subdomain set — Shodan often surfaces internal-naming-convention hosts (db-prod-1.<apex>, staging-eu-west-2.<apex>) that no passive DNS database knows about.
Domain DNS — GET /dns/domain/{domain}. Paid plan only. Returns Shodan's view of every DNS record for the apex plus every subdomain Shodan has ever indexed. Often the single highest-yield enrichment pass for paid plans — surfaces hundreds of subdomains in one request. Skipped silently in IP mode.
Passive CVEs — extracts CVE IDs from the Host Lookup response when available; otherwise issues a separate InternetDB query per IP. CVEs are matched by Shodan against the service version banners — they're advisory until validated by Nuclei or Nmap NSE, but they sharply prioritize which hosts get aggressive vulnerability scanning later.

Key rotation: when multiple Shodan keys are set in Global Settings, every API call rotates through the pool — useful for large IP sets where a single key would hit the 1 query/sec or daily-credit limit. Failed requests on one key automatically retry on the next.

Why Shodan runs before active port scanning: the open ports it returns expand the port-scan target list. If Shodan already knows 198.51.100.42:8080 is open, Naabu/Masscan will probe that port even if it's not in the configured Top Ports range — so the active scan doesn't miss services on weird ports that Shodan has already cataloged.

Uncover Multi-Engine Search

ProjectDiscovery's uncover queries up to 13 search engines simultaneously to discover exposed hosts, IPs, and endpoints associated with the target. Runs before port scanning so discovered assets are processed by all downstream modules.

Parameter	Default	Description
Uncover Enabled	false	Enable/disable multi-engine target expansion
Uncover Max Results	500	Maximum results to collect across all engines (1-10,000)
Uncover Docker Image	`projectdiscovery/uncover:latest`	Docker image for the uncover container

Key configuration: Uncover automatically reuses API keys configured for standalone OSINT tools (Shodan, Censys, FOFA, ZoomEye, Netlas, CriminalIP). Additional engines require their own keys configured in Global Settings > API Keys under the "Uncover (Multi-Engine Search)" group: Quake, Hunter, PublicWWW, HunterHow, Google Custom Search (key + CX), Onyphe, Driftnet.

IP filtering: All discovered IPs pass through centralized filtering (ip_filter.py) that removes non-routable addresses (RFC 1918, CGNAT, loopback, reserved) and CDN IPs (detected by Naabu/httpx) before entering the pipeline. This prevents wasting API credits on downstream enrichment.

How it works

The module first walks the configured key set in _build_provider_config and assembles two outputs: a list of engines that have valid keys (skipping any silently when no key is set) and the corresponding env-var injection for the uncover container (SHODAN_API_KEY, CENSYS_API_TOKEN, FOFA_KEY, etc.). Engines without keys are simply omitted — the module never fails just because one source is unconfigured.

Next, _build_queries constructs search-engine-specific query strings for the apex domain. Each engine has its own DSL — Shodan uses hostname:/ssl.cert.subject.cn:, Censys uses parsed.names:, FOFA uses domain="...", ZoomEye uses hostname:, etc. — so uncover translates a single conceptual query ("everything related to ") into the right syntax per source. Custom queries can be added through Global Settings.

The container is invoked via docker run --rm projectdiscovery/uncover:latest -e <engines> -q <queries> -limit <max> with all keys piped in as environment variables. uncover internally fans out across the configured engines in parallel and streams JSON results to stdout. The module captures stdout with a configurable timeout, then runs _deduplicate_results to fold identical host:port pairs across engines (one host can appear from Shodan, Censys, and FOFA simultaneously).

_extract_hosts_and_ips then splits each result into structured fields:

IPs are validated as IPv4/IPv6 via _is_valid_ip and pushed through the central ip_filter — anything in RFC 1918, CGNAT (100.64.0.0/10), loopback (127/8), link-local (169.254/16), or marked as CDN by Naabu/httpx in the existing graph is dropped before merging.
Hostnames are extracted from result URLs via _extract_hostname_from_url and matched against the apex; matches become Subdomain candidates, others become ExternalDomains.
Ports discovered alongside the hosts are queued for the same kind of MERGE that Naabu/Masscan does, expanding the port-scan starting set.

merge_uncover_into_pipeline writes everything into the combined recon result so downstream modules see uncover's findings as if they came from the standard discovery path. The merge is idempotent — re-running uncover doesn't duplicate.

Why uncover instead of querying each engine directly? Each engine has its own auth flow, response schema, pagination quirks, and rate limits. uncover normalizes all 13 into one CLI surface and one JSON output schema. The trade-off is that it runs as a single container so retries are at the container level — if your Shodan key 429s mid-run, the whole uncover invocation has to be retried, not just that engine. For finer-grained control over a specific engine, use the dedicated OSINT enrichment modules (Shodan, Censys, FOFA, OTX, Netlas, VirusTotal, ZoomEye, CriminalIP) which all have their own retry, rate-limit, and key-rotation logic.

Threat Intelligence Enrichment (7 OSINT Tools)

Seven passive threat intelligence enrichment tools that run concurrently with port scanning. All tools query external intelligence platforms using IPs and domains discovered during subdomain enumeration. Located in the Discovery & OSINT tab.

API Keys: All API keys are stored in Global Settings > API Keys (user-scoped, not per-project). Project settings contain only enable/disable toggles and optional limits. Enable a tool here, then add its key in Global Settings.

OTX Exception: OTX is enabled by default and works without an API key (anonymous requests, 1,000 req/hr).

Key Rotation: FOFA, OTX, Netlas, VirusTotal, ZoomEye, and CriminalIP support automatic round-robin key rotation — configure extra keys in Global Settings to avoid rate limiting mid-scan.

Graph nodes — consumes: IP, Domain, Subdomain | produces: threat intelligence properties stored on existing IP and Domain nodes (no new node types). Results also written to recon_domain.json under per-tool keys.

How it works (shared mechanics)

All seven enrichment tools follow an identical engineering pattern — only the upstream API and response parser change. Understanding the shared mechanics applies to every tool below:

Target extraction: each tool calls _extract_ips_from_recon(combined_result) to walk the in-progress recon JSON and pull every IP that subdomain discovery has produced so far. Some tools also enrich domains directly (VirusTotal, OTX) using the apex + every discovered Subdomain.

Worker pool + rate limiter: every tool spawns a ThreadPoolExecutor sized by its Workers setting and pairs it with a custom _RateLimiter class. The rate limiter uses a simple time.monotonic()-based interval gate — before each request, the worker calls rate_limiter.wait() which sleeps just long enough to enforce the configured req/sec ceiling. This pacing is per-tool, so one tool hitting its limit doesn't block the others.

Key rotation: when multiple keys are configured in Global Settings, each tool calls _effective_key(settings, key_rotator) (or _otx_effective_key, etc.) to pick the next key from a round-robin rotator. On HTTP 429 the worker either retries with the next key or stops querying and logs the limit-hit — preventing one bad key from poisoning the whole run.

Failure isolation: each tool wraps its API calls in try/except requests.RequestException and degrades gracefully — a single 5xx or timeout from one tool never aborts the others. Tools that hit a hard rate limit (429) return early with whatever they collected before the limit.

Stop-on-rate-limit: most tools log "[!][TOOL] Rate limit hit — stopping for this run" and return their partial results rather than retry-loop. This is deliberate: rate limits usually mean the daily quota is exhausted, and burning a retry loop just delays the rest of the pipeline without producing new data.

Why all run after subdomain discovery but before vuln scanning: the IPs discovered here feed into the port-scan target list, the threat-intelligence flags (VPN/Tor/proxy/scanner) feed into the AI agent's host-prioritization heuristic, and the malware/pulse data feeds into the Insights dashboard's threat-intelligence view.

Censys

Parameter	Default	Description
Enabled	false	Enable Censys host intelligence enrichment. Requires both Censys API ID and API Secret in Global Settings
Workers	5	Parallel IP enrichment workers for Censys (1-20)

Queries /v2/hosts/{ip} for each discovered IP. Returns open ports, running services + banners, TLS certificate chains, geolocation, ASN, and OS fingerprint. On HTTP 429 (rate limit), stops querying and logs the limit.

FOFA

Parameter	Default	Description
Enabled	false	Enable FOFA internet asset search enrichment. Requires FOFA API Key in Global Settings
Max Results	1000	Maximum rows to fetch per query (hard cap: 10,000)
Workers	5	Parallel IP enrichment workers for FOFA (1-20)

Queries the FOFA API using base64-encoded syntax (domain="<domain>" or per-IP queries). Returns IP:port pairs, HTTP titles, server headers, geolocation, certificate info, and protocol details. Supports both legacy (email:key) and modern (key-only) authentication formats.

OTX (AlienVault Open Threat Exchange)

Parameter	Default	Description
Enabled	true	Enable OTX threat intelligence enrichment. Works without an API key (anonymous). Add OTX API Key in Global Settings for higher rate limits
Workers	5	Parallel IP enrichment workers for AlienVault OTX (1-20)

Queries the OTX Indicators API v1 for each IP and domain. Returns threat reputation, pulse count, associated malware families, MITRE ATT&CK attack IDs, passive DNS records (first/last seen), and individual pulse details (adversaries, TLP, tags). Anonymous rate limit: 1,000 req/hr. With API key: 10,000 req/hr.

OTX is the only enrichment tool enabled by default. It requires no API key to function, making it active in every scan out of the box.

Netlas

Parameter	Default	Description
Enabled	false	Enable Netlas internet intelligence enrichment. Requires Netlas API Key in Global Settings
Max Results	1000	Maximum items to fetch per query (hard cap: 1,000)
Workers	5	Parallel IP enrichment workers for Netlas (1-20)

Queries the Netlas Responses API (host:{domain} or host:{ip}). Returns port/service data, HTTP response headers and body snippets, geolocation (country, city, latitude/longitude, timezone), TLS certificate details, DNS records, and WHOIS data.

VirusTotal

Parameter	Default	Description
Enabled	false	Enable VirusTotal reputation enrichment. Requires VirusTotal API Key in Global Settings
Rate Limit	4	Requests per minute (free-tier limit). Increase for paid plans. On 429, the pipeline automatically waits 65 seconds and retries once
Max Targets	20	Maximum number of domains + IPs to query per scan (caps API usage for large target sets)
Workers	3	Parallel IP enrichment workers for VirusTotal (1-10, lower due to strict rate limits)

Queries VirusTotal API v3 for each discovered domain (/v3/domains/{domain}) and IP (/v3/ip_addresses/{ip}). Returns reputation score, last analysis stats (malicious/suspicious/undetected AV engine counts), categories, tags, JARM fingerprint, registrar, total votes, and last analysis date.

ZoomEye

Parameter	Default	Description
Enabled	false	Enable ZoomEye host search enrichment. Requires ZoomEye API Key in Global Settings
Max Results	1000	Maximum items to fetch per query
Workers	5	Parallel IP enrichment workers for ZoomEye (1-20)

Queries the ZoomEye API for hostname and IP searches. Returns open ports, service banners, device type/OS, web application fingerprints, geolocation (country, city, lat/lon, timezone), ASN, ISP, and SSL certificate details.

CriminalIP

Parameter	Default	Description
Enabled	false	Enable Criminal IP threat intelligence enrichment. Requires CriminalIP API Key in Global Settings
Workers	5	Parallel IP enrichment workers for CriminalIP (1-20)

Queries the Criminal IP API v1 for each IP (/v1/ip/data?full=true) and domain (/v1/domain/data). Returns IP risk score, threat tags (VPN, cloud, Tor, proxy, hosting, mobile, darkweb, scanner, Snort IDS), geolocation, ISP, hosted services, and abuse history. On HTTP 429, automatically waits 2 seconds and retries once.

GitHub Secret Hunting

How it works

GitHub Secret Hunting is an orchestrated dorking module that searches public GitHub repositories for leaked credentials, API keys, hostnames, and config files referencing the target. It runs as a separate dockerized scanner (github_secret_hunt/) similar to GVM — invoked from the Red Zone toolbar rather than as part of the main recon flow.

Authentication is mandatory: the module uses GitHub's Code Search API (https://api.github.com/search/code) which strictly requires an authenticated request. Without a Personal Access Token (PAT), API access is rate-limited to 10 req/hr and the module is disabled in the UI. With a PAT, the limit jumps to 30 req/min — enough to run a meaningful dorking session.

Dork strategy: the module assembles a curated set of search dorks combining the target apex with secret-shaped tokens. Examples:

"<domain>" password
"<domain>" api_key
"<domain>" AKIA (AWS access key prefix)
"<domain>" -----BEGIN (PEM-formatted keys)
"<domain>" Bearer
extension:env "<domain>" (.env files referencing the domain)
extension:json "<domain>" "password"
path:.aws/credentials "<domain>"

Each dork is run in turn, paginated through results, with each hit downloading the file blob via https://api.github.com/repos/<owner>/<repo>/contents/<path> and running it through the same secret-detection regex bank used by JS Recon and TruffleHog. This filters regex-only false-positives (random base64 strings that aren't actually secrets) from real exposures.

Scope handling: hits are returned with repo, path, and html_url — recorded as Secret nodes attached to a synthetic source identifier so the AI agent and the Insights dashboard can distinguish "secret found in a public repo" from "secret found on the target's site."

False-positive control: GitHub's search index includes a lot of noise — sample code, security-research repos, documentation files, user-credential dumps unrelated to the target. The module applies several filters: skip repos in known-noise orgs (security blog repos that quote secret patterns), skip files matching dump-shaped paths (leaks/, dumps/, pastebin/), skip files where the secret is also accompanied by [REDACTED] or EXAMPLE markers. These filters cut ~80% of the noise without losing real findings.

Pair with TruffleHog: GitHub Hunting is the public-internet sweep — it finds leaks where someone (employee, contractor, AI tool) committed a target-related secret to a public repo. TruffleHog is the targeted deep-dive — it clones a specific known org/repo list and walks every commit (including deleted/orphan branches) for secrets that aren't in the search index. Use both: GitHub Hunting for breadth, TruffleHog for depth on specific targets.

Configure GitHub repository scanning for leaked credentials.

Graph nodes — consumes: Domain | produces: GithubHunt, GithubRepository, GithubPath, GithubSecret, GithubSensitiveFile

Parameter	Default	Description
GitHub Access Token	—	Personal Access Token (`ghp_...`)
Target Organization	—	GitHub org or username to scan
Target Repositories	(all)	Comma-separated repo names to limit scope
Scan Member Repositories	false	Include individual member repos
Scan Gists	false	Search gists for secrets
Scan Commits	false	Examine git history for removed secrets
Max Commits to Scan	100	Max commits per repo (1-1000)
Output as JSON	false	Save results as downloadable JSON

See GitHub Secret Hunting for a step-by-step setup guide including how to create a GitHub Personal Access Token.

TruffleHog Secret Scanning

How it works

TruffleHog is the deep companion to GitHub Hunting. Where GitHub Hunting greps the GitHub Code Search API (which only sees what's currently in the default branch), TruffleHog clones the entire repository locally and walks every commit on every branch, including deleted blobs and orphan refs. The result: secrets that were committed and reverted, or rotated and force-pushed-over, are still recoverable because git's content-addressable model keeps the old blobs alive in the object database.

Why this matters: the canonical secret-leakage pattern is "developer commits AWS key → tests CI → realizes they leaked → reverts → force-pushes." The current branch tip looks clean, but the original commit's blob is still in the repo's object database, reachable via reflog or via cloning. GitHub Code Search misses this entirely. TruffleHog finds it.

Detection engine: 700+ regex-based detectors covering AWS, GCP, Azure, Slack, Stripe, Twilio, Okta, GitHub PATs, JWTs, SSH keys, npm tokens, Datadog, Heroku, Mailgun, SendGrid, PagerDuty, OpenAI API keys, Anthropic keys, Google service accounts, plus generic high-entropy patterns (Shannon entropy threshold + character-class checks). Each detector is a self-contained Go module with its own regex and post-match validation.

Live verification mode (the killer feature, toggleable): for each candidate secret, TruffleHog hits the corresponding provider's API with the secret as auth. AWS keys go to https://sts.amazonaws.com/?Action=GetCallerIdentity (a free no-side-effect call); GitHub PATs go to https://api.github.com/user; Slack tokens go to https://slack.com/api/auth.test. If the API call returns a 200 with valid identity info, the secret is verified (high confidence, active, critical severity). If the API returns 401/403/invalid, the secret is unverified (regex matched but the secret is rotated/expired/example — informational).

This eliminates the regex-only false-positive problem that plagues less-sophisticated secret scanners. A typical scan returns 100 regex matches but only 5-10 verified secrets — those 5-10 are the actionable findings. The unverified ones still get recorded for completeness but are deprioritized in the Insights dashboard.

Targeting strategy: TruffleHog operates on either a GitHub org/user (clones every public repo for that org) or a specific repository list. The org option is broad and slow (cloning 100 repos can take 30+ minutes over slow connections); the repo list option is fast and focused (use it when GitHub Hunting has already surfaced specific repos of interest, then deep-scan just those).

Authentication: TruffleHog uses the same GitHub PAT as GitHub Hunting (configured in Global Settings → API Keys). The PAT is needed for cloning private repos when scanning your own org's private repository list, and for higher API rate limits when cloning many public repos.

Output: every verified or unverified secret becomes a Secret node in the graph with the detector name, the verified status, and a redacted snippet of the file context. The AI agent reads verified secrets as immediate exploitation leads (a verified AWS key with iam:* permissions is game-over for the cloud account); unverified ones go to manual review.

Configure TruffleHog secret scanning with 700+ detectors and optional live API verification.

Graph nodes — consumes: Domain | produces: TrufflehogScan, TrufflehogRepository, TrufflehogFinding

Parameter	Default	Description
Target Organization	—	GitHub org or username to scan
Target Repositories	(all)	Comma-separated repo names to limit scope
Only Verified	false	Only report findings verified as active against live APIs
No Verification	false	Skip all API verification — faster but unconfirmed
Concurrency	8	Concurrent scanning workers (1-20)
Include Detectors	(all)	Comma-separated detector names to include
Exclude Detectors	(none)	Comma-separated detector names to exclude

Note: TruffleHog uses the GitHub Access Token from Global Settings > API Keys (shared with GitHub Secret Hunt). See TruffleHog Secret Scanning for a step-by-step setup guide.

Agent Behavior

Configure the AI agent orchestrator for autonomous pentesting.

Agent Behaviour Settings

LLM & Phase Configuration:

Parameter	Default	Description
Guardrail Enabled	true	Enable/disable the LLM-based scope guardrail that verifies the target on agent startup. When disabled, the agent skips scope verification. Fail-closed: if the check itself fails, the agent is blocked
LLM Model	claude-opus-4-6	AI model for the agent. 400+ models from 5 providers — see AI Model Providers
Deep Think	true	When enabled, the agent performs an explicit deep reasoning step at key decision points (start of session, phase transitions, failure loops) to plan multi-step attack strategies before acting. Adds ~1 extra LLM call at these moments. Recommended for complex targets with multiple services.
Post-Exploitation Type	statefull	`statefull` (Meterpreter sessions) or `stateless` (one-shot commands)
Activate Post-Exploitation Phase	true	Whether post-exploitation is available
Informational Phase System Prompt	—	Custom instructions for the informational phase
Exploitation Phase System Prompt	—	Custom instructions for the exploitation phase
Post-Exploitation Phase System Prompt	—	Custom instructions for the post-exploitation phase

Payload Direction:

Parameter	Default	Description
Tunnel Provider	None	Dropdown: None (manual LHOST/LPORT), ngrok (single port — free, no VPS), or chisel (multi-port — requires VPS). Only one tunnel can be active at a time. ngrok tunnels port 4444 only, requires the ngrok authtoken configured in Global Settings → Tunneling, auto-detects LHOST/LPORT from the ngrok public URL, stageless payloads only. Requires identity verification on your ngrok account (free). chisel tunnels ports 4444 + 8080, requires Chisel Server URL (and optionally Chisel Auth) configured in Global Settings → Tunneling, enables web delivery and HTA delivery (which need two ports), stageless payloads required (staged payloads fail through the tunnel). Requires a VPS running `chisel server -p 9090 --reverse`. See AI Agent Guide — Tunnel Providers for setup instructions.
LHOST (Attacker IP)	—	Your IP for reverse shell callbacks. Leave empty for bind mode. Hidden when a tunnel provider is enabled.
LPORT	—	Listening port for reverse shells. Leave empty for bind mode. Hidden when a tunnel provider is enabled.
Bind Port on Target	—	Port the target opens for bind shell payloads
Payload Use HTTPS	false	Use `reverse_https` instead of `reverse_tcp`

Agent Limits:

Parameter	Default	Description
Max Iterations	100	Maximum LLM reasoning-action loops per objective
Trace Memory Steps	100	Past steps kept in agent's working context
Tool Output Max Chars	20000	Truncation limit for tool output (min: 1000)

Approval Gates:

Parameter	Default	Description
Require Approval for Exploitation	true	User confirmation before exploitation phase
Require Approval for Post-Exploitation	true	User confirmation before post-exploitation phase

Kali Shell — Library Installation:

Parameter	Default	Description
Allow Library Installation	false	Let the agent install packages (pip/apt) via kali_shell at runtime. Prompt-based control only — no server-side enforcement. Installed packages are ephemeral (lost on container restart).
Authorized Packages	—	Comma-separated whitelist. If non-empty, only these packages may be installed.
Forbidden Packages	—	Comma-separated blacklist. These packages must never be installed.

Retries, Logging & Debug:

Parameter	Default	Description
Cypher Max Retries	3	Neo4j query retry attempts (0-10)
Log Max MB	10	Maximum log file size before rotation
Log Backups	5	Number of rotated log backups
Create Graph Image on Init	false	Generate a LangGraph visualization on startup

Cross-Site Scripting (XSS)

Configure the XSS attack skill (reflected, stored, DOM-based, blind, WAF/CSP bypass).

Parameter	Default	Description
dalfox WAF Evasion Enabled	true	Allow dalfox automated scanning + WAF bypass when manual context-aware payloads fail. Runs in background mode (`--silence --waf-evasion --deep-domxss --mining-dom`)
Blind Callback Enabled	false	Allow `interactsh-client` OOB callbacks for blind/stored XSS detection. Opt-in — when enabled, the agent may send `document.cookie` and other browser data to a third-party callback domain (`oast.fun`). Disabled by default
CSP Bypass Guidance	true	Include the CSP bypass reference table in the workflow prompt (covers `unsafe-inline`, `unsafe-eval`, JSONP gadgets, nonce reuse, AngularJS template injection, `<base>` hijack)

See Agent Skills > Cross-Site Scripting (XSS) for the full 8-step workflow documentation.

Hydra Credential Testing

Configure THC Hydra password cracking (50+ protocols: SSH, FTP, RDP, SMB, HTTP forms, databases, etc.).

Agent Skills Settings

Parameter	Default	Description
Hydra Enabled	true	Enable/disable Hydra brute force
Threads (-t)	16	Parallel connections per target. Protocol limits: SSH max 4, RDP max 1, VNC max 4
Wait Between Connections (-W)	0	Seconds between each connection. 0 = no delay
Connection Timeout (-w)	32	Max seconds to wait for a response
Stop On First Found (-f)	true	Stop when valid credentials are found
Extra Password Checks (-e)	nsr	Additional checks: n=null, s=username-as-password, r=reversed username
Verbose Output (-V)	true	Show each login attempt
Max Wordlist Attempts	3	Wordlist strategies to try before giving up (1-10)

Social Engineering Simulation

Configure SMTP settings for the phishing agent skill email delivery capability. The agent reads this configuration when the phishing_social_engineering agent skill is active and the user requests email delivery.

Parameter	Default	Description
SMTP Configuration	(empty)	Free-text SMTP settings for email delivery. The agent parses this naturally when sending phishing emails via Python smtplib

Example configuration:

SMTP_HOST: smtp.gmail.com
SMTP_PORT: 587
SMTP_USER: pentest@gmail.com
SMTP_PASS: abcd efgh ijkl mnop
SMTP_FROM: it-support@company.com
USE_TLS: true

If left empty, the agent asks the user at runtime for SMTP credentials when email delivery is requested. The agent never attempts to send email without proper SMTP configuration.

See Agent Skills > Social Engineering Simulation for the full phishing workflow documentation.

CypherFix Configuration

Configure CypherFix automated vulnerability remediation. These settings control how the CodeFix agent interacts with your GitHub repository.

CypherFix Settings

Parameter	Default	Description
GitHub Token (CypherFix)	—	Personal Access Token with `repo` scope for cloning, pushing, and creating PRs
Default Repository	—	Target repository in `owner/repo` format (e.g., `redis/redis`)
Default Branch	main	Base branch for creating fix branches
Branch Prefix	cypherfix/	Prefix for auto-created fix branches (e.g., `cypherfix/fix-sqli-42`)
Require Approval	true	Pause before each code edit for human review. When disabled, blocks auto-accept after 5 minutes
LLM Model Override	(Agent default)	Use a specific model for CodeFix instead of the model configured in Agent Behaviour

See CypherFix — Automated Remediation for the full usage guide.

Tool Phase Restrictions

A matrix controlling which tools the agent can use in each operational phase. Each tool can be independently enabled/disabled per phase. Tools that require an external API key (web_search, shodan, google_dork) display a warning with a quick-add modal when enabled without a key configured in Global Settings.

Tool	Informational	Exploitation	Post-Exploitation
query_graph	✓	✓	✓
web_search	✓	✓	✓
shodan	✓	✓	--
google_dork	✓	--	--
execute_curl	✓	✓	✓
execute_httpx	✓	✓	--
execute_naabu	✓	✓	--
execute_subfinder	✓	✓	--
execute_gau	✓	✓	--
execute_nmap	✓	✓	✓
execute_nuclei	✓	✓	--
execute_wpscan	✓	✓	--
execute_jsluice	✓	✓	--
execute_amass	✓	✓	--
execute_katana	✓	✓	--
execute_arjun	✓	✓	--
execute_ffuf	✓	✓	--
kali_shell	✓	✓	✓
execute_code	--	✓	✓
execute_playwright	✓	✓	✓
execute_hydra	--	✓	✓
metasploit_console	--	✓	✓
msf_restart	--	✓	✓

This matrix is configurable per project in the dedicated Tool Matrix tab of the project settings form (under the AI Agent tab group).

User MCP Tool Plugins also surface here: any Model-Context-Protocol server you add as a tool plugin via Global Settings → MCP Tool Plugins appears in this same Tool Matrix as a separate "MCP Tool Plugins" group below the built-ins, with the same 3-phase checkboxes per tool. New tools default to all three phases enabled. See the MCP Tool Plugins wiki page for the full operator manual.

RedAmon GitHub Repository | Report an Issue | Back to Home

Home

Getting Started

Core Workflow

Scanning & OSINT

AI & Automation

HackLab

RedAmon HackLab

Analysis & Reporting

Contributing

Reference & Help

Project Settings Reference

Project Settings Reference

Table of Contents

Target Configuration

Scan Module Toggles

Port Scanner (Masscan)

How it works

Port Scanner (Naabu)

How it works

Nmap Service Detection

How it works

HTTP Prober (httpx)

How it works

Technology Detection (Wappalyzer)

Banner Grabbing

Web Crawler (Katana)

How it works

Passive URL Discovery (GAU)

How it works

ParamSpider Passive Parameter Discovery

How it works

API Discovery (Kiterunner)

How it works

Web Crawler (Hakrawler)

How it works

JavaScript Analysis (jsluice)

How it works

JS Reconnaissance

How it works

Directory Fuzzer (FFuf)

How it works

Parameter Discovery (Arjun)

How it works

GraphQL Security Testing

How it works

Subdomain Takeover Detection

How it works

VHost & SNI Enumeration

How it works

Vulnerability Scanner (Nuclei)

How it works

CVE Enrichment

How it works

MITRE Mapping

How it works

Security Checks

How it works

GVM Vulnerability Scan

How it works

Subdomain Discovery

How it works

URLScan.io Enrichment

How it works

Shodan OSINT Enrichment

How it works

Uncover Multi-Engine Search

How it works

Threat Intelligence Enrichment (7 OSINT Tools)

How it works (shared mechanics)

Censys

FOFA

OTX (AlienVault Open Threat Exchange)

Netlas

VirusTotal

ZoomEye

CriminalIP

GitHub Secret Hunting

How it works

TruffleHog Secret Scanning

How it works

Agent Behavior

Cross-Site Scripting (XSS)

Hydra Credential Testing

Social Engineering Simulation

CypherFix Configuration

Tool Phase Restrictions

Uh oh!

Uh oh!

Uh oh!

Uh oh!