Crawl. Detect. Classify. Actually understand what's broken.
I built this because every free scanner I tried had the same annoying problem — it floods you with 60 findings and treats a missing X-XSS-Protection header with the same urgency as a raw SQL injection. You end up spending more time sorting the output than actually fixing anything.
So I wrote my own. This one crawls your target first, runs five detection modules, and then feeds every finding into a trained ML model that scores severity based on what the vulnerability actually is — not some hardcoded priority list. The whole thing lives inside a Streamlit dashboard with a live terminal so you can watch it work. When it finishes, you get a clean JSON report you can keep, share, or pipe into whatever workflow you have.
The scanner doesn't just ping a URL. It crawls the site first using a BFS traversal, discovers internal pages up to your depth limit, then fans out across everything it found.
Cross-Site Scripting (XSS)
Throws 10 payloads at every URL parameter and HTML form input it finds. It's not just checking for raw reflection — it also catches partial-encoding bypasses using pattern matching. Both GET parameters and POST form fields are covered independently.
SQL Injection
Two strategies running together. Error-based detection listens for database error signatures from MySQL, PostgreSQL, MSSQL, Oracle, and SQLite — over 25 patterns in total. Boolean-based detection compares response lengths between 1=1 and 1=2 conditions to catch cases where the server stays quiet but still behaves differently. URL params and form inputs both get tested.
Security Headers
Checks eight headers: X-Frame-Options, X-Content-Type-Options, X-XSS-Protection, Content-Security-Policy, Strict-Transport-Security, Referrer-Policy, Permissions-Policy, and Cache-Control. Also flags the ones that leak information — Server, X-Powered-By, X-Generator — which hand attackers your full stack without them having to do anything.
Open Redirect
Tests over 20 redirect-flavored parameter names (url, next, return, goto, redirect, callback, destination, and more) against 9 payloads. Catches server-side 3xx redirects and client-side ones hiding in meta-refresh tags or window.location calls.
Directory and File Discovery
Probes 40+ paths that people routinely forget to lock down: admin panels, .env files, .git/config, raw database dumps, backup zips, Swagger UI, server-status pages, upload directories. Anything that comes back with a 200, 301, or 302 gets flagged.
Every single finding goes through the AI module and comes out tagged as Critical, High, Medium, or Low.
Python 3.9 or higher. That's the only real requirement.
cd ai-web-vulnerability-scanner
pip install -r requirements.txt
streamlit run app.pyIt opens at http://localhost:8501.
- Drop your target URL into the input field at the top —
http://orhttps://, it handles both - Pick which modules you want to run from the left sidebar (everything's on by default)
- Set your crawl depth, delay, and timeout if the defaults don't fit
- Press ▶ SCAN
- Watch the live terminal output as each phase runs
- Once results are in, use the filters to cut down to what you care about
- Grab the JSON report from the export panel at the bottom
One thing worth paying attention to: the Request delay slider in the sidebar controls how fast the scanner hits the server. Don't set it to zero just because you can.
ai-web-vulnerability-scanner/
│
├── app.py ← Dashboard UI, state management, results rendering
├── scanner_engine.py ← Runs each phase in order, wires progress callbacks
├── crawler.py ← BFS link crawler with depth control and URL deduplication
├── requirements.txt
│
├── ai/
│ └── vulnerability_ai.py ← RandomForest classifier, feature extraction, severity scoring
│
├── detectors/
│ ├── xss_detector.py ← XSS via URL params and form fields
│ ├── sql_detector.py ← Error-based and boolean-based SQLi
│ ├── header_detector.py ← Missing headers, weak values, server info leakage
│ ├── redirect_detector.py ← Open redirect parameter injection
│ └── directory_detector.py ← Sensitive path and exposed file detection
│
├── utils/
│ ├── request_manager.py ← Shared HTTP session, retry handling, SSL fallback
│ └── payloads.py ← Every payload in one place, easy to extend
│
└── reports/
└── report_generator.py ← JSON report structure, risk scoring, export logic
Every detector is completely independent. If you want to run just the header check against a single endpoint, import detect_missing_headers and call it directly — no need to drag in the whole engine.
{
"report_metadata": {
"tool": "AI Web Vulnerability Scanner",
"version": "1.0.0",
"generated_at": "20260316_142205",
"target": "http://testphp.vulnweb.com",
"total_findings": 8
},
"executive_summary": {
"risk_level": "CRITICAL",
"total_vulnerabilities": 8,
"pages_scanned": 14,
"requests_made": 387,
"scan_duration_seconds": 42.1,
"severity_breakdown": {
"Critical": 2,
"High": 2,
"Medium": 2,
"Low": 2
}
},
"vulnerabilities": [
{
"id": "VULN-0001",
"type": "SQLi",
"subtype": "Error-based SQL Injection",
"severity": "Critical",
"severity_score": 4,
"url": "http://testphp.vulnweb.com/listproducts.php",
"parameter": "cat",
"http_method": "GET",
"payload_used": "' OR 1=1 --",
"evidence": "Database error message exposed in response",
"description": "SQL injection confirmed in parameter 'cat'. The app returned a raw database error, meaning user input is going straight into the query without any sanitization.",
"remediation": "Switch to parameterized queries or prepared statements. Kill verbose error messages in production — they're free recon for attackers."
},
{
"id": "VULN-0002",
"type": "XSS",
"subtype": "Reflected XSS",
"severity": "High",
"severity_score": 3,
"url": "http://testphp.vulnweb.com/search.php",
"parameter": "q",
"http_method": "GET",
"payload_used": "<script>alert('XSS')</script>",
"evidence": "Payload reflected in response body without encoding",
"remediation": "Encode all output before it touches HTML. Add a Content-Security-Policy header while you're at it."
}
],
"remediation_priority": [
{
"vulnerability_type": "SQLi",
"severity": "Critical",
"count": 2,
"remediation": "Use parameterized queries. Remove raw database errors from responses."
},
{
"vulnerability_type": "XSS",
"severity": "High",
"count": 2,
"remediation": "Encode user output. Implement Content-Security-Policy."
}
]
}The model is a RandomForest trained on synthetic feature vectors. Each vulnerability gets converted into nine numeric features before classification:
- Type score — base risk weight of the vulnerability class (SQLi = 4, XSS = 3, headers = 1, etc.)
- Subtype score — more granular risk for specific variants
- Method score — POST carries more weight than GET
- Has payload — whether an active injection payload was used to trigger the finding
- Has evidence — whether concrete confirmation was captured (error message, reflection, redirect)
- Is injection — binary flag for XSS and SQLi class
- Is header — binary flag for header-based findings
- Is redirect — binary flag for redirect-based findings
- Is disclosure — binary flag for information exposure and directory findings
This approach means the classifier isn't just matching a type name to a severity. It's reasoning about the full context of how the vulnerability was found and confirmed. A blind SQLi with evidence will always outrank a theoretical one, and missing headers stay low unless they're Strict-Transport-Security or Content-Security-Policy.
| Library | What it does here |
|---|---|
| Python | Everything |
| Streamlit | Dashboard UI and live state management |
| Requests | HTTP session, retries, SSL fallback |
| BeautifulSoup | HTML parsing for links and form extraction |
| scikit-learn | RandomForest classifier for severity scoring |
| NumPy | Feature vector construction |
| Pandas | Results table in the UI |
Don't scan anything you don't control or have written permission to test. These are specifically designed to be broken:
| Target | What's useful about it |
|---|---|
http://testphp.vulnweb.com |
Acunetix's deliberately vulnerable PHP app. Has SQLi, XSS, and open redirects baked in. Best place to start. |
https://ginandjuice.shop |
PortSwigger's vulnerable shop. Good for testing redirect and injection detection. |
http://zero.webappsecurity.com |
Demo banking app. Useful for header auditing. |
**It doesn't authenticate. It won't find stored XSS, IDOR, broken access control, or any logic-layer vulnerability. Think of it as a surface scan — a solid starting point, not a substitute for a real pentest.
The AI model is trained on synthetic data. It's not pulling from a CVE database or live exploit feeds. It reasons about feature patterns, which works well for prioritization but won't give you CVSS scores or CWE mappings.
SSL issues are handled automatically. If a certificate is expired or self-signed, the request manager falls back to skipping verification. The scan continues either way.
Rate limiting is on for a reason. There's a built-in request delay. It keeps the scanner from looking like a DDoS, prevents getting your IP blocked, and is just the professional way to run a tool like this.
Scanning systems without explicit written permission is illegal. This applies in the US (Computer Fraud and Abuse Act), UK (Computer Misuse Act), India (IT Act), and most other jurisdictions. This project exists for authorized security testing, CTF practice, and learning. How you use it is entirely on you.
Tirth — IT undergrad, IIT Delhi ethical hacking certified, IIT Guwahati AI/ML track in progress.
GitHub: @Tktirth