AI-Powered Web Vulnerability Scanner

Crawl. Detect. Classify. Actually understand what's broken.

I built this because every free scanner I tried had the same annoying problem — it floods you with 60 findings and treats a missing X-XSS-Protection header with the same urgency as a raw SQL injection. You end up spending more time sorting the output than actually fixing anything.

So I wrote my own. This one crawls your target first, runs five detection modules, and then feeds every finding into a trained ML model that scores severity based on what the vulnerability actually is — not some hardcoded priority list. The whole thing lives inside a Streamlit dashboard with a live terminal so you can watch it work. When it finishes, you get a clean JSON report you can keep, share, or pipe into whatever workflow you have.

What it tests

The scanner doesn't just ping a URL. It crawls the site first using a BFS traversal, discovers internal pages up to your depth limit, then fans out across everything it found.

Cross-Site Scripting (XSS)
Throws 10 payloads at every URL parameter and HTML form input it finds. It's not just checking for raw reflection — it also catches partial-encoding bypasses using pattern matching. Both GET parameters and POST form fields are covered independently.

SQL Injection
Two strategies running together. Error-based detection listens for database error signatures from MySQL, PostgreSQL, MSSQL, Oracle, and SQLite — over 25 patterns in total. Boolean-based detection compares response lengths between 1=1 and 1=2 conditions to catch cases where the server stays quiet but still behaves differently. URL params and form inputs both get tested.

Security Headers
Checks eight headers: X-Frame-Options, X-Content-Type-Options, X-XSS-Protection, Content-Security-Policy, Strict-Transport-Security, Referrer-Policy, Permissions-Policy, and Cache-Control. Also flags the ones that leak information — Server, X-Powered-By, X-Generator — which hand attackers your full stack without them having to do anything.

Open Redirect
Tests over 20 redirect-flavored parameter names (url, next, return, goto, redirect, callback, destination, and more) against 9 payloads. Catches server-side 3xx redirects and client-side ones hiding in meta-refresh tags or window.location calls.

Directory and File Discovery
Probes 40+ paths that people routinely forget to lock down: admin panels, .env files, .git/config, raw database dumps, backup zips, Swagger UI, server-status pages, upload directories. Anything that comes back with a 200, 301, or 302 gets flagged.

Every single finding goes through the AI module and comes out tagged as Critical, High, Medium, or Low.

Getting started

Python 3.9 or higher. That's the only real requirement.

cd ai-web-vulnerability-scanner

pip install -r requirements.txt

streamlit run app.py

It opens at http://localhost:8501.

Running a scan

Drop your target URL into the input field at the top — http:// or https://, it handles both
Pick which modules you want to run from the left sidebar (everything's on by default)
Set your crawl depth, delay, and timeout if the defaults don't fit
Press ▶ SCAN
Watch the live terminal output as each phase runs
Once results are in, use the filters to cut down to what you care about
Grab the JSON report from the export panel at the bottom

One thing worth paying attention to: the Request delay slider in the sidebar controls how fast the scanner hits the server. Don't set it to zero just because you can.

Project layout

ai-web-vulnerability-scanner/
│
├── app.py                     ← Dashboard UI, state management, results rendering
├── scanner_engine.py          ← Runs each phase in order, wires progress callbacks
├── crawler.py                 ← BFS link crawler with depth control and URL deduplication
├── requirements.txt
│
├── ai/
│   └── vulnerability_ai.py   ← RandomForest classifier, feature extraction, severity scoring
│
├── detectors/
│   ├── xss_detector.py        ← XSS via URL params and form fields
│   ├── sql_detector.py        ← Error-based and boolean-based SQLi
│   ├── header_detector.py     ← Missing headers, weak values, server info leakage
│   ├── redirect_detector.py   ← Open redirect parameter injection
│   └── directory_detector.py  ← Sensitive path and exposed file detection
│
├── utils/
│   ├── request_manager.py     ← Shared HTTP session, retry handling, SSL fallback
│   └── payloads.py            ← Every payload in one place, easy to extend
│
└── reports/
    └── report_generator.py    ← JSON report structure, risk scoring, export logic

Every detector is completely independent. If you want to run just the header check against a single endpoint, import detect_missing_headers and call it directly — no need to drag in the whole engine.

Sample report output

{
  "report_metadata": {
    "tool": "AI Web Vulnerability Scanner",
    "version": "1.0.0",
    "generated_at": "20260316_142205",
    "target": "http://testphp.vulnweb.com",
    "total_findings": 8
  },
  "executive_summary": {
    "risk_level": "CRITICAL",
    "total_vulnerabilities": 8,
    "pages_scanned": 14,
    "requests_made": 387,
    "scan_duration_seconds": 42.1,
    "severity_breakdown": {
      "Critical": 2,
      "High": 2,
      "Medium": 2,
      "Low": 2
    }
  },
  "vulnerabilities": [
    {
      "id": "VULN-0001",
      "type": "SQLi",
      "subtype": "Error-based SQL Injection",
      "severity": "Critical",
      "severity_score": 4,
      "url": "http://testphp.vulnweb.com/listproducts.php",
      "parameter": "cat",
      "http_method": "GET",
      "payload_used": "' OR 1=1 --",
      "evidence": "Database error message exposed in response",
      "description": "SQL injection confirmed in parameter 'cat'. The app returned a raw database error, meaning user input is going straight into the query without any sanitization.",
      "remediation": "Switch to parameterized queries or prepared statements. Kill verbose error messages in production — they're free recon for attackers."
    },
    {
      "id": "VULN-0002",
      "type": "XSS",
      "subtype": "Reflected XSS",
      "severity": "High",
      "severity_score": 3,
      "url": "http://testphp.vulnweb.com/search.php",
      "parameter": "q",
      "http_method": "GET",
      "payload_used": "<script>alert('XSS')</script>",
      "evidence": "Payload reflected in response body without encoding",
      "remediation": "Encode all output before it touches HTML. Add a Content-Security-Policy header while you're at it."
    }
  ],
  "remediation_priority": [
    {
      "vulnerability_type": "SQLi",
      "severity": "Critical",
      "count": 2,
      "remediation": "Use parameterized queries. Remove raw database errors from responses."
    },
    {
      "vulnerability_type": "XSS",
      "severity": "High",
      "count": 2,
      "remediation": "Encode user output. Implement Content-Security-Policy."
    }
  ]
}

How the AI classifier works

The model is a RandomForest trained on synthetic feature vectors. Each vulnerability gets converted into nine numeric features before classification:

Type score — base risk weight of the vulnerability class (SQLi = 4, XSS = 3, headers = 1, etc.)
Subtype score — more granular risk for specific variants
Method score — POST carries more weight than GET
Has payload — whether an active injection payload was used to trigger the finding
Has evidence — whether concrete confirmation was captured (error message, reflection, redirect)
Is injection — binary flag for XSS and SQLi class
Is header — binary flag for header-based findings
Is redirect — binary flag for redirect-based findings
Is disclosure — binary flag for information exposure and directory findings

This approach means the classifier isn't just matching a type name to a severity. It's reasoning about the full context of how the vulnerability was found and confirmed. A blind SQLi with evidence will always outrank a theoretical one, and missing headers stay low unless they're Strict-Transport-Security or Content-Security-Policy.

Stack

Library	What it does here
Python	Everything
Streamlit	Dashboard UI and live state management
Requests	HTTP session, retries, SSL fallback
BeautifulSoup	HTML parsing for links and form extraction
scikit-learn	RandomForest classifier for severity scoring
NumPy	Feature vector construction
Pandas	Results table in the UI

Legal practice targets

Don't scan anything you don't control or have written permission to test. These are specifically designed to be broken:

Target	What's useful about it
`http://testphp.vulnweb.com`	Acunetix's deliberately vulnerable PHP app. Has SQLi, XSS, and open redirects baked in. Best place to start.
`https://ginandjuice.shop`	PortSwigger's vulnerable shop. Good for testing redirect and injection detection.
`http://zero.webappsecurity.com`	Demo banking app. Useful for header auditing.

Honest limitations

**It doesn't authenticate. It won't find stored XSS, IDOR, broken access control, or any logic-layer vulnerability. Think of it as a surface scan — a solid starting point, not a substitute for a real pentest.

The AI model is trained on synthetic data. It's not pulling from a CVE database or live exploit feeds. It reasons about feature patterns, which works well for prioritization but won't give you CVSS scores or CWE mappings.

SSL issues are handled automatically. If a certificate is expired or self-signed, the request manager falls back to skipping verification. The scan continues either way.

Rate limiting is on for a reason. There's a built-in request delay. It keeps the scanner from looking like a DDoS, prevents getting your IP blocked, and is just the professional way to run a tool like this.

Legal

Scanning systems without explicit written permission is illegal. This applies in the US (Computer Fraud and Abuse Act), UK (Computer Misuse Act), India (IT Act), and most other jurisdictions. This project exists for authorized security testing, CTF practice, and learning. How you use it is entirely on you.

Author

Tirth — IT undergrad, IIT Delhi ethical hacking certified, IIT Guwahati AI/ML track in progress.
GitHub: @Tktirth

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI-Powered Web Vulnerability Scanner

What it tests

Getting started

Running a scan

Project layout

Sample report output

How the AI classifier works

Stack

Legal practice targets

Honest limitations

Legal

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
ai		ai
detectors		detectors
reports		reports
utils		utils
.gitignore		.gitignore
README.md		README.md
app.py		app.py
crawler.py		crawler.py
requirements.txt		requirements.txt
scanner_engine.py		scanner_engine.py

Folders and files

Latest commit

History

Repository files navigation

AI-Powered Web Vulnerability Scanner

What it tests

Getting started

Running a scan

Project layout

Sample report output

How the AI classifier works

Stack

Legal practice targets

Honest limitations

Legal

Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages