SiteSentry is a production-grade website audit engine for SEO, technical, performance, and conversion analysis — built for evidence-based detection and executive-ready reporting.
- Full Website Crawling – Unlimited pages/depth with concurrent crawling
- JavaScript Rendering – Playwright-powered rendering for SPAs and dynamic content
- Subdomain Discovery – Automatic detection and crawling of subdomains
- 8 Accuracy Guardrails – Evidence-based issue detection (no false positives)
- Dual Scoring System – Site Health Score (0-100) + Revenue Score (0-100)
- Full-Stack Exports – 12 output files including 3 executive PDFs
- Technology Detection – Identifies CMS, frameworks, analytics, CDN, and more
- Python 3.9+
- pip
# Clone the repository
git clone https://github.com/BalaShankar9/parcellab-audit-toolkit.git
cd parcellab-audit-toolkit
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Install Playwright browsers (required for JS rendering)
playwright install chromium# Basic audit (50 pages, auto rendering)
python audit.py --url https://example.com --max-pages 50
# Full unlimited audit with JS rendering
python audit.py --url https://example.com --max-pages 0 --max-depth 0 --render always
# Quick scan without rendering (faster)
python audit.py --url https://example.com --max-pages 100 --render never| Option | Default | Description |
|---|---|---|
--url |
(required) | Target website URL |
--max-pages |
0 |
Max pages to crawl (0 = unlimited) |
--max-depth |
0 |
Max crawl depth (0 = unlimited) |
--render |
auto |
JS rendering: always, never, auto |
--output |
./audit-output |
Output directory |
--audit-date |
current date | Date string for reports |
--workers |
10 |
Concurrent crawl workers |
--timeout |
30 |
Request timeout (seconds) |
--accuracy |
strict |
Accuracy mode: strict or normal |
--no-journeys |
flag | Skip journey tests |
| Mode | Speed | Use Case |
|---|---|---|
never |
⚡ Fast (10-20 pages/sec) | Static HTML sites, quick scans |
auto |
🔄 Adaptive | Renders only when JS detected |
always |
🐢 Slow (2-5 pages/sec) | SPAs, React/Vue/Angular sites |
The toolkit uses 8 evidence-based rules to ensure accuracy:
- Evidence Required – Every issue must have HTML/data proof
- Confidence Levels – HIGH/MEDIUM/LOW with appropriate weighting
- Valid Pages Only – Only status 200 pages are scored
- No Inference – Never assume; only report what's verifiable
- Verification Steps – Each issue includes reproducible steps
- Category Caps – Prevents score from hitting 0 unfairly
- Manual Validation Flag – Low-confidence issues marked for review
- Source Attribution – Every finding cites its data source
Each audit generates a timestamped folder with:
| File | Description |
|---|---|
pages.csv |
One row per crawled page with metadata |
issues.csv |
All issues with affected URLs |
issues_summary.csv |
One row per issue type |
internal_links.csv |
Link graph (source → target) |
redirects.csv |
Redirect chains |
errors.csv |
Timeouts, failures, HTTP errors |
fix_backlog.csv |
Jira-ready task list |
audit_data.csv |
Comprehensive single-file export |
| File | Description |
|---|---|
run_manifest.json |
Run config, counts, durations, scores |
| File | Description |
|---|---|
Executive_Summary.pdf |
3-5 page leadership briefing |
Full_Audit_Report.pdf |
20-40 page detailed analysis |
Appendix.pdf |
Raw data tables and URL lists |
Measures overall website technical health with category-capped penalties:
| Category | Max Penalty |
|---|---|
| Technical | 20 pts |
| SEO | 20 pts |
| Performance | 15 pts |
| UX-CRO | 15 pts |
| Content | 10 pts |
| Tracking | 8 pts |
| Security | 5 pts |
| Accessibility | 3 pts |
Focuses on conversion-critical pages (pricing, demo, contact):
| Category | Max Penalty |
|---|---|
| Conversion Journey | 35 pts |
| Tracking & Attribution | 20 pts |
| Trust & Compliance | 15 pts |
| Performance (Money Pages) | 15 pts |
| UX Friction | 10 pts |
| SEO Readiness | 5 pts |
The toolkit automatically detects:
- CMS: WordPress, HubSpot, Contentful, Webflow, etc.
- Frameworks: React, Vue, Angular, Next.js, jQuery
- Analytics: GA4, Amplitude, Heap, FullStory, Mixpanel
- Tag Managers: GTM, Segment, Tealium
- CDN: Cloudflare, Fastly, Akamai, CloudFront
- Marketing: HubSpot, Marketo, Pardot, Intercom, Drift
parcellab-audit-toolkit/
├── audit.py # Main entry point
├── score_audit.py # Standalone scoring CLI
├── requirements.txt # Python dependencies
├── src/
│ ├── crawler/ # Crawl engine & URL normalization
│ ├── render/ # Playwright renderer
│ ├── analyze/ # Issue detection & tech detection
│ ├── scoring/ # Health & Revenue scoring
│ ├── reporting/ # PDF generator
│ ├── outputs/ # Output manager
│ └── journeys/ # User journey tests
└── audit-output/ # Generated reports (gitignored)
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Playwright for browser automation
- ReportLab for PDF generation
- Beautiful Soup for HTML parsing
- Rich for terminal output
Built with ❤️ for website optimization professionals