SiteSentry

SiteSentry is a production-grade website audit engine for SEO, technical, performance, and conversion analysis — built for evidence-based detection and executive-ready reporting.

🎯 Features

Full Website Crawling – Unlimited pages/depth with concurrent crawling
JavaScript Rendering – Playwright-powered rendering for SPAs and dynamic content
Subdomain Discovery – Automatic detection and crawling of subdomains
8 Accuracy Guardrails – Evidence-based issue detection (no false positives)
Dual Scoring System – Site Health Score (0-100) + Revenue Score (0-100)
Full-Stack Exports – 12 output files including 3 executive PDFs
Technology Detection – Identifies CMS, frameworks, analytics, CDN, and more

📦 Installation

Prerequisites

Python 3.9+
pip

Setup

# Clone the repository
git clone https://github.com/BalaShankar9/parcellab-audit-toolkit.git
cd parcellab-audit-toolkit

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Install Playwright browsers (required for JS rendering)
playwright install chromium

🚀 Quick Start

# Basic audit (50 pages, auto rendering)
python audit.py --url https://example.com --max-pages 50

# Full unlimited audit with JS rendering
python audit.py --url https://example.com --max-pages 0 --max-depth 0 --render always

# Quick scan without rendering (faster)
python audit.py --url https://example.com --max-pages 100 --render never

⚙️ Command Line Options

Option	Default	Description
`--url`	(required)	Target website URL
`--max-pages`	`0`	Max pages to crawl (0 = unlimited)
`--max-depth`	`0`	Max crawl depth (0 = unlimited)
`--render`	`auto`	JS rendering: `always`, `never`, `auto`
`--output`	`./audit-output`	Output directory
`--audit-date`	current date	Date string for reports
`--workers`	`10`	Concurrent crawl workers
`--timeout`	`30`	Request timeout (seconds)
`--accuracy`	`strict`	Accuracy mode: `strict` or `normal`
`--no-journeys`	flag	Skip journey tests

🔧 Rendering Modes

Mode	Speed	Use Case
`never`	⚡ Fast (10-20 pages/sec)	Static HTML sites, quick scans
`auto`	🔄 Adaptive	Renders only when JS detected
`always`	🐢 Slow (2-5 pages/sec)	SPAs, React/Vue/Angular sites

🛡️ Accuracy Guardrails (STRICT Mode)

The toolkit uses 8 evidence-based rules to ensure accuracy:

Evidence Required – Every issue must have HTML/data proof
Confidence Levels – HIGH/MEDIUM/LOW with appropriate weighting
Valid Pages Only – Only status 200 pages are scored
No Inference – Never assume; only report what's verifiable
Verification Steps – Each issue includes reproducible steps
Category Caps – Prevents score from hitting 0 unfairly
Manual Validation Flag – Low-confidence issues marked for review
Source Attribution – Every finding cites its data source

📊 Output Files

Each audit generates a timestamped folder with:

CSV Exports

File	Description
`pages.csv`	One row per crawled page with metadata
`issues.csv`	All issues with affected URLs
`issues_summary.csv`	One row per issue type
`internal_links.csv`	Link graph (source → target)
`redirects.csv`	Redirect chains
`errors.csv`	Timeouts, failures, HTTP errors
`fix_backlog.csv`	Jira-ready task list
`audit_data.csv`	Comprehensive single-file export

JSON

File	Description
`run_manifest.json`	Run config, counts, durations, scores

PDF Reports

File	Description
`Executive_Summary.pdf`	3-5 page leadership briefing
`Full_Audit_Report.pdf`	20-40 page detailed analysis
`Appendix.pdf`	Raw data tables and URL lists

📈 Scoring System

Site Health Score (0-100)

Measures overall website technical health with category-capped penalties:

Category	Max Penalty
Technical	20 pts
SEO	20 pts
Performance	15 pts
UX-CRO	15 pts
Content	10 pts
Tracking	8 pts
Security	5 pts
Accessibility	3 pts

Revenue Score (0-100)

Focuses on conversion-critical pages (pricing, demo, contact):

Category	Max Penalty
Conversion Journey	35 pts
Tracking & Attribution	20 pts
Trust & Compliance	15 pts
Performance (Money Pages)	15 pts
UX Friction	10 pts
SEO Readiness	5 pts

🔍 Detected Technologies

The toolkit automatically detects:

CMS: WordPress, HubSpot, Contentful, Webflow, etc.
Frameworks: React, Vue, Angular, Next.js, jQuery
Analytics: GA4, Amplitude, Heap, FullStory, Mixpanel
Tag Managers: GTM, Segment, Tealium
CDN: Cloudflare, Fastly, Akamai, CloudFront
Marketing: HubSpot, Marketo, Pardot, Intercom, Drift

📁 Project Structure

parcellab-audit-toolkit/
├── audit.py              # Main entry point
├── score_audit.py        # Standalone scoring CLI
├── requirements.txt      # Python dependencies
├── src/
│   ├── crawler/          # Crawl engine & URL normalization
│   ├── render/           # Playwright renderer
│   ├── analyze/          # Issue detection & tech detection
│   ├── scoring/          # Health & Revenue scoring
│   ├── reporting/        # PDF generator
│   ├── outputs/          # Output manager
│   └── journeys/         # User journey tests
└── audit-output/         # Generated reports (gitignored)

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Playwright for browser automation
ReportLab for PDF generation
Beautiful Soup for HTML parsing
Rich for terminal output

Built with ❤️ for website optimization professionals

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
audit_engine		audit_engine
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
audit.py		audit.py
config.yaml		config.yaml
requirements.txt		requirements.txt
run_audit.py		run_audit.py
score_audit.py		score_audit.py
test_reports.py		test_reports.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SiteSentry

🎯 Features

📦 Installation

Prerequisites

Setup

🚀 Quick Start

⚙️ Command Line Options

🔧 Rendering Modes

🛡️ Accuracy Guardrails (STRICT Mode)

📊 Output Files

CSV Exports

JSON

PDF Reports

📈 Scoring System

Site Health Score (0-100)

Revenue Score (0-100)

🔍 Detected Technologies

📁 Project Structure

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SiteSentry

🎯 Features

📦 Installation

Prerequisites

Setup

🚀 Quick Start

⚙️ Command Line Options

🔧 Rendering Modes

🛡️ Accuracy Guardrails (STRICT Mode)

📊 Output Files

CSV Exports

JSON

PDF Reports

📈 Scoring System

Site Health Score (0-100)

Revenue Score (0-100)

🔍 Detected Technologies

📁 Project Structure

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages