A comprehensive, multi-layer spam protection system for web forms using OpenResty (Lua) for intelligent form analysis and HAProxy for global rate limiting. Features a modern React-based Admin UI for real-time configuration management.
| Category | Features |
|---|---|
| Content Analysis | Keyword filtering, content hashing, honeypot detection, disposable email blocking, link analysis |
| Behavioral Detection | Form timing tokens, field anomaly detection, submission fingerprinting |
| Threat Intelligence | IP reputation (AbuseIPDB), GeoIP restrictions, datacenter/VPN detection |
| Bot Protection | CAPTCHA integration (reCAPTCHA, hCaptcha, Turnstile), rate limiting |
| Operations | Webhook notifications, audit logging, bulk import/export, Prometheus metrics |
| Multi-tenancy | Virtual hosts, per-endpoint configuration, field learning |
Client → Ingress → OpenResty → HAProxy → Backend
↓ ↓
Redis (Stick-table sync)
↑
Admin UI (port 3000)
Admin API (port 8082)
-
OpenResty (Port 8080/8081/8082) - Intelligent form analysis engine
- Multi-format parsing (multipart, urlencoded, JSON)
- Content-based spam scoring with 20+ detection rules
- Redis-backed dynamic configuration
- Per-vhost and per-endpoint customization
-
HAProxy - Global rate limiting with stick-table sync
- StatefulSet with automatic peer discovery
- Per-hash, per-IP, and per-fingerprint rate limiting
- Prometheus metrics export
-
Redis - Dynamic configuration store
- Keywords, hashes, IP lists
- Virtual host and endpoint configurations
- Session management for Admin UI
-
Admin UI - React-based management dashboard
- Real-time configuration updates
- Visual management of all WAF features
- Role-based authentication
Multi-instance deployments use Redis-based coordination:
- Instance Registration - Each pod registers with heartbeat (15s interval)
- Leader Election - Single leader via Redis SET NX PX
- Global Metrics - Leader aggregates metrics from all instances
- Automatic Cleanup - Stale instances removed after 5 minutes
See Cluster Coordination Guide for details.
# Start all services
docker-compose up -d
# Initialize Redis with default data
docker-compose exec redis sh /init-data.sh
# Access Admin UI at http://localhost:3000
# Default credentials: admin / changeme
# Test the WAF
./scripts/test-waf.sh http://localhost:8080helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update
cd helm/forms-waf
helm dependency build
helm install forms-waf . -n forms-waf --create-namespace- Blocked Keywords: Instant rejection (e.g., "viagra", "casino")
- Flagged Keywords: Score-based with configurable weights
- Case-insensitive matching with word boundary detection
- SHA256 hashing of normalized form content
- Duplicate submission detection across all users
- Configurable thresholds for blocking repeated content
Automatically detects and scores submissions that fill hidden honeypot fields.
{
"honeypot_fields": ["website", "url", "phone2"],
"honeypot_action": "flag",
"honeypot_score": 50
}Blocks or flags submissions using temporary email services:
- 250+ built-in disposable domains
- Custom domain blocklist via Redis
- Configurable action (block/flag/monitor)
- URL Shortener Detection: Flags bit.ly, tinyurl.com, etc. (25+ services)
- Suspicious TLD Detection: Flags .xyz, .top, .click, etc.
- Excessive Link Detection: Scores based on link count
Detects bot submissions by measuring time between form load and submission:
| Timing | Score | Reason |
|---|---|---|
| No cookie | +30 | Direct POST without loading form |
| < 2 seconds | +40 | Too fast for human |
| < 5 seconds | +20 | Suspiciously fast |
| > 5 seconds | +0 | Normal behavior |
Enable via Admin UI: Security → Form Timing
Detects suspicious patterns in form submissions:
- Identical field lengths (bot-generated)
- Sequential/incremental values
- ALL CAPS submissions
- Test data patterns ("test", "asdf", "123")
- Unusually long field values
Creates client fingerprints based on:
- User-Agent string
- Accept-Language header
- Field names submitted
- Request characteristics
Used for cross-request correlation and flood detection.
Advanced statistical analysis of form submission patterns:
- Flow-based monitoring across multi-page forms
- Time-series aggregation (hour/day/week/month/year)
- Duration histograms for bot detection
- Baseline calculation with z-score anomaly detection
- HyperLogLog for unique IP counting
See Behavioral Tracking Guide for details.
Checks IP addresses against multiple sources:
| Provider | Description |
|---|---|
| Local Blocklist | Redis-based manual blocklist |
| AbuseIPDB | External API (requires API key) |
| Custom Webhook | Your internal reputation service |
Enable via Admin UI: Security → IP Reputation
Configuration example:
{
"enabled": true,
"abuseipdb": {
"enabled": true,
"api_key": "your-api-key",
"min_confidence": 25
},
"block_score": 80,
"flag_score": 50
}Country and ASN-based access control using MaxMind GeoLite2 databases:
- Country Blocking: Block specific countries
- Country Allowlist: Only allow specific countries
- ASN Blocking: Block specific networks
- Datacenter Detection: Flag/block cloud provider IPs
Enable via Admin UI: Security → GeoIP
Setup Requirements:
- Download GeoLite2 databases (free registration required)
- Mount to
/usr/share/GeoIP/:GeoLite2-Country.mmdbGeoLite2-ASN.mmdb
# docker-compose.yml
volumes:
- ./geoip:/usr/share/GeoIP:roSupports multiple CAPTCHA providers with automatic fallback:
| Provider | Features |
|---|---|
| reCAPTCHA v2 | Checkbox challenge |
| reCAPTCHA v3 | Invisible scoring |
| hCaptcha | Privacy-focused alternative |
| Cloudflare Turnstile | Frictionless verification |
Configure via Admin UI: CAPTCHA → Providers and CAPTCHA → Settings
Features:
- Per-endpoint CAPTCHA requirements
- Trust tokens for verified users
- Configurable score thresholds
- Fallback chain if primary fails
Multi-layer rate limiting:
| Layer | Scope | Default |
|---|---|---|
| OpenResty | Per-endpoint | Configurable |
| HAProxy | Per-IP | 30/min |
| HAProxy | Per-hash | 10/min |
| HAProxy | Per-fingerprint | 50/min |
Send real-time notifications for WAF events:
{
"enabled": true,
"urls": ["https://your-webhook.example.com/waf-events"],
"events": ["blocked", "flagged", "captcha_required"],
"batch_size": 10,
"batch_interval_ms": 5000
}Configure via Admin UI: Operations → Webhooks
JSON-formatted structured logs for security events:
{
"@timestamp": "2024-01-15T10:30:00Z",
"event_type": "request_blocked",
"client_ip": "192.168.1.100",
"spam_score": 85,
"flags": ["keyword:blocked:viagra", "timing:too_fast"],
"vhost_id": "example-com",
"endpoint_id": "contact-form"
}Import and export configurations via Admin UI: Operations → Bulk
Supported data types:
- Blocked/flagged keywords
- IP allowlist
- Blocked hashes
Formats: JSON, CSV (keywords only)
Available at /metrics endpoint:
| Metric | Description |
|---|---|
waf_requests_total |
Total requests by vhost, endpoint |
waf_requests_blocked_total |
Blocked requests |
waf_requests_monitored_total |
Monitored (would-block) requests |
waf_requests_allowed_total |
Allowed requests |
waf_spam_score_total |
Sum of spam scores |
waf_form_submissions_total |
Form submissions processed |
waf_shared_dict_bytes |
Shared dictionary memory usage |
Global Metrics Aggregation:
In multi-instance deployments, the leader aggregates metrics from all instances. The /api/metrics endpoint returns both local and global metrics side-by-side. See Metrics Aggregation Guide.
Configure per-domain WAF rules:
- Exact hostname matching (
example.com) - Wildcard support (
*.example.com) - Per-vhost thresholds and routing
- WAF modes: blocking, monitoring, passthrough
Configure via Admin UI: Virtual Hosts
Per-endpoint customization:
| Setting | Description |
|---|---|
| WAF Mode | blocking, monitoring, passthrough, strict |
| Thresholds | Override global spam score limits |
| Rate Limits | Per-endpoint request limits |
| Field Validation | Required fields, max lengths |
| Expected Fields | Block unexpected field names |
| Honeypot Fields | Hidden fields to detect bots |
Configure via Admin UI: Endpoints
Automatic field discovery from submissions:
- Enable learning mode on endpoint
- WAF records field names and infers types
- Review learned fields in Admin UI
- Mark fields as expected or honeypot
- Enable validation to block unexpected fields
| Key | Type | Description |
|---|---|---|
waf:keywords:blocked |
SET | Keywords that trigger immediate block |
waf:keywords:flagged |
SET | Keywords with scores (keyword:score) |
waf:hashes:blocked |
SET | Content hashes to block |
waf:config:thresholds |
HASH | Global thresholds |
waf:whitelist:ips |
SET | IPs to bypass filtering |
waf:vhosts:config:{id} |
STRING | Virtual host config (JSON) |
waf:endpoints:config:{id} |
STRING | Endpoint config (JSON) |
waf:config:timing_token |
STRING | Timing token config (JSON) |
waf:config:geoip |
STRING | GeoIP config (JSON) |
waf:config:ip_reputation |
STRING | IP reputation config (JSON) |
waf:config:captcha |
STRING | CAPTCHA config (JSON) |
waf:config:webhooks |
STRING | Webhook config (JSON) |
waf:reputation:blocked_ips |
SET | Local IP blocklist |
waf:disposable_domains |
SET | Custom disposable email domains |
| Threshold | Default | Description |
|---|---|---|
spam_score_block |
80 | Score to block immediately |
spam_score_flag |
50 | Score to flag for monitoring |
hash_count_block |
10 | Block after N identical submissions |
ip_rate_limit |
30 | Max submissions/minute per IP |
ip_daily_limit |
500 | Max submissions/day per IP |
| Variable | Default | Description |
|---|---|---|
REDIS_HOST |
redis | Redis hostname |
REDIS_PORT |
6379 | Redis port |
REDIS_PASSWORD |
- | Redis password (optional) |
REDIS_DB |
0 | Redis database number |
HOSTNAME |
- | Instance ID for cluster coordination |
WAF_ADMIN_AUTH |
true | Require authentication for Admin API |
WAF_EXPOSE_HEADERS |
false | Expose debug headers in responses |
The WAF automatically seeds a default admin user on startup when environment variables are configured:
| Variable | Required | Default | Description |
|---|---|---|---|
WAF_ADMIN_SALT |
Yes | - | 32+ character random salt for password hashing |
WAF_ADMIN_PASSWORD |
No | changeme | Initial admin password |
Helm Configuration:
openresty:
adminSeed:
enabled: true
salt:
existingSecret: "" # Use existing secret in production
value: "" # Or provide salt directly (32+ chars)
password:
existingSecret: ""
value: "changeme" # Change immediately after deploymentSecurity Notes:
- Change default password immediately after deployment
- Use
existingSecretin production for salt management - Admin user is only created if not already present in Redis
| Endpoint | Method | Description |
|---|---|---|
/api/auth/login |
POST | Login with credentials |
/api/auth/logout |
POST | End session |
/api/auth/verify |
GET | Verify session |
/api/auth/change-password |
POST | Change password |
All endpoints require authentication.
| Endpoint | Method | Description |
|---|---|---|
/api/status |
GET | WAF status |
/api/metrics |
GET | Metrics summary |
/api/sync |
POST | Force Redis sync |
| Endpoint | Method | Description |
|---|---|---|
/api/keywords/blocked |
GET/POST/DELETE | Blocked keywords |
/api/keywords/flagged |
GET/POST/PUT/DELETE | Flagged keywords |
| Endpoint | Method | Description |
|---|---|---|
/api/vhosts |
GET/POST | List/create vhosts |
/api/vhosts/{id} |
GET/PUT/DELETE | Manage vhost |
/api/vhosts/{id}/enable |
POST | Enable vhost |
/api/vhosts/{id}/disable |
POST | Disable vhost |
| Endpoint | Method | Description |
|---|---|---|
/api/endpoints |
GET/POST | List/create endpoints |
/api/endpoints/{id} |
GET/PUT/DELETE | Manage endpoint |
/api/endpoints/{id}/fields |
GET | Get learned fields |
| Endpoint | Method | Description |
|---|---|---|
/api/timing/config |
GET/PUT | Timing token config |
/api/geoip/config |
GET/PUT | GeoIP config |
/api/geoip/lookup |
GET | Lookup IP location |
/api/reputation/config |
GET/PUT | IP reputation config |
/api/reputation/check |
GET | Check IP reputation |
/api/reputation/blocklist |
GET/POST/DELETE | Local blocklist |
| Endpoint | Method | Description |
|---|---|---|
/api/captcha/providers |
GET/POST | List/create providers |
/api/captcha/providers/{id} |
GET/PUT/DELETE | Manage provider |
/api/captcha/config |
GET/PUT | Global CAPTCHA settings |
| Endpoint | Method | Description |
|---|---|---|
/api/webhooks/config |
GET/PUT | Webhook config |
/api/webhooks/test |
POST | Test webhook |
/api/bulk/export/{type} |
GET | Export data |
/api/bulk/import/{type} |
POST | Import data |
| Header | Description |
|---|---|
X-Spam-Score |
Calculated spam score |
X-Spam-Flags |
Triggered detection flags |
X-Form-Hash |
Content hash |
X-Blocked |
Whether request was blocked |
X-Block-Reason |
Reason for blocking |
X-GeoIP-Country |
Detected country code |
X-WAF-Mode |
Current WAF mode |
forms-waf/
├── admin-ui/ # React Admin Dashboard
│ └── src/
│ ├── pages/
│ │ ├── security/ # Timing, GeoIP, Reputation
│ │ ├── captcha/ # CAPTCHA providers/settings
│ │ ├── webhooks/ # Webhook configuration
│ │ └── bulk/ # Import/export
├── openresty/
│ ├── Dockerfile
│ └── lua/
│ ├── waf_handler.lua # Main WAF orchestrator (1100+ lines)
│ ├── form_parser.lua # Multipart/JSON/urlencoded parsing
│ ├── content_hasher.lua # SHA256 content hashing
│ ├── keyword_filter.lua # Pattern and keyword scanning
│ ├── vhost_resolver.lua # 3-level config hierarchy
│ ├── vhost_matcher.lua # Hostname matching
│ ├── endpoint_matcher.lua # Path/method matching
│ ├── behavioral_tracker.lua # ML-based anomaly detection
│ ├── instance_coordinator.lua # Leader election and cluster coordination
│ ├── metrics.lua # Prometheus metrics + global aggregation
│ ├── redis_sync.lua # Bidirectional Redis sync
│ ├── timing_token.lua # Form timing detection
│ ├── geoip.lua # GeoIP restrictions
│ ├── ip_reputation.lua # IP reputation checks
│ ├── captcha_handler.lua # CAPTCHA integration
│ ├── captcha_providers.lua # Provider implementations
│ ├── webhooks.lua # Webhook notifications
│ ├── field_learner.lua # Field learning system
│ ├── admin_auth.lua # Session authentication
│ ├── rbac.lua # Role-based access control
│ ├── sso_ldap.lua # LDAP SSO integration
│ ├── sso_oidc.lua # OpenID Connect integration
│ └── api_handlers/ # 18 modular API handlers
│ ├── system.lua # /status, /metrics, /sync
│ ├── users.lua # User management
│ ├── vhosts.lua # Virtual host CRUD
│ ├── endpoints.lua # Endpoint configuration
│ ├── keywords.lua # Keyword management
│ ├── behavioral.lua # Behavioral tracking API
│ ├── cluster.lua # Cluster status API
│ └── ... # 11 more handlers
├── haproxy/
│ ├── Dockerfile
│ └── haproxy.cfg
├── helm/forms-waf/ # Helm chart
├── docs/ # Documentation
│ ├── ARCHITECTURE.md # Technical architecture
│ ├── RBAC.md # Role-based access control
│ ├── BEHAVIORAL_TRACKING.md # ML anomaly detection
│ ├── CLUSTER_COORDINATION.md # Leader election
│ ├── METRICS_AGGREGATION.md # Global metrics
│ └── SSO_*.md # SSO integration guides
└── scripts/
├── test-waf.sh
└── load-test.sh
- Redis HA: Use Redis Sentinel or Cluster
- TLS: Enable TLS for all external communication
- Admin Security:
- Change default password immediately
- Restrict Admin API access via network policies
- GeoIP Databases: Set up automatic updates for MaxMind databases
- Monitoring: Configure Prometheus alerts for:
- High block rates
- Unusual traffic patterns
- API errors
- Logging: Ship audit logs to centralized logging system
- Rate Limits: Tune based on expected traffic patterns
GeoIP not working:
- Verify MaxMind databases are mounted at
/usr/share/GeoIP/ - Check logs for "geoip: mmdb library not available"
CAPTCHA verification failing:
- Verify provider credentials in Admin UI
- Check network connectivity to CAPTCHA provider APIs
- Review timeout settings
High false positive rate:
- Review spam score thresholds
- Check keyword lists for overly broad terms
- Enable monitoring mode to analyze before blocking
Enable debug headers to troubleshoot:
# docker-compose.yml
environment:
- WAF_EXPOSE_HEADERS=trueThen check response headers:
curl -v -X POST http://localhost:8080/submit -d "test=data"MIT