Forms WAF - Advanced Spam Protection System

A comprehensive, multi-layer spam protection system for web forms using OpenResty (Lua) for intelligent form analysis and HAProxy for global rate limiting. Features a modern React-based Admin UI for real-time configuration management.

Features at a Glance

Category	Features
Content Analysis	Keyword filtering, content hashing, honeypot detection, disposable email blocking, link analysis
Behavioral Detection	Form timing tokens, field anomaly detection, submission fingerprinting
Threat Intelligence	IP reputation (AbuseIPDB), GeoIP restrictions, datacenter/VPN detection
Bot Protection	CAPTCHA integration (reCAPTCHA, hCaptcha, Turnstile), rate limiting
Operations	Webhook notifications, audit logging, bulk import/export, Prometheus metrics
Multi-tenancy	Virtual hosts, per-endpoint configuration, field learning

Architecture Overview

Client → Ingress → OpenResty → HAProxy → Backend
                      ↓            ↓
                    Redis    (Stick-table sync)
                      ↑
                  Admin UI (port 3000)
                  Admin API (port 8082)

Components

OpenResty (Port 8080/8081/8082) - Intelligent form analysis engine
- Multi-format parsing (multipart, urlencoded, JSON)
- Content-based spam scoring with 20+ detection rules
- Redis-backed dynamic configuration
- Per-vhost and per-endpoint customization
HAProxy - Global rate limiting with stick-table sync
- StatefulSet with automatic peer discovery
- Per-hash, per-IP, and per-fingerprint rate limiting
- Prometheus metrics export
Redis - Dynamic configuration store
- Keywords, hashes, IP lists
- Virtual host and endpoint configurations
- Session management for Admin UI
Admin UI - React-based management dashboard
- Real-time configuration updates
- Visual management of all WAF features
- Role-based authentication

Cluster Coordination

Multi-instance deployments use Redis-based coordination:

Instance Registration - Each pod registers with heartbeat (15s interval)
Leader Election - Single leader via Redis SET NX PX
Global Metrics - Leader aggregates metrics from all instances
Automatic Cleanup - Stale instances removed after 5 minutes

See Cluster Coordination Guide for details.

Quick Start

Local Development (Docker Compose)

# Start all services
docker-compose up -d

# Initialize Redis with default data
docker-compose exec redis sh /init-data.sh

# Access Admin UI at http://localhost:3000
# Default credentials: admin / changeme

# Test the WAF
./scripts/test-waf.sh http://localhost:8080

Kubernetes Deployment (Helm)

helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update

cd helm/forms-waf
helm dependency build
helm install forms-waf . -n forms-waf --create-namespace

Feature Details

1. Content Analysis

Keyword Filtering

Blocked Keywords: Instant rejection (e.g., "viagra", "casino")
Flagged Keywords: Score-based with configurable weights
Case-insensitive matching with word boundary detection

Content Hashing

SHA256 hashing of normalized form content
Duplicate submission detection across all users
Configurable thresholds for blocking repeated content

Honeypot Field Detection

Automatically detects and scores submissions that fill hidden honeypot fields.

{
  "honeypot_fields": ["website", "url", "phone2"],
  "honeypot_action": "flag",
  "honeypot_score": 50
}

Disposable Email Detection

Blocks or flags submissions using temporary email services:

250+ built-in disposable domains
Custom domain blocklist via Redis
Configurable action (block/flag/monitor)

Enhanced Link Analysis

URL Shortener Detection: Flags bit.ly, tinyurl.com, etc. (25+ services)
Suspicious TLD Detection: Flags .xyz, .top, .click, etc.
Excessive Link Detection: Scores based on link count

2. Behavioral Detection

Form Timing Tokens

Detects bot submissions by measuring time between form load and submission:

Timing	Score	Reason
No cookie	+30	Direct POST without loading form
< 2 seconds	+40	Too fast for human
< 5 seconds	+20	Suspiciously fast
> 5 seconds	+0	Normal behavior

Enable via Admin UI: Security → Form Timing

Field Anomaly Detection

Detects suspicious patterns in form submissions:

Identical field lengths (bot-generated)
Sequential/incremental values
ALL CAPS submissions
Test data patterns ("test", "asdf", "123")
Unusually long field values

Submission Fingerprinting

Creates client fingerprints based on:

User-Agent string
Accept-Language header
Field names submitted
Request characteristics

Used for cross-request correlation and flood detection.

ML-Based Behavioral Tracking

Advanced statistical analysis of form submission patterns:

Flow-based monitoring across multi-page forms
Time-series aggregation (hour/day/week/month/year)
Duration histograms for bot detection
Baseline calculation with z-score anomaly detection
HyperLogLog for unique IP counting

See Behavioral Tracking Guide for details.

3. Threat Intelligence

IP Reputation

Checks IP addresses against multiple sources:

Provider	Description
Local Blocklist	Redis-based manual blocklist
AbuseIPDB	External API (requires API key)
Custom Webhook	Your internal reputation service

Enable via Admin UI: Security → IP Reputation

Configuration example:

{
  "enabled": true,
  "abuseipdb": {
    "enabled": true,
    "api_key": "your-api-key",
    "min_confidence": 25
  },
  "block_score": 80,
  "flag_score": 50
}

GeoIP Restrictions

Country and ASN-based access control using MaxMind GeoLite2 databases:

Country Blocking: Block specific countries
Country Allowlist: Only allow specific countries
ASN Blocking: Block specific networks
Datacenter Detection: Flag/block cloud provider IPs

Enable via Admin UI: Security → GeoIP

Setup Requirements:

Download GeoLite2 databases (free registration required)
Mount to /usr/share/GeoIP/:
- GeoLite2-Country.mmdb
- GeoLite2-ASN.mmdb

# docker-compose.yml
volumes:
  - ./geoip:/usr/share/GeoIP:ro

4. Bot Protection

CAPTCHA Integration

Supports multiple CAPTCHA providers with automatic fallback:

Provider	Features
reCAPTCHA v2	Checkbox challenge
reCAPTCHA v3	Invisible scoring
hCaptcha	Privacy-focused alternative
Cloudflare Turnstile	Frictionless verification

Configure via Admin UI: CAPTCHA → Providers and CAPTCHA → Settings

Features:

Per-endpoint CAPTCHA requirements
Trust tokens for verified users
Configurable score thresholds
Fallback chain if primary fails

Rate Limiting

Multi-layer rate limiting:

Layer	Scope	Default
OpenResty	Per-endpoint	Configurable
HAProxy	Per-IP	30/min
HAProxy	Per-hash	10/min
HAProxy	Per-fingerprint	50/min

5. Operational Features

Webhook Notifications

Send real-time notifications for WAF events:

{
  "enabled": true,
  "urls": ["https://your-webhook.example.com/waf-events"],
  "events": ["blocked", "flagged", "captcha_required"],
  "batch_size": 10,
  "batch_interval_ms": 5000
}

Configure via Admin UI: Operations → Webhooks

Audit Logging

JSON-formatted structured logs for security events:

{
  "@timestamp": "2024-01-15T10:30:00Z",
  "event_type": "request_blocked",
  "client_ip": "192.168.1.100",
  "spam_score": 85,
  "flags": ["keyword:blocked:viagra", "timing:too_fast"],
  "vhost_id": "example-com",
  "endpoint_id": "contact-form"
}

Bulk Import/Export

Import and export configurations via Admin UI: Operations → Bulk

Supported data types:

Blocked/flagged keywords
IP allowlist
Blocked hashes

Formats: JSON, CSV (keywords only)

Prometheus Metrics

Available at /metrics endpoint:

Metric	Description
`waf_requests_total`	Total requests by vhost, endpoint
`waf_requests_blocked_total`	Blocked requests
`waf_requests_monitored_total`	Monitored (would-block) requests
`waf_requests_allowed_total`	Allowed requests
`waf_spam_score_total`	Sum of spam scores
`waf_form_submissions_total`	Form submissions processed
`waf_shared_dict_bytes`	Shared dictionary memory usage

Global Metrics Aggregation: In multi-instance deployments, the leader aggregates metrics from all instances. The /api/metrics endpoint returns both local and global metrics side-by-side. See Metrics Aggregation Guide.

6. Multi-tenancy

Virtual Hosts

Configure per-domain WAF rules:

Exact hostname matching (example.com)
Wildcard support (*.example.com)
Per-vhost thresholds and routing
WAF modes: blocking, monitoring, passthrough

Configure via Admin UI: Virtual Hosts

Endpoint Configuration

Per-endpoint customization:

Setting	Description
WAF Mode	blocking, monitoring, passthrough, strict
Thresholds	Override global spam score limits
Rate Limits	Per-endpoint request limits
Field Validation	Required fields, max lengths
Expected Fields	Block unexpected field names
Honeypot Fields	Hidden fields to detect bots

Configure via Admin UI: Endpoints

Field Learning

Automatic field discovery from submissions:

Enable learning mode on endpoint
WAF records field names and infers types
Review learned fields in Admin UI
Mark fields as expected or honeypot
Enable validation to block unexpected fields

Configuration Reference

Redis Keys

Key	Type	Description
`waf:keywords:blocked`	SET	Keywords that trigger immediate block
`waf:keywords:flagged`	SET	Keywords with scores (`keyword:score`)
`waf:hashes:blocked`	SET	Content hashes to block
`waf:config:thresholds`	HASH	Global thresholds
`waf:whitelist:ips`	SET	IPs to bypass filtering
`waf:vhosts:config:{id}`	STRING	Virtual host config (JSON)
`waf:endpoints:config:{id}`	STRING	Endpoint config (JSON)
`waf:config:timing_token`	STRING	Timing token config (JSON)
`waf:config:geoip`	STRING	GeoIP config (JSON)
`waf:config:ip_reputation`	STRING	IP reputation config (JSON)
`waf:config:captcha`	STRING	CAPTCHA config (JSON)
`waf:config:webhooks`	STRING	Webhook config (JSON)
`waf:reputation:blocked_ips`	SET	Local IP blocklist
`waf:disposable_domains`	SET	Custom disposable email domains

Global Thresholds

Threshold	Default	Description
`spam_score_block`	80	Score to block immediately
`spam_score_flag`	50	Score to flag for monitoring
`hash_count_block`	10	Block after N identical submissions
`ip_rate_limit`	30	Max submissions/minute per IP
`ip_daily_limit`	500	Max submissions/day per IP

Environment Variables

Variable	Default	Description
`REDIS_HOST`	redis	Redis hostname
`REDIS_PORT`	6379	Redis port
`REDIS_PASSWORD`	-	Redis password (optional)
`REDIS_DB`	0	Redis database number
`HOSTNAME`	-	Instance ID for cluster coordination
`WAF_ADMIN_AUTH`	true	Require authentication for Admin API
`WAF_EXPOSE_HEADERS`	false	Expose debug headers in responses

Admin User Seeding

The WAF automatically seeds a default admin user on startup when environment variables are configured:

Variable	Required	Default	Description
`WAF_ADMIN_SALT`	Yes	-	32+ character random salt for password hashing
`WAF_ADMIN_PASSWORD`	No	changeme	Initial admin password

Helm Configuration:

openresty:
  adminSeed:
    enabled: true
    salt:
      existingSecret: ""  # Use existing secret in production
      value: ""           # Or provide salt directly (32+ chars)
    password:
      existingSecret: ""
      value: "changeme"   # Change immediately after deployment

Security Notes:

Change default password immediately after deployment
Use existingSecret in production for salt management
Admin user is only created if not already present in Redis

API Reference

Authentication (Port 8082)

Endpoint	Method	Description
`/api/auth/login`	POST	Login with credentials
`/api/auth/logout`	POST	End session
`/api/auth/verify`	GET	Verify session
`/api/auth/change-password`	POST	Change password

WAF Admin API (Port 8082)

All endpoints require authentication.

Core

Endpoint	Method	Description
`/api/status`	GET	WAF status
`/api/metrics`	GET	Metrics summary
`/api/sync`	POST	Force Redis sync

Keywords

Endpoint	Method	Description
`/api/keywords/blocked`	GET/POST/DELETE	Blocked keywords
`/api/keywords/flagged`	GET/POST/PUT/DELETE	Flagged keywords

Virtual Hosts

Endpoint	Method	Description
`/api/vhosts`	GET/POST	List/create vhosts
`/api/vhosts/{id}`	GET/PUT/DELETE	Manage vhost
`/api/vhosts/{id}/enable`	POST	Enable vhost
`/api/vhosts/{id}/disable`	POST	Disable vhost

Endpoints

Endpoint	Method	Description
`/api/endpoints`	GET/POST	List/create endpoints
`/api/endpoints/{id}`	GET/PUT/DELETE	Manage endpoint
`/api/endpoints/{id}/fields`	GET	Get learned fields

Security Features

Endpoint	Method	Description
`/api/timing/config`	GET/PUT	Timing token config
`/api/geoip/config`	GET/PUT	GeoIP config
`/api/geoip/lookup`	GET	Lookup IP location
`/api/reputation/config`	GET/PUT	IP reputation config
`/api/reputation/check`	GET	Check IP reputation
`/api/reputation/blocklist`	GET/POST/DELETE	Local blocklist

CAPTCHA

Endpoint	Method	Description
`/api/captcha/providers`	GET/POST	List/create providers
`/api/captcha/providers/{id}`	GET/PUT/DELETE	Manage provider
`/api/captcha/config`	GET/PUT	Global CAPTCHA settings

Operations

Endpoint	Method	Description
`/api/webhooks/config`	GET/PUT	Webhook config
`/api/webhooks/test`	POST	Test webhook
`/api/bulk/export/{type}`	GET	Export data
`/api/bulk/import/{type}`	POST	Import data

Response Headers

Debug Headers (when `WAF_EXPOSE_HEADERS=true`)

Header	Description
`X-Spam-Score`	Calculated spam score
`X-Spam-Flags`	Triggered detection flags
`X-Form-Hash`	Content hash
`X-Blocked`	Whether request was blocked
`X-Block-Reason`	Reason for blocking
`X-GeoIP-Country`	Detected country code
`X-WAF-Mode`	Current WAF mode

Project Structure

forms-waf/
├── admin-ui/                 # React Admin Dashboard
│   └── src/
│       ├── pages/
│       │   ├── security/     # Timing, GeoIP, Reputation
│       │   ├── captcha/      # CAPTCHA providers/settings
│       │   ├── webhooks/     # Webhook configuration
│       │   └── bulk/         # Import/export
├── openresty/
│   ├── Dockerfile
│   └── lua/
│       ├── waf_handler.lua         # Main WAF orchestrator (1100+ lines)
│       ├── form_parser.lua         # Multipart/JSON/urlencoded parsing
│       ├── content_hasher.lua      # SHA256 content hashing
│       ├── keyword_filter.lua      # Pattern and keyword scanning
│       ├── vhost_resolver.lua      # 3-level config hierarchy
│       ├── vhost_matcher.lua       # Hostname matching
│       ├── endpoint_matcher.lua    # Path/method matching
│       ├── behavioral_tracker.lua  # ML-based anomaly detection
│       ├── instance_coordinator.lua # Leader election and cluster coordination
│       ├── metrics.lua             # Prometheus metrics + global aggregation
│       ├── redis_sync.lua          # Bidirectional Redis sync
│       ├── timing_token.lua        # Form timing detection
│       ├── geoip.lua               # GeoIP restrictions
│       ├── ip_reputation.lua       # IP reputation checks
│       ├── captcha_handler.lua     # CAPTCHA integration
│       ├── captcha_providers.lua   # Provider implementations
│       ├── webhooks.lua            # Webhook notifications
│       ├── field_learner.lua       # Field learning system
│       ├── admin_auth.lua          # Session authentication
│       ├── rbac.lua                # Role-based access control
│       ├── sso_ldap.lua            # LDAP SSO integration
│       ├── sso_oidc.lua            # OpenID Connect integration
│       └── api_handlers/           # 18 modular API handlers
│           ├── system.lua          # /status, /metrics, /sync
│           ├── users.lua           # User management
│           ├── vhosts.lua          # Virtual host CRUD
│           ├── endpoints.lua       # Endpoint configuration
│           ├── keywords.lua        # Keyword management
│           ├── behavioral.lua      # Behavioral tracking API
│           ├── cluster.lua         # Cluster status API
│           └── ...                 # 11 more handlers
├── haproxy/
│   ├── Dockerfile
│   └── haproxy.cfg
├── helm/forms-waf/           # Helm chart
├── docs/                     # Documentation
│   ├── ARCHITECTURE.md       # Technical architecture
│   ├── RBAC.md              # Role-based access control
│   ├── BEHAVIORAL_TRACKING.md # ML anomaly detection
│   ├── CLUSTER_COORDINATION.md # Leader election
│   ├── METRICS_AGGREGATION.md  # Global metrics
│   └── SSO_*.md             # SSO integration guides
└── scripts/
    ├── test-waf.sh
    └── load-test.sh

Production Considerations

Redis HA: Use Redis Sentinel or Cluster
TLS: Enable TLS for all external communication
Admin Security:
- Change default password immediately
- Restrict Admin API access via network policies
GeoIP Databases: Set up automatic updates for MaxMind databases
Monitoring: Configure Prometheus alerts for:
- High block rates
- Unusual traffic patterns
- API errors
Logging: Ship audit logs to centralized logging system
Rate Limits: Tune based on expected traffic patterns

Troubleshooting

Common Issues

GeoIP not working:

Verify MaxMind databases are mounted at /usr/share/GeoIP/
Check logs for "geoip: mmdb library not available"

CAPTCHA verification failing:

Verify provider credentials in Admin UI
Check network connectivity to CAPTCHA provider APIs
Review timeout settings

High false positive rate:

Review spam score thresholds
Check keyword lists for overly broad terms
Enable monitoring mode to analyze before blocking

Debug Mode

Enable debug headers to troubleshoot:

# docker-compose.yml
environment:
  - WAF_EXPOSE_HEADERS=true

Then check response headers:

curl -v -X POST http://localhost:8080/submit -d "test=data"

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
.github/workflows		.github/workflows
admin-ui		admin-ui
docs		docs
haproxy		haproxy
helm/forms-waf		helm/forms-waf
kd-templates		kd-templates
openresty		openresty
redis		redis
scripts		scripts
.dockerignore		.dockerignore
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml

License

dobrevit/forms-waf

Folders and files

Latest commit

History

Repository files navigation

Forms WAF - Advanced Spam Protection System

Features at a Glance

Architecture Overview

Components

Cluster Coordination

Quick Start

Local Development (Docker Compose)

Kubernetes Deployment (Helm)

Feature Details

1. Content Analysis

Keyword Filtering

Content Hashing

Honeypot Field Detection

Disposable Email Detection

Enhanced Link Analysis

2. Behavioral Detection

Form Timing Tokens

Field Anomaly Detection

Submission Fingerprinting

ML-Based Behavioral Tracking

3. Threat Intelligence

IP Reputation

GeoIP Restrictions

4. Bot Protection

CAPTCHA Integration

Rate Limiting

5. Operational Features

Webhook Notifications

Audit Logging

Bulk Import/Export

Prometheus Metrics

6. Multi-tenancy

Virtual Hosts

Endpoint Configuration

Field Learning

Configuration Reference

Redis Keys

Global Thresholds

Environment Variables

Admin User Seeding

API Reference

Authentication (Port 8082)

WAF Admin API (Port 8082)

Core

Keywords

Virtual Hosts

Endpoints

Security Features

CAPTCHA

Operations

Response Headers

Debug Headers (when WAF_EXPOSE_HEADERS=true)

Project Structure

Production Considerations

Troubleshooting

Common Issues

Debug Mode

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors 4

Uh oh!

Languages

Debug Headers (when `WAF_EXPOSE_HEADERS=true`)

Packages