Conversation
Don't use form fields for fingerprint profiling.
There was a problem hiding this comment.
Pull request overview
This PR refactors fingerprint profiling to remove form field names from fingerprints and introduces a new header consistency validation system for improved bot detection. The changes enhance the WAF's ability to identify spoofed browsers and suspicious clients by analyzing HTTP header patterns.
- Removed
include_field_namesparameter from fingerprint generation to avoid using form data in fingerprints - Added new
header_consistency.luamodule with User-Agent parsing and browser header validation - Introduced multiple new fingerprint profiles for monitoring bots, modern browsers, mobile apps, API clients, and suspicious patterns
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 16 comments.
Show a summary per file
| File | Description |
|---|---|
| openresty/lua/header_consistency.lua | New module for User-Agent parsing and header consistency validation using lua-resty-woothee |
| openresty/lua/fingerprint_profiles.lua | Removed form field fingerprinting, added 8 new built-in profiles, added request context classification functions |
| openresty/lua/waf_handler.lua | Integrated header consistency checks into request processing pipeline |
| openresty/lua/api_handlers/fingerprint_profiles.lua | Removed include_field_names from default profile configuration |
| openresty/Dockerfile | Added lua-resty-woothee dependency installation via luarocks |
| admin-ui/src/pages/security/FingerprintProfiles.tsx | Removed UI form field for include_field_names toggle |
| admin-ui/src/api/types.ts | Removed include_field_names from TypeScript interface |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| matching = { | ||
| conditions = { | ||
| -- Chrome 80+ should have Sec-Fetch headers (introduced in Chrome 76) | ||
| { header = "User-Agent", condition = "matches", pattern = "Chrome/([89][0-9]|1[0-2][0-9])" }, |
There was a problem hiding this comment.
The regex pattern Chrome/([89][0-9]|1[0-2][0-9]) matches Chrome versions 80-129, but Chrome has already surpassed version 129 (current versions are 130+). This pattern should be updated to include current Chrome versions. Consider using a pattern like Chrome/([89][0-9]|1[0-9]{2}|[2-9][0-9]{2}) to match Chrome 80+ more comprehensively, or adjust the upper bound to a more future-proof range.
| { header = "User-Agent", condition = "matches", pattern = "Chrome/([89][0-9]|1[0-2][0-9])" }, | |
| { header = "User-Agent", condition = "matches", pattern = "Chrome/([89][0-9]|1[0-9]{2}|[2-9][0-9]{2})" }, |
| -- Java libraries | ||
| "java|httpclient|okhttp|apache-httpclient|spring-resttemplate|restassured|" .. | ||
| -- PHP libraries | ||
| "guzzle|guzzlehttp|symfony.*http|" .. |
There was a problem hiding this comment.
The pattern symfony.*http uses an unescaped dot followed by an asterisk, which matches any characters. If you intend to match literal dots in package names like "symfony.http" or "symfony-http", consider using symfony[.-]http or escaping the dot as symfony\\..*http if you want to match "symfony." followed by any characters and then "http".
| "guzzle|guzzlehttp|symfony.*http|" .. | |
| "guzzle|guzzlehttp|symfony[./-].*http|" .. |
| "jmeter|apache-jmeter|wrk|ab\\/|apachebench|bombardier|k6|locust|artillery|vegeta|" .. | ||
| -- API testing tools | ||
| "postman|insomnia|httpie|paw\\/|" .. |
There was a problem hiding this comment.
The pattern ab\/ uses an escaped forward slash which is unnecessary in Lua regex patterns. The forward slash doesn't need escaping in Lua (unlike in JavaScript or other languages where regex literals use / delimiters). You can simplify this to ab/ for better readability.
| "jmeter|apache-jmeter|wrk|ab\\/|apachebench|bombardier|k6|locust|artillery|vegeta|" .. | |
| -- API testing tools | |
| "postman|insomnia|httpie|paw\\/|" .. | |
| "jmeter|apache-jmeter|wrk|ab/|apachebench|bombardier|k6|locust|artillery|vegeta|" .. | |
| -- API testing tools | |
| "postman|insomnia|httpie|paw/|" .. |
| } | ||
|
|
||
| -- Expected header profiles per browser family | ||
| -- Based on actual browser behavior in 2024/2025 |
There was a problem hiding this comment.
The comment states "Based on actual browser behavior in 2024/2025" but this creates a maintenance issue as the comment will quickly become outdated. Consider either removing the year references or updating to say "Based on current browser behavior" or "Based on modern browser behavior (as of 2024/2025)".
| -- Based on actual browser behavior in 2024/2025 | |
| -- Based on current browser behavior |
| { header = "User-Agent", condition = "matches", pattern = "(?i)(curl|wget|" .. | ||
| -- Python libraries | ||
| "python-requests|python-urllib|aiohttp|httpx|urllib3|requests-html|scrapy|beautifulsoup|" .. | ||
| -- JavaScript/Node libraries | ||
| "axios|node-fetch|superagent|got|undici|cheerio|" .. | ||
| -- Java libraries | ||
| "java|httpclient|okhttp|apache-httpclient|spring-resttemplate|restassured|" .. | ||
| -- PHP libraries | ||
| "guzzle|guzzlehttp|symfony.*http|" .. | ||
| -- Go libraries | ||
| "go-http-client|fasthttp|go-resty|" .. | ||
| -- Rust libraries | ||
| "reqwest|hyper-client|" .. | ||
| -- Ruby/Perl | ||
| "ruby|perl|libwww|mechanize|httparty|" .. | ||
| -- Load testing tools | ||
| "jmeter|apache-jmeter|wrk|ab\\/|apachebench|bombardier|k6|locust|artillery|vegeta|" .. | ||
| -- API testing tools | ||
| "postman|insomnia|httpie|paw\\/|" .. | ||
| -- Generic patterns | ||
| "http-client|httpclient)" }, |
There was a problem hiding this comment.
The User-Agent pattern contains a very long alternation list with 40+ alternatives. While functional, this could impact regex matching performance. Consider breaking this into multiple, more specific profiles or using a more optimized pattern structure. For example, group related patterns or use character classes where appropriate.
| { header = "User-Agent", condition = "matches", pattern = "(?i)(curl|wget|" .. | |
| -- Python libraries | |
| "python-requests|python-urllib|aiohttp|httpx|urllib3|requests-html|scrapy|beautifulsoup|" .. | |
| -- JavaScript/Node libraries | |
| "axios|node-fetch|superagent|got|undici|cheerio|" .. | |
| -- Java libraries | |
| "java|httpclient|okhttp|apache-httpclient|spring-resttemplate|restassured|" .. | |
| -- PHP libraries | |
| "guzzle|guzzlehttp|symfony.*http|" .. | |
| -- Go libraries | |
| "go-http-client|fasthttp|go-resty|" .. | |
| -- Rust libraries | |
| "reqwest|hyper-client|" .. | |
| -- Ruby/Perl | |
| "ruby|perl|libwww|mechanize|httparty|" .. | |
| -- Load testing tools | |
| "jmeter|apache-jmeter|wrk|ab\\/|apachebench|bombardier|k6|locust|artillery|vegeta|" .. | |
| -- API testing tools | |
| "postman|insomnia|httpie|paw\\/|" .. | |
| -- Generic patterns | |
| "http-client|httpclient)" }, | |
| -- Core CLI tools | |
| { header = "User-Agent", condition = "matches", pattern = "(?i)(curl|wget)" }, | |
| -- Python libraries | |
| { header = "User-Agent", condition = "matches", pattern = "(?i)(python-requests|python-urllib|aiohttp|httpx|urllib3|requests-html|scrapy|beautifulsoup)" }, | |
| -- JavaScript/Node libraries | |
| { header = "User-Agent", condition = "matches", pattern = "(?i)(axios|node-fetch|superagent|got|undici|cheerio)" }, | |
| -- Java libraries | |
| { header = "User-Agent", condition = "matches", pattern = "(?i)(java|httpclient|okhttp|apache-httpclient|spring-resttemplate|restassured)" }, | |
| -- PHP and Go libraries | |
| { header = "User-Agent", condition = "matches", pattern = "(?i)(guzzle|guzzlehttp|symfony.*http|go-http-client|fasthttp|go-resty)" }, | |
| -- Rust, Ruby, Perl and related HTTP libraries | |
| { header = "User-Agent", condition = "matches", pattern = "(?i)(reqwest|hyper-client|ruby|perl|libwww|mechanize|httparty)" }, | |
| -- Load testing tools | |
| { header = "User-Agent", condition = "matches", pattern = "(?i)(jmeter|apache-jmeter|wrk|ab\\/|apachebench|bombardier|k6|locust|artillery|vegeta)" }, | |
| -- API testing tools | |
| { header = "User-Agent", condition = "matches", pattern = "(?i)(postman|insomnia|httpie|paw\\/)" }, | |
| -- Generic HTTP client patterns | |
| { header = "User-Agent", condition = "matches", pattern = "(?i)(http-client|httpclient)" }, |
| -- Extract numeric version | ||
| local version = nil | ||
| if result.version and result.version ~= "UNKNOWN" then | ||
| version = tonumber(result.version:match("^(%d+)")) |
There was a problem hiding this comment.
The version extraction uses tonumber(result.version:match("^(%d+)")) which could fail if result.version is a non-string value. While the check result.version ~= "UNKNOWN" suggests it's a string, consider adding a type check or using tostring(result.version) before calling match to prevent potential runtime errors if the woothee library returns an unexpected value type.
| version = tonumber(result.version:match("^(%d+)")) | |
| local version_str = tostring(result.version) | |
| local major = version_str:match("^(%d+)") | |
| if major then | |
| version = tonumber(major) | |
| end |
| -- Mobile SDKs | ||
| "react.native|flutter|expo|cordova|ionic|capacitor|" .. | ||
| -- Native app patterns (app name followed by version) | ||
| "[a-z]+app\\/[0-9])" }, |
There was a problem hiding this comment.
The pattern [a-z]+app\\/[0-9] at the end will only match lowercase letters followed by "app". This means it won't match app names like "MyApp" or "TestApp" that use uppercase letters. If you intend to match app names with mixed case, use [a-zA-Z]+app/[0-9] or add the case-insensitive flag (which is already present with (?i) at the start). However, note that the case-insensitive flag should apply to the entire pattern, so this should work as intended. The escaped slash is also unnecessary.
| local function get_header(ngx_vars, header_name) | ||
| local var_name = "http_" .. header_name:lower():gsub("-", "_") | ||
| return ngx_vars[var_name] | ||
| end |
There was a problem hiding this comment.
The get_header function duplicates the logic of get_header_value function from fingerprint_profiles.lua. Both functions convert header names to nginx variable format identically. Consider extracting this to a shared utility module to avoid code duplication and ensure consistent behavior across modules.
| -- Load testing tools | ||
| "jmeter|apache-jmeter|wrk|ab\\/|apachebench|bombardier|k6|locust|artillery|vegeta|" .. | ||
| -- API testing tools | ||
| "postman|insomnia|httpie|paw\\/|" .. |
There was a problem hiding this comment.
The pattern paw\/ uses an escaped forward slash which is unnecessary in Lua regex patterns. You can simplify this to paw/ for better readability.
| "postman|insomnia|httpie|paw\\/|" .. | |
| "postman|insomnia|httpie|paw/|" .. |
| opm get ledgetech/lua-resty-http && \ | ||
| opm get anjia0532/lua-resty-maxminddb && \ | ||
| opm get zmartzone/lua-resty-openidc && \ | ||
| luarocks-5.1 install lua-resty-woothee && \ |
There was a problem hiding this comment.
The new luarocks-5.1 install lua-resty-woothee step pulls and executes a third-party Lua module at build time without any pinning (version, commit hash, or checksum) or integrity verification, which exposes the image build to a supply-chain attack if the LuaRocks index or package is compromised or replaced. Because this code executes during the Docker build with access to the build context and potentially sensitive secrets (used by CI/CD or future extensions), an attacker controlling the fetched package could run arbitrary code, alter artifacts, or exfiltrate secrets. To mitigate this, pin lua-resty-woothee to a specific, vetted version or content hash and, where possible, verify its integrity (e.g., via a checksum or vendoring) rather than installing the latest mutable release.
Don't use form fields for fingerprint profiling.