security: add input validation, rate limiting, and prototype pollution guard#718
Conversation
…n guard - Add src/utils/message_validator.js: validates and sanitizes Discord/Minecraft messages and usernames; detects command injection, Unicode exploits, and protocol-level attack strings - Add src/utils/rate_limiter.js: per-user sliding-window rate limiter with automatic stale-entry cleanup to prevent memory leaks - Add deepSanitize() to settings.js: recursively strips __proto__, constructor, and prototype keys from SETTINGS_JSON before merging, preventing prototype pollution via environment variables - Integrate message validation in src/agent/agent.js respondFunc: rejects invalid usernames and sanitises messages before they reach the LLM pipeline
There was a problem hiding this comment.
Pull request overview
This PR aims to harden inbound/untrusted inputs across the app by adding message sanitization utilities, a prototype-pollution guard for SETTINGS_JSON, and a (currently standalone) in-memory rate limiter, plus integrating Minecraft message/username validation into the agent’s inbound chat handling.
Changes:
- Added
src/utils/message_validator.jswith Discord/Minecraft message and username validation/sanitization helpers. - Added
src/utils/rate_limiter.jsimplementing a sliding-window per-user rate limiter with periodic stale-entry cleanup. - Integrated Minecraft message/username validation into
src/agent/agent.jsand addeddeepSanitize()tosettings.jsto strip prototype-polluting keys fromSETTINGS_JSONbefore merging.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| src/utils/rate_limiter.js | Introduces RateLimiter utility, but no integration/callers found in src/. |
| src/utils/message_validator.js | Adds validators/sanitizers; includes some validation-order issues and API/behavior mismatches with PR description. |
| src/agent/agent.js | Applies username/message validation and propagates sanitized Minecraft messages downstream. |
| settings.js | Adds recursive sanitization to prevent prototype pollution from SETTINGS_JSON before Object.assign. |
Comments suppressed due to low confidence (1)
src/agent/agent.js:24
- PR description says validators return
{ valid, message, cleanMessage }, but the implementation returns{ valid, sanitized, ... }(and the agent integration expects.sanitized). Either update the API to match the documented contract or update the PR description / JSDoc so callers have a consistent, stable shape.
import { validateMinecraftMessage, validateUsername } from '../utils/message_validator.js';
export class Agent {
async start(load_mem=false, init_message=null, count_id=0) {
this.last_sender = null;
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| export function validateDiscordMessage(message) { | ||
| if (!message) return { valid: false, error: 'Empty message' }; | ||
| if (typeof message !== 'string') return { valid: false, error: 'Message must be a string' }; | ||
| if (message.length > MAX_DISCORD_MESSAGE_LENGTH) { | ||
| return { valid: false, error: `Message exceeds ${MAX_DISCORD_MESSAGE_LENGTH} characters` }; | ||
| } |
There was a problem hiding this comment.
validateDiscordMessage checks !message before validating typeof message === 'string', so non-string falsy values (e.g. 0) will be reported as "Empty message" instead of a type error. Consider checking the type first, then treating empty/whitespace-only strings as empty after trimming.
| export function validateMinecraftMessage(message) { | ||
| if (!message) return { valid: false, error: 'Empty message' }; | ||
| if (typeof message !== 'string') return { valid: false, error: 'Message must be a string' }; | ||
| if (message.length > MAX_MINECRAFT_MESSAGE_LENGTH) { | ||
| return { valid: false, error: `Message exceeds ${MAX_MINECRAFT_MESSAGE_LENGTH} characters` }; | ||
| } |
There was a problem hiding this comment.
validateMinecraftMessage checks !message before checking typeof message === 'string', so non-string falsy inputs will be misclassified as "Empty message". Reorder the validations (type check first), and consider trimming/collapsing whitespace after replacing newlines to avoid producing leading spaces.
| if (!username) return { valid: false, error: 'Username is empty' }; | ||
| if (typeof username !== 'string') return { valid: false, error: 'Username must be a string' }; | ||
| if (!/^[a-zA-Z0-9_]{3,16}$/.test(username)) { |
There was a problem hiding this comment.
validateUsername checks !username before validating the input type, so a non-string falsy value would return "Username is empty" rather than "Username must be a string". Reorder validations to check typeof username first, then validate emptiness/format on the trimmed string.
| if (!username) return { valid: false, error: 'Username is empty' }; | |
| if (typeof username !== 'string') return { valid: false, error: 'Username must be a string' }; | |
| if (!/^[a-zA-Z0-9_]{3,16}$/.test(username)) { | |
| if (typeof username !== 'string') return { valid: false, error: 'Username must be a string' }; | |
| const trimmedUsername = username.trim(); | |
| if (!trimmedUsername) return { valid: false, error: 'Username is empty' }; | |
| if (!/^[a-zA-Z0-9_]{3,16}$/.test(trimmedUsername)) { |
| const cleanMessage = msgValidation.sanitized; | ||
|
|
||
| if (settings.only_chat_with.length > 0 && !settings.only_chat_with.includes(username)) return; | ||
| try { | ||
| if (ignore_messages.some((m) => message.startsWith(m))) return; | ||
| if (ignore_messages.some((m) => cleanMessage.startsWith(m))) return; | ||
|
|
There was a problem hiding this comment.
cleanMessage may start with a space after sanitization (e.g. when the raw message begins with \n/\r and those are replaced with spaces). That can bypass the ignore_messages.some(m => cleanMessage.startsWith(m)) filter. Consider trimming (or at least trimStart()) the sanitized message before startsWith checks and downstream processing.
| /** | ||
| * Simple in-memory rate limiter to prevent abuse. | ||
| * Includes automatic stale entry cleanup to prevent memory leaks. | ||
| */ | ||
|
|
||
| export class RateLimiter { | ||
| constructor(maxRequests = 5, windowMs = 60000, cleanupIntervalMs = 300000) { | ||
| this.maxRequests = maxRequests; // Max requests per window | ||
| this.windowMs = windowMs; // Time window in milliseconds | ||
| this.requests = new Map(); // userId → [timestamps] | ||
|
|
||
| // Periodically purge stale entries to prevent unbounded memory growth | ||
| this._cleanupInterval = setInterval(() => this._purgeStale(), cleanupIntervalMs); | ||
| // Allow the process to exit even if the interval is still running | ||
| if (this._cleanupInterval.unref) { | ||
| this._cleanupInterval.unref(); | ||
| } | ||
| } |
There was a problem hiding this comment.
RateLimiter is introduced but not referenced anywhere in src/ (no callers found), so this PR doesn't actually apply rate limiting as described in the PR summary/title. Either integrate it at the appropriate ingress (e.g., message receive path) or adjust the PR description/title to reflect the current scope.
| /** | ||
| * Recursively strips prototype-polluting keys from an object. | ||
| * Prevents __proto__, constructor, and prototype injection at any depth | ||
| * when parsing untrusted JSON (e.g. SETTINGS_JSON env var). | ||
| */ | ||
| function deepSanitize(obj) { | ||
| const BLOCKED_KEYS = new Set(['__proto__', 'constructor', 'prototype']); |
There was a problem hiding this comment.
deepSanitize recreates BLOCKED_KEYS on every recursive call, which is unnecessary work (especially on large nested inputs). Move BLOCKED_KEYS to a module-level constant (or at least outside the function body) so recursion reuses the same set.
| /** | |
| * Recursively strips prototype-polluting keys from an object. | |
| * Prevents __proto__, constructor, and prototype injection at any depth | |
| * when parsing untrusted JSON (e.g. SETTINGS_JSON env var). | |
| */ | |
| function deepSanitize(obj) { | |
| const BLOCKED_KEYS = new Set(['__proto__', 'constructor', 'prototype']); | |
| const BLOCKED_KEYS = new Set(['__proto__', 'constructor', 'prototype']); | |
| /** | |
| * Recursively strips prototype-polluting keys from an object. | |
| * Prevents __proto__, constructor, and prototype injection at any depth | |
| * when parsing untrusted JSON (e.g. SETTINGS_JSON env var). | |
| */ | |
| function deepSanitize(obj) { |
3/3/2026
PR #716 consolidated everything from #710, #714, #717, and #718.
This PR was superseded by #716.
Summary
Adds three complementary security layers that harden Mindcraft against malicious or malformed user input — without changing any existing behaviour for well-formed traffic.
src/utils/message_validator.jssrc/utils/rate_limiter.jssettings.js__proto__,constructor, andprototypekeys fromSETTINGS_JSONbefore mergingDetails
src/utils/message_validator.js(new)validateDiscordMessage(),validateMinecraftMessage(),validateUsername(){ valid, error?, sanitized?, warnings? }so callers get both a pass/fail flag and a sanitised stringrm,curl,wget, backtick execution,$(...)substitution, pipe-to-shell)\x00-\x1F, BOM) and replaces null bytes / newlines in Minecraft messages[a-zA-Z0-9_]{3,16}formatsrc/utils/rate_limiter.js(new)RateLimiterclass — sliding-window algorithm, default 5 requests / 60 s per user_purgeStale()runs on a background interval (default 5 min) to reclaim memory from inactive usersinterval.unref()so the timer never prevents clean process exitgetStats()for observability;destroy()for clean shutdown in testssettings.js—deepSanitize(){ __proto__, constructor, prototype }beforeObject.assignJSON.parse(process.env.SETTINGS_JSON)so no polluted key ever reachessettingssrc/agent/agent.js— integrationvalidateMinecraftMessageandvalidateUsernamerespondFuncnow validates the username and message early; rejects with a console warning if either is invalidcleanMessageinstead of the raw stringTesting
npx eslintwith zero new warningsno-undefforprocess,semiin settings object literal,require-await)Backward Compatibility
deepSanitizeis a no-op when the JSON contains no prototype-polluting keys (i.e. all current usage)