feat: add diagnostic logging to agent and enrollment endpoint#7
feat: add diagnostic logging to agent and enrollment endpoint#7TerrifiedBug merged 9 commits intomainfrom
Conversation
Greptile SummaryThis PR adds diagnostic logging across the Go agent (HTTP client, enrollment flow, poll/heartbeat cycle) and always-on audit logging to the server-side enrollment endpoint, making Key changes:
Confidence Score: 2/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant A as Agent (enrollment.go)
participant C as Client (client.go)
participant S as Server (route.ts)
participant DB as PostgreSQL
Note over A: slog.Debug "no existing node token"
Note over A: slog.Debug "enrolling" (tokenPrefix, hostname, os)
A->>C: Enroll(EnrollRequest)
Note over C: slog.Debug "http request POST /api/agent/enroll"
C->>S: POST /api/agent/enroll {token, hostname, os, agentVersion, vectorVersion}
Note over S: console.error on invalid input (always-on)
Note over S: console.log attempt from hostname/agentVersion (always-on)
S->>DB: findMany environments with enrollmentTokenHash
Note over S: console.log found N candidate environments (always-on)
S->>S: bcrypt.compare(token, env.enrollmentTokenHash) per env
alt No match
Note over S: console.error REJECTED (always-on)
S-->>C: 401 {error: "Invalid enrollment token"}
Note over C: slog.Debug "http response" status=401 + body
C-->>A: error
else Match found
S->>DB: vectorNode.create(hostname, nodeTokenHash, ...)
Note over S: console.log SUCCESS node enrolled (always-on)
S-->>C: 200 {nodeId, nodeToken, environmentId, environmentName}
Note over C: slog.Debug "http response" status=200
C-->>A: EnrollResponse
end
Note over A: slog.Info "enrolled successfully" (nodeId, environment)
A->>A: WriteFile(nodeToken)
Last reviewed commit: b61070c |
agent/internal/client/client.go
Outdated
| } | ||
| httpReq.Header.Set("Authorization", "Bearer "+c.nodeToken) | ||
|
|
||
| slog.Debug("http request", "method", "GET", "url", c.baseURL+"/api/agent/config", "tokenPrefix", tokenPrefix(c.nodeToken, 16)) |
There was a problem hiding this comment.
Node token prefix leaked into debug logs
c.nodeToken is the per-node bearer token — the plaintext value is returned only once at enrollment, bcrypt-hashed in the database, and used to authenticate every subsequent API call. Logging even 16 characters of it in a debug log violates the project's security rule ("Never log or expose decrypted secret values") and narrows the keyspace should debug logs be captured by an attacker.
The other two HTTP methods (Enroll, SendHeartbeat) correctly avoid logging the token, making this inconsistent as well.
| slog.Debug("http request", "method", "GET", "url", c.baseURL+"/api/agent/config", "tokenPrefix", tokenPrefix(c.nodeToken, 16)) | |
| slog.Debug("http request", "method", "GET", "url", c.baseURL+"/api/agent/config") |
Context Used: Rule from dashboard - ## Security & Cryptography Review Rules
When reviewing changes to authentication, authorization, en... (source)
src/app/api/agent/enroll/route.ts
Outdated
| } else { | ||
| console.log(`[enroll] env "${env.name}" -- token mismatch`); | ||
| } |
There was a problem hiding this comment.
Always-on logging enumerates all environment names on failed enrollment
This console.log fires for every environment whose token doesn't match the submitted one. For a deployment with many environments, a single failed enrollment (or a brute-force attempt) will emit one log line per environment, each containing the environment name. Since this is console.log (always-on, not debug-gated), it will appear in production logs unconditionally.
Two concerns:
- Log volume: a fleet attacker repeatedly attempting enrollment will flood production logs with per-environment mismatch lines.
- Information leakage: anyone with read access to server logs can enumerate all environment names from a single failed attempt.
The [enroll] REJECTED line below already captures the count of environments checked. Consider removing the per-environment mismatch lines entirely, or downgrading them to a single summary line.
agent/internal/agent/enrollment.go
Outdated
| hostname, _ := os.Hostname() | ||
| vectorVersion := detectVectorVersion(cfg.VectorBin) | ||
|
|
||
| slog.Debug("enrolling", "hostname", hostname, "os", runtime.GOOS+"/"+runtime.GOARCH, "tokenPrefix", cfg.Token[:min(len(cfg.Token), 24)]+"...") |
There was a problem hiding this comment.
Token prefix always appends "..." even when the full token is logged
When len(cfg.Token) <= 24, cfg.Token[:min(len(cfg.Token), 24)] returns the entire token, and then "..." is appended — making the log output look truncated when it actually contains the complete value. A short enrollment token (e.g. during testing) would be fully exposed while the trailing "..." misleads a log reviewer into believing it was safely redacted.
The tokenPrefix helper in client.go handles this correctly (no "..." for short tokens), but it's unexported. The inline equivalent here should replicate that guard:
prefix := cfg.Token
if len(cfg.Token) > 24 {
prefix = cfg.Token[:24] + "..."
}
slog.Debug("enrolling", "hostname", hostname, "os", runtime.GOOS+"/"+runtime.GOARCH, "tokenPrefix", prefix)- Remove node token prefix from GetConfig debug log (security) - Fix enrollment token prefix guard for short tokens - Remove per-environment name logging on mismatch (info leak)
agent/internal/client/client.go
Outdated
| // tokenPrefix returns the first N chars of a token for safe debug logging. | ||
| func tokenPrefix(token string, n int) string { | ||
| if len(token) <= n { | ||
| return token | ||
| } | ||
| return token[:n] + "..." | ||
| } |
There was a problem hiding this comment.
tokenPrefix helper is dead code
The function is defined here but is never called anywhere in the client package. The previous thread flagged that c.nodeToken was being logged, and it appears the intended fix was to use tokenPrefix(c.nodeToken, 16) before logging it — but the logging of c.nodeToken was removed entirely, leaving this helper unreachable. Either wire it up where intended, or remove it to keep the package clean.
src/app/api/agent/enroll/route.ts
Outdated
| const tokenPrefix = token.slice(0, 24) + "..."; | ||
| console.log(`[enroll] attempt from hostname="${hostname}" tokenPrefix="${tokenPrefix}" agentVersion="${agentVersion ?? "unknown"}"`); |
There was a problem hiding this comment.
Enrollment token prefix: unconditional ellipsis + always-on production log
Two issues on these lines:
.slice(0, 24) + "..."always appends"..."regardless of token length. For a short enrollment token (e.g. ≤ 24 chars), the full value is written to logs while the trailing"..."misleads a log reviewer into thinking it was safely redacted. The GotokenPrefixhelper added inclient.goguards against this correctly — the same guard is missing here. A fix would be:
const raw = token.slice(0, 24);
const tokenPrefix = token.length > 24 ? raw + "..." : raw;
- More critically, this
console.logis always-on — it fires in production on every enrollment attempt, unconditionally writing a portion of the enrollment token to server logs. Per the project security policy, enrollment tokens are encrypted at rest and are sensitive credentials; even a prefix should not appear in non-debug production logs. Consider droppingtokenPrefixfrom this log line, or gating the entire line behind a debug flag.
Context Used: Rule from dashboard - ## Security & Cryptography Review Rules
When reviewing changes to authentication, authorization, en... (source)
Remove dead tokenPrefix helper from client.go and strip enrollment token prefix from always-on production log in enrollment endpoint.
src/app/api/agent/enroll/route.ts
Outdated
| } | ||
|
|
||
| const { token, hostname, os, agentVersion, vectorVersion } = parsed.data; | ||
| console.log(`[enroll] attempt from hostname="${hostname}" agentVersion="${agentVersion ?? "unknown"}"`); |
There was a problem hiding this comment.
Log injection via user-controlled fields
hostname (and agentVersion) are attacker-controlled strings validated only for length by Zod — not sanitized for newlines or other control characters. Embedding them directly in a console.log template literal allows an unauthenticated attacker to forge arbitrary log lines.
Example payload:
{
"token": "anything",
"hostname": "attacker-host\"\n[enroll] SUCCESS -- node abc123 enrolled in \"production\""
}This would emit the following to production logs:
[enroll] attempt from hostname="attacker-host"
[enroll] SUCCESS -- node abc123 enrolled in "production"
A forged SUCCESS entry could mislead an operator monitoring enrollment activity or contaminate a SIEM alert pipeline. Since this endpoint requires no prior authentication, the injection is trivially reachable.
Consider stripping or escaping control characters before interpolating user-supplied strings into log messages:
const safeHostname = hostname.replace(/[\r\n\t]/g, " ");
const safeVersion = (agentVersion ?? "unknown").replace(/[\r\n\t]/g, " ");
console.log(`[enroll] attempt from hostname="${safeHostname}" agentVersion="${safeVersion}"`);Context Used: Rule from dashboard - ## Security & Cryptography Review Rules
When reviewing changes to authentication, authorization, en... (source)
Strip newlines and tabs from hostname and agentVersion before interpolating into log messages to prevent log injection attacks on the unauthenticated enrollment endpoint.
src/app/api/agent/enroll/route.ts
Outdated
| const safeHostname = hostname.replace(/[\r\n\t]/g, " "); | ||
| const safeVersion = (agentVersion ?? "unknown").replace(/[\r\n\t]/g, " "); | ||
| console.log(`[enroll] attempt from hostname="${safeHostname}" agentVersion="${safeVersion}"`); |
There was a problem hiding this comment.
Double-quote characters bypass sanitization, enabling intra-line field forgery
The sanitization strips \r, \n, and \t to block newline injection, but the log template wraps field values in double quotes (hostname="...", agentVersion="..."). An attacker who submits a hostname containing literal " characters can close one field and forge additional key-value pairs in the same log line without any newlines:
hostname: 'real-host" customField="injected'
→ [enroll] attempt from hostname="real-host" customField="injected" agentVersion="..."
This allows fabricating structured log fields on a single line, which can mislead log-analysis tools or SIEM rules that parse this log format. A consistent fix is to also strip (or escape) " from user-supplied values:
| const safeHostname = hostname.replace(/[\r\n\t]/g, " "); | |
| const safeVersion = (agentVersion ?? "unknown").replace(/[\r\n\t]/g, " "); | |
| console.log(`[enroll] attempt from hostname="${safeHostname}" agentVersion="${safeVersion}"`); | |
| const safeHostname = hostname.replace(/[\r\n\t"]/g, " "); | |
| const safeVersion = (agentVersion ?? "unknown").replace(/[\r\n\t"]/g, " "); | |
| console.log(`[enroll] attempt from hostname="${safeHostname}" agentVersion="${safeVersion}"`); |
Context Used: Rule from dashboard - ## Security & Cryptography Review Rules
When reviewing changes to authentication, authorization, en... (source)
| slog.Debug("http request", "method", "POST", "url", c.baseURL+"/api/agent/enroll") | ||
| resp, err := c.httpClient.Do(httpReq) | ||
| if err != nil { | ||
| slog.Debug("http error", "method", "POST", "url", "/api/agent/enroll", "error", err) | ||
| return nil, fmt.Errorf("enroll request failed: %w", err) | ||
| } | ||
| defer resp.Body.Close() | ||
|
|
||
| if resp.StatusCode != http.StatusOK { | ||
| respBody, _ := io.ReadAll(resp.Body) | ||
| slog.Debug("http response", "method", "POST", "url", "/api/agent/enroll", "status", resp.StatusCode, "body", string(respBody)) | ||
| return nil, fmt.Errorf("enroll failed (status %d): %s", resp.StatusCode, string(respBody)) | ||
| } | ||
|
|
||
| var result EnrollResponse | ||
| if err := json.NewDecoder(resp.Body).Decode(&result); err != nil { | ||
| return nil, fmt.Errorf("decode enroll response: %w", err) | ||
| } | ||
| slog.Debug("http response", "method", "POST", "url", "/api/agent/enroll", "status", 200) |
There was a problem hiding this comment.
Inconsistent URL representation in request vs. error/response debug logs
The request log uses the fully-qualified URL (c.baseURL + "/api/agent/enroll") while the subsequent error and response logs drop back to just the relative path ("/api/agent/enroll"). This pattern is repeated for all three methods (Enroll, GetConfig, SendHeartbeat).
In practice this means that when an http error or http response log line appears you lose the hostname/port context needed to identify which server instance the agent was talking to — which is exactly the information most useful during a 401 enrollment failure investigation, particularly when debugging multi-environment deployments.
Consider using the same full URL in all three log sites:
| slog.Debug("http request", "method", "POST", "url", c.baseURL+"/api/agent/enroll") | |
| resp, err := c.httpClient.Do(httpReq) | |
| if err != nil { | |
| slog.Debug("http error", "method", "POST", "url", "/api/agent/enroll", "error", err) | |
| return nil, fmt.Errorf("enroll request failed: %w", err) | |
| } | |
| defer resp.Body.Close() | |
| if resp.StatusCode != http.StatusOK { | |
| respBody, _ := io.ReadAll(resp.Body) | |
| slog.Debug("http response", "method", "POST", "url", "/api/agent/enroll", "status", resp.StatusCode, "body", string(respBody)) | |
| return nil, fmt.Errorf("enroll failed (status %d): %s", resp.StatusCode, string(respBody)) | |
| } | |
| var result EnrollResponse | |
| if err := json.NewDecoder(resp.Body).Decode(&result); err != nil { | |
| return nil, fmt.Errorf("decode enroll response: %w", err) | |
| } | |
| slog.Debug("http response", "method", "POST", "url", "/api/agent/enroll", "status", 200) | |
| endpoint := c.baseURL + "/api/agent/enroll" | |
| slog.Debug("http request", "method", "POST", "url", endpoint) | |
| // ... | |
| slog.Debug("http error", "method", "POST", "url", endpoint, "error", err) | |
| // ... | |
| slog.Debug("http response", "method", "POST", "url", endpoint, "status", resp.StatusCode, "body", string(respBody)) |
Apply the same fix to GetConfig() and SendHeartbeat() methods for consistency.
Add " to the control character strip regex to prevent intra-line field forgery via quote injection in the unauthenticated enrollment endpoint.
| team: { select: { id: true } }, | ||
| }, | ||
| }); | ||
| console.log(`[enroll] found ${environments.length} candidate environment(s)`); |
There was a problem hiding this comment.
Always-on pre-auth environment count log
This console.log fires on every enrollment attempt — including unauthenticated and brute-force ones — before any token verification has occurred. It unconditionally writes the number of environments with active enrollment tokens to production logs.
Two concerns:
- Log volume: A brute-force attacker can cause one
[enroll] found N candidate environment(s)line per attempt to be emitted, flooding production logs. The existing[enroll] REJECTEDline already captures this count on failed attempts, making this line redundant. - Pre-auth information disclosure: Anyone with read access to server logs can determine how many environments have enrollment tokens configured from any failed attempt, before the request is authenticated.
Consider removing this line, or gating it behind a debug logger, since the rejected/success lines already include the checked count.
agent/internal/agent/enrollment.go
Outdated
| fmt.Printf("Enrolled as node %s in environment %q\n", resp.NodeID, resp.EnvironmentName) | ||
| slog.Info("enrolled successfully", "nodeId", resp.NodeID, "environment", resp.EnvironmentName) |
There was a problem hiding this comment.
Duplicate enrollment success output
Both fmt.Printf (stdout) and slog.Info (structured logger) emit the same enrollment success message. Under a typical deployment the structured logger already writes to stdout, so operators see two identical lines on every enrollment:
Enrolled as node abc123 in environment "production"
time=... level=INFO msg="enrolled successfully" nodeId=abc123 environment=production
Consider removing the fmt.Printf and relying solely on slog.Info, which is the consistent pattern used everywhere else in this file and in agent.go.
| slog.Debug("http request", "method", "POST", "url", c.baseURL+"/api/agent/enroll") | ||
| resp, err := c.httpClient.Do(httpReq) | ||
| if err != nil { | ||
| slog.Debug("http error", "method", "POST", "url", "/api/agent/enroll", "error", err) | ||
| return nil, fmt.Errorf("enroll request failed: %w", err) | ||
| } | ||
| defer resp.Body.Close() | ||
|
|
||
| if resp.StatusCode != http.StatusOK { | ||
| respBody, _ := io.ReadAll(resp.Body) | ||
| slog.Debug("http response", "method", "POST", "url", "/api/agent/enroll", "status", resp.StatusCode, "body", string(respBody)) | ||
| return nil, fmt.Errorf("enroll failed (status %d): %s", resp.StatusCode, string(respBody)) | ||
| } | ||
|
|
||
| var result EnrollResponse | ||
| if err := json.NewDecoder(resp.Body).Decode(&result); err != nil { | ||
| return nil, fmt.Errorf("decode enroll response: %w", err) | ||
| } | ||
| slog.Debug("http response", "method", "POST", "url", "/api/agent/enroll", "status", 200) |
There was a problem hiding this comment.
Inconsistent URL representation in request vs. error/response debug logs
The request log uses the fully-qualified URL (c.baseURL + "/api/agent/enroll") while the subsequent error and response logs drop back to just the relative path ("/api/agent/enroll"). This pattern is repeated for all three methods (Enroll, GetConfig, SendHeartbeat).
In practice this means that when an http error or http response log line appears you lose the hostname/port context needed to identify which server instance the agent was talking to — which is exactly the information most useful during a 401 enrollment failure investigation, particularly when debugging multi-environment deployments.
Consider using the same full URL in all three log sites for consistency and debuggability.
Remove fmt.Printf that duplicated the slog.Info enrollment success message, causing two identical lines on stdout.
| metadata: { enrolledVia: "agent" }, | ||
| }, | ||
| }); | ||
| console.log(`[enroll] SUCCESS -- node ${node.id} enrolled in "${matchedEnv.name}"`); |
There was a problem hiding this comment.
matchedEnv.name embedded in log without sanitization
safeHostname and safeVersion are sanitized with /[\r\n\t"]/g before being interpolated into log lines, but matchedEnv.name on this success log is used raw. Environment names are admin-controlled (not from the enrollment request), so the risk is low — but if an admin sets a name containing a newline or a double-quote, it bypasses the same injection guard that was carefully applied to the request fields on lines 27–28.
For consistency, apply the same sanitization here:
| console.log(`[enroll] SUCCESS -- node ${node.id} enrolled in "${matchedEnv.name}"`); | |
| console.log(`[enroll] SUCCESS -- node ${node.id} enrolled in "${matchedEnv.name.replace(/[\r\n\t"]/g, " ")}"`); |
Context Used: Rule from dashboard - ## Security & Cryptography Review Rules
When reviewing changes to authentication, authorization, en... (source)
| if (!parsed.success) { | ||
| console.error("[enroll] invalid input:", parsed.error.flatten().fieldErrors); |
There was a problem hiding this comment.
Always-on validation-error log on unauthenticated endpoint
console.error("[enroll] invalid input:", ...) fires unconditionally for every malformed request to an unauthenticated endpoint. The enrollment endpoint requires no prior authentication, so a flood of crafted malformed requests will emit one console.error per request to production logs indefinitely — there is no rate limit or gating here.
The payload is parsed.error.flatten().fieldErrors, which contains only field names (e.g. { hostname: ["Required"] }), not actual input values, so this doesn't leak user data. The concern is log volume under a malformed-request DoS. Consider dropping this line (Zod validation errors are already returned to the caller in the 400 response body) or downgrading to a debug-only log.
Summary
slog.Debuglogging to Go agent HTTP client (request URL, response status, response body on errors)slog.Debuglogging to agent enrollment flow (token path, token prefix, hostname)slog.Debuglogging to agent poll/heartbeat cycle (action count, pipeline count)[enroll]-prefixed audit logging to server enrollment endpoint (attempt details, environment matching, rejection/success)These changes make
VF_LOG_LEVEL=debugactually useful for diagnosing agent connectivity issues like 401 enrollment failures, and give server-side visibility into enrollment attempts without any config changes.Test plan
VF_LOG_LEVEL=debugand verify debug output appears for enrollment, polling, and heartbeat[enroll] REJECTEDwith environment count[enroll] SUCCESSVF_LOG_LEVEL=infoproduces no extra output from agent