Skip to content

feat: add diagnostic logging to agent and enrollment endpoint#7

Merged
TerrifiedBug merged 9 commits intomainfrom
feat/agent-diagnostic-logging
Mar 5, 2026
Merged

feat: add diagnostic logging to agent and enrollment endpoint#7
TerrifiedBug merged 9 commits intomainfrom
feat/agent-diagnostic-logging

Conversation

@TerrifiedBug
Copy link
Owner

Summary

  • Add slog.Debug logging to Go agent HTTP client (request URL, response status, response body on errors)
  • Add slog.Debug logging to agent enrollment flow (token path, token prefix, hostname)
  • Add slog.Debug logging to agent poll/heartbeat cycle (action count, pipeline count)
  • Add always-on [enroll]-prefixed audit logging to server enrollment endpoint (attempt details, environment matching, rejection/success)

These changes make VF_LOG_LEVEL=debug actually useful for diagnosing agent connectivity issues like 401 enrollment failures, and give server-side visibility into enrollment attempts without any config changes.

Test plan

  • Deploy agent with VF_LOG_LEVEL=debug and verify debug output appears for enrollment, polling, and heartbeat
  • Trigger a failed enrollment and verify server logs show [enroll] REJECTED with environment count
  • Trigger a successful enrollment and verify server logs show [enroll] SUCCESS
  • Verify default VF_LOG_LEVEL=info produces no extra output from agent

@greptile-apps
Copy link

greptile-apps bot commented Mar 5, 2026

Greptile Summary

This PR adds diagnostic logging across the Go agent (HTTP client, enrollment flow, poll/heartbeat cycle) and always-on audit logging to the server-side enrollment endpoint, making VF_LOG_LEVEL=debug genuinely useful for diagnosing agent connectivity failures.

Key changes:

  • agent/internal/client/client.go: slog.Debug request/error/response logs added to all three HTTP methods. The URL used in error and response logs is still the relative path ("/api/agent/enroll") while the request log uses the full URL — previously flagged inconsistency remains unresolved.
  • agent/internal/agent/enrollment.go: Token-path visibility and a token-prefix debug log added; fmt.Printf replaced with slog.Info for enrollment success, resolving the previously flagged duplicate-output issue.
  • agent/internal/agent/agent.go: Lightweight slog.Debug calls added for poll action count and heartbeat pipeline/sample-result count. Both are correctly scoped to the success path.
  • src/app/api/agent/enroll/route.ts: Multiple always-on [enroll]-prefixed console.log/console.error calls added. Hostname and agentVersion are sanitized with /[\r\n\t"]/g before interpolation. Previously flagged issues (pre-auth environment count log, token prefix, log injection, always-on token prefix in production) remain outstanding; additionally, matchedEnv.name in the success log is embedded without the same sanitization applied to request fields, and the always-on validation-error console.error on the unauthenticated endpoint can contribute to log noise under a malformed-request flood.

Confidence Score: 2/5

  • Several unresolved security and log-hygiene issues flagged in previous review rounds remain in the HEAD, making this PR not safe to merge as-is.
  • The PR still carries multiple previously flagged outstanding issues: always-on enrollment-token prefix in production logs (route.ts:27), pre-auth environment count disclosure (route.ts:44), URL inconsistency in all three HTTP methods (client.go), and the enrollment.go token still logged in full for short tokens. These were raised in earlier review rounds and do not appear to be resolved in the current HEAD. The two new issues found in this pass (unsanitized matchedEnv.name in success log and always-on validation-error log on an unauthenticated endpoint) are lower severity but add to the overall picture.
  • src/app/api/agent/enroll/route.ts requires the most attention — multiple always-on logs with token prefix, pre-auth data disclosure, and unsanitized field. agent/internal/client/client.go still has the inconsistent URL representation across all three methods.

Important Files Changed

Filename Overview
src/app/api/agent/enroll/route.ts Adds structured [enroll]-prefixed audit logging to the server enrollment endpoint; several always-on console.log/console.error calls were previously flagged (pre-auth environment count, token prefix with unconditional ellipsis, log-injection via user-controlled fields). Current code includes " in the sanitization regex for hostname/version fields, but matchedEnv.name in the success log remains unsanitized, and the always-on validation-error log can contribute to log noise under malformed-request attacks.
agent/internal/client/client.go Adds slog.Debug request/error/response logging to all three HTTP methods (Enroll, GetConfig, SendHeartbeat). Previously flagged: inconsistent URL representation (full URL on request, relative URL on error/response logs) and dead tokenPrefix helper. The current HEAD has no tokenPrefix function and the node token is not logged, but the URL inconsistency remains unresolved across all three methods.
agent/internal/agent/enrollment.go Adds token-path and token-prefix debug logs plus replaces fmt.Printf with slog.Info for enrollment success (eliminating the duplicate-output issue previously flagged). The token prefix guard (if len > 24) is correctly implemented for the > 24 case; the fmt.Printf duplicate was already removed. Looks clean otherwise.
agent/internal/agent/agent.go Adds two minor slog.Debug calls: one after a successful poll (action count) and one in the heartbeat success branch (pipeline and sampleResult counts). Both are correctly scoped to debug level and placed in non-error paths. No issues found.

Sequence Diagram

sequenceDiagram
    participant A as Agent (enrollment.go)
    participant C as Client (client.go)
    participant S as Server (route.ts)
    participant DB as PostgreSQL

    Note over A: slog.Debug "no existing node token"
    Note over A: slog.Debug "enrolling" (tokenPrefix, hostname, os)
    A->>C: Enroll(EnrollRequest)
    Note over C: slog.Debug "http request POST /api/agent/enroll"
    C->>S: POST /api/agent/enroll {token, hostname, os, agentVersion, vectorVersion}
    Note over S: console.error on invalid input (always-on)
    Note over S: console.log attempt from hostname/agentVersion (always-on)
    S->>DB: findMany environments with enrollmentTokenHash
    Note over S: console.log found N candidate environments (always-on)
    S->>S: bcrypt.compare(token, env.enrollmentTokenHash) per env
    alt No match
        Note over S: console.error REJECTED (always-on)
        S-->>C: 401 {error: "Invalid enrollment token"}
        Note over C: slog.Debug "http response" status=401 + body
        C-->>A: error
    else Match found
        S->>DB: vectorNode.create(hostname, nodeTokenHash, ...)
        Note over S: console.log SUCCESS node enrolled (always-on)
        S-->>C: 200 {nodeId, nodeToken, environmentId, environmentName}
        Note over C: slog.Debug "http response" status=200
        C-->>A: EnrollResponse
    end
    Note over A: slog.Info "enrolled successfully" (nodeId, environment)
    A->>A: WriteFile(nodeToken)
Loading

Last reviewed commit: b61070c

}
httpReq.Header.Set("Authorization", "Bearer "+c.nodeToken)

slog.Debug("http request", "method", "GET", "url", c.baseURL+"/api/agent/config", "tokenPrefix", tokenPrefix(c.nodeToken, 16))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Node token prefix leaked into debug logs

c.nodeToken is the per-node bearer token — the plaintext value is returned only once at enrollment, bcrypt-hashed in the database, and used to authenticate every subsequent API call. Logging even 16 characters of it in a debug log violates the project's security rule ("Never log or expose decrypted secret values") and narrows the keyspace should debug logs be captured by an attacker.

The other two HTTP methods (Enroll, SendHeartbeat) correctly avoid logging the token, making this inconsistent as well.

Suggested change
slog.Debug("http request", "method", "GET", "url", c.baseURL+"/api/agent/config", "tokenPrefix", tokenPrefix(c.nodeToken, 16))
slog.Debug("http request", "method", "GET", "url", c.baseURL+"/api/agent/config")

Context Used: Rule from dashboard - ## Security & Cryptography Review Rules

When reviewing changes to authentication, authorization, en... (source)

Comment on lines 51 to 53
} else {
console.log(`[enroll] env "${env.name}" -- token mismatch`);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always-on logging enumerates all environment names on failed enrollment

This console.log fires for every environment whose token doesn't match the submitted one. For a deployment with many environments, a single failed enrollment (or a brute-force attempt) will emit one log line per environment, each containing the environment name. Since this is console.log (always-on, not debug-gated), it will appear in production logs unconditionally.

Two concerns:

  1. Log volume: a fleet attacker repeatedly attempting enrollment will flood production logs with per-environment mismatch lines.
  2. Information leakage: anyone with read access to server logs can enumerate all environment names from a single failed attempt.

The [enroll] REJECTED line below already captures the count of environments checked. Consider removing the per-environment mismatch lines entirely, or downgrading them to a single summary line.

hostname, _ := os.Hostname()
vectorVersion := detectVectorVersion(cfg.VectorBin)

slog.Debug("enrolling", "hostname", hostname, "os", runtime.GOOS+"/"+runtime.GOARCH, "tokenPrefix", cfg.Token[:min(len(cfg.Token), 24)]+"...")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Token prefix always appends "..." even when the full token is logged

When len(cfg.Token) <= 24, cfg.Token[:min(len(cfg.Token), 24)] returns the entire token, and then "..." is appended — making the log output look truncated when it actually contains the complete value. A short enrollment token (e.g. during testing) would be fully exposed while the trailing "..." misleads a log reviewer into believing it was safely redacted.

The tokenPrefix helper in client.go handles this correctly (no "..." for short tokens), but it's unexported. The inline equivalent here should replicate that guard:

prefix := cfg.Token
if len(cfg.Token) > 24 {
    prefix = cfg.Token[:24] + "..."
}
slog.Debug("enrolling", "hostname", hostname, "os", runtime.GOOS+"/"+runtime.GOARCH, "tokenPrefix", prefix)

- Remove node token prefix from GetConfig debug log (security)
- Fix enrollment token prefix guard for short tokens
- Remove per-environment name logging on mismatch (info leak)
Comment on lines +32 to +38
// tokenPrefix returns the first N chars of a token for safe debug logging.
func tokenPrefix(token string, n int) string {
if len(token) <= n {
return token
}
return token[:n] + "..."
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tokenPrefix helper is dead code

The function is defined here but is never called anywhere in the client package. The previous thread flagged that c.nodeToken was being logged, and it appears the intended fix was to use tokenPrefix(c.nodeToken, 16) before logging it — but the logging of c.nodeToken was removed entirely, leaving this helper unreachable. Either wire it up where intended, or remove it to keep the package clean.

Comment on lines +27 to +28
const tokenPrefix = token.slice(0, 24) + "...";
console.log(`[enroll] attempt from hostname="${hostname}" tokenPrefix="${tokenPrefix}" agentVersion="${agentVersion ?? "unknown"}"`);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enrollment token prefix: unconditional ellipsis + always-on production log

Two issues on these lines:

  1. .slice(0, 24) + "..." always appends "..." regardless of token length. For a short enrollment token (e.g. ≤ 24 chars), the full value is written to logs while the trailing "..." misleads a log reviewer into thinking it was safely redacted. The Go tokenPrefix helper added in client.go guards against this correctly — the same guard is missing here. A fix would be:
const raw = token.slice(0, 24);
const tokenPrefix = token.length > 24 ? raw + "..." : raw;
  1. More critically, this console.log is always-on — it fires in production on every enrollment attempt, unconditionally writing a portion of the enrollment token to server logs. Per the project security policy, enrollment tokens are encrypted at rest and are sensitive credentials; even a prefix should not appear in non-debug production logs. Consider dropping tokenPrefix from this log line, or gating the entire line behind a debug flag.

Context Used: Rule from dashboard - ## Security & Cryptography Review Rules

When reviewing changes to authentication, authorization, en... (source)

Remove dead tokenPrefix helper from client.go and strip enrollment
token prefix from always-on production log in enrollment endpoint.
}

const { token, hostname, os, agentVersion, vectorVersion } = parsed.data;
console.log(`[enroll] attempt from hostname="${hostname}" agentVersion="${agentVersion ?? "unknown"}"`);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log injection via user-controlled fields

hostname (and agentVersion) are attacker-controlled strings validated only for length by Zod — not sanitized for newlines or other control characters. Embedding them directly in a console.log template literal allows an unauthenticated attacker to forge arbitrary log lines.

Example payload:

{
  "token": "anything",
  "hostname": "attacker-host\"\n[enroll] SUCCESS -- node abc123 enrolled in \"production\""
}

This would emit the following to production logs:

[enroll] attempt from hostname="attacker-host"
[enroll] SUCCESS -- node abc123 enrolled in "production"

A forged SUCCESS entry could mislead an operator monitoring enrollment activity or contaminate a SIEM alert pipeline. Since this endpoint requires no prior authentication, the injection is trivially reachable.

Consider stripping or escaping control characters before interpolating user-supplied strings into log messages:

const safeHostname = hostname.replace(/[\r\n\t]/g, " ");
const safeVersion = (agentVersion ?? "unknown").replace(/[\r\n\t]/g, " ");
console.log(`[enroll] attempt from hostname="${safeHostname}" agentVersion="${safeVersion}"`);

Context Used: Rule from dashboard - ## Security & Cryptography Review Rules

When reviewing changes to authentication, authorization, en... (source)

Strip newlines and tabs from hostname and agentVersion before
interpolating into log messages to prevent log injection attacks
on the unauthenticated enrollment endpoint.
Comment on lines +27 to +29
const safeHostname = hostname.replace(/[\r\n\t]/g, " ");
const safeVersion = (agentVersion ?? "unknown").replace(/[\r\n\t]/g, " ");
console.log(`[enroll] attempt from hostname="${safeHostname}" agentVersion="${safeVersion}"`);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double-quote characters bypass sanitization, enabling intra-line field forgery

The sanitization strips \r, \n, and \t to block newline injection, but the log template wraps field values in double quotes (hostname="...", agentVersion="..."). An attacker who submits a hostname containing literal " characters can close one field and forge additional key-value pairs in the same log line without any newlines:

hostname: 'real-host" customField="injected'
→ [enroll] attempt from hostname="real-host" customField="injected" agentVersion="..."

This allows fabricating structured log fields on a single line, which can mislead log-analysis tools or SIEM rules that parse this log format. A consistent fix is to also strip (or escape) " from user-supplied values:

Suggested change
const safeHostname = hostname.replace(/[\r\n\t]/g, " ");
const safeVersion = (agentVersion ?? "unknown").replace(/[\r\n\t]/g, " ");
console.log(`[enroll] attempt from hostname="${safeHostname}" agentVersion="${safeVersion}"`);
const safeHostname = hostname.replace(/[\r\n\t"]/g, " ");
const safeVersion = (agentVersion ?? "unknown").replace(/[\r\n\t"]/g, " ");
console.log(`[enroll] attempt from hostname="${safeHostname}" agentVersion="${safeVersion}"`);

Context Used: Rule from dashboard - ## Security & Cryptography Review Rules

When reviewing changes to authentication, authorization, en... (source)

Comment on lines +61 to +79
slog.Debug("http request", "method", "POST", "url", c.baseURL+"/api/agent/enroll")
resp, err := c.httpClient.Do(httpReq)
if err != nil {
slog.Debug("http error", "method", "POST", "url", "/api/agent/enroll", "error", err)
return nil, fmt.Errorf("enroll request failed: %w", err)
}
defer resp.Body.Close()

if resp.StatusCode != http.StatusOK {
respBody, _ := io.ReadAll(resp.Body)
slog.Debug("http response", "method", "POST", "url", "/api/agent/enroll", "status", resp.StatusCode, "body", string(respBody))
return nil, fmt.Errorf("enroll failed (status %d): %s", resp.StatusCode, string(respBody))
}

var result EnrollResponse
if err := json.NewDecoder(resp.Body).Decode(&result); err != nil {
return nil, fmt.Errorf("decode enroll response: %w", err)
}
slog.Debug("http response", "method", "POST", "url", "/api/agent/enroll", "status", 200)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent URL representation in request vs. error/response debug logs

The request log uses the fully-qualified URL (c.baseURL + "/api/agent/enroll") while the subsequent error and response logs drop back to just the relative path ("/api/agent/enroll"). This pattern is repeated for all three methods (Enroll, GetConfig, SendHeartbeat).

In practice this means that when an http error or http response log line appears you lose the hostname/port context needed to identify which server instance the agent was talking to — which is exactly the information most useful during a 401 enrollment failure investigation, particularly when debugging multi-environment deployments.

Consider using the same full URL in all three log sites:

Suggested change
slog.Debug("http request", "method", "POST", "url", c.baseURL+"/api/agent/enroll")
resp, err := c.httpClient.Do(httpReq)
if err != nil {
slog.Debug("http error", "method", "POST", "url", "/api/agent/enroll", "error", err)
return nil, fmt.Errorf("enroll request failed: %w", err)
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
respBody, _ := io.ReadAll(resp.Body)
slog.Debug("http response", "method", "POST", "url", "/api/agent/enroll", "status", resp.StatusCode, "body", string(respBody))
return nil, fmt.Errorf("enroll failed (status %d): %s", resp.StatusCode, string(respBody))
}
var result EnrollResponse
if err := json.NewDecoder(resp.Body).Decode(&result); err != nil {
return nil, fmt.Errorf("decode enroll response: %w", err)
}
slog.Debug("http response", "method", "POST", "url", "/api/agent/enroll", "status", 200)
endpoint := c.baseURL + "/api/agent/enroll"
slog.Debug("http request", "method", "POST", "url", endpoint)
// ...
slog.Debug("http error", "method", "POST", "url", endpoint, "error", err)
// ...
slog.Debug("http response", "method", "POST", "url", endpoint, "status", resp.StatusCode, "body", string(respBody))

Apply the same fix to GetConfig() and SendHeartbeat() methods for consistency.

Add " to the control character strip regex to prevent intra-line
field forgery via quote injection in the unauthenticated enrollment
endpoint.
team: { select: { id: true } },
},
});
console.log(`[enroll] found ${environments.length} candidate environment(s)`);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always-on pre-auth environment count log

This console.log fires on every enrollment attempt — including unauthenticated and brute-force ones — before any token verification has occurred. It unconditionally writes the number of environments with active enrollment tokens to production logs.

Two concerns:

  1. Log volume: A brute-force attacker can cause one [enroll] found N candidate environment(s) line per attempt to be emitted, flooding production logs. The existing [enroll] REJECTED line already captures this count on failed attempts, making this line redundant.
  2. Pre-auth information disclosure: Anyone with read access to server logs can determine how many environments have enrollment tokens configured from any failed attempt, before the request is authenticated.

Consider removing this line, or gating it behind a debug logger, since the rejected/success lines already include the checked count.

Comment on lines +67 to +68
fmt.Printf("Enrolled as node %s in environment %q\n", resp.NodeID, resp.EnvironmentName)
slog.Info("enrolled successfully", "nodeId", resp.NodeID, "environment", resp.EnvironmentName)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate enrollment success output

Both fmt.Printf (stdout) and slog.Info (structured logger) emit the same enrollment success message. Under a typical deployment the structured logger already writes to stdout, so operators see two identical lines on every enrollment:

Enrolled as node abc123 in environment "production"
time=... level=INFO msg="enrolled successfully" nodeId=abc123 environment=production

Consider removing the fmt.Printf and relying solely on slog.Info, which is the consistent pattern used everywhere else in this file and in agent.go.

Comment on lines +61 to +79
slog.Debug("http request", "method", "POST", "url", c.baseURL+"/api/agent/enroll")
resp, err := c.httpClient.Do(httpReq)
if err != nil {
slog.Debug("http error", "method", "POST", "url", "/api/agent/enroll", "error", err)
return nil, fmt.Errorf("enroll request failed: %w", err)
}
defer resp.Body.Close()

if resp.StatusCode != http.StatusOK {
respBody, _ := io.ReadAll(resp.Body)
slog.Debug("http response", "method", "POST", "url", "/api/agent/enroll", "status", resp.StatusCode, "body", string(respBody))
return nil, fmt.Errorf("enroll failed (status %d): %s", resp.StatusCode, string(respBody))
}

var result EnrollResponse
if err := json.NewDecoder(resp.Body).Decode(&result); err != nil {
return nil, fmt.Errorf("decode enroll response: %w", err)
}
slog.Debug("http response", "method", "POST", "url", "/api/agent/enroll", "status", 200)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent URL representation in request vs. error/response debug logs

The request log uses the fully-qualified URL (c.baseURL + "/api/agent/enroll") while the subsequent error and response logs drop back to just the relative path ("/api/agent/enroll"). This pattern is repeated for all three methods (Enroll, GetConfig, SendHeartbeat).

In practice this means that when an http error or http response log line appears you lose the hostname/port context needed to identify which server instance the agent was talking to — which is exactly the information most useful during a 401 enrollment failure investigation, particularly when debugging multi-environment deployments.

Consider using the same full URL in all three log sites for consistency and debuggability.

Remove fmt.Printf that duplicated the slog.Info enrollment
success message, causing two identical lines on stdout.
metadata: { enrolledVia: "agent" },
},
});
console.log(`[enroll] SUCCESS -- node ${node.id} enrolled in "${matchedEnv.name}"`);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

matchedEnv.name embedded in log without sanitization

safeHostname and safeVersion are sanitized with /[\r\n\t"]/g before being interpolated into log lines, but matchedEnv.name on this success log is used raw. Environment names are admin-controlled (not from the enrollment request), so the risk is low — but if an admin sets a name containing a newline or a double-quote, it bypasses the same injection guard that was carefully applied to the request fields on lines 27–28.

For consistency, apply the same sanitization here:

Suggested change
console.log(`[enroll] SUCCESS -- node ${node.id} enrolled in "${matchedEnv.name}"`);
console.log(`[enroll] SUCCESS -- node ${node.id} enrolled in "${matchedEnv.name.replace(/[\r\n\t"]/g, " ")}"`);

Context Used: Rule from dashboard - ## Security & Cryptography Review Rules

When reviewing changes to authentication, authorization, en... (source)

Comment on lines 18 to +19
if (!parsed.success) {
console.error("[enroll] invalid input:", parsed.error.flatten().fieldErrors);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always-on validation-error log on unauthenticated endpoint

console.error("[enroll] invalid input:", ...) fires unconditionally for every malformed request to an unauthenticated endpoint. The enrollment endpoint requires no prior authentication, so a flood of crafted malformed requests will emit one console.error per request to production logs indefinitely — there is no rate limit or gating here.

The payload is parsed.error.flatten().fieldErrors, which contains only field names (e.g. { hostname: ["Required"] }), not actual input values, so this doesn't leak user data. The concern is log volume under a malformed-request DoS. Consider dropping this line (Zod validation errors are already returned to the caller in the 400 response body) or downgrading to a debug-only log.

@TerrifiedBug TerrifiedBug merged commit fc1c6eb into main Mar 5, 2026
9 checks passed
@TerrifiedBug TerrifiedBug deleted the feat/agent-diagnostic-logging branch March 5, 2026 11:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant