Skip to content

feat: dev binary channel with CLI flags and auto-update#11

Merged
TerrifiedBug merged 12 commits intomainfrom
feat/dev-binary-channel
Mar 5, 2026
Merged

feat: dev binary channel with CLI flags and auto-update#11
TerrifiedBug merged 12 commits intomainfrom
feat/dev-binary-channel

Conversation

@TerrifiedBug
Copy link
Owner

Summary

  • Add --version and --help CLI flags to the agent binary
  • CI publishes dev binaries as a rolling dev pre-release on every push to main
  • Install script supports --channel dev for dev binary installs
  • Server tracks dev agent latest version separately from stable
  • Fleet UI detects dev agent updates and offers self-update
  • Cross-channel safety: dev agents never offered release updates (or vice versa)

Test plan

  • vf-agent --version prints version and exits
  • vf-agent --help prints usage and exits
  • CI builds and publishes dev binaries on push to main
  • install.sh --channel dev downloads from dev pre-release
  • Fleet page shows "Update available" for dev agents with outdated SHA
  • Update button triggers self-update with correct dev binary URL
  • Stable agents unaffected — no cross-channel update suggestions

@greptile-apps
Copy link

greptile-apps bot commented Mar 5, 2026

Greptile Summary

This PR introduces a dev binary channel end-to-end: CI publishes rolling dev pre-releases on every push to main, install.sh gains a --channel dev flag, vf-agent gains --version/--help CLI flags, the server tracks dev agent versions and checksums in a new DB migration, and the Fleet UI routes update offers through a per-node getNodeLatest helper that enforces cross-channel safety via isVersionOlder's new SHA-comparison path.

Key points:

  • The concurrency: group: dev-release, cancel-in-progress: true configuration in CI correctly prevents the previously-identified race condition on concurrent pushes to main.
  • The Prisma migration (20260305000000_add_dev_agent_version) covers all four new schema fields, including latestAgentChecksums which addresses the previously flagged checksum persistence gap for the stable channel.
  • checkDevAgentVersion correctly bumps checkedAt inside if (release) when dev-version.txt is missing, but leaves the timestamp un-updated when fetchDevRelease() itself returns null — meaning repeated GitHub API calls will occur with no cooldown if the dev tag is absent or the API is temporarily unavailable.
  • install.sh validates --channel dev + --version mutual exclusion but does not reject unknown channel names, causing typos to silently fall through to stable behaviour.

Confidence Score: 3/5

  • Safe to merge after addressing the rate-limit bypass in checkDevAgentVersion and the unvalidated --channel argument in the install script.
  • The core logic — cross-channel safety, checksum persistence, CI concurrency group, and the dev SHA comparison — is well-implemented and correctly addresses the issues identified in earlier review threads. Two new issues were found: fetchDevRelease() returning null silently disables the 24-hour rate-limit cooldown (a logic bug that could exhaust GitHub's unauthenticated API limit), and invalid --channel values in install.sh are not rejected. Neither is a security concern, but the first could cause operational problems in edge cases.
  • src/server/services/version-check.ts — the outer fetchDevRelease() null path needs a checkedAt upsert to enforce rate limiting.

Important Files Changed

Filename Overview
src/server/services/version-check.ts Adds checkDevAgentVersion service and persists stable agent checksums; the fetchDevRelease() null case skips the timestamp upsert entirely, bypassing the 24-hour rate-limiting cooldown and enabling unbounded GitHub API calls.
src/app/(dashboard)/fleet/page.tsx Adds getNodeLatest helper to route dev vs. stable version/checksum data per node; cross-channel safety is correctly enforced via isVersionOlder, and the dev tag is used directly in the download URL for dev agents.
src/lib/version.ts Clean extension of isVersionOlder with dev-SHA comparison and cross-channel guard; logic is correct and well-structured.
agent/install.sh Adds --channel dev support with correct URL routing and mutual-exclusion guard with --version; unknown channel values silently fall through to the stable path instead of being rejected.
.github/workflows/ci.yml Adds agent-dev-binaries job with correct concurrency group and cancel-in-progress: true, building and publishing amd64+arm64 dev binaries and dev-version.txt as a rolling pre-release on every push to main.
agent/main.go Adds --version/-v and --help/-h CLI flags cleanly before config loading; straightforward and correct.
prisma/migrations/20260305000000_add_dev_agent_version/migration.sql Adds the four new nullable columns (latestAgentChecksums, latestDevAgentRelease, latestDevAgentReleaseCheckedAt, latestDevAgentChecksums) to SystemSettings; correctly matches the schema changes.
src/server/routers/settings.ts Extends checkVersion to include devAgent data from checkDevAgentVersion; runs all three checks in parallel via Promise.all.
prisma/schema.prisma Adds four new nullable fields to SystemSettings for dev and stable agent checksums and dev agent version tracking; all covered by the accompanying migration.

Sequence Diagram

sequenceDiagram
    participant CI as GitHub Actions CI
    participant GH as GitHub Releases
    participant Browser as Fleet UI
    participant tRPC as settings.checkVersion
    participant DB as PostgreSQL
    participant Agent as vf-agent (node)

    Note over CI,GH: On every push to main
    CI->>CI: Build vf-agent-linux-amd64 + arm64 (dev-SHA)
    CI->>CI: Write dev-version.txt (e.g. dev-abc1234)
    CI->>GH: gh release delete dev (idempotent)
    CI->>GH: gh release create dev (prerelease, binaries + checksums)

    Note over Browser,Agent: Fleet page load
    Browser->>tRPC: checkVersion (staleTime: Infinity)
    tRPC->>DB: findUnique SystemSettings (×3 parallel)
    alt needsCheck (>24h or force)
        tRPC->>GH: GET /releases/latest (stable agent)
        tRPC->>GH: GET /releases/tags/dev (dev agent)
        GH-->>tRPC: release metadata + assets
        tRPC->>GH: GET dev-version.txt asset
        tRPC->>GH: GET checksums.txt assets
        tRPC->>DB: upsert SystemSettings (versions + checksums)
    end
    tRPC-->>Browser: { agent, devAgent: { latestVersion, checksums } }

    Browser->>Browser: getNodeLatest(node) — route by dev- prefix
    alt node.agentVersion starts with "dev-"
        Browser->>Browser: compare SHA: current !== latestDevVersion
        Browser-->>Browser: show "Update available" badge
        Browser->>tRPC: fleet.triggerAgentUpdate (tag="dev", url=.../dev/vf-agent-linux-amd64)
        tRPC->>Agent: pendingAction → self-update
    else stable node
        Browser->>Browser: semver compare
        Browser->>tRPC: fleet.triggerAgentUpdate (tag="v{ver}", stable binary)
        tRPC->>Agent: pendingAction → self-update
    end
Loading

Last reviewed commit: 3efbfc1

Comment on lines +415 to +418
latestServerReleaseCheckedAt DateTime?
latestAgentRelease String?
latestAgentReleaseCheckedAt DateTime?
latestDevAgentRelease String?
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing Prisma migration for new fields

Two new columns are added to SystemSettings (latestDevAgentRelease, latestDevAgentReleaseCheckedAt), but no corresponding migration file exists in prisma/migrations/. The most recent migration is 20260304000000_add_backup_settings — nothing covers these fields.

At runtime, Prisma's generated client will expect these columns to exist in PostgreSQL. Without a migration, queries in checkDevAgentVersion that read settings?.latestDevAgentRelease and settings?.latestDevAgentReleaseCheckedAt will fail (or silently return undefined) on a production database that hasn't had this column added.

Per the project conventions, a migration needs to be generated with:

npx prisma migrate dev --name add_dev_agent_version

and the resulting file committed alongside the schema change.

Prompt To Fix With AI
This is a comment left during a code review.
Path: prisma/schema.prisma
Line: 415-418

Comment:
**Missing Prisma migration for new fields**

Two new columns are added to `SystemSettings` (`latestDevAgentRelease`, `latestDevAgentReleaseCheckedAt`), but no corresponding migration file exists in `prisma/migrations/`. The most recent migration is `20260304000000_add_backup_settings` — nothing covers these fields.

At runtime, Prisma's generated client will expect these columns to exist in PostgreSQL. Without a migration, queries in `checkDevAgentVersion` that read `settings?.latestDevAgentRelease` and `settings?.latestDevAgentReleaseCheckedAt` will fail (or silently return `undefined`) on a production database that hasn't had this column added.

Per the project conventions, a migration needs to be generated with:
```
npx prisma migrate dev --name add_dev_agent_version
```
and the resulting file committed alongside the schema change.

How can I resolve this? If you propose a fix, please make it concise.

if (needsCheck) {
const release = await fetchDevRelease();
if (release) {
latestVersion = await fetchDevVersionString(release) ?? "dev-unknown";
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"dev-unknown" fallback triggers false update notifications

When fetchDevVersionString returns null (e.g., the dev-version.txt asset is missing from the release), latestVersion is set to the string "dev-unknown". The isVersionOlder function in src/lib/version.ts then compares current !== latest for dev-to-dev checks.

This means any running dev agent (e.g. "dev-abc1234") will always be considered "outdated" when the server can't determine the real latest SHA, because "dev-abc1234" !== "dev-unknown" is true. The Fleet UI would show an "Update available" badge and — worse — clicking Update would dispatch an update action with targetVersion: "dev-unknown" and whatever incomplete checksums were fetched.

A safer approach is to return null and skip the update offer when the version string cannot be determined:

Suggested change
latestVersion = await fetchDevVersionString(release) ?? "dev-unknown";
latestVersion = await fetchDevVersionString(release) ?? null;

Then in the upsert, only write to the DB if latestVersion is non-null, and the caller already handles latestVersion: null safely.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/server/services/version-check.ts
Line: 216

Comment:
**`"dev-unknown"` fallback triggers false update notifications**

When `fetchDevVersionString` returns `null` (e.g., the `dev-version.txt` asset is missing from the release), `latestVersion` is set to the string `"dev-unknown"`. The `isVersionOlder` function in `src/lib/version.ts` then compares `current !== latest` for dev-to-dev checks.

This means any running dev agent (e.g. `"dev-abc1234"`) will always be considered "outdated" when the server can't determine the real latest SHA, because `"dev-abc1234" !== "dev-unknown"` is `true`. The Fleet UI would show an "Update available" badge and — worse — clicking Update would dispatch an update action with `targetVersion: "dev-unknown"` and whatever incomplete checksums were fetched.

A safer approach is to return `null` and skip the update offer when the version string cannot be determined:

```suggestion
      latestVersion = await fetchDevVersionString(release) ?? null;
```

Then in the `upsert`, only write to the DB if `latestVersion` is non-null, and the caller already handles `latestVersion: null` safely.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +209 to +234
let latestVersion = settings?.latestDevAgentRelease ?? null;
let checksums: Record<string, string> = {};
let checkedAt: Date | null = lastChecked ?? null;

if (needsCheck) {
const release = await fetchDevRelease();
if (release) {
latestVersion = await fetchDevVersionString(release) ?? "dev-unknown";
checksums = await fetchChecksums(release);
checkedAt = new Date();
await prisma.systemSettings.upsert({
where: { id: "singleton" },
update: {
latestDevAgentRelease: latestVersion,
latestDevAgentReleaseCheckedAt: checkedAt,
},
create: {
id: "singleton",
latestDevAgentRelease: latestVersion,
latestDevAgentReleaseCheckedAt: checkedAt,
},
});
}
}

return { latestVersion, checksums, checkedAt };
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checksums are not persisted and will be empty on cache hits

The same pattern applies to the existing checkAgentVersion function: checksums are fetched from GitHub when needsCheck = true, but they are never written to the database. When a cached response is served (needsCheck = false), the function returns checksums: {}.

In fleet/page.tsx the versionQuery uses staleTime: Infinity, so the React Query client-side cache preserves checksums for the session lifetime. However, for any new session where the server-side 24-hour cache is still valid, the server returns empty checksums, causing the Update button to dispatch an action with checksum: "sha256:". In the agent's updater.go, this results in a guaranteed checksum mismatch error and a failed update.

Consider adding a checksums (JSON string) column to SystemSettings and persisting the parsed map alongside the version, mirroring how latestDevAgentRelease is stored. This affects both stable and dev channels.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/server/services/version-check.ts
Line: 209-234

Comment:
**Checksums are not persisted and will be empty on cache hits**

The same pattern applies to the existing `checkAgentVersion` function: `checksums` are fetched from GitHub when `needsCheck = true`, but they are never written to the database. When a cached response is served (`needsCheck = false`), the function returns `checksums: {}`.

In `fleet/page.tsx` the `versionQuery` uses `staleTime: Infinity`, so the React Query client-side cache preserves checksums for the session lifetime. However, for any new session where the server-side 24-hour cache is still valid, the server returns empty checksums, causing the Update button to dispatch an action with `checksum: "sha256:"`. In the agent's `updater.go`, this results in a guaranteed checksum mismatch error and a failed update.

Consider adding a `checksums` (JSON string) column to `SystemSettings` and persisting the parsed map alongside the version, mirroring how `latestDevAgentRelease` is stored. This affects both stable and dev channels.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 206 to 207
VF_TOKEN=${VF_TOKEN}
VF_DATA_DIR=${DATA_DIR}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VF_CHANNEL is written to the env file but never read by the agent

The install script writes VF_CHANNEL=${CHANNEL} to the environment file, which suggests the agent should be channel-aware. However, agent/internal/config/config.go does not read VF_CHANNEL, and it is absent from the Config struct. The --help output in agent/main.go also does not list it.

In practice, cross-channel safety is enforced entirely by the server-side isVersionOlder check on the version string prefix (dev-). The VF_CHANNEL env var in the env file is currently dead code and may confuse operators who try to rely on it.

Either remove the write from the install script or wire it up through config.go so the agent can use it (e.g., to include the channel in heartbeat payloads for server-side validation).

Prompt To Fix With AI
This is a comment left during a code review.
Path: agent/install.sh
Line: 206-207

Comment:
**`VF_CHANNEL` is written to the env file but never read by the agent**

The install script writes `VF_CHANNEL=${CHANNEL}` to the environment file, which suggests the agent should be channel-aware. However, `agent/internal/config/config.go` does not read `VF_CHANNEL`, and it is absent from the `Config` struct. The `--help` output in `agent/main.go` also does not list it.

In practice, cross-channel safety is enforced entirely by the server-side `isVersionOlder` check on the version string prefix (`dev-`). The `VF_CHANNEL` env var in the env file is currently dead code and may confuse operators who try to rely on it.

Either remove the write from the install script or wire it up through `config.go` so the agent can use it (e.g., to include the channel in heartbeat payloads for server-side validation).

How can I resolve this? If you propose a fix, please make it concise.

- Replace "dev-unknown" fallback with null to prevent false update
  notifications for dev agents when version string can't be fetched
- Add missing Prisma migration for latestDevAgentRelease fields
- Add comment clarifying VF_CHANNEL is documentation-only
Comment on lines +213 to +235
if (needsCheck) {
const release = await fetchDevRelease();
if (release) {
const versionString = await fetchDevVersionString(release);
if (versionString) {
latestVersion = versionString;
checksums = await fetchChecksums(release);
checkedAt = new Date();
await prisma.systemSettings.upsert({
where: { id: "singleton" },
update: {
latestDevAgentRelease: latestVersion,
latestDevAgentReleaseCheckedAt: checkedAt,
},
create: {
id: "singleton",
latestDevAgentRelease: latestVersion,
latestDevAgentReleaseCheckedAt: checkedAt,
},
});
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkedAt never updated when dev-version.txt is missing

When fetchDevVersionString returns null (e.g., the dev-version.txt asset is absent from the release), the upsert block is never reached (lines 221–232). This means latestDevAgentReleaseCheckedAt is not written to the database, so lastChecked remains null on the next invocation of checkDevAgentVersion.

On subsequent calls, needsCheck evaluates to true (line 204–207) and fetchDevRelease() is called again—with no 24-hour cooldown in effect. Under normal load, repeated API calls without rate limiting can exhaust GitHub's unauthenticated API limit (60 req/hr), causing 403 errors.

Fix: Always update checkedAt when the release fetch succeeds, even if the version string is unavailable:

if (needsCheck) {
  const release = await fetchDevRelease();
  if (release) {
    checkedAt = new Date();  // ← always bump the timestamp on successful fetch
    const versionString = await fetchDevVersionString(release);
    if (versionString) {
      latestVersion = versionString;
      checksums = await fetchChecksums(release);
    }
    await prisma.systemSettings.upsert({
      where: { id: "singleton" },
      update: {
        latestDevAgentRelease: latestVersion,
        latestDevAgentReleaseCheckedAt: checkedAt,
      },
      create: {
        id: "singleton",
        latestDevAgentRelease: latestVersion,
        latestDevAgentReleaseCheckedAt: checkedAt,
      },
    });
  }
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/server/services/version-check.ts
Line: 213-235

Comment:
`checkedAt` never updated when `dev-version.txt` is missing

When `fetchDevVersionString` returns `null` (e.g., the `dev-version.txt` asset is absent from the release), the `upsert` block is never reached (lines 221–232). This means `latestDevAgentReleaseCheckedAt` is not written to the database, so `lastChecked` remains `null` on the next invocation of `checkDevAgentVersion`.

On subsequent calls, `needsCheck` evaluates to `true` (line 204–207) and `fetchDevRelease()` is called again—with no 24-hour cooldown in effect. Under normal load, repeated API calls without rate limiting can exhaust GitHub's unauthenticated API limit (60 req/hr), causing 403 errors.

**Fix:** Always update `checkedAt` when the release fetch succeeds, even if the version string is unavailable:

```typescript
if (needsCheck) {
  const release = await fetchDevRelease();
  if (release) {
    checkedAt = new Date();  // ← always bump the timestamp on successful fetch
    const versionString = await fetchDevVersionString(release);
    if (versionString) {
      latestVersion = versionString;
      checksums = await fetchChecksums(release);
    }
    await prisma.systemSettings.upsert({
      where: { id: "singleton" },
      update: {
        latestDevAgentRelease: latestVersion,
        latestDevAgentReleaseCheckedAt: checkedAt,
      },
      create: {
        id: "singleton",
        latestDevAgentRelease: latestVersion,
        latestDevAgentReleaseCheckedAt: checkedAt,
      },
    });
  }
}
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +204 to +205
downloadUrl: `https://github.com/${AGENT_REPO}/releases/download/${latest.tag}/vf-agent-linux-amd64`,
checksum: `sha256:${latest.checksums["vf-agent-linux-amd64"] ?? ""}`,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded amd64 architecture breaks arm64 agent updates

The download URL and checksum key both hardcode vf-agent-linux-amd64, but the CI workflow explicitly builds and publishes both vf-agent-linux-amd64 and vf-agent-linux-arm64 for the dev channel (.github/workflows/ci.yml lines 169–170, 188–189). An arm64 agent (e.g., running on AWS Graviton or Raspberry Pi) that is offered an update will download and attempt to execute the amd64 binary, causing the agent to crash or fail to restart.

The VectorNode model includes an os field but no architecture information. To fix this, either:

  1. Add an arch field to VectorNode (persisted from agent heartbeat metadata)
  2. Infer the architecture from agentVersion or metadata if available
  3. Default to amd64 with an override mechanism for known arm64 deployments

Then use the per-node architecture in the update command:

const arch = node.arch ?? "amd64"; // e.g. "amd64" | "arm64"
triggerUpdate.mutate({
  nodeId: node.id,
  targetVersion: latest.version!,
  downloadUrl: `https://github.com/${AGENT_REPO}/releases/download/${latest.tag}/vf-agent-linux-${arch}`,
  checksum: `sha256:${latest.checksums[`vf-agent-linux-${arch}`] ?? ""}`,
});
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/app/(dashboard)/fleet/page.tsx
Line: 204-205

Comment:
Hardcoded `amd64` architecture breaks arm64 agent updates

The download URL and checksum key both hardcode `vf-agent-linux-amd64`, but the CI workflow explicitly builds and publishes both `vf-agent-linux-amd64` and `vf-agent-linux-arm64` for the dev channel (`.github/workflows/ci.yml` lines 169–170, 188–189). An arm64 agent (e.g., running on AWS Graviton or Raspberry Pi) that is offered an update will download and attempt to execute the `amd64` binary, causing the agent to crash or fail to restart.

The `VectorNode` model includes an `os` field but no architecture information. To fix this, either:
1. Add an `arch` field to `VectorNode` (persisted from agent heartbeat metadata)
2. Infer the architecture from `agentVersion` or `metadata` if available
3. Default to `amd64` with an override mechanism for known arm64 deployments

Then use the per-node architecture in the update command:

```typescript
const arch = node.arch ?? "amd64"; // e.g. "amd64" | "arm64"
triggerUpdate.mutate({
  nodeId: node.id,
  targetVersion: latest.version!,
  downloadUrl: `https://github.com/${AGENT_REPO}/releases/download/${latest.tag}/vf-agent-linux-${arch}`,
  checksum: `sha256:${latest.checksums[`vf-agent-linux-${arch}`] ?? ""}`,
});
```

How can I resolve this? If you propose a fix, please make it concise.

The agent infers its channel from the version string prefix (dev- vs
semver). Writing VF_CHANNEL to the env file served no purpose.
Previously, checksums fetched from GitHub were only held in memory.
After a server restart within the 24h cache window, the version was
loaded from the DB but checksums were empty, causing agent self-update
checksum verification to fail.

Store checksums as JSON in SystemSettings for both stable and dev
channels, and load them on cache hits.
Comment on lines +156 to 195
- uses: actions/checkout@v4

- uses: actions/setup-go@v5
with:
go-version: "1.22"
cache-dependency-path: agent/go.sum

- name: Build dev binaries
working-directory: agent
run: |
SHORT_SHA="${GITHUB_SHA::7}"
VERSION="dev-${SHORT_SHA}"
LDFLAGS="-s -w -X github.com/TerrifiedBug/vectorflow/agent/internal/agent.Version=${VERSION}"
CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -ldflags="${LDFLAGS}" -o ../vf-agent-linux-amd64 .
CGO_ENABLED=0 GOOS=linux GOARCH=arm64 go build -ldflags="${LDFLAGS}" -o ../vf-agent-linux-arm64 .
echo "${VERSION}" > ../dev-version.txt

- name: Generate checksums
run: sha256sum vf-agent-linux-* > checksums.txt

- name: Publish dev pre-release
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
# Delete existing dev release if present (gh errors if not found, ignore)
gh release delete dev --yes --cleanup-tag 2>/dev/null || true
# Create fresh pre-release pointing at current commit
gh release create dev \
--title "Development Build" \
--notes "Rolling dev build from \`${GITHUB_SHA::7}\` on main. Not for production use." \
--target "${GITHUB_SHA}" \
--prerelease \
vf-agent-linux-amd64 \
vf-agent-linux-arm64 \
checksums.txt \
dev-version.txt

agent-binaries:
name: Agent Binaries
needs: check
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dev release race condition on concurrent pushes

The agent-dev-binaries job has no concurrency group. If two commits land on main in quick succession, two workflow runs will overlap. The delete → create sequence is not atomic, so the following interleaving is possible:

  1. Run A (older SHA): gh release delete dev ← removes dev
  2. Run A: gh release create dev ← publishes commit-A binaries
  3. Run B (newer SHA): gh release delete dev ← removes commit-A binaries
  4. Run A (still running): nothing left to do, exits cleanly
  5. Run B: gh release create dev ← publishes commit-B binaries ✓ (correct final state)

…or, if scheduling flips steps 2 and 3:

  1. Run A: gh release delete dev
  2. Run B: gh release delete dev (already gone — silently ignored with || true)
  3. Run B: gh release create dev ← publishes commit-B binaries
  4. Run A: gh release create devFAILS with "release already exists"

In scenario 4, gh release create exits non-zero and the step fails, causing the newer commit's release to persist — but subsequent CI runs will surface a noisy failure that needs to be retried manually.

Add a job-level concurrency key to cancel in-progress runs:

  agent-dev-binaries:
    name: Agent Dev Binaries
    needs: check
    if: github.event_name == 'push' && !startsWith(github.ref, 'refs/tags/v')
    runs-on: ubuntu-latest
    concurrency:
      group: dev-release
      cancel-in-progress: true

With cancel-in-progress: true, only the latest push's job ever reaches the delete/create steps, making the operation effectively atomic.

Prompt To Fix With AI
This is a comment left during a code review.
Path: .github/workflows/ci.yml
Line: 156-195

Comment:
**Dev release race condition on concurrent pushes**

The `agent-dev-binaries` job has no `concurrency` group. If two commits land on `main` in quick succession, two workflow runs will overlap. The delete → create sequence is not atomic, so the following interleaving is possible:

1. Run A (older SHA): `gh release delete dev` ← removes dev
2. Run A: `gh release create dev` ← publishes commit-A binaries
3. Run B (newer SHA): `gh release delete dev` ← removes commit-A binaries
4. Run A (still running): nothing left to do, exits cleanly
5. Run B: `gh release create dev` ← publishes commit-B binaries ✓ (correct final state)

…or, if scheduling flips steps 2 and 3:

1. Run A: `gh release delete dev`
2. Run B: `gh release delete dev` (already gone — silently ignored with `|| true`)
3. Run B: `gh release create dev` ← publishes commit-B binaries
4. Run A: `gh release create dev`**FAILS** with "release already exists"

In scenario 4, `gh release create` exits non-zero and the step fails, causing the newer commit's release to persist — but subsequent CI runs will surface a noisy failure that needs to be retried manually.

Add a job-level `concurrency` key to cancel in-progress runs:

```yaml
  agent-dev-binaries:
    name: Agent Dev Binaries
    needs: check
    if: github.event_name == 'push' && !startsWith(github.ref, 'refs/tags/v')
    runs-on: ubuntu-latest
    concurrency:
      group: dev-release
      cancel-in-progress: true
```

With `cancel-in-progress: true`, only the latest push's job ever reaches the delete/create steps, making the operation effectively atomic.

How can I resolve this? If you propose a fix, please make it concise.

if (node.agentVersion?.startsWith("dev-")) {
return { version: latestDevAgentVersion, checksums: devAgentChecksums, tag: "dev" };
}
return { version: latestAgentVersion, checksums: agentChecksums, tag: `v${latestAgentVersion}` };
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tag eagerly built with potentially-null version

latestAgentVersion is typed string | null (it's null when no stable release has been fetched yet). The template literal \v${latestAgentVersion}`is evaluated unconditionally, producing"vnull"when the version is absent. The downstream&&guard ongetNodeLatest(node).versionprevents the broken URL from being used in the current render logic, but this is a latent trap — any future code that reaches into.tagwithout checking.versionfirst will silently dispatch an update request to.../releases/download/vnull/vf-agent-linux-amd64`.

Suggested change
return { version: latestAgentVersion, checksums: agentChecksums, tag: `v${latestAgentVersion}` };
return { version: latestAgentVersion, checksums: agentChecksums, tag: latestAgentVersion ? `v${latestAgentVersion}` : null };

Then narrow the type at the call site: latest.tag!latest.tag ?? "" (or guard on it like version).

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/app/(dashboard)/fleet/page.tsx
Line: 79

Comment:
**Tag eagerly built with potentially-null version**

`latestAgentVersion` is typed `string | null` (it's `null` when no stable release has been fetched yet). The template literal `\`v${latestAgentVersion}\`` is evaluated unconditionally, producing `"vnull"` when the version is absent. The downstream `&&` guard on `getNodeLatest(node).version` prevents the broken URL from being used in the current render logic, but this is a latent trap — any future code that reaches into `.tag` without checking `.version` first will silently dispatch an update request to `.../releases/download/vnull/vf-agent-linux-amd64`.

```suggestion
    return { version: latestAgentVersion, checksums: agentChecksums, tag: latestAgentVersion ? `v${latestAgentVersion}` : null };
```

Then narrow the type at the call site: `latest.tag!``latest.tag ?? ""` (or guard on it like `version`).

How can I resolve this? If you propose a fix, please make it concise.

- Always update checkedAt when dev release fetch succeeds, even if
  dev-version.txt is missing, to prevent GitHub API rate limit exhaustion
- Add concurrency group to agent-dev-binaries CI job to prevent race
  conditions on concurrent pushes to main
- Guard stable tag construction against null version to avoid "vnull" URL
Comment on lines +217 to +240
if (needsCheck) {
const release = await fetchDevRelease();
if (release) {
checkedAt = new Date();
const versionString = await fetchDevVersionString(release);
if (versionString) {
latestVersion = versionString;
checksums = await fetchChecksums(release);
}
await prisma.systemSettings.upsert({
where: { id: "singleton" },
update: {
latestDevAgentRelease: latestVersion,
latestDevAgentReleaseCheckedAt: checkedAt,
...(versionString ? { latestDevAgentChecksums: JSON.stringify(checksums) } : {}),
},
create: {
id: "singleton",
latestDevAgentRelease: latestVersion,
latestDevAgentReleaseCheckedAt: checkedAt,
...(versionString ? { latestDevAgentChecksums: JSON.stringify(checksums) } : {}),
},
});
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outer release null case bypasses rate-limiting entirely

When fetchDevRelease() returns null (e.g., GitHub API is down, the dev release tag doesn't exist yet after the first push, or the request is rate-limited), the if (release) block is skipped, so latestDevAgentReleaseCheckedAt is never written to the database. lastChecked remains null or stale, and needsCheck evaluates to true on every subsequent call — causing unbounded GitHub API requests with no cooldown period.

This is distinct from the previously flagged versionString null case: when release is non-null but versionString is null, the upsert at line 226 still runs (updating latestDevAgentReleaseCheckedAt) because checkedAt = new Date() is set unconditionally inside if (release). The outer null case has no equivalent protection.

Consider updating checkedAt even when the release fetch fails, to enforce the 24-hour cooldown:

if (needsCheck) {
  checkedAt = new Date(); // bump timestamp even on failed fetch
  const release = await fetchDevRelease();
  if (release) {
    const versionString = await fetchDevVersionString(release);
    if (versionString) {
      latestVersion = versionString;
      checksums = await fetchChecksums(release);
    }
  }
  await prisma.systemSettings.upsert({
    where: { id: "singleton" },
    update: {
      latestDevAgentRelease: latestVersion,
      latestDevAgentReleaseCheckedAt: checkedAt,
      ...(checksums && Object.keys(checksums).length > 0
        ? { latestDevAgentChecksums: JSON.stringify(checksums) }
        : {}),
    },
    create: { /* ... */ },
  });
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/server/services/version-check.ts
Line: 217-240

Comment:
**Outer `release` null case bypasses rate-limiting entirely**

When `fetchDevRelease()` returns `null` (e.g., GitHub API is down, the `dev` release tag doesn't exist yet after the first push, or the request is rate-limited), the `if (release)` block is skipped, so `latestDevAgentReleaseCheckedAt` is never written to the database. `lastChecked` remains `null` or stale, and `needsCheck` evaluates to `true` on every subsequent call — causing unbounded GitHub API requests with no cooldown period.

This is distinct from the previously flagged `versionString` null case: when `release` is non-null but `versionString` is null, the upsert at line 226 still runs (updating `latestDevAgentReleaseCheckedAt`) because `checkedAt = new Date()` is set unconditionally inside `if (release)`. The outer null case has no equivalent protection.

Consider updating `checkedAt` even when the release fetch fails, to enforce the 24-hour cooldown:

```typescript
if (needsCheck) {
  checkedAt = new Date(); // bump timestamp even on failed fetch
  const release = await fetchDevRelease();
  if (release) {
    const versionString = await fetchDevVersionString(release);
    if (versionString) {
      latestVersion = versionString;
      checksums = await fetchChecksums(release);
    }
  }
  await prisma.systemSettings.upsert({
    where: { id: "singleton" },
    update: {
      latestDevAgentRelease: latestVersion,
      latestDevAgentReleaseCheckedAt: checkedAt,
      ...(checksums && Object.keys(checksums).length > 0
        ? { latestDevAgentChecksums: JSON.stringify(checksums) }
        : {}),
    },
    create: { /* ... */ },
  });
}
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 68 to 82
--url) VF_URL="$2"; shift 2 ;;
--token) VF_TOKEN="$2"; shift 2 ;;
--version) VERSION="$2"; shift 2 ;;
--channel) CHANNEL="$2"; shift 2 ;;
--help) usage ;;
*) fatal "Unknown option: $1 (use --help for usage)" ;;
esac
done

if [ "${CHANNEL}" = "dev" ] && [ "${VERSION}" != "latest" ]; then
fatal "--channel dev and --version are mutually exclusive"
fi

# ─────────────────────────────────────────────────
# Preflight checks
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unknown --channel values silently fall through to stable

The --channel flag only rejects the combination of dev + --version, but does not validate that the channel value itself is one of the supported options (stable | dev). Passing --channel foo silently treats the install as a stable-channel install, which can confuse operators who mistype the channel name.

Suggested change
--url) VF_URL="$2"; shift 2 ;;
--token) VF_TOKEN="$2"; shift 2 ;;
--version) VERSION="$2"; shift 2 ;;
--channel) CHANNEL="$2"; shift 2 ;;
--help) usage ;;
*) fatal "Unknown option: $1 (use --help for usage)" ;;
esac
done
if [ "${CHANNEL}" = "dev" ] && [ "${VERSION}" != "latest" ]; then
fatal "--channel dev and --version are mutually exclusive"
fi
# ─────────────────────────────────────────────────
# Preflight checks
if [ "${CHANNEL}" = "dev" ] && [ "${VERSION}" != "latest" ]; then
fatal "--channel dev and --version are mutually exclusive"
fi
if [ "${CHANNEL}" != "stable" ] && [ "${CHANNEL}" != "dev" ]; then
fatal "Unknown channel '${CHANNEL}'. Valid values are: stable, dev"
fi
Prompt To Fix With AI
This is a comment left during a code review.
Path: agent/install.sh
Line: 68-82

Comment:
**Unknown `--channel` values silently fall through to stable**

The `--channel` flag only rejects the combination of `dev` + `--version`, but does not validate that the channel value itself is one of the supported options (`stable` | `dev`). Passing `--channel foo` silently treats the install as a stable-channel install, which can confuse operators who mistype the channel name.

```suggestion
if [ "${CHANNEL}" = "dev" ] && [ "${VERSION}" != "latest" ]; then
    fatal "--channel dev and --version are mutually exclusive"
fi
if [ "${CHANNEL}" != "stable" ] && [ "${CHANNEL}" != "dev" ]; then
    fatal "Unknown channel '${CHANNEL}'. Valid values are: stable, dev"
fi
```

How can I resolve this? If you propose a fix, please make it concise.

@TerrifiedBug TerrifiedBug merged commit 10b09a1 into main Mar 5, 2026
1 check passed
@TerrifiedBug TerrifiedBug deleted the feat/dev-binary-channel branch March 5, 2026 16:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant