Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 46 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,52 @@ jobs:
cache-from: type=gha
cache-to: type=gha,mode=max

agent-dev-binaries:
name: Agent Dev Binaries
needs: check
if: github.event_name == 'push' && !startsWith(github.ref, 'refs/tags/v')
runs-on: ubuntu-latest
concurrency:
group: dev-release
cancel-in-progress: true
steps:
- uses: actions/checkout@v4

- uses: actions/setup-go@v5
with:
go-version: "1.22"
cache-dependency-path: agent/go.sum

- name: Build dev binaries
working-directory: agent
run: |
SHORT_SHA="${GITHUB_SHA::7}"
VERSION="dev-${SHORT_SHA}"
LDFLAGS="-s -w -X github.com/TerrifiedBug/vectorflow/agent/internal/agent.Version=${VERSION}"
CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -ldflags="${LDFLAGS}" -o ../vf-agent-linux-amd64 .
CGO_ENABLED=0 GOOS=linux GOARCH=arm64 go build -ldflags="${LDFLAGS}" -o ../vf-agent-linux-arm64 .
echo "${VERSION}" > ../dev-version.txt

- name: Generate checksums
run: sha256sum vf-agent-linux-* > checksums.txt

- name: Publish dev pre-release
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
# Delete existing dev release if present (gh errors if not found, ignore)
gh release delete dev --yes --cleanup-tag 2>/dev/null || true
# Create fresh pre-release pointing at current commit
gh release create dev \
--title "Development Build" \
--notes "Rolling dev build from \`${GITHUB_SHA::7}\` on main. Not for production use." \
--target "${GITHUB_SHA}" \
--prerelease \
vf-agent-linux-amd64 \
vf-agent-linux-arm64 \
checksums.txt \
dev-version.txt

agent-binaries:
name: Agent Binaries
needs: check
Comment on lines +159 to 198
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dev release race condition on concurrent pushes

The agent-dev-binaries job has no concurrency group. If two commits land on main in quick succession, two workflow runs will overlap. The delete → create sequence is not atomic, so the following interleaving is possible:

  1. Run A (older SHA): gh release delete dev ← removes dev
  2. Run A: gh release create dev ← publishes commit-A binaries
  3. Run B (newer SHA): gh release delete dev ← removes commit-A binaries
  4. Run A (still running): nothing left to do, exits cleanly
  5. Run B: gh release create dev ← publishes commit-B binaries ✓ (correct final state)

…or, if scheduling flips steps 2 and 3:

  1. Run A: gh release delete dev
  2. Run B: gh release delete dev (already gone — silently ignored with || true)
  3. Run B: gh release create dev ← publishes commit-B binaries
  4. Run A: gh release create devFAILS with "release already exists"

In scenario 4, gh release create exits non-zero and the step fails, causing the newer commit's release to persist — but subsequent CI runs will surface a noisy failure that needs to be retried manually.

Add a job-level concurrency key to cancel in-progress runs:

  agent-dev-binaries:
    name: Agent Dev Binaries
    needs: check
    if: github.event_name == 'push' && !startsWith(github.ref, 'refs/tags/v')
    runs-on: ubuntu-latest
    concurrency:
      group: dev-release
      cancel-in-progress: true

With cancel-in-progress: true, only the latest push's job ever reaches the delete/create steps, making the operation effectively atomic.

Prompt To Fix With AI
This is a comment left during a code review.
Path: .github/workflows/ci.yml
Line: 156-195

Comment:
**Dev release race condition on concurrent pushes**

The `agent-dev-binaries` job has no `concurrency` group. If two commits land on `main` in quick succession, two workflow runs will overlap. The delete → create sequence is not atomic, so the following interleaving is possible:

1. Run A (older SHA): `gh release delete dev` ← removes dev
2. Run A: `gh release create dev` ← publishes commit-A binaries
3. Run B (newer SHA): `gh release delete dev` ← removes commit-A binaries
4. Run A (still running): nothing left to do, exits cleanly
5. Run B: `gh release create dev` ← publishes commit-B binaries ✓ (correct final state)

…or, if scheduling flips steps 2 and 3:

1. Run A: `gh release delete dev`
2. Run B: `gh release delete dev` (already gone — silently ignored with `|| true`)
3. Run B: `gh release create dev` ← publishes commit-B binaries
4. Run A: `gh release create dev`**FAILS** with "release already exists"

In scenario 4, `gh release create` exits non-zero and the step fails, causing the newer commit's release to persist — but subsequent CI runs will surface a noisy failure that needs to be retried manually.

Add a job-level `concurrency` key to cancel in-progress runs:

```yaml
  agent-dev-binaries:
    name: Agent Dev Binaries
    needs: check
    if: github.event_name == 'push' && !startsWith(github.ref, 'refs/tags/v')
    runs-on: ubuntu-latest
    concurrency:
      group: dev-release
      cancel-in-progress: true
```

With `cancel-in-progress: true`, only the latest push's job ever reaches the delete/create steps, making the operation effectively atomic.

How can I resolve this? If you propose a fix, please make it concise.

Expand Down
38 changes: 28 additions & 10 deletions agent/install.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ VECTOR_VERSION="0.44.0"
VF_URL=""
VF_TOKEN=""
VERSION="latest"
CHANNEL="stable"

# ─────────────────────────────────────────────────
# Helpers
Expand All @@ -39,6 +40,7 @@ Options:
--url <url> VectorFlow server URL (e.g. https://vectorflow.example.com)
--token <token> One-time enrollment token from the VectorFlow UI
--version <tag> Release version to install (default: latest)
--channel <name> Release channel: stable or dev (default: stable)
--help Show this help message

Examples:
Expand All @@ -50,6 +52,9 @@ Examples:

# Install specific version
curl -sSfL .../install.sh | sudo bash -s -- --version v0.3.0

# Install dev channel
curl -sSfL .../install.sh | sudo bash -s -- --channel dev --url https://vf.example.com --token abc123
EOF
exit 0
}
Expand All @@ -63,11 +68,16 @@ while [ $# -gt 0 ]; do
--url) VF_URL="$2"; shift 2 ;;
--token) VF_TOKEN="$2"; shift 2 ;;
--version) VERSION="$2"; shift 2 ;;
--channel) CHANNEL="$2"; shift 2 ;;
--help) usage ;;
*) fatal "Unknown option: $1 (use --help for usage)" ;;
esac
done

if [ "${CHANNEL}" = "dev" ] && [ "${VERSION}" != "latest" ]; then
fatal "--channel dev and --version are mutually exclusive"
fi

# ─────────────────────────────────────────────────
# Preflight checks
Comment on lines 68 to 82
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unknown --channel values silently fall through to stable

The --channel flag only rejects the combination of dev + --version, but does not validate that the channel value itself is one of the supported options (stable | dev). Passing --channel foo silently treats the install as a stable-channel install, which can confuse operators who mistype the channel name.

Suggested change
--url) VF_URL="$2"; shift 2 ;;
--token) VF_TOKEN="$2"; shift 2 ;;
--version) VERSION="$2"; shift 2 ;;
--channel) CHANNEL="$2"; shift 2 ;;
--help) usage ;;
*) fatal "Unknown option: $1 (use --help for usage)" ;;
esac
done
if [ "${CHANNEL}" = "dev" ] && [ "${VERSION}" != "latest" ]; then
fatal "--channel dev and --version are mutually exclusive"
fi
# ─────────────────────────────────────────────────
# Preflight checks
if [ "${CHANNEL}" = "dev" ] && [ "${VERSION}" != "latest" ]; then
fatal "--channel dev and --version are mutually exclusive"
fi
if [ "${CHANNEL}" != "stable" ] && [ "${CHANNEL}" != "dev" ]; then
fatal "Unknown channel '${CHANNEL}'. Valid values are: stable, dev"
fi
Prompt To Fix With AI
This is a comment left during a code review.
Path: agent/install.sh
Line: 68-82

Comment:
**Unknown `--channel` values silently fall through to stable**

The `--channel` flag only rejects the combination of `dev` + `--version`, but does not validate that the channel value itself is one of the supported options (`stable` | `dev`). Passing `--channel foo` silently treats the install as a stable-channel install, which can confuse operators who mistype the channel name.

```suggestion
if [ "${CHANNEL}" = "dev" ] && [ "${VERSION}" != "latest" ]; then
    fatal "--channel dev and --version are mutually exclusive"
fi
if [ "${CHANNEL}" != "stable" ] && [ "${CHANNEL}" != "dev" ]; then
    fatal "Unknown channel '${CHANNEL}'. Valid values are: stable, dev"
fi
```

How can I resolve this? If you propose a fix, please make it concise.

# ─────────────────────────────────────────────────
Expand Down Expand Up @@ -96,22 +106,30 @@ info "Detected architecture: ${ARCH}"
# Resolve version
# ─────────────────────────────────────────────────

if [ "${VERSION}" = "latest" ]; then
info "Resolving latest release..."
VERSION=$(curl -sSf "https://api.github.com/repos/${REPO}/releases/latest" \
| grep '"tag_name"' | sed -E 's/.*"([^"]+)".*/\1/')
[ -n "${VERSION}" ] || fatal "Could not determine latest release version"
if [ "${CHANNEL}" = "dev" ]; then
info "Using dev channel..."
VERSION="dev"
BINARY_NAME="vf-agent-linux-${ARCH}"
DOWNLOAD_URL="https://github.com/${REPO}/releases/download/dev/${BINARY_NAME}"
CHECKSUM_URL="https://github.com/${REPO}/releases/download/dev/checksums.txt"
else
if [ "${VERSION}" = "latest" ]; then
info "Resolving latest release..."
VERSION=$(curl -sSf "https://api.github.com/repos/${REPO}/releases/latest" \
| grep '"tag_name"' | sed -E 's/.*"([^"]+)".*/\1/')
[ -n "${VERSION}" ] || fatal "Could not determine latest release version"
fi
info "Target version: ${VERSION}"

BINARY_NAME="vf-agent-linux-${ARCH}"
DOWNLOAD_URL="https://github.com/${REPO}/releases/download/${VERSION}/${BINARY_NAME}"
CHECKSUM_URL="https://github.com/${REPO}/releases/download/${VERSION}/checksums.txt"
fi
info "Target version: ${VERSION}"

# ─────────────────────────────────────────────────
# Download and verify agent binary
# ─────────────────────────────────────────────────

BINARY_NAME="vf-agent-linux-${ARCH}"
DOWNLOAD_URL="https://github.com/${REPO}/releases/download/${VERSION}/${BINARY_NAME}"
CHECKSUM_URL="https://github.com/${REPO}/releases/download/${VERSION}/checksums.txt"

TMPDIR=$(mktemp -d)
trap 'rm -rf "${TMPDIR}"' EXIT

Expand Down
26 changes: 26 additions & 0 deletions agent/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,32 @@ import (
)

func main() {
if len(os.Args) > 1 {
switch os.Args[1] {
case "--version", "-v":
fmt.Printf("vf-agent %s\n", agent.Version)
os.Exit(0)
case "--help", "-h":
fmt.Print(`VectorFlow Agent

Usage: vf-agent [flags]

Flags:
--version, -v Print version and exit
--help, -h Show this help

Environment variables:
VF_URL Server URL (required)
VF_TOKEN Enrollment token
VF_DATA_DIR Data directory (default: /var/lib/vf-agent)
VF_VECTOR_BIN Path to Vector binary (default: vector)
VF_POLL_INTERVAL Poll interval duration (default: 15s)
VF_LOG_LEVEL Log level: debug|info|warn|error (default: info)
`)
os.Exit(0)
}
}

cfg, err := config.Load()
if err != nil {
fmt.Fprintf(os.Stderr, "config error: %v\n", err)
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
-- AlterTable
ALTER TABLE "SystemSettings" ADD COLUMN "latestAgentChecksums" TEXT;
ALTER TABLE "SystemSettings" ADD COLUMN "latestDevAgentRelease" TEXT;
ALTER TABLE "SystemSettings" ADD COLUMN "latestDevAgentReleaseCheckedAt" TIMESTAMP(3);
ALTER TABLE "SystemSettings" ADD COLUMN "latestDevAgentChecksums" TEXT;
12 changes: 8 additions & 4 deletions prisma/schema.prisma
Original file line number Diff line number Diff line change
Expand Up @@ -411,10 +411,14 @@ model SystemSettings {
lastBackupStatus String?
lastBackupError String?

latestServerRelease String?
latestServerReleaseCheckedAt DateTime?
latestAgentRelease String?
latestAgentReleaseCheckedAt DateTime?
latestServerRelease String?
latestServerReleaseCheckedAt DateTime?
latestAgentRelease String?
latestAgentReleaseCheckedAt DateTime?
latestAgentChecksums String? // JSON: {"vf-agent-linux-amd64":"sha256..."}
latestDevAgentRelease String?
latestDevAgentReleaseCheckedAt DateTime?
latestDevAgentChecksums String? // JSON: {"vf-agent-linux-amd64":"sha256..."}

updatedAt DateTime @updatedAt
}
Expand Down
28 changes: 19 additions & 9 deletions src/app/(dashboard)/fleet/page.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,15 @@ export default function FleetPage() {
);
const latestAgentVersion = versionQuery.data?.agent.latestVersion ?? null;
const agentChecksums = versionQuery.data?.agent.checksums ?? {};
const latestDevAgentVersion = versionQuery.data?.devAgent?.latestVersion ?? null;
const devAgentChecksums = versionQuery.data?.devAgent?.checksums ?? {};

const getNodeLatest = (node: { agentVersion: string | null }) => {
if (node.agentVersion?.startsWith("dev-")) {
return { version: latestDevAgentVersion, checksums: devAgentChecksums, tag: "dev" };
}
return { version: latestAgentVersion, checksums: agentChecksums, tag: latestAgentVersion ? `v${latestAgentVersion}` : null };
};

const triggerUpdate = useMutation(
trpc.fleet.triggerAgentUpdate.mutationOptions({
Expand Down Expand Up @@ -132,9 +141,9 @@ export default function FleetPage() {
<span className="font-mono text-sm text-muted-foreground">
{node.agentVersion ?? "—"}
</span>
{latestAgentVersion &&
{getNodeLatest(node).version &&
node.agentVersion &&
isVersionOlder(node.agentVersion, latestAgentVersion) && (
isVersionOlder(node.agentVersion, getNodeLatest(node).version ?? "") && (
<Badge variant="outline" className="text-amber-600">
Update available
</Badge>
Expand Down Expand Up @@ -165,9 +174,9 @@ export default function FleetPage() {
Update pending...
</Badge>
) : node.deploymentMode === "DOCKER" ? (
latestAgentVersion &&
getNodeLatest(node).version &&
node.agentVersion &&
isVersionOlder(node.agentVersion, latestAgentVersion) ? (
isVersionOlder(node.agentVersion, getNodeLatest(node).version ?? "") ? (
<Tooltip>
<TooltipTrigger asChild>
<span>
Expand All @@ -179,20 +188,21 @@ export default function FleetPage() {
<TooltipContent>Update via Docker image pull</TooltipContent>
</Tooltip>
) : null
) : latestAgentVersion &&
) : getNodeLatest(node).version &&
node.agentVersion &&
isVersionOlder(node.agentVersion, latestAgentVersion) ? (
isVersionOlder(node.agentVersion, getNodeLatest(node).version ?? "") ? (
<Button
variant="outline"
size="sm"
disabled={triggerUpdate.isPending}
onClick={(e) => {
e.preventDefault();
const latest = getNodeLatest(node);
triggerUpdate.mutate({
nodeId: node.id,
targetVersion: latestAgentVersion,
downloadUrl: `https://github.com/${AGENT_REPO}/releases/download/v${latestAgentVersion}/vf-agent-linux-amd64`,
checksum: `sha256:${agentChecksums["vf-agent-linux-amd64"] ?? ""}`,
targetVersion: latest.version!,
downloadUrl: `https://github.com/${AGENT_REPO}/releases/download/${latest.tag}/vf-agent-linux-amd64`,
checksum: `sha256:${latest.checksums["vf-agent-linux-amd64"] ?? ""}`,
Comment on lines +204 to +205
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded amd64 architecture breaks arm64 agent updates

The download URL and checksum key both hardcode vf-agent-linux-amd64, but the CI workflow explicitly builds and publishes both vf-agent-linux-amd64 and vf-agent-linux-arm64 for the dev channel (.github/workflows/ci.yml lines 169–170, 188–189). An arm64 agent (e.g., running on AWS Graviton or Raspberry Pi) that is offered an update will download and attempt to execute the amd64 binary, causing the agent to crash or fail to restart.

The VectorNode model includes an os field but no architecture information. To fix this, either:

  1. Add an arch field to VectorNode (persisted from agent heartbeat metadata)
  2. Infer the architecture from agentVersion or metadata if available
  3. Default to amd64 with an override mechanism for known arm64 deployments

Then use the per-node architecture in the update command:

const arch = node.arch ?? "amd64"; // e.g. "amd64" | "arm64"
triggerUpdate.mutate({
  nodeId: node.id,
  targetVersion: latest.version!,
  downloadUrl: `https://github.com/${AGENT_REPO}/releases/download/${latest.tag}/vf-agent-linux-${arch}`,
  checksum: `sha256:${latest.checksums[`vf-agent-linux-${arch}`] ?? ""}`,
});
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/app/(dashboard)/fleet/page.tsx
Line: 204-205

Comment:
Hardcoded `amd64` architecture breaks arm64 agent updates

The download URL and checksum key both hardcode `vf-agent-linux-amd64`, but the CI workflow explicitly builds and publishes both `vf-agent-linux-amd64` and `vf-agent-linux-arm64` for the dev channel (`.github/workflows/ci.yml` lines 169–170, 188–189). An arm64 agent (e.g., running on AWS Graviton or Raspberry Pi) that is offered an update will download and attempt to execute the `amd64` binary, causing the agent to crash or fail to restart.

The `VectorNode` model includes an `os` field but no architecture information. To fix this, either:
1. Add an `arch` field to `VectorNode` (persisted from agent heartbeat metadata)
2. Infer the architecture from `agentVersion` or `metadata` if available
3. Default to `amd64` with an override mechanism for known arm64 deployments

Then use the per-node architecture in the update command:

```typescript
const arch = node.arch ?? "amd64"; // e.g. "amd64" | "arm64"
triggerUpdate.mutate({
  nodeId: node.id,
  targetVersion: latest.version!,
  downloadUrl: `https://github.com/${AGENT_REPO}/releases/download/${latest.tag}/vf-agent-linux-${arch}`,
  checksum: `sha256:${latest.checksums[`vf-agent-linux-${arch}`] ?? ""}`,
});
```

How can I resolve this? If you propose a fix, please make it concise.

});
}}
>
Expand Down
21 changes: 19 additions & 2 deletions src/lib/version.ts
Original file line number Diff line number Diff line change
@@ -1,9 +1,26 @@
/**
* Returns true if `current` is an older semver than `latest`.
* Handles multi-digit segments correctly (e.g., "0.9.0" < "0.10.0").
* Returns true if `current` is an older version than `latest`.
*
* For release versions: standard semver comparison.
* For dev versions: true if SHAs differ (any difference = update available).
* Cross-channel (dev vs release): always false.
*/
export function isVersionOlder(current: string, latest: string): boolean {
const currentIsDev = current.startsWith("dev-");
const latestIsDev = latest.startsWith("dev-");

// Plain "dev" (local build, no SHA) — not trackable
if (current === "dev" || latest === "dev") return false;

// Cross-channel: never suggest updates
if (currentIsDev !== latestIsDev) return false;

// Dev-to-dev: different SHA means update available
if (currentIsDev && latestIsDev) {
return current !== latest;
}

// Release-to-release: semver comparison
const a = current.split(".").map(Number);
const b = latest.split(".").map(Number);
for (let i = 0; i < Math.max(a.length, b.length); i++) {
Expand Down
15 changes: 12 additions & 3 deletions src/server/routers/settings.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ import { prisma } from "@/lib/prisma";
import { encrypt, decrypt } from "@/server/services/crypto";
import { withAudit } from "@/server/middleware/audit";
import { invalidateAuthCache } from "@/auth";
import { checkServerVersion, checkAgentVersion } from "@/server/services/version-check";
import { checkServerVersion, checkAgentVersion, checkDevAgentVersion } from "@/server/services/version-check";
import {
createBackup,
listBackups,
Expand Down Expand Up @@ -277,11 +277,20 @@ export const settingsRouter = router({
checkVersion: protectedProcedure
.input(z.object({ force: z.boolean().optional() }).optional())
.query(async ({ input }) => {
const [server, agent] = await Promise.all([
const [server, agent, devAgent] = await Promise.all([
checkServerVersion(input?.force),
checkAgentVersion(input?.force),
checkDevAgentVersion(input?.force),
]);
return { server, agent };
return {
server,
agent,
devAgent: {
latestVersion: devAgent.latestVersion,
checksums: devAgent.checksums,
checkedAt: devAgent.checkedAt,
},
};
}),

// ─── Backup & Restore ─────────────────────────────────────────────────────
Expand Down
Loading