From d42fd09a8e130d9dfbad396d351f3ac808f01ce1 Mon Sep 17 00:00:00 2001 From: John Sell Date: Tue, 28 Apr 2026 21:14:10 -0400 Subject: [PATCH] spec(control-plane): add Workspace Initialization section for repo cloning Document the init container pattern for cloning repos specified by session.RepoURL or session.Repos into /workspace/repos/ before the runner starts. Covers trigger conditions, normalization, credential injection for private repos, and current gap status. Also adds REPOS_JSON to the environment variables table. Co-Authored-By: Claude Sonnet 4.6 --- docs/internal/design/control-plane.spec.md | 58 ++++++++++++++++++++++ 1 file changed, 58 insertions(+) diff --git a/docs/internal/design/control-plane.spec.md b/docs/internal/design/control-plane.spec.md index b4ab61058..d72dc8b39 100755 --- a/docs/internal/design/control-plane.spec.md +++ b/docs/internal/design/control-plane.spec.md @@ -100,6 +100,63 @@ The CP creates a Pod (not a Job) for each session. Key pod attributes: Each section is joined with `\n\n`. Empty sections are omitted. If all four are empty, `INITIAL_PROMPT` is not set and the runner waits for a user message via gRPC. +### Workspace Initialization (Repo Cloning) + +When a session specifies repositories to clone (`session.RepoURL` or `session.Repos`), the CP adds an **init container** to the runner pod that clones the repositories into `/workspace/repos/` before the runner starts. + +#### Trigger + +The init container is added when either: +- `session.RepoURL` is a non-empty string (single repo shorthand) +- `session.Repos` is a non-empty JSON string (array of `{"url": "...", "branch": "..."}` objects) + +If neither is set, no init container is created and `/workspace` starts empty. + +#### Init Container Behavior + +The CP reuses the existing **state-sync** image (`quay.io/ambient_code/vteam_state_sync`) and its `hydrate.sh` script — the same init container the operator uses. No new images or scripts are needed. + +``` +Name: init-hydrate +Image: quay.io/ambient_code/vteam_state_sync (same as operator) +Command: /usr/local/bin/hydrate.sh +Env: REPOS_JSON, SESSION_NAME, NAMESPACE, PROJECT_NAME, BACKEND_API_URL +Mount: /workspace (shared with runner container via emptyDir volume) +``` + +`hydrate.sh` handles the full workspace initialization lifecycle: +1. Creates workspace directory structure (`/workspace/repos/`, `/workspace/artifacts/`, etc.) +2. Restores session state from S3 (if configured — skipped when S3 is not available) +3. Installs a git credential helper that reads `GITHUB_TOKEN`/`GITLAB_TOKEN` from env +4. Fetches git credentials from the backend API (if `BACKEND_API_URL` and `BOT_TOKEN` are set) +5. Parses `REPOS_JSON` and clones each repo into `/workspace/repos/` +6. Clones workflow repos (if `ACTIVE_WORKFLOW_GIT_URL` is set) +7. Restores git branch/patch state from S3 backup (if available) +8. Sets ownership and permissions for the runner user (UID 1001) + +#### Repo URL Normalization + +- `session.RepoURL` (single string) is converted to `REPOS_JSON`: `[{"url": ""}]` +- `session.Repos` (JSON string) is passed through as-is to `REPOS_JSON` +- If both are set, `Repos` takes precedence (it may contain `RepoURL` plus additional repos) + +#### Credential Injection for Private Repos + +When `CREDENTIAL_IDS` includes a `github` or `gitlab` credential, the init container receives the same credential environment so that `git clone` can authenticate to private repositories. The CP injects: + +- `GITHUB_TOKEN` — for `https://github.com/` URLs +- `GITLAB_TOKEN` — for GitLab URLs + +These are fetched from the credential store at pod creation time (same as the runner's credential fetch, but injected into the init container env directly). + +Public repos require no credentials and clone over HTTPS without authentication. + +#### Status: 🔲 not implemented + +The CP currently sets `REPOS_JSON` as an env var on the runner container but does **not** create an init container. The runner creates the target directory (`/workspace/repos/`) but does not clone. Repos are present as empty directories. + +The operator (`components/operator`) implements this correctly in `reconcileSpecReposWithPatch` using the `state-sync` image's `hydrate.sh` script. The CP implementation should add the same `init-hydrate` container to the pod spec in `ensurePod` — the image and script already exist and are deployed. + ### Environment Variables Injected into Runner Pod | Var | Value | Purpose | @@ -116,6 +173,7 @@ Each section is joined with `\n\n`. Empty sections are omitted. If all four are | `USE_VERTEX` / `ANTHROPIC_VERTEX_PROJECT_ID` / `CLOUD_ML_REGION` | CP config | Vertex AI config (when enabled) | | `GOOGLE_APPLICATION_CREDENTIALS` | `/app/vertex/ambient-code-key.json` | Vertex service account path | | `LLM_MODEL` / `LLM_TEMPERATURE` / `LLM_MAX_TOKENS` | session fields | Per-session model config | +| `REPOS_JSON` | JSON array of `{"url","branch"}` | Repos to clone into `/workspace/repos/` (set when `session.RepoURL` or `session.Repos` is non-empty) | | `CREDENTIAL_IDS` | JSON map `{provider: credential_id}` | Resolved credentials for this session; runner calls `/credentials/{id}/token` per provider | ---