From d42fd09a8e130d9dfbad396d351f3ac808f01ce1 Mon Sep 17 00:00:00 2001
From: John Sell <jsell@redhat.com>
Date: Tue, 28 Apr 2026 21:14:10 -0400
Subject: [PATCH] spec(control-plane): add Workspace Initialization section for
 repo cloning

Document the init container pattern for cloning repos specified by
session.RepoURL or session.Repos into /workspace/repos/ before the
runner starts. Covers trigger conditions, normalization, credential
injection for private repos, and current gap status.

Also adds REPOS_JSON to the environment variables table.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---
 docs/internal/design/control-plane.spec.md | 58 ++++++++++++++++++++++
 1 file changed, 58 insertions(+)
diff --git a/docs/internal/design/control-plane.spec.md b/docs/internal/design/control-plane.spec.md
index b4ab61058..d72dc8b39 100755
--- a/docs/internal/design/control-plane.spec.md
+++ b/docs/internal/design/control-plane.spec.md
@@ -100,6 +100,63 @@ The CP creates a Pod (not a Job) for each session. Key pod attributes:
 
 Each section is joined with `\n\n`. Empty sections are omitted. If all four are empty, `INITIAL_PROMPT` is not set and the runner waits for a user message via gRPC.
 
+### Workspace Initialization (Repo Cloning)
+
+When a session specifies repositories to clone (`session.RepoURL` or `session.Repos`), the CP adds an **init container** to the runner pod that clones the repositories into `/workspace/repos/` before the runner starts.
+
+#### Trigger
+
+The init container is added when either:
+- `session.RepoURL` is a non-empty string (single repo shorthand)
+- `session.Repos` is a non-empty JSON string (array of `{"url": "...", "branch": "..."}` objects)
+
+If neither is set, no init container is created and `/workspace` starts empty.
+
+#### Init Container Behavior
+
+The CP reuses the existing **state-sync** image (`quay.io/ambient_code/vteam_state_sync`) and its `hydrate.sh` script — the same init container the operator uses. No new images or scripts are needed.
+
+```
+Name:    init-hydrate
+Image:   quay.io/ambient_code/vteam_state_sync (same as operator)
+Command: /usr/local/bin/hydrate.sh
+Env:     REPOS_JSON, SESSION_NAME, NAMESPACE, PROJECT_NAME, BACKEND_API_URL
+Mount:   /workspace (shared with runner container via emptyDir volume)
+```
+
+`hydrate.sh` handles the full workspace initialization lifecycle:
+1. Creates workspace directory structure (`/workspace/repos/`, `/workspace/artifacts/`, etc.)
+2. Restores session state from S3 (if configured — skipped when S3 is not available)
+3. Installs a git credential helper that reads `GITHUB_TOKEN`/`GITLAB_TOKEN` from env
+4. Fetches git credentials from the backend API (if `BACKEND_API_URL` and `BOT_TOKEN` are set)
+5. Parses `REPOS_JSON` and clones each repo into `/workspace/repos/<repo-name>`
+6. Clones workflow repos (if `ACTIVE_WORKFLOW_GIT_URL` is set)
+7. Restores git branch/patch state from S3 backup (if available)
+8. Sets ownership and permissions for the runner user (UID 1001)
+
+#### Repo URL Normalization
+
+- `session.RepoURL` (single string) is converted to `REPOS_JSON`: `[{"url": "<value>"}]`
+- `session.Repos` (JSON string) is passed through as-is to `REPOS_JSON`
+- If both are set, `Repos` takes precedence (it may contain `RepoURL` plus additional repos)
+
+#### Credential Injection for Private Repos
+
+When `CREDENTIAL_IDS` includes a `github` or `gitlab` credential, the init container receives the same credential environment so that `git clone` can authenticate to private repositories. The CP injects:
+
+- `GITHUB_TOKEN` — for `https://github.com/` URLs
+- `GITLAB_TOKEN` — for GitLab URLs
+
+These are fetched from the credential store at pod creation time (same as the runner's credential fetch, but injected into the init container env directly).
+
+Public repos require no credentials and clone over HTTPS without authentication.
+
+#### Status: 🔲 not implemented
+
+The CP currently sets `REPOS_JSON` as an env var on the runner container but does **not** create an init container. The runner creates the target directory (`/workspace/repos/<name>`) but does not clone. Repos are present as empty directories.
+
+The operator (`components/operator`) implements this correctly in `reconcileSpecReposWithPatch` using the `state-sync` image's `hydrate.sh` script. The CP implementation should add the same `init-hydrate` container to the pod spec in `ensurePod` — the image and script already exist and are deployed.
+
 ### Environment Variables Injected into Runner Pod
 
 | Var | Value | Purpose |
@@ -116,6 +173,7 @@ Each section is joined with `\n\n`. Empty sections are omitted. If all four are
 | `USE_VERTEX` / `ANTHROPIC_VERTEX_PROJECT_ID` / `CLOUD_ML_REGION` | CP config | Vertex AI config (when enabled) |
 | `GOOGLE_APPLICATION_CREDENTIALS` | `/app/vertex/ambient-code-key.json` | Vertex service account path |
 | `LLM_MODEL` / `LLM_TEMPERATURE` / `LLM_MAX_TOKENS` | session fields | Per-session model config |
+| `REPOS_JSON` | JSON array of `{"url","branch"}` | Repos to clone into `/workspace/repos/` (set when `session.RepoURL` or `session.Repos` is non-empty) |
 | `CREDENTIAL_IDS` | JSON map `{provider: credential_id}` | Resolved credentials for this session; runner calls `/credentials/{id}/token` per provider |
 
 ---