Wire OPENAI_API_KEY through to deployed containers

## Summary

`OPENAI_API_KEY` is referenced at runtime by both the App API and the Inference API (AgentCore Runtime), but it is **not passed through to the deployed containers**. Today it only works when a developer sets it in their local `backend/src/.env`. Deployed environments return a 500 from `/admin/models/openai` and any OpenAI-provider chat request fails in `AgentFactory`.

## Current State

### Consumer call sites (already implemented)
- [backend/src/agents/main_agent/core/agent_factory.py:49](backend/src/agents/main_agent/core/agent_factory.py#L49) — `os.getenv(EnvVars.OPENAI_API_KEY)`; raises `ValueError` if unset. Runs inside the AgentCore Runtime container.
- [backend/src/apis/app_api/admin/routes.py:311](backend/src/apis/app_api/admin/routes.py#L311) — `os.environ.get('OPENAI_API_KEY')`; returns 500 if unset. Runs inside the ECS App API container.
- [backend/src/agents/main_agent/config/constants.py:46](backend/src/agents/main_agent/config/constants.py#L46) — `EnvVars.OPENAI_API_KEY` constant.
- [backend/src/.env.example:418](backend/src/.env.example#L418) — documented as OPTIONAL.

### Infrastructure gap
Neither CDK stack passes the value through:
- [infrastructure/lib/app-api-stack.ts:403](infrastructure/lib/app-api-stack.ts#L403) — `environment:` block on the `AppApiContainer` does **not** include `OPENAI_API_KEY`.
- [infrastructure/lib/inference-api-stack.ts:904](infrastructure/lib/inference-api-stack.ts#L904) — `environmentVariables:` on `CfnRuntime` does **not** include `OPENAI_API_KEY`.
- [infrastructure/lib/infrastructure-stack.ts](infrastructure/lib/infrastructure-stack.ts) — no Secrets Manager secret or SSM parameter is provisioned for OpenAI.
- No GitHub Actions workflow references `OPENAI`, so the value is not sourced from CI secrets either.

## Proposed Solution — Follow the existing OAuth/Auth-Provider Secrets pattern

The repo already has a clean pattern for runtime secrets (see `OAUTH_CLIENT_SECRETS_ARN` and `AUTH_PROVIDER_SECRETS_ARN`): create a Secrets Manager secret in `InfrastructureStack`, publish its ARN via SSM, import the ARN in consumer stacks, inject the ARN as an env var, grant `secretsmanager:GetSecretValue` on the task role, and fetch the plaintext at runtime.

**Why ARN-at-runtime rather than raw env var injection:**
- Consistent with existing secrets in this repo.
- AgentCore `CfnRuntime.environmentVariables` only accepts plain string values (no `ValueFrom`/`secrets` block like ECS has), so we'd need a fetch-at-runtime helper for the Inference API regardless. Using the same approach in both services keeps things symmetric.
- The plaintext key never lands in CloudFormation templates, task definitions, or logs.

## Instructions for the implementing agent

> **Scope boundaries:** Do **not** change the two consumer call sites to use a different config key name. They already read `OPENAI_API_KEY` from the environment — keep it that way. Your job is to ensure that env var is populated (from Secrets Manager) inside both deployed containers, plus add a small helper that hydrates `os.environ['OPENAI_API_KEY']` at startup from the secret ARN when running in AWS.

### 1. CDK — InfrastructureStack

In [infrastructure/lib/infrastructure-stack.ts](infrastructure/lib/infrastructure-stack.ts), near the existing `OAuthClientSecretsSecret` (around line 590):

- Create `new secretsmanager.Secret(this, "OpenAiApiKeySecret", { secretName: getResourceName(config, "openai-api-key"), description: "OpenAI API key for OpenAI provider models" })` with `removalPolicy: getRemovalPolicy(config)`.
- **Do not** generate the value via `generateSecretString` — this is a user-supplied key. The secret is created empty and populated out-of-band (documented in step 6).
- Publish the ARN to SSM at `/${config.projectPrefix}/llm/openai-api-key-secret-arn` (mirror the `OAuthClientSecretsArnParameter` block exactly).

### 2. CDK — AppApiStack

In [infrastructure/lib/app-api-stack.ts](infrastructure/lib/app-api-stack.ts):

- Import the ARN via `ssm.StringParameter.valueForStringParameter(this, '/${config.projectPrefix}/llm/openai-api-key-secret-arn')` near the other SSM imports.
- Add `OPENAI_API_KEY_SECRET_ARN: openAiApiKeySecretArn` to the container `environment:` block (~line 403).
- Add an IAM policy statement to `taskDefinition.taskRole` granting `secretsmanager:GetSecretValue` and `secretsmanager:DescribeSecret` on `${openAiApiKeySecretArn}*` (wildcard to cover the random suffix), following the exact shape of the existing `OAuthClientSecretsAccess` statement (~line 912).

### 3. CDK — InferenceApiStack

In [infrastructure/lib/inference-api-stack.ts](infrastructure/lib/inference-api-stack.ts):

- Import the same SSM ARN alongside the other parameter imports.
- Add `OPENAI_API_KEY_SECRET_ARN: openAiApiKeySecretArn` to the `CfnRuntime.environmentVariables` block (~line 904).
- Add a `secretsmanager:GetSecretValue` / `secretsmanager:DescribeSecret` statement to the **runtime execution role** (not the task role — AgentCore uses `runtimeExecutionRole`, see around line 195 where existing SSM permissions are granted).

### 4. Backend — secret hydration helper

The two call sites read `os.environ['OPENAI_API_KEY']` directly. Add a small bootstrap helper that populates it from Secrets Manager before the consumers run:

- Create `backend/src/apis/shared/secrets/openai_key.py` (new file) with one function `hydrate_openai_api_key() -> None` that:
  - Short-circuits if `OPENAI_API_KEY` is already set (local dev path — `.env` wins).
  - Reads `OPENAI_API_KEY_SECRET_ARN` from env; returns silently if empty (OpenAI is optional).
  - Uses `boto3.client('secretsmanager').get_secret_value(SecretId=arn)` and sets `os.environ['OPENAI_API_KEY']` from `SecretString` (treat as plaintext — the secret stores the raw key, not JSON).
  - Catches `ClientError` with `ResourceNotFoundException` / empty `SecretString` and logs a warning rather than raising — OpenAI is optional and the rest of the system must still boot.
  - Uses a module-level flag so it runs at most once per process.
- Call it from both service entrypoints at startup:
  - [backend/src/apis/app_api/main.py](backend/src/apis/app_api/main.py) — in the lifespan / startup hook.
  - [backend/src/apis/inference_api/main.py](backend/src/apis/inference_api/main.py) — same.
- Leave the consumer call sites untouched.

### 5. Tests

- Add `backend/tests/apis/shared/secrets/test_openai_key.py` covering: (a) already-set env var is preserved, (b) missing ARN is a no-op, (c) successful fetch populates env, (d) `ResourceNotFoundException` logs and does not raise, (e) idempotent on second call.
- Use `moto` or `unittest.mock` — match whatever pattern is already used in `backend/tests/` (check the existing secrets-related tests first).
- Do not modify the existing `test_agent_factory.py` tests around `OPENAI_API_KEY` — they correctly test the factory's behavior assuming the env var is set.

### 6. Documentation

- Update [backend/src/.env.example:418](backend/src/.env.example#L418) to add a new `OPENAI_API_KEY_SECRET_ARN=` entry below `OPENAI_API_KEY=` with a comment explaining: local dev uses `OPENAI_API_KEY`, deployed environments use `OPENAI_API_KEY_SECRET_ARN` which CDK populates from SSM.
- Add a short section to the appropriate deploy doc under `.github/docs/deploy/` (check what exists for OAuth client secrets — likely `step-02-aws-setup.md` or similar) documenting the **post-deploy step**: `aws secretsmanager put-secret-value --secret-id <name> --secret-string <your-openai-key>`. Note that until this is populated, OpenAI models will not work but the rest of the system is unaffected.

### 7. Verification

- `cd infrastructure && npm run build && npx cdk synth` must succeed.
- `cd backend && uv run python -m pytest tests/apis/shared/secrets/ tests/agents/main_agent/core/test_agent_factory.py -v` must pass.
- Manually trace the flow: a fresh `cdk deploy` + `put-secret-value` + redeploy of the App API service → `GET /admin/models/openai` returns the OpenAI model list instead of 500.

## Non-Goals

- Do **not** add `GOOGLE_GEMINI_API_KEY` plumbing in this issue — same problem exists but track it separately to keep the diff reviewable.
- Do **not** change the `EnvVars.OPENAI_API_KEY` constant or rename anything at the consumer call sites.
- Do **not** make OpenAI a required dependency — the system must still boot when the secret is empty.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wire OPENAI_API_KEY through to deployed containers #165

Summary

Current State

Consumer call sites (already implemented)

Infrastructure gap

Proposed Solution — Follow the existing OAuth/Auth-Provider Secrets pattern

Instructions for the implementing agent

1. CDK — InfrastructureStack

2. CDK — AppApiStack

3. CDK — InferenceApiStack

4. Backend — secret hydration helper

5. Tests

6. Documentation

7. Verification

Non-Goals

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Wire OPENAI_API_KEY through to deployed containers #165

Description

Summary

Current State

Consumer call sites (already implemented)

Infrastructure gap

Proposed Solution — Follow the existing OAuth/Auth-Provider Secrets pattern

Instructions for the implementing agent

1. CDK — InfrastructureStack

2. CDK — AppApiStack

3. CDK — InferenceApiStack

4. Backend — secret hydration helper

5. Tests

6. Documentation

7. Verification

Non-Goals

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions