Skip to content

Wire OPENAI_API_KEY through to deployed containers #165

@philmerrell

Description

@philmerrell

Summary

OPENAI_API_KEY is referenced at runtime by both the App API and the Inference API (AgentCore Runtime), but it is not passed through to the deployed containers. Today it only works when a developer sets it in their local backend/src/.env. Deployed environments return a 500 from /admin/models/openai and any OpenAI-provider chat request fails in AgentFactory.

Current State

Consumer call sites (already implemented)

Infrastructure gap

Neither CDK stack passes the value through:

Proposed Solution — Follow the existing OAuth/Auth-Provider Secrets pattern

The repo already has a clean pattern for runtime secrets (see OAUTH_CLIENT_SECRETS_ARN and AUTH_PROVIDER_SECRETS_ARN): create a Secrets Manager secret in InfrastructureStack, publish its ARN via SSM, import the ARN in consumer stacks, inject the ARN as an env var, grant secretsmanager:GetSecretValue on the task role, and fetch the plaintext at runtime.

Why ARN-at-runtime rather than raw env var injection:

  • Consistent with existing secrets in this repo.
  • AgentCore CfnRuntime.environmentVariables only accepts plain string values (no ValueFrom/secrets block like ECS has), so we'd need a fetch-at-runtime helper for the Inference API regardless. Using the same approach in both services keeps things symmetric.
  • The plaintext key never lands in CloudFormation templates, task definitions, or logs.

Instructions for the implementing agent

Scope boundaries: Do not change the two consumer call sites to use a different config key name. They already read OPENAI_API_KEY from the environment — keep it that way. Your job is to ensure that env var is populated (from Secrets Manager) inside both deployed containers, plus add a small helper that hydrates os.environ['OPENAI_API_KEY'] at startup from the secret ARN when running in AWS.

1. CDK — InfrastructureStack

In infrastructure/lib/infrastructure-stack.ts, near the existing OAuthClientSecretsSecret (around line 590):

  • Create new secretsmanager.Secret(this, "OpenAiApiKeySecret", { secretName: getResourceName(config, "openai-api-key"), description: "OpenAI API key for OpenAI provider models" }) with removalPolicy: getRemovalPolicy(config).
  • Do not generate the value via generateSecretString — this is a user-supplied key. The secret is created empty and populated out-of-band (documented in step 6).
  • Publish the ARN to SSM at /${config.projectPrefix}/llm/openai-api-key-secret-arn (mirror the OAuthClientSecretsArnParameter block exactly).

2. CDK — AppApiStack

In infrastructure/lib/app-api-stack.ts:

  • Import the ARN via ssm.StringParameter.valueForStringParameter(this, '/${config.projectPrefix}/llm/openai-api-key-secret-arn') near the other SSM imports.
  • Add OPENAI_API_KEY_SECRET_ARN: openAiApiKeySecretArn to the container environment: block (~line 403).
  • Add an IAM policy statement to taskDefinition.taskRole granting secretsmanager:GetSecretValue and secretsmanager:DescribeSecret on ${openAiApiKeySecretArn}* (wildcard to cover the random suffix), following the exact shape of the existing OAuthClientSecretsAccess statement (~line 912).

3. CDK — InferenceApiStack

In infrastructure/lib/inference-api-stack.ts:

  • Import the same SSM ARN alongside the other parameter imports.
  • Add OPENAI_API_KEY_SECRET_ARN: openAiApiKeySecretArn to the CfnRuntime.environmentVariables block (~line 904).
  • Add a secretsmanager:GetSecretValue / secretsmanager:DescribeSecret statement to the runtime execution role (not the task role — AgentCore uses runtimeExecutionRole, see around line 195 where existing SSM permissions are granted).

4. Backend — secret hydration helper

The two call sites read os.environ['OPENAI_API_KEY'] directly. Add a small bootstrap helper that populates it from Secrets Manager before the consumers run:

  • Create backend/src/apis/shared/secrets/openai_key.py (new file) with one function hydrate_openai_api_key() -> None that:
    • Short-circuits if OPENAI_API_KEY is already set (local dev path — .env wins).
    • Reads OPENAI_API_KEY_SECRET_ARN from env; returns silently if empty (OpenAI is optional).
    • Uses boto3.client('secretsmanager').get_secret_value(SecretId=arn) and sets os.environ['OPENAI_API_KEY'] from SecretString (treat as plaintext — the secret stores the raw key, not JSON).
    • Catches ClientError with ResourceNotFoundException / empty SecretString and logs a warning rather than raising — OpenAI is optional and the rest of the system must still boot.
    • Uses a module-level flag so it runs at most once per process.
  • Call it from both service entrypoints at startup:
  • Leave the consumer call sites untouched.

5. Tests

  • Add backend/tests/apis/shared/secrets/test_openai_key.py covering: (a) already-set env var is preserved, (b) missing ARN is a no-op, (c) successful fetch populates env, (d) ResourceNotFoundException logs and does not raise, (e) idempotent on second call.
  • Use moto or unittest.mock — match whatever pattern is already used in backend/tests/ (check the existing secrets-related tests first).
  • Do not modify the existing test_agent_factory.py tests around OPENAI_API_KEY — they correctly test the factory's behavior assuming the env var is set.

6. Documentation

  • Update backend/src/.env.example:418 to add a new OPENAI_API_KEY_SECRET_ARN= entry below OPENAI_API_KEY= with a comment explaining: local dev uses OPENAI_API_KEY, deployed environments use OPENAI_API_KEY_SECRET_ARN which CDK populates from SSM.
  • Add a short section to the appropriate deploy doc under .github/docs/deploy/ (check what exists for OAuth client secrets — likely step-02-aws-setup.md or similar) documenting the post-deploy step: aws secretsmanager put-secret-value --secret-id <name> --secret-string <your-openai-key>. Note that until this is populated, OpenAI models will not work but the rest of the system is unaffected.

7. Verification

  • cd infrastructure && npm run build && npx cdk synth must succeed.
  • cd backend && uv run python -m pytest tests/apis/shared/secrets/ tests/agents/main_agent/core/test_agent_factory.py -v must pass.
  • Manually trace the flow: a fresh cdk deploy + put-secret-value + redeploy of the App API service → GET /admin/models/openai returns the OpenAI model list instead of 500.

Non-Goals

  • Do not add GOOGLE_GEMINI_API_KEY plumbing in this issue — same problem exists but track it separately to keep the diff reviewable.
  • Do not change the EnvVars.OPENAI_API_KEY constant or rename anything at the consumer call sites.
  • Do not make OpenAI a required dependency — the system must still boot when the secret is empty.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions