feat: add AWS_USE_IMDS support for ambient IAM credential resolution on Bedrock#2307
feat: add AWS_USE_IMDS support for ambient IAM credential resolution on Bedrock#2307ira-at-work wants to merge 15 commits intoThe-PR-Agent:mainfrom
Conversation
When AWS_USE_IMDS=true, PR-Agent resolves credentials via boto3's standard provider chain (EC2 IMDS, ECS task metadata, EKS IRSA, Lambda runtime) so operators running on AWS compute no longer need to store long-lived static IAM keys. Static keys in [aws] config become an optional fallback used automatically if the ambient credentials fail a Bedrock call. Also resolves AWS_REGION_NAME from the compute environment when not explicitly configured. Docs updated with IAM role workflow example and per-provider setup section in github.md.
- Add versioned bedrock/anthropic.claude-sonnet-4-6-v1:0 (bare versioned form was missing unlike the equivalent Opus 4.6 entry) - Add regional -v1:0 variants for Sonnet 4.6: us, au, eu, jp, apac, global - Add apac variant for Sonnet 4.6 bare name - Add missing bedrock/anthropic.claude-opus-4-5-20251101-v1:0 base entry - Complete Opus 4.5 regional coverage: eu, au, jp, apac - Complete Opus 4.6 regional coverage: eu, au, jp, apac -v1:0 variants
Three issues found in code review: 1. Wrap boto3 credential/region resolution in try/except so a timeout or missing endpoint falls through to static keys instead of crashing init. 2. Add _refresh_imds_credentials() called before each Bedrock chat_completion to avoid serving stale credentials from long-lived processes (EC2 instance role credentials rotate every ~6 hours). 3. Store AWS_SESSION_TOKEN in _aws_static_creds when present (STS-derived static creds) and restore/clear it correctly in _activate_static_aws_fallback, preventing silent auth failures when the fallback credentials require a token.
17 tests covering: - boto3 creds written to os.environ on init - AWS_SESSION_TOKEN set/cleared correctly from IMDS - Region auto-resolved and not overwritten when already set - Graceful handling when boto3 returns no creds or raises an exception - boto3 never called when AWS_USE_IMDS is absent - Static keys stashed for fallback (including session token) - _refresh_imds_credentials called before each Bedrock chat_completion - Fallback to static keys triggered on Bedrock APIError - Fallback not triggered when no static creds are configured - _activate_static_aws_fallback correctly restores/clears session token
|
@naorpeled |
|
/review |
Code Review by Qodo
|
- Replace AKIA-pattern example keys in tests with scanner-safe fakes - Wrap log strings exceeding 120-char line limit - Fix IMDS-miss bug: apply static keys to env when boto3 returns None/raises (previously _aws_imds_mode stayed False and static keys were never exported) - Use get_logger().exception() to preserve stack traces on boto3 failures - Add asyncio.Lock to serialize os.environ mutation + Bedrock call, preventing concurrent asyncio.gather calls from interleaving AWS credentials - Add two tests covering IMDS-miss + static fallback paths Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Critical: do IMDS fallback retry inside the lock, not via @Retry. Releasing the lock between attempts left a window for concurrent coroutines to overwrite os.environ before the retry ran. - High: stash _aws_static_creds even when IMDS fails so runtime fallback state is consistent regardless of which init path was taken. - Refactor: extract _static_aws_settings() test helper to eliminate 4 copies of the inline static-credential settings setup. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Review Summary by QodoAdd AWS_USE_IMDS ambient credential support and expand Bedrock model coverage
WalkthroughsDescription• Add AWS_USE_IMDS support for ambient IAM credential resolution on Bedrock - Resolves credentials via boto3's standard provider chain (EC2 IMDS, ECS task metadata, EKS IRSA, Lambda) - Auto-resolves AWS_REGION_NAME from compute environment when not configured - Static keys become optional fallback with automatic retry on Bedrock API failure • Add missing Bedrock model IDs for Sonnet 4.6 and Opus 4.5/4.6 - Add versioned bedrock/anthropic.claude-sonnet-4-6-v1:0 entry - Add regional -v1:0 variants for Sonnet 4.6 (us, au, eu, jp, apac, global) - Complete regional coverage for Opus 4.5 and Opus 4.6 (eu, au, jp, apac) • Harden IMDS credential handling with refresh and fallback logic - Wrap boto3 resolution in try/except to prevent init crashes - Call _refresh_imds_credentials() before each Bedrock call to avoid stale credentials - Correctly handle AWS_SESSION_TOKEN in fallback path for STS-derived credentials • Add comprehensive unit tests for AWS_USE_IMDS credential resolution and fallback behavior • Update documentation with IAM role workflow examples and Bedrock provider setup Diagramflowchart LR
A["AWS_USE_IMDS env var"] -->|"true"| B["boto3 credential resolution"]
B -->|"success"| C["IMDS credentials in env"]
B -->|"failure"| D["Fall back to static keys"]
C -->|"Bedrock call"| E["_refresh_imds_credentials"]
E -->|"success"| F["Invoke Bedrock"]
F -->|"auth failure"| G["_activate_static_aws_fallback"]
G -->|"retry"| H["Invoke Bedrock with static creds"]
A -->|"false/absent"| I["Use static keys only"]
I -->|"Bedrock call"| F
File Changes1. pr_agent/algo/ai_handlers/litellm_ai_handler.py
|
Code Review by Qodo
1. Invalid raise openai.APIError usage
|
…eholders TESTFAKEACCESSKEYID00 / TestFakeSecretKeyNotRealForTestingOnly00 match real AWS key length patterns and can trigger secret-scanning tools. Replace with clearly fake FAKE-KEY / FAKE-SECRET values. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Addressing the qodo bot findings: Issue 1 (credential-shaped test values): Fixed in 9420cdc. Issue 2 (over-120 line length): False positive. The longest line in the new IMDS block is 103 characters (line 82: Issue 3 (static creds not applied on IMDS miss): False positive. Lines 108–119 of Issue 4 ( |
|
Persistent review updated to latest commit 9420cdc |
1. IMDS credential refresh was creating a new boto3.Session() which would re-read the AWS_* env vars we set in __init__, returning the same stale credentials instead of going back to IMDS/task-role. Fix: store the original boto3 credentials object (_aws_boto3_creds) before writing to env; _refresh_imds_credentials now calls get_frozen_credentials() on that stored object, which honours boto3's own RefreshableCredentials TTL logic. 2. When aws.AWS_REGION_NAME was configured in settings but AWS_REGION_NAME was absent from the environment, the IMDS branch skipped auto-resolution but also never exported the configured region, leaving Bedrock calls without a region. Fix: if env region is absent and a settings region is present, export it immediately; only fall back to session.region_name when neither source has a region. Tests added for both bugs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Addressing the second pass of bot findings: ID:3037074690 (stale IMDS refresh): Valid. ID:3037074691 (configured region not exported): Valid. The ID:3037062418 ( |
…l; fail-fast on refresh error - Extract repeated frozen-creds-to-env-var writing into a static helper _write_frozen_creds_to_env(), used by both __init__ and _refresh_imds_credentials. - _refresh_imds_credentials now returns bool (True = success, False = failure) so callers can react without relying on stale env state. - chat_completion call site: if refresh returns False and static creds are available, activate static fallback immediately (before the Bedrock call) rather than making a doomed round-trip and waiting for the APIError handler. Tests added for: refresh returning False on None guard, refresh returning False on exception, and early static fallback activation on refresh failure. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Persistent review updated to latest commit d4265cc |
get_logger().exception() captures the active traceback automatically; binding the variable triggers Ruff F841. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Third-pass bot findings: ID:3037085791 (unused ID:3037085793 (async nullcontext): False positive for this project. ID:3037074690, ID:3037074691 (stale IMDS refresh, region not exported): Already addressed in commits |
|
Persistent review updated to latest commit 5ddd51d |
|
@naorpeled fully tested on our runners, and worked nicely with a few Bedrock models I tried. תודה מראש! |
|
/agentic_review |
|
Persistent review updated to latest commit 5ddd51d |
naorpeled
left a comment
There was a problem hiding this comment.
LGTM 🙏
Once my comments are addressed I'll gladly merge this
Sorry for the delayed response and thanks for your patience!
| self.streaming_required_models = STREAMING_REQUIRED_MODELS | ||
|
|
||
| @staticmethod | ||
| def _write_frozen_creds_to_env(frozen) -> None: |
There was a problem hiding this comment.
nit:
let's rename to _write_frozen_aws_creds_to_env 🙏
| elif "AWS_SESSION_TOKEN" in os.environ: | ||
| del os.environ["AWS_SESSION_TOKEN"] | ||
|
|
||
| def _refresh_imds_credentials(self) -> bool: |
There was a problem hiding this comment.
nit:
let's rename to _refresh_aws_imds_credentials 🙏
_write_frozen_creds_to_env → _write_frozen_aws_creds_to_env _refresh_imds_credentials → _refresh_aws_imds_credentials Requested by maintainer review (naorpeled) on PR The-PR-Agent#2307. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Persistent review updated to latest commit 1a21ab2 |
| elif get_settings().get("aws.AWS_ACCESS_KEY_ID"): | ||
| assert get_settings().aws.AWS_SECRET_ACCESS_KEY and get_settings().aws.AWS_REGION_NAME, "AWS credentials are incomplete" | ||
| os.environ["AWS_ACCESS_KEY_ID"] = get_settings().aws.AWS_ACCESS_KEY_ID | ||
| os.environ["AWS_SECRET_ACCESS_KEY"] = get_settings().aws.AWS_SECRET_ACCESS_KEY |
There was a problem hiding this comment.
1. Static token not exported 🐞 Bug ≡ Correctness
When AWS_USE_IMDS is not set, the static-credentials path exports access key/secret/region but never exports aws.AWS_SESSION_TOKEN, so STS-derived static credentials (that require a session token) will fail.
Agent Prompt
### Issue description
In the non-IMDS path (static AWS keys), `AWS_SESSION_TOKEN` from settings is ignored, breaking temporary/STS static credentials.
### Issue Context
IMDS mode already handles session token for both ambient creds and static fallback; only the non-IMDS static branch is missing it.
### Fix Focus Areas
- pr_agent/algo/ai_handlers/litellm_ai_handler.py[119-123]
### Implementation notes
- Read `aws.AWS_SESSION_TOKEN` from settings in this branch and set `os.environ["AWS_SESSION_TOKEN"]` when present.
- If absent, consider clearing `AWS_SESSION_TOKEN` from env to avoid a stale token interfering with static long-lived keys (match `_write_frozen_aws_creds_to_env` / `_activate_static_aws_fallback` behavior).
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
Static keys configured without AWS_USE_IMDS never set or cleared AWS_SESSION_TOKEN, breaking STS-derived credentials that require a session token. Apply the same set/clear pattern used by _write_frozen_aws_creds_to_env and _activate_static_aws_fallback. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When IMDS resolution fails and static credentials without a session token are activated, a session token previously written by IMDS would remain in the process environment. Add the missing elif clear to match the pattern in _write_frozen_aws_creds_to_env and _activate_static_aws_fallback. Also align the _static_aws_settings test fixture so AWS_SESSION_TOKEN appears on the .aws attribute object when provided, consistent with how the other three credential fields are exposed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Persistent review updated to latest commit f42f1d6 |
|
/agentic_review |
|
Persistent review updated to latest commit f42f1d6 |
…ed var and test exceptions - Replace bare `except Exception` with `except botocore.exceptions.BotoCoreError` in both the __init__ credential resolution block and _refresh_aws_imds_credentials, so unexpected programming errors surface instead of being swallowed - Update matching test side_effect values from plain Exception to CredentialRetrievalError so they exercise the now-narrowed catch clause - Fix _frozen_creds helper signature to hanging-indent style - Remove unused `mock_boto3` alias (ruff F841) Resolves review comments from The-PR-Agent#2307 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…or tests botocore.exceptions.ClientError is not a BotoCoreError subclass (by design: it represents a successful HTTP roundtrip with a service-level error), so it escaped both IMDS catch sites. This matters for AssumeRole/IRSA credential providers which call STS and can raise AccessDenied or ExpiredTokenException. - Extend both catches to (BotoCoreError, ClientError) with explanatory comment - Add test_imds_sts_client_error_does_not_crash for __init__ path - Add test_refresh_returns_false_on_sts_client_error for refresh path Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Persistent review updated to latest commit 0acbbc7 |
Summary
AWS_USE_IMDS=trueis set in the environment, PR-Agent resolves AWS credentials via boto3's standard provider chain instead of requiring static IAM keys. This covers all AWS compute contexts (EC2 instance roles, ECS/Fargate task roles, EKS with IRSA, Lambda runtime credentials) transparently — no code-path changes needed per compute type.AWS_REGION_NAMEis auto-resolved from the compute environment when not explicitly configured.[aws]config become an optional fallback: if the ambient credentials fail a Bedrock call, PR-Agent retries with static keys and logs a warning.-v1:0versioned and regional variants) and completed regional coverage for Opus 4.5 and Opus 4.6.Motivation
Users running PR-Agent on AWS compute (self-hosted GitHub Actions runners on EC2/ECS/EKS) must currently create long-lived static IAM keys and store them as GitHub Secrets. This is unnecessary secret management overhead and a credential-leakage risk. The instance/task/pod already has an IAM role — this change lets PR-Agent use it directly.
Changes
pr_agent/algo/ai_handlers/litellm_ai_handler.py__init__: whenAWS_USE_IMDSis truthy, resolve credentials viaboto3.Session().get_credentials()(wrapped in try/except so IMDS timeouts fall through gracefully). Static keys in config are stashed as_aws_static_creds(includingAWS_SESSION_TOKENfor STS-derived static creds)._refresh_imds_credentials(): called before each Bedrockchat_completionto avoid stale credentials in long-lived processes (EC2 role tokens rotate every ~6 hours)._activate_static_aws_fallback(): swaps env vars to static credentials (with correct session token handling) on first Bedrock API failure; tenacity retries the call with the new creds.pr_agent/algo/__init__.pybedrock/anthropic.claude-sonnet-4-6-v1:0(versioned form was missing)-v1:0variants for Sonnet 4.6:us,au,eu,jp,apac,globalbedrock/anthropic.claude-opus-4-5-20251101-v1:0base entryeu,au,jp,apac)Docs
docs/docs/usage-guide/changing_a_model.md: new "Using IAM Role Credentials (Recommended on AWS Compute)" subsection with compute context table, IAM policy example, and minimal GitHub Actions workflow snippet.docs/docs/installation/github.md: new "Using Amazon Bedrock" provider subsection under "Switching Models" with both static-key and IMDS workflow examples.pr_agent/settings/.secrets_template.toml: comments clarifying when static keys are optional; addedAWS_SESSION_TOKENfield.Test plan
AWS_USE_IMDS=truewith no static keys on an EC2 instance with a Bedrock-enabled IAM role → review runs without secretsAWS_USE_IMDS=trueon a machine with no IMDS (e.g. GitHub-hosted runner, no AWS creds) → boto3 exception caught, falls through to static keys with an error logAWS_USE_IMDS=true+ valid static keys, with IMDS creds that lackbedrock:InvokeModel→ first call fails, warning logged, second call uses static keys and succeedsAWS_USE_IMDSwith static keys configured → behavior identical to before this PRAWS_USE_IMDSwith no keys → behavior identical to before this PR (no boto3 import)Notes
AWS_USE_IMDSis env-var-only (not aconfiguration.tomlkey) to keep it ergonomic in GitHub Actionsenv:blocks.os.environmutation pattern for credentials is preserved for consistency with the rest of the file; passing creds via per-call litellm kwargs would be a cleaner future improvement.🤖 Generated with Claude Code