Skip to content

Conversation

cheeseandcereal
Copy link

This PR adds a new proposed design doc for multi-role functionality within ACK.

Today, CARM provides some multi-role functionality, but has various fundamental limitations. This doc proposes a refactor of the functionality with some additional capabilities and extensibility.

Seeking comments/feedback

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@ack-prow ack-prow bot requested review from a-hilaly and jlbutler September 16, 2025 01:46
Copy link

ack-prow bot commented Sep 16, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: cheeseandcereal
Once this PR has been reviewed and has the lgtm label, please assign jlbutler for approval by writing /assign @jlbutler in a comment. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@nabuskey
Copy link

Thanks for the doc! Definitely a needed feature. Seems like a very similar functionality to ProviderConfig and ProviderConfigUsages. I have some questions:

  1. How are orphaned resources handled? For example: When an IAMRoleSelector is updated from role1 in account1 to role2 in account2. are resources created in account1 orphaned? Out of scope for this?
  2. Similarly, what happens when an IAMRoleSelector is deleted while ACK resources depend on it?
  3. Do we want to support a reverse lookup mechanism for cluster administrators? Help them identify which resources depend on a specific IAMRoleSelector before deletion.
  4. How can we prohibit certain IAMRoleSelector from being used in a namespace or a resource type? I assume rule enforcement (OPA / Kyverno) are usually against spec fields, not status fields?
  5. Do we care about fairness for the initial implementation? E.g. a namespace has 10000 resources and another one has 10 resources. They are all referencing the same iamroleselector. You might get throttled at 5000 resources in the first namespace. The resources in the second namespace might experience delayed reconciliation.

@cheeseandcereal
Copy link
Author

Thanks for the questions!

How are orphaned resources handled? For example: When an IAMRoleSelector is updated from role1 in account1 to role2 in account2. are resources created in account1 orphaned? Out of scope for this?

Similarly, what happens when an IAMRoleSelector is deleted while ACK resources depend on it?

Coordination around updating (or deleting) IAMRoleSelectors are interesting. If a cluster administrator wants to restrict access to/from an IAM role, they need to ensure it's not mapped in an IAMRoleSelector. I think for these reasons, it doesn't make sense for us to try to do anything to a resource which may have been 'orphaned' due to its iam role changing. Removing a role shouldn't cause ACK to then do a bunch of actions with that removed role just because it orphaned a resource.

As for protections against resources changing accounts (or even regions), I would think this should be solved with a completely separate higher level mechanism. ACK already tracks the region, account, and arn of the resource it created on the kubernetes resource. If we want protections here, it should be implemented at a higher level. This would be an identical problem if the base role given to the controller changes, or the region annotation of a resource changes, etc. I don't think it makes sense to try to solve it here specifically for only IAMRoleSelector's.

Do we want to support a reverse lookup mechanism for cluster administrators? Help them identify which resources depend on a specific IAMRoleSelector before deletion.

Seems like a potentially useful feature. This gets into potentially conflicting writers on the shared IAMRoleSelector resource. Things like server-side-apply with field owners could be used to help solve this problem, but I think this is probably future feature scope we can better understand after some folks have been using it for a bit.

How can we prohibit certain IAMRoleSelector from being used in a namespace or a resource type? I assume rule enforcement (OPA / Kyverno) are usually against spec fields, not status fields?

The IAMRoleSelectors allow for matchExpressions which allow for a selector that can have exclusions. I.e.

matchExpressions:
    - {key: environment, operator: NotIn, values: [dev]}

But I don't think this initial implementation should propose any sort of separate "protection" mechanism or 'anti-selector' to start.

It would be the responsibility of the cluster administrator configuring the IAMRoleSelectors to either include or exclude the namespaces/resources desired.

Do we care about fairness for the initial implementation? E.g. a namespace has 10000 resources and another one has 10 resources. They are all referencing the same iamroleselector. You might get throttled at 5000 resources in the first namespace. The resources in the second namespace might experience delayed reconciliation.

This seems like a general fairness problem, not anything necessarily specific to these IAMRoleSelectors. Even with just a base role (and no CARM or IAMRoleSelectors), resource reconciliation can be throttled. I would argue that providing QoS/fairness mechanisms should be addressed at a higher level, and not within this feature.

@pires
Copy link

pires commented Sep 18, 2025

The proposal makes sense to me.

I too found myself wondering about removing a IAMRoleSelector that could encapsulate existing ACK resources. IIUC the IAMRoleSelector is (should be?) resolved by each ACK controller only for new resources. For existing resources, as the author clarifies in the comment above, each ACK controller already knows which account was used to sync the resource, so if a relevant IAMRoleSelector is updated/deleted, each ACK controller will only be affected if there are conflicts, eg multiple IAMRoleSelector instances match for a same resource, or if there's no IAMRoleSelector matching.

@cheeseandcereal
Copy link
Author

IIUC the IAMRoleSelector is (should be?) resolved by each ACK controller only for new resources.

I don't think so, actually. This could end up being very confusing because your selector can select things, but not actually influence them simply based on timing of resource creation which would be weird and I think could lead to some unexpected edge cases.

I think we can definitely be more clear about the protection mechanisms in case a resource is trying to change accounts due to an iam role change though. This is the same problem if a customer changes the region annotation for an existing resource too. [or current CARM users if they update CARM config. We just need ACK to generally gracefully handle this regardless. I think @a-hilaly is working on an update for this. I'll include those details once that change is in.

a-hilaly added a commit to a-hilaly/ack-runtime that referenced this pull request Sep 24, 2025
Implements aws-controllers-k8s/community#2628 (mostly)

Introduces a new IAMRoleSelector CRD that enables dynamic IAM role
assignment based on namespace and resource type selectors. This feature
provides an alternative to CARM for role selection and cannot be used
simultaneously with CARM (enforced by validation).

Key components:
- New IAMRoleSelector CRD with namespace and resource type selectors
- Selector matching logic with AND between selector types, OR within arrays
- Dynamic informer-based cache for IAMRoleSelector resources
- Integration into the reconciler to override CARM role selection
- Alpha feature gate (IAMRoleSelector) defaulting to disabled

Note: ResourceTypeSelector uses schema.GroupVersionKind in the API,
which differs from the separate fields approach in the original types.
This may need adjustment based on CRD generation requirements.
a-hilaly added a commit to a-hilaly/ack-runtime that referenced this pull request Sep 24, 2025
Implements aws-controllers-k8s/community#2628 (mostly)

Introduces a new IAMRoleSelector CRD that enables dynamic IAM role
assignment based on namespace and resource type selectors. This feature
provides an alternative to CARM for role selection and cannot be used
simultaneously with CARM (enforced by validation).

Key components:
- New IAMRoleSelector CRD with namespace and resource type selectors
- Selector matching logic with AND between selector types, OR within arrays
- Dynamic informer-based cache for IAMRoleSelector resources
- Integration into the reconciler to override CARM role selection
- Alpha feature gate (IAMRoleSelector) defaulting to disabled

Note: ResourceTypeSelector uses schema.GroupVersionKind in the API,
which differs from the separate fields approach in the original types.
This may need adjustment based on CRD generation requirements.
a-hilaly added a commit to a-hilaly/ack-runtime that referenced this pull request Sep 24, 2025
Implements aws-controllers-k8s/community#2628 (mostly)

Introduces a new IAMRoleSelector CRD that enables dynamic IAM role
assignment based on namespace and resource type selectors. This feature
provides an alternative to CARM for role selection and cannot be used
simultaneously with CARM (enforced by validation).

Key components:
- New IAMRoleSelector CRD with namespace and resource type selectors
- Selector matching logic with AND between selector types, OR within arrays
- Dynamic informer-based cache for IAMRoleSelector resources
- Integration into the reconciler to override CARM role selection
- Alpha feature gate (IAMRoleSelector) defaulting to disabled

Note: ResourceTypeSelector uses schema.GroupVersionKind in the API,
which differs from the separate fields approach in the original types.
This may need adjustment based on CRD generation requirements.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants