Skip to content

docker-autoscaler: Missing permission for lifecycle hook update #1316

@blackillzone

Description

@blackillzone

Describe the bug

When we stop a runner agent deployed with docker-autoscaler, by updating the agent configuration, it takes some time to release and stop the instance (always reaching the lifecycle hook timeout).

Looking at the logs, the issue seems to be that the instance role don't have the permissions for autoscaling:DescribeLifecycleHooks, in the IAM definition.

Aug 11 19:52:40 ip-172-21-252-42 monitor_runner.sh[31313]: An error occurred (AccessDenied) when calling the DescribeLifecycleHooks operation: User: arn:aws:sts::[ACCOUN_ID]:assumed-role/runners-fleet-test-instance/i-00a91317465c7587a is not authorized to perform: autoscaling:DescribeLifecycleHooks because no identity-based policy allows the autoscaling:DescribeLifecycleHooks action

To Reproduce

Deploy a runner agent, with docker-autoscaler mode, and then update the agent configuration, to trigger a deployment on ASG. Check the logs in CloudWatch, or connect to the agent in termination process, and check the logs in the monitor-runner service.

Expected behavior

The role associated to the instance deployed with the ASG should have the permission to check/update the lifecycle hook status on the ASG.

Additional context

From what I see, in docker+machine mode we have those IAM rights configured here:

"autoscaling:CompleteLifecycleAction",

But we don't have them here, for autoscaler mode: https://github.com/cattle-ops/terraform-aws-gitlab-runner/blob/main/policies/instance-docker-autoscaler-policy.json

If this seems correct, I can open an MR to fix it, but I would like to test this assumption on my side to validate the fix before.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions