Skip to content

feat: track allocations per fleet#4513

Open
nrwiersma wants to merge 10 commits intoagones-dev:mainfrom
nrwiersma:allocations
Open

feat: track allocations per fleet#4513
nrwiersma wants to merge 10 commits intoagones-dev:mainfrom
nrwiersma:allocations

Conversation

@nrwiersma
Copy link
Copy Markdown
Contributor

What type of PR is this?

Uncomment only one /kind <> line, press enter to put that in a new line, and remove leading whitespace from that line:

/kind breaking
/kind bug
/kind cleanup
/kind documentation
/kind feature
/kind hotfix
/kind release

What this PR does / Why we need it:

This PR adds allocations tracking on a Fleet level to provide real allocation metrics to the Fleet Autoscaler. This provides better metrics to base custom autoscaling off of while also providing a good metric to the system

Which issue(s) this PR fixes:

Closes #4452

Special notes for your reviewer:

Initially I thought to use a RWMutex and atomic.Int64, but the locking logic got really verbose and I dont think, even in active clusters, that the contention on a Mutex is likely to be a real issue, given how low the actual lock time will be. If this proves to no longer be true, it is simple to change it.

It also occurred to me that queuing the Fleet when an Allocation is observed is likely not needed, as this will always result in either a change in AllocatedReplicas, ReadyReplicas or Replicas on the GameServerSet which will itself queue the Fleet. I will run some manual tests to prove this.

@github-actions github-actions Bot added kind/feature New features for Agones size/M labels Apr 14, 2026
@nrwiersma nrwiersma force-pushed the allocations branch 2 times, most recently from 5dea128 to 38ae676 Compare April 14, 2026 07:05
@nrwiersma nrwiersma marked this pull request as ready for review April 14, 2026 09:44
@markmandel
Copy link
Copy Markdown
Member

/gcbrun

@agones-bot
Copy link
Copy Markdown
Collaborator

Build Failed 😭

Build Id: 0200e9ed-b8cb-4f9c-82a3-22a40ca1032e

Status: FAILURE

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@markmandel
Copy link
Copy Markdown
Member

/gcbrun

@agones-bot
Copy link
Copy Markdown
Collaborator

Build Succeeded 🥳

Build Id: 52edf034-8467-427b-90f9-765fd733d9c6

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

git fetch https://github.com/googleforgames/agones.git pull/4513/head:pr_4513 && git checkout pr_4513
helm install agones ./install/helm/agones --namespace agones-system --set agones.image.registry=us-docker.pkg.dev/agones-images/ci --set agones.image.tag=1.58.0-dev-30678d3

Copy link
Copy Markdown
Member

@markmandel markmandel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this -- sorry it took a bit to get to it.

We'll need some docs at https://agones.dev/site/docs/guides/metrics/ -- but also will need to be feature shortcoded -- https://agones.dev/site/docs/contribute/documentation-editing-contribution/

Comment thread pkg/metrics/exporter_test.go Outdated
gs.Status.State = agonesv1.GameServerStateAllocated
ctrl.gsWatch.Modify(gs)

require.Eventually(t, func() bool {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use https://pkg.go.dev/github.com/stretchr/testify@v1.11.1/require#EventuallyWithT you can check multiple assertions, and also get better error messages.

ReservedReplicas int32 `json:"reservedReplicas"`
// AllocatedReplicas are the number of Allocated GameServer replicas
AllocatedReplicas int32 `json:"allocatedReplicas"`
// Allocations is a counter of the number of allocations observed.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread pkg/fleets/controller.go
},
})

_, _ = gsInformer.AddEventHandler(cache.ResourceEventHandlerFuncs{
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll wan tto sync GameServers at, since you are now watching their events:

https://github.com/nrwiersma/agones/blob/30678d321529ca2fd3bdd27f58b8bd2a4b5c8e96/pkg/fleets/controller.go#L255

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that. I decided not to in the end as it does not stop the controller from working. We do not handle Add on the EventHandler (just Update), so the sync would basically block for no reason. Will gladly add it if you still want though.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'd recommend it - I've seen some weirdness when not syncing -- it'll probably mean nothing much, but also means that if other people edit this later on, or do other controller things they aren't left wondering "when do I sync, or not sync" - better to always sync 👍🏻

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair. The global rule for me is, if it is in the processNextItem loop anywhere, it gets synced. But will add it in.

Comment thread pkg/fleets/controller.go Outdated
defer cancel()

c.allocsMu.Lock()
defer c.allocsMu.Unlock()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one on making sure it gets processed on shutdown.

Probably doesn't make a huge difference, but holding the whole mutex over the network calls may cause some interesting long term locking over shutdown. Might be worth taking a snapshot - or the very least, moving to a RWLock, to avoid some potential contention.

Comment thread pkg/fleets/controller.go Outdated
Comment thread pkg/fleets/controller.go Outdated
Comment on lines +814 to +819
count, ok := c.allocs[key]
if !ok {
c.allocs[key] = 1
return
}
c.allocs[key] = count + 1
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
count, ok := c.allocs[key]
if !ok {
c.allocs[key] = 1
return
}
c.allocs[key] = count + 1
c.allocs[key]++

maps will 0 value here, just as a suggestion

Comment thread pkg/fleets/controller.go Outdated
}
}

fCopy.Status.Allocations += c.getAllocations(fleet.ObjectMeta.Namespace, fCopy.ObjectMeta.Name)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If UpdateStatus fails, the worker queue re-enqueues the fleet, but on retry the counter is already 0 — those allocations are permanently lost.

My suggestion would be to capture the delta, attempt the update, and only zero out the counter on success

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, had not seen that failure path.

Signed-off-by: Nicholas Wiersma <nick@wiersma.co.za>
Signed-off-by: Nicholas Wiersma <nick@wiersma.co.za>
Signed-off-by: Nicholas Wiersma <nick@wiersma.co.za>
Signed-off-by: Nicholas Wiersma <nick@wiersma.co.za>
Signed-off-by: Nicholas Wiersma <nick@wiersma.co.za>
Signed-off-by: Nicholas Wiersma <nick@wiersma.co.za>
Signed-off-by: Nicholas Wiersma <nick@wiersma.co.za>
Signed-off-by: Nicholas Wiersma <nick@wiersma.co.za>
@markmandel
Copy link
Copy Markdown
Member

/gcbrun

@agones-bot
Copy link
Copy Markdown
Collaborator

Build Succeeded 🥳

Build Id: 6178bfd6-30e7-4060-ba43-0e3864f400b8

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

git fetch https://github.com/googleforgames/agones.git pull/4513/head:pr_4513 && git checkout pr_4513
helm install agones ./install/helm/agones --namespace agones-system --set agones.image.registry=us-docker.pkg.dev/agones-images/ci --set agones.image.tag=1.58.0-dev-a8be28c

Signed-off-by: Nicholas Wiersma <nick@wiersma.co.za>
@markmandel
Copy link
Copy Markdown
Member

/gcbrun

1 similar comment
@markmandel
Copy link
Copy Markdown
Member

/gcbrun

@agones-bot
Copy link
Copy Markdown
Collaborator

Build Succeeded 🥳

Build Id: c09603eb-c5f9-4bac-84cf-41668d315953

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

git fetch https://github.com/googleforgames/agones.git pull/4513/head:pr_4513 && git checkout pr_4513
helm install agones ./install/helm/agones --namespace agones-system --set agones.image.registry=us-docker.pkg.dev/agones-images/ci --set agones.image.tag=1.58.0-dev-6a3b9cb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/feature New features for Agones size/L size/M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Proposal] - Total Allocations Tracking

3 participants