Skip to content

Conversation

schmikei
Copy link
Contributor

Updates the Varnish Mxin to use more modern libaries and signals API

Overview
image
image

Logs Overview
image

Have some questions around why we're using irate and wanted to discuss before merging as of submission of this PR.

@schmikei schmikei requested a review from a team as a code owner September 15, 2025 22:02
Comment on lines 133 to 137
signals.backend.backendConnectionsAccepted.asTarget(),
signals.backend.backendConnectionsRecycled.asTarget(),
signals.backend.backendConnectionsReused.asTarget(),
signals.backend.backendConnectionsBusy.asTarget(),
signals.backend.backendConnectionsUnhealthy.asTarget(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these signals previously used the irate() function and I'm not quite sure how useful it is to know rate per second of the connection... I've currently translated as originally implemented but it seemed like we should maybe be displaying using increase. Happy to hear other thoughts, but just figured that knowing a whole number connection value here would be more user friendly/intuitive.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, and thank you for calling this out! I believe the later best-practices around the correct offset for increase actually stems from a discussion we had around a bug with the Varnish integration (or feedback related to it at least?) so I'm good with converting these to using increase.

Comment on lines 151 to 153
signals.sessions.sessionsConnected.asTarget(),
signals.sessions.sessionsQueued.asTarget(),
signals.sessions.sessionsDropped.asTarget(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing on usage of irate here which could be useful but feel as though the increase value may be more useful?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, I agree with swapping them.

Copy link
Member

@Dasomeone Dasomeone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple comments, this is definitely a mixin that needs a few more things done to it, I remember it was what spawned the desire to establish the best practices doc though I think it only had the first pass applied to it back then so there's a bit more missing :)

sources: {
prometheus: {
expr: 'varnish_main_backend_recycle{%(queriesSelector)s}',
rangeFunction: 'irate',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like you rightly pointed out, we can convert these to increase() using the proper offset and it should be a little more reliable than irate()

Comment on lines 11 to 20
// line of stat panels so width is generally small within the grid
this.grafana.panels.cacheHitRatePanel { gridPos+: { w: 3 } },
this.grafana.panels.frontendRequestsPanel { gridPos+: { w: 3 } },
this.grafana.panels.backendRequestsPanel { gridPos+: { w: 3 } },
this.grafana.panels.sessionsRatePanel { gridPos+: { w: 3 } },
this.grafana.panels.cacheHitsPanel { gridPos+: { w: 3 } },
this.grafana.panels.cacheHitPassPanel { gridPos+: { w: 3 } },
this.grafana.panels.sessionQueueLengthPanel { gridPos+: { w: 3 } },
this.grafana.panels.poolsPanel { gridPos+: { w: 3 } },

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please split this into a separate row as we deliberately wanted the stat panels to be 4 units tall, while the current approach ends up being 8 units tall

description='Rate of cache hits for pass objects (fulfilled requests that are not cached).'
)
+ g.panel.stat.standardOptions.withUnit('/ sec')
+ g.panel.stat.panelOptions.withTransparent(true),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add this for consistency with the other stat panels

Suggested change
+ g.panel.stat.panelOptions.withTransparent(true),
+ g.panel.stat.options.withGraphMode('none')

+ g.panel.gauge.panelOptions.withTransparent(true),

// Frontend requests stat
frontendRequestsPanel:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for all the stat panels can you please modify them and the queries to be sums/stats across all instances such that the stat panels do not split into one each? Gets pretty cluttered pretty quickly with multiple instances

Image

This was a complaint we saw with the old Varnish mixin as well so easy to miss as it wasn't something we fixed

description='Number of failed, created, limited, and current threads.'
)
+ g.panel.timeSeries.standardOptions.withUnit('none')
+ g.panel.timeSeries.fieldConfig.defaults.custom.withFillOpacity(0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder here that we want at least withFillOpacity(10), though 20 is perfectly fine as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants