-
Notifications
You must be signed in to change notification settings - Fork 196
Expose job latency metric via ActiveSupport Notifications job middleware #362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -798,6 +798,32 @@ Que.job_middleware.push( | |
| ) | ||
| ``` | ||
|
|
||
| #### Existing Middleware | ||
|
|
||
| Que ships with middleware to expose job metrics using ActiveSupport notifications to subscribe to it you can implelent the following | ||
|
|
||
| ```ruby | ||
| ::ActiveSupport::Notifications.subscribe("que_job.worked") do |message, started, finished, labels| | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I just looked this up, and it seems that
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry, you are passing monotonic times from the middleware... I guess I don't understand how this works then 😅 |
||
| # do something with notification. | ||
| end | ||
| ``` | ||
|
|
||
| `started` and `finished` are numeric values representing a monotonic clock so can | ||
| be used for timing calculations without concerning ourselves with the system clock. | ||
|
|
||
| `labels` is a hash containing the following keys | ||
|
|
||
| * `job_class` - the class of the job. | ||
| * `queue` - the queue this job was queued into. | ||
| * `priority` - the priority of this job. | ||
| * `latency` - the amount of time this job was waiting in the queue for. | ||
|
|
||
| To use this middleware you will have to initialize it with Que | ||
|
|
||
| ```ruby | ||
| Que.job_middleware.push(Que::ActiveSupport::JobMiddleware) | ||
| ``` | ||
|
|
||
| ### Defining Middleware For SQL statements | ||
|
|
||
| SQL middleware wraps queries that Que executes, or which you might decide to execute via Que.execute(). You can use hook this into NewRelic or a similar service to instrument how long SQL queries take, for example. | ||
|
|
||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,26 @@ | ||||||
| # frozen_string_literal: true | ||||||
|
|
||||||
| module Que | ||||||
| module ActiveSupport | ||||||
| module JobMiddleware | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Some usage documentation would be in order, I reckon.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Have added to readme do you think its better here?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, nah, I just wanted to start a thread so it'd be able to be marked resolved 😆 Cheers for adding that; looks good |
||||||
| def self.call(job) | ||||||
| labels = { | ||||||
| job_class: job.que_attrs[:job_class], | ||||||
| priority: job.que_attrs[:priority], | ||||||
| queue: job.que_attrs[:queue], | ||||||
| latency: job.que_attrs[:latency], | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it would be also useful to have a
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Execution duration can be calculated using start and end currently, we could make this more intuitive by doing that for consumers |
||||||
| } | ||||||
|
|
||||||
| started = Process.clock_gettime(Process::CLOCK_MONOTONIC) | ||||||
| yield | ||||||
| ensure | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd like a test for the exception case. But is publishing a
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sounds good will add one shortly, I think so in terms of the metric were more concerned with how Que is performing rather than if the Jobs were executed correctly. |
||||||
| ::ActiveSupport::Notifications.publish( | ||||||
| "que_job.worked", | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Nitpick: what about prefixing the event with
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||||||
| started, | ||||||
| Process.clock_gettime(Process::CLOCK_MONOTONIC), | ||||||
| labels.merge(error: job.que_error.present?), | ||||||
| ) | ||||||
| end | ||||||
| end | ||||||
| end | ||||||
| end | ||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -63,6 +63,7 @@ class Poller | |
| SELECT | ||
| (j).*, | ||
| l.locked, | ||
| extract(epoch from (now() - (j).run_at)) as latency, | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
@hlascelles This is a good point. I think the intention here is to permit ascertaining worker contention - by allowing you to analyse at the delay between when a job is desired to be run at (or re-run at), and when it's actually picked up to be run. In terms of that desire, I think this metric is more useful. But I'm sure there's also merit in #325. |
||
| l.remaining_priorities | ||
| FROM ( | ||
| SELECT j | ||
|
|
@@ -81,6 +82,7 @@ class Poller | |
| SELECT | ||
| (j).*, | ||
| l.locked, | ||
| extract(epoch from (now() - (j).run_at)) as latency, | ||
| l.remaining_priorities | ||
| FROM ( | ||
| SELECT | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,53 @@ | ||
| # frozen_string_literal: true | ||
|
|
||
| require 'spec_helper' | ||
|
|
||
| if defined?(::ActiveSupport) | ||
| require 'que/active_support/job_middleware' | ||
|
|
||
| describe Que::ActiveSupport::JobMiddleware do | ||
| let(:job) { Que::Job.new(**labels) } | ||
|
|
||
| let(:labels) do | ||
| { | ||
| job_class: "Foo", | ||
| priority: 100, | ||
| queue: "foo_queue", | ||
| latency: 100, | ||
| } | ||
| end | ||
|
|
||
| it "records metrics when job succeeds" do | ||
| called = false | ||
| subscriber = ::ActiveSupport::Notifications.subscribe("que_job.worked") do |message, started, finished, metric_labels| | ||
| assert_equal "que_job.worked", message | ||
| assert started != nil | ||
| assert finished != nil | ||
| assert_equal labels.merge(error: false), metric_labels | ||
| called = true | ||
| end | ||
|
|
||
| Que::ActiveSupport::JobMiddleware.call(job) { } | ||
|
|
||
| assert_equal true, called | ||
|
|
||
| ::ActiveSupport::Notifications.unsubscribe(subscriber) | ||
| end | ||
|
|
||
| it "records metrics when job fails" do | ||
| called = false | ||
| subscriber = ::ActiveSupport::Notifications.subscribe("que_job.worked") do |message, started, finished, metric_labels| | ||
| assert_equal "que_job.worked", message | ||
| assert started != nil | ||
| assert finished != nil | ||
| assert_equal labels.merge(error: true), metric_labels | ||
| called = true | ||
| end | ||
|
|
||
| Que::ActiveSupport::JobMiddleware.call(job) { job.que_error = "error" } | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, I was thinking you'd need to raise within the block here, but you're right - an exception from a job does not bubble up to middleware. Testing job failure is good, but I was meaning that you test that it's in an For realistic data, I think Although.. I've just looked up the term |
||
|
|
||
| assert_equal true, called | ||
| ::ActiveSupport::Notifications.unsubscribe(subscriber) | ||
| end | ||
| end | ||
| end | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*implement ;)