Open
Conversation
83f21d2 to
8d3063d
Compare
fbfcba6 to
9994388
Compare
ffd1337 to
1beeacd
Compare
4ebdbe6 to
db68f58
Compare
- Add backend abstraction for ProcessRegistry (file/rabbitmq) - Add RabbitMQ publisher backend with heartbeat messages - Add web aggregator to subscribe and aggregate process state - Add Management API client for queue/connection stats - Update web API to use real data when available - Add configuration options for backend selection and management API
…BBITMQ_URL - Update config.ru to call Lepus::Web.start for real data - Derive Management API credentials from rabbitmq_url when not configured - Change default management_api_username/password to nil (derive from URL)
Replace demo/fake data with real metrics pipeline: per-consumer stats tracking (processed/rejected/errored) via atomic counters, metrics propagation through heartbeat messages, and real RabbitMQ Management API data for queues/connections/exchanges. Web module is fully isolated behind require 'lepus/web' - core has zero web overhead. Stats, handler extensions, worker metrics, and config attributes are only activated via prepend when web is explicitly loaded. Zeitwerk ignores web files so they are never eager-loaded.
Inject <base href> into index.html from env["SCRIPT_NAME"] and switch all asset, API, and service worker URLs to relative paths so mounting Lepus::Web at a sub-path (e.g. /lepus) no longer 404s on assets. Replace the Rails constraints auth examples in the README, which do not actually authenticate, with working Rack::Auth::Basic and Devise patterns.
Adds lib/lepus/prometheus (not auto-required) that ships per-delivery counters and latency, publish counters, worker process RSS gauges, and optional RabbitMQ queue gauges to a prometheus_exporter server via PrometheusExporter::Client.default. The server side is bundled as a TypeCollector in lib/lepus/prometheus/collector.rb, loaded via: prometheus_exporter -a lepus/prometheus/collector Ships a Grafana dashboard example covering every exposed metric.
Zeitwerk was eager-loading lib/lepus/prometheus.rb during Rails boot, which hard-required the prometheus_exporter gem and broke deploys that did not depend on it. Ignore the prometheus tree in the loader so the integration stays opt-in via an explicit require "lepus/prometheus". Pin prometheus_exporter to versions compatible with the CI matrix (2.1.0 on Ruby 2.7, < 2.3 on Ruby 3.0/3.1) and relax the JS MIME spec so it passes on both Rack 2 (application/javascript) and Rack 3 (text/javascript).
The prior fix only touched the rails-5.2/rails-6.1 gemfiles, so the root Gemfile still resolved prometheus_exporter 2.3.1 (ruby 3.2+) and blew up on the ruby-2.7/ruby-3.0 matrix entries. The rails-7.2/8.0 lockfiles were also stale versus the gemspec's new dev deps (rack, rack-test, prometheus_exporter), which tripped bundler --deployment. Pin prometheus_exporter to 2.1.0 in the root Gemfile, regenerate rails-7.2/8.0 lockfiles, and refresh rails-6.1's lockfile to replace the yanked nokogiri build. Also require "active_support" before "active_support/notifications" in the prometheus spec so it works on Rails 7+, which otherwise errors on IsolatedExecutionState.
Host Rails apps commonly set a Content Security Policy without 'unsafe-inline', which silently dropped the inline IIFE that wires up OfflineManager, ServiceWorkerManager, and triggers loading of app.js and the controllers. Result: the dashboard HTML and CSS loaded but no API calls were ever made, so the page sat empty. Move the bootstrap to web/assets/js/bootstrap.js (and replace the two inline onclick= handlers with data attributes bound from bootstrap.js) so everything runs as external scripts under a strict CSP.
…ollers after app.js - service-worker-manager: drop `await navigator.serviceWorker.ready`. The SW is an offline-cache enhancement; if any asset in `cache.addAll` 401s (e.g. behind basic auth), the worker never activates and `ready` hangs forever, blocking the entire dashboard bootstrap. - offline-manager: load local scripts serially instead of in parallel. Controllers reference `StimulusApp` defined by app.js, so parallel loading caused races where controllers executed before app.js and silently failed to register, leaving the UI inert (no data, theme toggle no-op).
… loaded The :file backend writes to /tmp on the local filesystem, which silently breaks the dashboard when workers and the web app run in separate containers (the two most common deployment shapes). Requiring lepus/web now flips the default to :rabbitmq so the dashboard sees the same registry as the workers without any extra configuration. Users can still opt back into the file backend explicitly in their initializer. Also handles the case where Lepus.config was memoized before lepus/web was required (e.g. loaded from routes.rb after an initializer touched the config) by retroactively flipping the backend and resetting the lazily-built ProcessRegistry.
Two bugs were exposed when lepus/web started flipping the default process_registry_backend to :rabbitmq: 1. Supervisor#boot called ProcessRegistry.start *before* loading the host app (config/environment), so the backend was built as FileBackend and started; the subsequent `require "lepus/web"` from routes.rb flipped the config but the in-flight backend was the wrong class. The retroactive reset in lepus/web then left a fresh unstarted RabbitmqBackend behind, and the next ProcessRegistry.add crashed with "ProcessRegistry not started." Move the start call to after the host app loads so the flip happens before the registry is instantiated. 2. RabbitmqBackend#stop only closed the channel and left the dedicated Bunny::Session's reader thread running, which kept the forked supervisor alive past SIGTERM and timed out the integration tests (and would have deferred shutdown in production). Track the session we opened and close it too, swallowing errors on each side independently — channel.close can hang on broker CHANNEL_ERROR mid-recovery, but session.close still has to run so the process can exit. lepus/web retroactive logic is also simplified: just flip the config flag, never reset an already-memoized backend (with #1 fixed, the supervisor has no backend yet when the flip happens; web-side processes build the backend lazily after the flip).
…ls app Mounting via `mount Lepus::Web => "/lepus"` in routes.rb does not invoke `Lepus::Web.start` — Rails just stores a reference to the module and dispatches to `.call` per request. As a result the aggregator (which subscribes to the `lepus.heartbeat` fanout and powers `/api/processes`) was never started, so the dashboard reported zero processes even with workers happily publishing heartbeats. Lazily start web services on the first incoming request and memoize the built Rack app. Only processes that actually dispatch HTTP to the dashboard pay this cost; the supervisor loads routes.rb during boot but never calls `.call`, so it is unaffected.
Copies tmp/lepus-web.png into docs/images/ so the gem ships the screenshot and both GitHub and the published docs site render it.
close_channel was closing the dedicated registry channel and the
Bunny::Session back-to-back. The explicit channel.close triggered a
broker CHANNEL_ERROR ("expected 'channel.open'"), which woke Bunny's
auto-recovery thread; the subsequent @connection.close then blocked
15s waiting for a close-ok that never arrived and timed out the
forked supervisor's SIGTERM path — the integration specs at
supervisor_spec.rb:62 and :74 exceeded their 10s termination window.
Drop the explicit channel close (session.close cascades to its
channels) and close the session with await_response: false so a
half-open connection can't keep the process alive past SIGTERM.
Bunny's Session#close still calls close_all_channels regardless of the await_response flag, and each channel.close blocks up to 15s waiting for a broker close-ok continuation. Forked supervisor shutdown sometimes never gets that reply, so the parent test's 10s SIGTERM budget expires and the integration specs fail (supervisor_spec.rb:62 and :74 in CI). Wrap the graceful close in Timeout.timeout(2) and, on timeout, close the underlying transport socket directly. The bunny reader loop observes the dead socket and the process can exit promptly.
The Processes::Base#kind accessor returns a capitalized class name
("Supervisor", "Worker"), but the dashboard controller compared it
against lowercase literals, so the supervisor/worker counts stayed at
0 and the process tree never rendered.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.