Skip to content

Web Dashboard#3

Open
marcosgz wants to merge 28 commits intomainfrom
marcosgz/web
Open

Web Dashboard#3
marcosgz wants to merge 28 commits intomainfrom
marcosgz/web

Conversation

@marcosgz
Copy link
Copy Markdown
Owner

@marcosgz marcosgz commented Sep 4, 2025

lepus-web

@marcosgz marcosgz force-pushed the marcosgz/web branch 3 times, most recently from 83f21d2 to 8d3063d Compare September 25, 2025 17:54
@marcosgz marcosgz force-pushed the marcosgz/web branch 2 times, most recently from ffd1337 to 1beeacd Compare February 6, 2026 18:51
@marcosgz marcosgz force-pushed the marcosgz/web branch 2 times, most recently from 4ebdbe6 to db68f58 Compare March 14, 2026 07:07
marcosgz added 21 commits April 19, 2026 09:00
- Add backend abstraction for ProcessRegistry (file/rabbitmq)
- Add RabbitMQ publisher backend with heartbeat messages
- Add web aggregator to subscribe and aggregate process state
- Add Management API client for queue/connection stats
- Update web API to use real data when available
- Add configuration options for backend selection and management API
…BBITMQ_URL

- Update config.ru to call Lepus::Web.start for real data
- Derive Management API credentials from rabbitmq_url when not configured
- Change default management_api_username/password to nil (derive from URL)
Replace demo/fake data with real metrics pipeline: per-consumer stats
tracking (processed/rejected/errored) via atomic counters, metrics
propagation through heartbeat messages, and real RabbitMQ Management API
data for queues/connections/exchanges.

Web module is fully isolated behind require 'lepus/web' - core has zero
web overhead. Stats, handler extensions, worker metrics, and config
attributes are only activated via prepend when web is explicitly loaded.
Zeitwerk ignores web files so they are never eager-loaded.
Inject <base href> into index.html from env["SCRIPT_NAME"] and switch
all asset, API, and service worker URLs to relative paths so mounting
Lepus::Web at a sub-path (e.g. /lepus) no longer 404s on assets.

Replace the Rails constraints auth examples in the README, which do not
actually authenticate, with working Rack::Auth::Basic and Devise patterns.
Adds lib/lepus/prometheus (not auto-required) that ships per-delivery
counters and latency, publish counters, worker process RSS gauges, and
optional RabbitMQ queue gauges to a prometheus_exporter server via
PrometheusExporter::Client.default.

The server side is bundled as a TypeCollector in
lib/lepus/prometheus/collector.rb, loaded via:

  prometheus_exporter -a lepus/prometheus/collector

Ships a Grafana dashboard example covering every exposed metric.
Zeitwerk was eager-loading lib/lepus/prometheus.rb during Rails boot,
which hard-required the prometheus_exporter gem and broke deploys that
did not depend on it. Ignore the prometheus tree in the loader so the
integration stays opt-in via an explicit require "lepus/prometheus".

Pin prometheus_exporter to versions compatible with the CI matrix
(2.1.0 on Ruby 2.7, < 2.3 on Ruby 3.0/3.1) and relax the JS MIME spec
so it passes on both Rack 2 (application/javascript) and Rack 3
(text/javascript).
The prior fix only touched the rails-5.2/rails-6.1 gemfiles, so the
root Gemfile still resolved prometheus_exporter 2.3.1 (ruby 3.2+) and
blew up on the ruby-2.7/ruby-3.0 matrix entries. The rails-7.2/8.0
lockfiles were also stale versus the gemspec's new dev deps (rack,
rack-test, prometheus_exporter), which tripped bundler --deployment.

Pin prometheus_exporter to 2.1.0 in the root Gemfile, regenerate
rails-7.2/8.0 lockfiles, and refresh rails-6.1's lockfile to replace
the yanked nokogiri build. Also require "active_support" before
"active_support/notifications" in the prometheus spec so it works on
Rails 7+, which otherwise errors on IsolatedExecutionState.
Host Rails apps commonly set a Content Security Policy without
'unsafe-inline', which silently dropped the inline IIFE that wires up
OfflineManager, ServiceWorkerManager, and triggers loading of app.js
and the controllers. Result: the dashboard HTML and CSS loaded but no
API calls were ever made, so the page sat empty.

Move the bootstrap to web/assets/js/bootstrap.js (and replace the two
inline onclick= handlers with data attributes bound from bootstrap.js)
so everything runs as external scripts under a strict CSP.
…ollers after app.js

- service-worker-manager: drop `await navigator.serviceWorker.ready`. The SW is
  an offline-cache enhancement; if any asset in `cache.addAll` 401s (e.g. behind
  basic auth), the worker never activates and `ready` hangs forever, blocking
  the entire dashboard bootstrap.
- offline-manager: load local scripts serially instead of in parallel.
  Controllers reference `StimulusApp` defined by app.js, so parallel loading
  caused races where controllers executed before app.js and silently failed to
  register, leaving the UI inert (no data, theme toggle no-op).
… loaded

The :file backend writes to /tmp on the local filesystem, which silently
breaks the dashboard when workers and the web app run in separate containers
(the two most common deployment shapes). Requiring lepus/web now flips the
default to :rabbitmq so the dashboard sees the same registry as the workers
without any extra configuration. Users can still opt back into the file
backend explicitly in their initializer.

Also handles the case where Lepus.config was memoized before lepus/web was
required (e.g. loaded from routes.rb after an initializer touched the
config) by retroactively flipping the backend and resetting the lazily-built
ProcessRegistry.
Two bugs were exposed when lepus/web started flipping the default
process_registry_backend to :rabbitmq:

1. Supervisor#boot called ProcessRegistry.start *before* loading the host
   app (config/environment), so the backend was built as FileBackend and
   started; the subsequent `require "lepus/web"` from routes.rb flipped the
   config but the in-flight backend was the wrong class. The retroactive
   reset in lepus/web then left a fresh unstarted RabbitmqBackend behind,
   and the next ProcessRegistry.add crashed with "ProcessRegistry not
   started." Move the start call to after the host app loads so the flip
   happens before the registry is instantiated.

2. RabbitmqBackend#stop only closed the channel and left the dedicated
   Bunny::Session's reader thread running, which kept the forked supervisor
   alive past SIGTERM and timed out the integration tests (and would have
   deferred shutdown in production). Track the session we opened and close
   it too, swallowing errors on each side independently — channel.close can
   hang on broker CHANNEL_ERROR mid-recovery, but session.close still has
   to run so the process can exit.

lepus/web retroactive logic is also simplified: just flip the config flag,
never reset an already-memoized backend (with #1 fixed, the supervisor has
no backend yet when the flip happens; web-side processes build the backend
lazily after the flip).
…ls app

Mounting via `mount Lepus::Web => "/lepus"` in routes.rb does not invoke
`Lepus::Web.start` — Rails just stores a reference to the module and
dispatches to `.call` per request. As a result the aggregator (which
subscribes to the `lepus.heartbeat` fanout and powers `/api/processes`)
was never started, so the dashboard reported zero processes even with
workers happily publishing heartbeats.

Lazily start web services on the first incoming request and memoize the
built Rack app. Only processes that actually dispatch HTTP to the
dashboard pay this cost; the supervisor loads routes.rb during boot but
never calls `.call`, so it is unaffected.
Copies tmp/lepus-web.png into docs/images/ so the gem ships the
screenshot and both GitHub and the published docs site render it.
close_channel was closing the dedicated registry channel and the
Bunny::Session back-to-back. The explicit channel.close triggered a
broker CHANNEL_ERROR ("expected 'channel.open'"), which woke Bunny's
auto-recovery thread; the subsequent @connection.close then blocked
15s waiting for a close-ok that never arrived and timed out the
forked supervisor's SIGTERM path — the integration specs at
supervisor_spec.rb:62 and :74 exceeded their 10s termination window.

Drop the explicit channel close (session.close cascades to its
channels) and close the session with await_response: false so a
half-open connection can't keep the process alive past SIGTERM.
Bunny's Session#close still calls close_all_channels regardless of the
await_response flag, and each channel.close blocks up to 15s waiting for
a broker close-ok continuation. Forked supervisor shutdown sometimes
never gets that reply, so the parent test's 10s SIGTERM budget expires
and the integration specs fail (supervisor_spec.rb:62 and :74 in CI).

Wrap the graceful close in Timeout.timeout(2) and, on timeout, close the
underlying transport socket directly. The bunny reader loop observes the
dead socket and the process can exit promptly.
The Processes::Base#kind accessor returns a capitalized class name
("Supervisor", "Worker"), but the dashboard controller compared it
against lowercase literals, so the supervisor/worker counts stayed at
0 and the process tree never rendered.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant