Skip to content

feat(EM-41): Upgrade Micrometer/Prometheus Metrics and Add Service-Level Dashboards#164

Open
devin-ai-integration[bot] wants to merge 4 commits intofeat/microservices-migration-v5from
devin/1773765319-upgrade-micrometer-observability
Open

feat(EM-41): Upgrade Micrometer/Prometheus Metrics and Add Service-Level Dashboards#164
devin-ai-integration[bot] wants to merge 4 commits intofeat/microservices-migration-v5from
devin/1773765319-upgrade-micrometer-observability

Conversation

@devin-ai-integration
Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration Bot commented Mar 17, 2026

Summary

Adds a full observability stack (Micrometer 1.12.2 + Prometheus + Grafana) to the FTGO microservices platform. This includes:

  • ftgo-observability shared library (shared-libraries/ftgo-observability/) with ObservabilityAutoConfiguration (common metric tags) and per-service business metric helper classes (OrderMetrics, ConsumerMetrics, RestaurantMetrics, CourierMetrics)
  • ftgo.observability-conventions Gradle plugin that bundles Actuator + Micrometer Core + Prometheus registry as a single dependency
  • Prometheus config (prometheus/) with scrape targets for all 4 services and alerting rules (error rate >5%, p99 >2s, heap >90%, GC, DB pool, service down)
  • 4 Grafana dashboards (grafana/dashboards/): Service Health (RED), JVM Metrics, Business Metrics, Database Connection Pool
  • Docker Compose additions: Prometheus (v2.49.1) and Grafana (v10.3.1) services with provisioned datasources and dashboards
  • application.yml updates for all 4 services: management port on 8081, percentile histograms, SLO buckets, plus merged security/CORS/OpenAPI config from base branch

The version catalog already had Micrometer at 1.12.2; this PR adds an observability bundle grouping Actuator + Micrometer Core + Prometheus registry.

Updates since last revision

  • Resolved merge conflicts with feat/microservices-migration-v5 (base branch had an earlier version of observability infrastructure)
  • Fixed duplicate observability bundle in libs.versions.toml caused by the merge auto-resolution
  • Merged base branch additions into service application.yml files: spring.security.user, ftgo.security.cors, ftgo.openapi, springdoc config now coexist with our observability settings
  • Docker Compose now inherits base branch conventions (container_name, restart: unless-stopped, console libraries) with upgraded image versions (Prometheus v2.49.1, Grafana 10.3.1)
  • ObservabilityAutoConfiguration uses @AutoConfiguration (Spring Boot 3.2 idiomatic) instead of @Configuration
  • Grafana datasource provisioning sets explicit uid: prometheus to match dashboard references
  • Replaced deprecated management.metrics.distribution.sla with service-level-objectives in all 4 service application.yml files

Review & Testing Checklist for Human

  • Business metrics classes diverge from base branch pattern — verify this is intentional. The base branch had @Component + @ConditionalOnProperty on each metrics class (auto-wired per service). This PR removes those annotations, making the classes plain POJOs (scaffolding only, not wired as beans). If the base branch's auto-wiring was intentional, this is a regression. Decide whether to restore @Component/@ConditionalOnProperty or keep as scaffolding.
  • Grafana dashboards use hardcoded uid: "prometheus" instead of ${DS_PROMETHEUS} variable. The base branch dashboards used a templated datasource variable; this PR hardcodes the UID. This works with the provisioned datasource (which sets uid: prometheus) but won't work if dashboards are imported into a Grafana instance with a differently-named datasource. Verify this tradeoff is acceptable.
  • OrderMetrics.setOrdersByState API change. Changed from recordOrdersByState(String state, int count) (internally manages gauge lifecycle) to setOrdersByState(String state, AtomicInteger gauge) (caller manages the AtomicInteger). This pushes complexity to the caller. Verify the simpler API is preferred.
  • Prometheus scrape targets reference hostnames not yet in docker-compose. prometheus.yml targets (order-service:8081, etc.) won't resolve until individual microservices are added to docker-compose. Prometheus will show these targets as DOWN. Confirm this is intentional (infrastructure-ready ahead of service deployment).
  • Merge conflict resolution favored "ours" for most files. Verify the base branch versions of dashboards, alert rules, and Prometheus config didn't contain improvements that were overwritten.

Suggested test plan:

  1. Run docker-compose up prometheus grafana and verify both containers start
  2. Open Grafana at localhost:3000 (admin/admin), confirm the Prometheus datasource is connected and all 4 dashboards load without errors
  3. Verify ./gradlew :shared-libraries:ftgo-observability:compileJava succeeds
  4. When a service is eventually running, hit /actuator/prometheus on port 8081 and verify metrics are emitted with application and env tags
  5. Diff the base branch's Grafana dashboards and Prometheus configs against this PR's versions to ensure no important features were lost in conflict resolution

Notes

  • CI "Build & Test" checks are failing due to pre-existing ftgo-common compilation errors (javax.persistence, commons-lang missing) — unrelated to this PR. These checks are not marked as required.
  • The ftgo-end-to-end-tests build failure (eventuate-util-test not found) is also pre-existing.
  • Micrometer was already at 1.12.2 in the version catalog from a prior batch — no version bump was needed.
  • Service application.yml files now have both service and application metric tags; this is intentional for flexibility in Prometheus queries.

Link to Devin session: https://app.devin.ai/sessions/edf9650205c147cd891d03308b7d8cc0
Requested by: @mbatchelor81

…vel dashboards

- Micrometer 1.12.2 already in version catalog; added micrometer-core library and observability bundle
- Created ftgo-observability shared library with auto-configuration and per-service business metrics
- Created ftgo.observability-conventions Gradle convention plugin
- Added observability plugin + ftgo-observability dependency to all 4 services
- Defined custom business metrics: OrderMetrics, ConsumerMetrics, RestaurantMetrics, CourierMetrics
- Configured Prometheus scrape targets for all services (prometheus/prometheus.yml)
- Created alerting rules: error rate >5%, latency p99 >2s, heap >90%, GC pause, DB pool (prometheus/alert_rules.yml)
- Created 4 Grafana dashboards: Service Health (RED), JVM Metrics, Business Metrics, DB Connection Pool
- Added Prometheus + Grafana to docker-compose.yml for local development
- Configured metrics endpoints on separate management port (8081) for security
- Enhanced application.yml with percentile histograms and SLA buckets

Co-Authored-By: mason.batchelor <masonbatchelor81@gmail.com>
@devin-ai-integration
Copy link
Copy Markdown
Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

devin-ai-integration Bot and others added 3 commits March 17, 2026 16:45
…tasource UID, update deprecated sla property

- Use @autoConfiguration instead of @configuration for Spring Boot 3.2 idiomatic auto-config
- Add explicit uid: prometheus to Grafana datasource provisioning to match dashboard references
- Replace deprecated management.metrics.distribution.sla with service-level-objectives in all services

Co-Authored-By: mason.batchelor <masonbatchelor81@gmail.com>
Merged upstream changes including security config, CORS, OpenAPI/springdoc,
CI workflows, Docker configs, and ftgo-security library. Combined with our
observability additions: SLO buckets, management port 8081, Prometheus export,
upgraded Prometheus/Grafana versions, and Grafana datasource UID fix.

Co-Authored-By: mason.batchelor <masonbatchelor81@gmail.com>
The merge auto-resolution duplicated the observability bundle definition
in libs.versions.toml, causing Gradle TOML parsing failure.

Co-Authored-By: mason.batchelor <masonbatchelor81@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants