-
Notifications
You must be signed in to change notification settings - Fork 0
Observability Integration #66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Add OpenTelemetry dependencies and configuration - Create EngineMetrics service for centralized metrics management - Add observability configuration classes - Create comprehensive observability documentation - Configure logging with trace correlation - Set up test and development configurations Note: Manual instrumentation temporarily disabled due to dependency issues. Will be re-enabled once proper OpenTelemetry Spring Boot starter is configured.
fixed some compile errors
- Add mock dependencies for EngineMetrics and Tracer - Setup proper timer samples and span mocking in BeforeEach - Remove direct TaskManager constructor calls from test methods - Add proper OpenTelemetry span and timer lifecycle mocking
- Rename observability.md to observability-reference-guide.md (technical spec) - Add observability-implementation.md for current monitoring setup - Separate concerns between reference and implementation docs - Document Grafana dashboards and testing practices
- Add Grafana dashboards for process and task monitoring - Update test classes with OpenTelemetry instrumentation - Add comprehensive metrics for process execution monitoring - Configure metrics collection for all core components This completes the observability implementation with: - Metrics collection and export - Grafana dashboards - OpenTelemetry tracing - Test coverage for all components
- Update OpenTelemetry and Micrometer dependencies to latest versions - Separate observability configuration into dedicated profile - Simplify metrics configuration for testing environment - Add test-specific observability configuration - Enhance metric cleanup to prevent memory leaks - Add comprehensive tests for metric lifecycle management - Update logging configuration with trace correlation - Improve task and process metric tracking
- Update application-observability.yaml according to reference guide - Add OpenTelemetry test configuration and verification - Update metric names to match specification - Ensure proper resource attributes for OpenTelemetry
- Update OpenTelemetry version to 1.41.0 - Update Micrometer version to 1.14.4 - Update Micrometer Tracing version to 1.3.4 - Simplify dependencies to core requirements - Add proper version management through BOMs
// Note: Active process instances are automatically tracked by engineMetrics.recordProcessStarted/Completed/Faile
…cts between profiles
…into telemetry
- Add missing event metrics integration in EventManager - Add recordEventConsumed() calls in correlateMessage() and broadcastSignal() - Add recordEventPublished() calls in new publishMessage() and publishSignal() methods - Create proper event publishing API with full tracing and metrics - Implement missing event metrics in EngineMetrics - Add event queue size gauge (abada.events.queue_size) - Add event processing latency timer (abada.event.processing_latency) - Add queue size management methods (increment/decrement/get) - Add processing latency timing methods with tagged metrics - Enhance EventManager with comprehensive observability - Add publishMessage() method for external message publishing - Add publishSignal() method for external signal publishing - Integrate all event metrics (published, consumed, correlated, latency, queue size) - Add comprehensive tracing with proper span attributes - Fix linter warning for missing TIMER case in switch statement - Update documentation and testing - Mark all telemetry implementation tasks as completed (100%) - Add comprehensive EventMetricsIntegrationTest - Update telemetry-implementation-plan.md with final status This completes the telemetry implementation with full observability across process, task, and event metrics with comprehensive tracing.
- Fix recordEventProcessingLatency method to properly record duration to both global and tagged timers - Use Timer.record() instead of sample.stop() for tagged timer to avoid double-stopping the sample - Fix EventMetricsIntegrationTest to create timer before accessing it - Reorder test operations to record latency first, then verify timer values This resolves the MeterNotFoundException when accessing tagged event processing latency timers.
- Add comprehensive Docker Compose setup for dev, test, and prod environments - Implement OTEL Collector configuration for metrics and traces - Add Prometheus and Grafana with auto-provisioning - Configure Traefik load balancer for production scaling - Update application configs to use Docker service names - Add PostgreSQL support for production environment - Create comprehensive architecture and deployment documentation - Support horizontal scaling with stateless engine instances - Include complete observability stack (Jaeger, Prometheus, Grafana) - Add environment-specific configurations and resource limits
- Link to architecture-and-deployment-guide.md - Replace inline Docker snippet with deployment doc links - Remove API examples; point to api-documentation.md - Condense frontend section - Update Engine Status to 0.8.2-alpha with observability & deployment maturity
- Replace generic GitHub Issues reference with direct link to docs/roadmap-to-beta.md - Shows staged milestones from 0.8.2-alpha to 1.0.0-beta
- Add docker-entrypoint.sh with intelligent permission fixing - Handles both regular file systems and SELinux environments - Gracefully falls back when permission changes fail - Switches to non-root user (appuser) before running application - Creates necessary directories for data and logs persistence
- Install su-exec for secure user switching in entrypoint - Create logs and data directories with proper ownership - Use entrypoint script for permission handling - Add maven-jar-plugin configuration with proper manifest - Remove -DskipTests from Maven build for better CI practices - Stay as root initially to handle volume permissions, then switch to appuser
- Add explicit image tags for better image management - Use SELinux-compatible volume mount flags (:z) - Fix depends_on syntax with proper condition checking - Standardize container naming and image references - Enhance volume persistence with proper mount options
- Update OTEL Collector to use OTLP HTTP for Jaeger integration - Replace deprecated Jaeger exporter with otlphttp/jaeger - Add check_interval to memory_limiter for better resource management - Reorganize service pipeline configuration for clarity - Update Traefik to use OTLP tracing instead of legacy Jaeger integration - Add both gRPC and HTTP endpoints for OTLP tracing
- Add network aliases for better service discovery - Use SELinux-compatible volume mounts with :ro,z flags - Disable problematic OTEL Collector healthcheck (service runs correctly) - Improve container networking with explicit aliases - Enhance volume mount security and compatibility
- Fix OTLP metrics endpoint to use otel-collector service name - Add proper duration parsing for metric export intervals - Implement conditional bean creation for OTLP meter registry - Remove unused MeterRegistry import - Add fallback configuration for auto-configuration scenarios - Improve error handling for duration parsing with sensible defaults
…tion - Add comprehensive OTEL Collector configuration documentation - Document updated metrics naming conventions (dot vs underscore notation) - Include new Traefik OTLP tracing configuration - Add service dependency documentation for Docker Compose - Document enhanced Prometheus alerting rules with proper rate functions - Include troubleshooting notes for OTEL Collector healthcheck - Add detailed metrics descriptions with both internal and Prometheus names - Document resource processor and debug exporter configurations
- Add Loki service to docker-compose.yml and docker-compose.dev.yml - Add Consul service to docker-compose.yml for production coordination - Create Loki configuration files for dev and prod environments (loki-config-dev.yaml, loki-config-prod.yaml, loki-config.yaml) - Configure Loki healthchecks and volumes - Minor formatting updates in docker-compose files
…e three pillars of observability.
- Loki integration walkthrough with setup and configuration - Observability implementation details and best practices - Reference guide for metrics, traces, and logs - Telemetry implementation plan
- Updated architecture documentation with observability stack - Enhanced deployment guide with monitoring setup - Docker deployment updates for Loki, Tempo, and Prometheus
- Updated API documentation with observability endpoints - Enhanced service task design and implementation docs - Updated SPI design documentation - Auth service architecture updates
- Event-based gateways design and mechanics - Exclusive gateway documentation - Persistence layer updates - Process variables documentation - Kitchen sink process examples
- Updated release notes for 0.7.0 and 0.8.2 alpha versions - Frontend development prompt updates - Project configuration review updates
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.