Skip to content

Change runner to connect to otel collector on pre-job#781

Merged
yhaliaw merged 22 commits intomainfrom
feat/vm-metrics
Apr 24, 2026
Merged

Change runner to connect to otel collector on pre-job#781
yhaliaw merged 22 commits intomainfrom
feat/vm-metrics

Conversation

@yhaliaw
Copy link
Copy Markdown
Collaborator

@yhaliaw yhaliaw commented Apr 14, 2026

Applicable spec:

Overview

This pull request adds support for configuring and enabling OpenTelemetry (OTel) metrics collection for GitHub Runner Manager deployments. It introduces a new configuration option for specifying an OTel collector endpoint, ensures this configuration is validated and propagated through the system, and updates the runner setup to generate and apply the necessary OTel configuration at runtime. Comprehensive tests are included to validate the new logic.

OpenTelemetry Collector Integration:

  • Added a new otel-collector-endpoint configuration option to charmcraft.yaml and integrated it throughout the charm state and service configuration, allowing users to specify an OTel collector endpoint in the format host:port. [1] [2] [3] [4] [5]
  • Introduced the OtelCollectorConfig model to validate and store OTel collector configuration, with robust parsing and validation logic to ensure only valid host:port values are accepted. [1] [2] [3] [4]

Runner Setup and Cloud-Init Updates:

  • Modified the runner pre-job template to generate and apply an OTel configuration file if the endpoint is set, and to enable/start the OTel collector service on the runner VM. [1] [2]

Testing and Validation:

  • Added unit tests to verify correct parsing, validation, and error handling for the OTel collector endpoint, including valid and invalid formats. [1] [2]
  • Updated and extended test fixtures and test cases to cover the new configuration option and its propagation through the system. [1] [2] [3] [4]

Rationale

These changes allow users to easily enable and configure OpenTelemetry metrics collection for their self-hosted runners, improving observability and integration with monitoring systems.

Checklist

  • The charm style guide was applied.
  • The contributing guide was applied.
  • The changes are compliant with ISD054 - Managing Charm Complexity
  • The documentation for charmhub is updated.
  • The PR is tagged with appropriate label (urgent, trivial, complex).
  • The changelog is updated with changes that affects the users of the charm.
  • The application version number is updated in github-runner-manager/pyproject.toml.

@yhaliaw yhaliaw marked this pull request as ready for review April 22, 2026 00:21
Copy link
Copy Markdown
Member

@yanksyoon yanksyoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an otel-collector-endpoint charm config to enable runner-side OpenTelemetry host-metrics collection by generating an OTel Collector config during the runner pre-job phase and propagating this setting through charm state into runner VM provisioning.

Changes:

  • Introduces otel-collector-endpoint config and threads it through CharmStateApplicationConfiguration → runner cloud-init templating.
  • Generates an OTel Collector YAML config in the runner pre-job script and attempts to enable/start the opentelemetry-collector snap.
  • Adds unit/integration tests for parsing/propagation and verifies the config file is written on the runner.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
charmcraft.yaml Adds the new charm config option for OTel collector endpoint.
src/charm_state.py Parses/validates the endpoint and persists it in stored charm state.
src/factories.py Propagates OTel config into the application/service configuration model.
github-runner-manager/src/github_runner_manager/configuration/base.py Adds OtelCollectorConfig and wires it into SupportServiceConfig.
github-runner-manager/src/github_runner_manager/openstack_cloud/openstack_runner_manager.py Passes endpoint into Jinja context for pre-job generation.
github-runner-manager/src/github_runner_manager/templates/pre-job.j2 Writes OTel Collector config and enables/starts the collector service.
tests/unit/conftest.py Extends fixture to include otel_collector_config.
tests/unit/test_factories.py Unit test that OTel config is propagated into service config.
tests/unit/test_charm_state.py Unit tests for endpoint parsing/validation helper.
tests/integration/test_charm_runner.py Integration test verifying config file content is written on the runner VM.

Comment thread src/charm_state.py
Comment thread github-runner-manager/src/github_runner_manager/configuration/base.py Outdated
Comment thread charmcraft.yaml Outdated
Comment thread github-runner-manager/src/github_runner_manager/templates/pre-job.j2 Outdated
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Comment thread github-runner-manager/src/github_runner_manager/templates/env.j2 Outdated
@yhaliaw yhaliaw enabled auto-merge (squash) April 24, 2026 02:30
@yhaliaw yhaliaw merged commit 93b7c71 into main Apr 24, 2026
114 of 127 checks passed
@yhaliaw yhaliaw deleted the feat/vm-metrics branch April 24, 2026 07:03
cbartz added a commit to canonical/github-runner-operators that referenced this pull request Apr 24, 2026
Replace github_job_id with github_job and instance with github_runner
to match the actual attribute labels set by the pre-job OTel config
(see canonical/github-runner-operator#781).

Add github_repository and github_workflow template variables so the
dashboard can be filtered the same way as the existing PS6 hostmetrics
dashboard.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants