feat: enable log forwarding through otel collector#186
feat: enable log forwarding through otel collector#186florentianayuwono wants to merge 22 commits intomainfrom
Conversation
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
…v v8 (#182) Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
* fix(dashboard): remove dead override hiding job queue time series The "Job queue time" panel had a leftover hideSeriesFrom override that excluded every series except one specific named expression. Combined with the panel's current query (which already aggregates with sum by (le)), the override hid all data, leaving an empty chart. The override is obsolete now that the query collapses every label except le into a single combined histogram, so removing it restores visualisation without any other change. * ci: pin astral-sh/setup-uv to v8.1.0 The astral-sh/setup-uv repository does not publish a floating v8 major tag (only v8.0.0 and v8.1.0 specific tags exist), so referencing @v8 fails to resolve and breaks the workflow on every PR.
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Christopher Bartz <christopher.bartz@canonical.com>
* chore(docs): update contributing guidelines (charmkeeper) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(docs): build errors * chore: clarify PR checklist and remove unnecessary item * chore(docs): revert changes in CONTRIBUTING.md * fix(docs): whoops we're ignoring the changelog * fix: update pr checklist to incorporate previous items, remove duplicate items --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds a new composite GitHub Action that can opt workflows into forwarding selected log files from self-hosted runners to Loki via an OpenTelemetry Collector snap.
Changes:
- Introduces
actions/enable-log-forwardingcomposite action that writes an otelcol config fragment and restarts the collector. - Adds unit tests and a CI workflow (including a self-hosted smoke test) for the new action.
- Documents usage in a new how-to guide and records the change in the docs changelog.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| docs/how-to/enable-log-forwarding.md | New how-to page describing how to use the log-forwarding action and query hints. |
| docs/changelog.md | Adds a 2026-04-22 entry documenting the new action. |
| actions/enable-log-forwarding/tests/test_enable_log_forwarding.py | Unit tests for parsing inputs, endpoint resolution, exporter detection, and config generation. |
| actions/enable-log-forwarding/enable-log-forwarding.py | Action implementation that generates an otelcol config fragment, installs it, and restarts the collector. |
| actions/enable-log-forwarding/action.yaml | Composite action definition and inputs wiring to the Python script. |
| .github/workflows/enable_log_forwarding_action_tests.yaml | CI workflow to run unit tests and a self-hosted smoke test for the action. |
erinecon
left a comment
There was a problem hiding this comment.
Thanks for adding a how-to guide with this PR! Some initial comments and questions
yanksyoon
left a comment
There was a problem hiding this comment.
Initial review done, thank you!
Co-authored-by: Erin Conley <erin.conley@canonical.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Erin Conley <erin.conley@canonical.com>
There was a problem hiding this comment.
Should we also update the README?
Yes we should update the README, but my question would be whether those updates fall under the scope
of this PR or if we should update the README in a follow-up PR.
I know the current state of the README is not up to date, it still marks the repo as WIP.
Is the repo still WIP? We should continue to clearly indicate that, but regardless, I still think we should update the README.
There was a problem hiding this comment.
i think readme should be done in a separate pr / jira ticket
There was a problem hiding this comment.
I'm fine with this, would it be helpful if I opened an issue into the repo, or would you just prefer the Jira ticket?
|
|
||
|
|
||
| def build_config(files, resolved_endpoint, exporter_already_exists): | ||
| attrs = [ |
There was a problem hiding this comment.
could we use a templated string ? so that we can more quickly understand the generated config and avoid the requirement to read the python code to get a bigger picture
There was a problem hiding this comment.
i added a j2 file for this :)
There was a problem hiding this comment.
I think using a template for JSON is not a very good idea, as you need to consider escape characters if the input contains something like " or \n when using a template to compose a JSON file. It's much cleaner and more error-proof to use Python's json library. This also saves us from installing the jinja2 package.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
yanksyoon
left a comment
There was a problem hiding this comment.
A few more nits, almost there! Thanks @florentianayuwono :)
| MODULE_PATH = ( | ||
| pathlib.Path(__file__).resolve().parent.parent / "enable_log_forwarding.py" | ||
| ) |
There was a problem hiding this comment.
Would there be a better way to load the module? I'm wondering if we can safely assume that the test will be run from project root and import using the path relative to project root.
There was a problem hiding this comment.
Since load_module function for a test seems like a bit of an overkill.
| MODULE = load_module(MODULE_PATH) | ||
|
|
||
|
|
||
| class TestEnableLogForwarding(unittest.TestCase): |
There was a problem hiding this comment.
Could we approach our tests with pytest and in functional testing style please? I think this is probably to align with our team's style guidelines :)
| class TestEnableLogForwarding(unittest.TestCase): | ||
| """Test suite for parsing, endpoint resolution, exporter detection, and config generation.""" | ||
|
|
||
| def test_parse_files_into_list(self): |
There was a problem hiding this comment.
Could the tests have arrange/act/assert pattern in functional testing paradigm? This is to align with our team's style guidelines :)
| SNAP_CMD = shutil.which("snap") | ||
| SUDO_CMD = shutil.which("sudo") |
There was a problem hiding this comment.
I think we should hardcode the paths for these - so that the executables are not dynamically loaded, for security purposes.
| mkdir_result = run_as_root( | ||
| "mkdir", "-p", CONFIG_DIR # create directory if it doesn't exist | ||
| ) |
| install_result = run_as_root( | ||
| "install", "-m", "0644", tmp_path, config_path # rw-r--r-- permissions | ||
| ) |
There was a problem hiding this comment.
Should we prefer Pythonic way to install (copy and chmod) rather than calling install command?
florentianayuwono
left a comment
There was a problem hiding this comment.
review session from weii
| @@ -0,0 +1,115 @@ | |||
| # Copyright 2026 Canonical Ltd. | |||
There was a problem hiding this comment.
for the folder name, we should use - instead of _
| "start_at": "end" | ||
| } | ||
| }, | ||
| {{ exporters_section }} |
There was a problem hiding this comment.
do a test to see if it won't cause a conflict if the exporter is not optional and there are two config files has the same exporter
| def check_exporter_exists() -> bool: | ||
| """Return whether the configured exporter is already defined in collector config files.""" | ||
| # Check for " otlp_grpc:" exporter definition | ||
| pattern = f"^ {re.escape(EXPORTER_NAME)}:[ \t]*$" |
There was a problem hiding this comment.
currently this way of checking is not very reliable, it's either:
- not needed, if there is no problem with same exporters exist in multiple config files
- use a python yaml library to analyze the yaml file content (ex: PyYAML)
| ("github.repository", os.getenv("GITHUB_REPOSITORY", "unknown")), | ||
| ("github.runner.name", os.getenv("RUNNER_NAME", "unknown")), | ||
| ("github.workflow", os.getenv("GITHUB_WORKFLOW", "unknown")), | ||
| ("github.job.id", os.getenv("GITHUB_JOB", "unknown")), |
There was a problem hiding this comment.
align the key name with https://github.com/canonical/github-runner-operator/pull/781/changes#diff-18077814ced5abb8b45d89d7940585ae954213dca8585ef4d48e693256da18e9 for the same ones
| finally: | ||
| os.unlink(tmp_path) | ||
|
|
||
| logger.info("Wrote log-forwarding collector config to: %s", config_path) |
There was a problem hiding this comment.
log the final content of the config file, if wanna be fancy can use github action logging format https://docs.github.com/en/enterprise-server@3.18/actions/reference/workflows-and-actions/workflow-commands#grouping-log-lines
| capture_output=True, | ||
| check=False, | ||
| ) | ||
| if snap_list_result.returncode != 0: |
There was a problem hiding this comment.
in case it's not installed just install it
| ## Prerequisites | ||
|
|
||
| - Use a self-hosted Linux runner. | ||
| - Install the `opentelemetry-collector` snap on the runner. |
There was a problem hiding this comment.
won't be needed if we install it for the runner in the action steps
| - Ensure the workflow can update `/etc/otelcol/config.d` with root privileges. | ||
|
|
||
| ## Provide inputs | ||
|
|
There was a problem hiding this comment.
ensure that you do not include any files that may contain sensitive information so that these info are not forwarded
erinecon
left a comment
There was a problem hiding this comment.
I approve the documentation changes 👍 since there's code-based changes in this PR, I didn't provide an official approval.
Spotted a couple more nits + responded to a Copilot comment
There was a problem hiding this comment.
I'm fine with this, would it be helpful if I opened an issue into the repo, or would you just prefer the Jira ticket?
Co-authored-by: Erin Conley <erin.conley@canonical.com>
What this PR does
Add action to allow workflow authors to opt in to forwarding specific log files from self-hosted GitHub runners to Loki through the OpenTelemetry Collector snap.
Why we need it
So that workflow author can opt-in to forwarding specific log files from their job to Grafana, and view application logs alongside system metrics without changing their existing logging approach.
Checklist
CONTRIBUTING.mdhas been updated upon changes to the contribution/development process (e.g. changes to the way tests are run)docs/changelog.mdwith user-relevant changes(e.g., in
.github/workflows/integration_tests.yaml, ensure themoduleslist is correct)terraform fmtpasses andtflintreports no errors