Skip to content

add OpenTelemetry backend for work unit reporting#23284

Open
tdyas wants to merge 18 commits intomainfrom
tdyas/observability/opentelemetry
Open

add OpenTelemetry backend for work unit reporting#23284
tdyas wants to merge 18 commits intomainfrom
tdyas/observability/opentelemetry

Conversation

@tdyas
Copy link
Copy Markdown
Contributor

@tdyas tdyas commented Apr 22, 2026

Overview

Add a new pants.backend.observability.opentelemetry backend to report work unit tracing to OpenTelemetry. The backend is based on shoalsoft-pants-opentelemetry-plugin with unnecessary compatibility code and "shoalsoft" branding removed.

Notes:

Lockfile

    Lockfile diff: 3rdparty/python/user_reqs.lock [python-default]

    ==                    Upgraded dependencies                     ==

      anyio                          4.12.1       -->   4.13.0
      certifi                        2026.1.4     -->   2026.4.22
      charset-normalizer             3.4.4        -->   3.4.7
      click                          8.3.1        -->   8.3.2
      cross-web                      0.4.1        -->   0.6.0
      cryptography                   46.0.5       -->   46.0.7
      graphql-core                   3.2.7        -->   3.2.8
      idna                           3.11         -->   3.12
      librt                          0.8.1        -->   0.9.0
      pydantic                       2.12.5       -->   2.13.3
      pydantic-core                  2.41.5       -->   2.46.3
      pygments                       2.19.2       -->   2.20.0
      pyjwt                          2.11.0       -->   2.12.1
      python-dotenv                  1.2.1        -->   1.2.2
      python-multipart               0.0.22       -->   0.0.26
      ujson                          5.11.0       -->   5.12.0

    ==                      Added dependencies                      ==

      googleapis-common-protos       1.74.0
      importlib-metadata             8.7.1
      opentelemetry-api              1.41.0
      opentelemetry-exporter-otlp-proto-common 1.41.0
      opentelemetry-exporter-otlp-proto-http 1.41.0
      opentelemetry-proto            1.41.0
      opentelemetry-sdk              1.41.0
      opentelemetry-semantic-conventions 0.62b0
      protobuf                       6.33.6
      zipp                           3.23.1

tdyas added 12 commits April 22, 2026 17:08
```
Lockfile diff: 3rdparty/python/user_reqs.lock [python-default]

==                    Upgraded dependencies                     ==

  anyio                          4.12.1       -->   4.13.0
  certifi                        2026.1.4     -->   2026.4.22
  charset-normalizer             3.4.4        -->   3.4.7
  click                          8.3.1        -->   8.3.2
  cross-web                      0.4.1        -->   0.6.0
  cryptography                   46.0.5       -->   46.0.7
  graphql-core                   3.2.7        -->   3.2.8
  idna                           3.11         -->   3.12
  librt                          0.8.1        -->   0.9.0
  pydantic                       2.12.5       -->   2.13.3
  pydantic-core                  2.41.5       -->   2.46.3
  pygments                       2.19.2       -->   2.20.0
  pyjwt                          2.11.0       -->   2.12.1
  python-dotenv                  1.2.1        -->   1.2.2
  python-multipart               0.0.22       -->   0.0.26
  ujson                          5.11.0       -->   5.12.0

==                      Added dependencies                      ==

  googleapis-common-protos       1.74.0
  importlib-metadata             8.7.1
  opentelemetry-api              1.41.0
  opentelemetry-exporter-otlp-proto-common 1.41.0
  opentelemetry-exporter-otlp-proto-http 1.41.0
  opentelemetry-proto            1.41.0
  opentelemetry-sdk              1.41.0
  opentelemetry-semantic-conventions 0.62b0
  protobuf                       6.33.6
  zipp                           3.23.1
```
@tdyas
Copy link
Copy Markdown
Contributor Author

tdyas commented Apr 22, 2026

AI disclosure: AI had been in used parts of the original shoalsoft-pants-opentelemetry-plugin, but no AI was used in the migration of the existing sources into Pants in this PR.

@tdyas
Copy link
Copy Markdown
Contributor Author

tdyas commented Apr 22, 2026

cc @wisechengyi

@tdyas
Copy link
Copy Markdown
Contributor Author

tdyas commented Apr 22, 2026

src/python/pants/init/plugin_resolver_test.py is failing even though this PR does nothing to change the plugin resolution system due to a mismatch on the requests version.. https://github.com/pantsbuild/pants/actions/runs/24785703939/job/72530152547#step:12:461

@tdyas
Copy link
Copy Markdown
Contributor Author

tdyas commented Apr 22, 2026

Lint failure where ./pants fix and ./pants fmt make no changes despite there being an import sorting issue highlighted. https://github.com/pantsbuild/pants/actions/runs/24785703939/job/72530152642?pr=23284#step:7:482

Copy link
Copy Markdown
Contributor

@benjyw benjyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mostly skimmed this, as it's pretty isolated from the rest of the codebase and we know it works.

I wish we didn't have to have Pants depend on the opentelemetry requirements unless you opt in to the plugin, but that's for another day. At least those requirements are robust and well-maintained, so I'm not more concerned about them than I am about other 3rd-party deps, although every external requirement is a potential supply chain attack vector.

@sureshjoshi , I know you have opinions here.

Comment thread docs/notes/2.33.x.md Outdated
Comment thread src/python/pants/backend/observability/opentelemetry/BUILD Outdated
@benjyw
Copy link
Copy Markdown
Contributor

benjyw commented Apr 22, 2026

The one dep I am leery of is zipp. I'm sad that otel relies on it.

@tdyas
Copy link
Copy Markdown
Contributor Author

tdyas commented Apr 22, 2026

The one dep I am leery of is zipp. I'm sad that otel relies on it.

Seems to be a test requirement based on these search results: https://github.com/search?q=repo%3Aopen-telemetry%2Fopentelemetry-python%20zipp&type=code

Maybe we should just exclude it from the Pants resolve?

@sureshjoshi
Copy link
Copy Markdown
Member

sureshjoshi commented Apr 22, 2026

@sureshjoshi , I know you have opinions here.

Yes. [...rage builds...]

So, I just glanced over the code - I'm trusting it's just a copy/paste basically of the existing plugin, so whatever, it's already been running for quite some time.

Kinda sucks to have requirements be a dumping ground, whether or not a backend is in use. Would this plugin still work if we just vendored it as a pantsbuild/...plugin..?

Essentially, I'm just wondering what "opt-in" we have. As a practical usage, I run pants on one of my older/slower computers (which does a pretty good job emulating a github action), and the network side is already pretty slow.

Also, just a note. I ran into this, and was literally looking at a GH ticket earlier this week, but pants bootstrapping bypasses lockfiles, so I'd argue we have a higher than normal supply chain risk. That's not directly related to this PR - just to the supply chain comment.

@tdyas
Copy link
Copy Markdown
Contributor Author

tdyas commented Apr 22, 2026

Kinda sucks to have requirements be a dumping ground, whether or not a backend is in use. Would this plugin still work if we just vendored it as a pantsbuild/...plugin..?

I believe built-in backends can have a requirements.txt which is applied when plugin resolution occurs?

@tdyas
Copy link
Copy Markdown
Contributor Author

tdyas commented Apr 22, 2026

So, I just glanced over the code - I'm trusting it's just a copy/paste basically of the existing plugin, so whatever, it's already been running for quite some time.

It is mostly a straight copy/paste. The main changes are removing compatibility code needed to have this plugin work across a range of Pants versions, simplifying the integration test to not use my fork of the integration test helper (which was done to support testing against multiple Pants versions), and renaming the package given its new location.

@tdyas
Copy link
Copy Markdown
Contributor Author

tdyas commented Apr 22, 2026

@sureshjoshi: You can also review the individual commits to see the changes. The original plugin is copied as-is in the first commit, then edited by the remaining commits.

@sureshjoshi
Copy link
Copy Markdown
Member

@sureshjoshi: You can also review the individual commits to see the changes. The original plugin is copied as-is in the first commit, then edited by the remaining commits.

Yeah, I might do that, but 🤷🏽 - more interested in the surface area of dependencies, in general anyways.

@sureshjoshi
Copy link
Copy Markdown
Member

Is this PR alone intended to remove PANTS_SHOALSOFT_OPENTELEMETRY_ENABLED from CI, or subsequent PR?

@tdyas
Copy link
Copy Markdown
Contributor Author

tdyas commented Apr 23, 2026

Is this PR alone intended to remove PANTS_SHOALSOFT_OPENTELEMETRY_ENABLED from CI, or subsequent PR?

This PR doesn't touch the CI configuration to send work units to Honeycomb. That'll be a subsequent PR since messing with CI configuration should be done in a distinct change.

Copy link
Copy Markdown
Contributor

@cburroughs cburroughs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few very minor comments that can be followup if you are trying to minimize drift on initial import.

Thank you for contributing this. I think it is going to be a great capability. Is the spooky lint failure the last blocker before merging?

(_MessageType.FINISH, _FinishDetails(timeout=timeout, context=context), context)
)
if not self._finish_completed_event.wait(
timeout=timeout.total_seconds() * 1000.0 if timeout is not None else None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I'm following the conversion correctly. Doesn't https://docs.python.org/3/library/threading.html#threading.Event.wait take seconds already? (Why the *1000)

"""
The key-value pairs to be used as headers for spans associated with gRPC or HTTP requests.

This option is consumed by both the `{TracingExporterId.HTTP.value}` and `{TracingExporterId.GRPC.value}`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to be an f string for the {} formatting?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants