-
Notifications
You must be signed in to change notification settings - Fork 776
Fix: Reinitialize gRPC channel on UNAVAILABLE error (Fixes #4517) (Fixes #4529) #4825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix: Reinitialize gRPC channel on UNAVAILABLE error (Fixes #4517) (Fixes #4529) #4825
Conversation
c670f77 to
b7620d0
Compare
b7620d0 to
436ecc9
Compare
|
I understand this issue is related to the upstream gRPC bug (grpc/grpc#38290). I've analyzed that issue in depth, and the root cause appears to be a regression in the gRPC 'backup poller' (introduced in grpcio>=1.68.0) which fails to recover connections when the primary EventEngine is disabled (common in Python for fork safety). While upstream fixes are being explored (e.g., grpc/grpc#38480), the issue has persisted for months, leaving exporters stuck in an UNAVAILABLE state indefinitely after collector restarts. This PR implements a robust mitigation: detecting the persistent UNAVAILABLE state and forcing a channel re-initialization. This effectively resets the underlying poller state, allowing the exporter to recover immediately without requiring a full application restart. This approach provides stability for users while the complex upstream fix is finalized. |
...pentelemetry-exporter-otlp-proto-grpc/src/opentelemetry/exporter/otlp/proto/grpc/exporter.py
Outdated
Show resolved
Hide resolved
...pentelemetry-exporter-otlp-proto-grpc/src/opentelemetry/exporter/otlp/proto/grpc/exporter.py
Outdated
Show resolved
Hide resolved
...pentelemetry-exporter-otlp-proto-grpc/src/opentelemetry/exporter/otlp/proto/grpc/exporter.py
Outdated
Show resolved
Hide resolved
...pentelemetry-exporter-otlp-proto-grpc/src/opentelemetry/exporter/otlp/proto/grpc/exporter.py
Outdated
Show resolved
Hide resolved
...pentelemetry-exporter-otlp-proto-grpc/src/opentelemetry/exporter/otlp/proto/grpc/exporter.py
Outdated
Show resolved
Hide resolved
...pentelemetry-exporter-otlp-proto-grpc/src/opentelemetry/exporter/otlp/proto/grpc/exporter.py
Outdated
Show resolved
Hide resolved
...pentelemetry-exporter-otlp-proto-grpc/src/opentelemetry/exporter/otlp/proto/grpc/exporter.py
Outdated
Show resolved
Hide resolved
...pentelemetry-exporter-otlp-proto-grpc/src/opentelemetry/exporter/otlp/proto/grpc/exporter.py
Outdated
Show resolved
Hide resolved
...pentelemetry-exporter-otlp-proto-grpc/src/opentelemetry/exporter/otlp/proto/grpc/exporter.py
Outdated
Show resolved
Hide resolved
...pentelemetry-exporter-otlp-proto-grpc/src/opentelemetry/exporter/otlp/proto/grpc/exporter.py
Outdated
Show resolved
Hide resolved
...pentelemetry-exporter-otlp-proto-grpc/src/opentelemetry/exporter/otlp/proto/grpc/exporter.py
Outdated
Show resolved
Hide resolved
...pentelemetry-exporter-otlp-proto-grpc/src/opentelemetry/exporter/otlp/proto/grpc/exporter.py
Outdated
Show resolved
Hide resolved
…mments - Remove aggressive gRPC keepalive and retry settings to rely on defaults. - Fix compression precedence logic to correctly handle NoCompression (0). - Refactor channel initialization to be stateless (remove _channel_reconnection_enabled).- Update documentation to refer to 'OTLP-compatible receiver'
Description
This PR fixes issue #4517 where the OTLP gRPC exporter fails to reconnect to the collector after a restart (returning
UNAVAILABLE).Changes:
StatusCode.UNAVAILABLEin the export loop.Fixes #4517
Type of change
How Has This Been Tested?
I added a new regression test case test_unavailable_reconnects in exporter/opentelemetry-exporter-otlp-proto-grpc/tests/test_otlp_exporter_mixin.py.
StatusCode.UNAVAILABLE.Does This PR Require a Contrib Repo Change?
Checklist: