Skip to content

feat(csharp): implement TelemetryClientManager (WI-3.2)#172

Draft
jadewang-db wants to merge 17 commits intomainfrom
stack/telemetry-client-manager-wi-3.2
Draft

feat(csharp): implement TelemetryClientManager (WI-3.2)#172
jadewang-db wants to merge 17 commits intomainfrom
stack/telemetry-client-manager-wi-3.2

Conversation

@jadewang-db jadewang-db force-pushed the stack/telemetry-client-manager-wi-3.2 branch from 9930723 to 5e956d8 Compare January 22, 2026 22:24
@jadewang-db jadewang-db force-pushed the stack/telemetry-client-manager-wi-3.2 branch from 5e956d8 to 94b6786 Compare January 23, 2026 21:14
jadewang-db added a commit that referenced this pull request Jan 23, 2026
## 🥞 Stacked PR
Use this
[link](https://github.com/adbc-drivers/databricks/pull/161/files) to
review incremental changes.
-
[**stack/wi-1.2-tag-definition-system**](#161)
[[Files
changed](https://github.com/adbc-drivers/databricks/pull/161/files)]
-
[stack/wi-2.1-telemetry-data-models](#162)
[[Files
changed](https://github.com/adbc-drivers/databricks/pull/162/files/ab7fa964ff62f3fc9884034e17a7e57630fa8037..a566292aec78d19717c92e28f135535b09f25c80)]
-
[stack/wi-2.1-exception-classifier](#163)
[[Files
changed](https://github.com/adbc-drivers/databricks/pull/163/files/a566292aec78d19717c92e28f135535b09f25c80..baa7a2ae32662fddc65272e0264e8bb7d1644716)]
-
[stack/wi-3.1-circuit-breaker](#164)
[[Files
changed](https://github.com/adbc-drivers/databricks/pull/164/files/baa7a2ae32662fddc65272e0264e8bb7d1644716..03f7027e6731efe032c15555afe517ba49de3651)]
-
[stack/wi-3.1-feature-flag-cache](#165)
[[Files
changed](https://github.com/adbc-drivers/databricks/pull/165/files/03f7027e6731efe032c15555afe517ba49de3651..1d6e3d5b1c4c31ec91361337e574e6e5411fbbb6)]
-
[stack/wi-3.4-databricks-telemetry-exporter](#166)
[[Files
changed](https://github.com/adbc-drivers/databricks/pull/166/files/1d6e3d5b1c4c31ec91361337e574e6e5411fbbb6..eb382cb291c120a5f3cc3a1c38e0975b99c1369f)]
-
[stack/wi-3.5-metrics-aggregator](#167)
[[Files
changed](https://github.com/adbc-drivers/databricks/pull/167/files/eb382cb291c120a5f3cc3a1c38e0975b99c1369f..67723fabe6f62d7ed16591c3e88e96aa269daddd)]
-
[stack/wi-3.5-circuit-breaker-manager](#168)
[[Files
changed](https://github.com/adbc-drivers/databricks/pull/168/files/67723fabe6f62d7ed16591c3e88e96aa269daddd..6b66d37e9d97ca621d88c48a58ac60b2487425ea)]
-
[stack/e2e-feature-flag-cache-tests](#169)
[[Files
changed](https://github.com/adbc-drivers/databricks/pull/169/files/6b66d37e9d97ca621d88c48a58ac60b2487425ea..2a6fff2b9b91c7fd6cff7558d1d3b3596c0fa3c2)]
-
[stack/databricks-activity-listener](#170)
[[Files
changed](https://github.com/adbc-drivers/databricks/pull/170/files/2a6fff2b9b91c7fd6cff7558d1d3b3596c0fa3c2..39f6aed55278a533390e9aadf655f80dc11159c2)]
-
[stack/circuit-breaker-telemetry-exporter](#171)
[[Files
changed](https://github.com/adbc-drivers/databricks/pull/171/files/39f6aed55278a533390e9aadf655f80dc11159c2..4473de5ca3cfca8579818e6d58f8a2b12e869a47)]
-
[stack/telemetry-client-manager-wi-3.2](#172)
[[Files
changed](https://github.com/adbc-drivers/databricks/pull/172/files/4473de5ca3cfca8579818e6d58f8a2b12e869a47..94b678636d76a6d41a6612f76d00b4caccdab48a)]
-
[stack/telemetry-client-wi-5.5](#173)
[[Files
changed](https://github.com/adbc-drivers/databricks/pull/173/files/94b678636d76a6d41a6612f76d00b4caccdab48a..ce00998cbd0372d94303ad1d69e9711e4489fe96)]
-
[stack/telemetry-client-manager-e2e-wi-7](#174)
[[Files
changed](https://github.com/adbc-drivers/databricks/pull/174/files/ce00998cbd0372d94303ad1d69e9711e4489fe96..2646e86223ff1e7706b20d5970e556ec2f17867b)]
-
[stack/telemetry-client-e2e-tests-wi-7-standalone](#175)
[[Files
changed](https://github.com/adbc-drivers/databricks/pull/175/files/2646e86223ff1e7706b20d5970e556ec2f17867b..0b9ebd3867250d92d0d8007cb17d6ce471d5560a)]
-
[stack/wi-6.1-databricks-connection-telemetry-integration](#176)
[[Files
changed](https://github.com/adbc-drivers/databricks/pull/176/files/0b9ebd3867250d92d0d8007cb17d6ce471d5560a..4f553284c30eb7efcf67369c58dddd56675cd0be)]
-
[stack/wi-6.2-telemetry-tags-driver-activities](#177)
[[Files
changed](https://github.com/adbc-drivers/databricks/pull/177/files/4f553284c30eb7efcf67369c58dddd56675cd0be..1f7cde0c5642072b06588665b16ee3a30a90d256)]
-
[stack/wi-9-full-integration-e2e-tests](#178)
[[Files
changed](https://github.com/adbc-drivers/databricks/pull/178/files/1f7cde0c5642072b06588665b16ee3a30a90d256..c65e9fea7c65fa456f0114e95c867ee15f21bd87)]

---------

---------

Co-authored-by: Jade Wang <jade.wang+data@databricks.com>
Co-authored-by: Claude <noreply@anthropic.com>
@jadewang-db jadewang-db force-pushed the stack/telemetry-client-manager-wi-3.2 branch from 94b6786 to 75039c6 Compare January 23, 2026 21:43
jadewang-db added a commit that referenced this pull request Jan 23, 2026
## 🥞 Stacked PR
Use this
[link](https://github.com/adbc-drivers/databricks/pull/162/files) to
review incremental changes.
-
[**stack/wi-2.1-telemetry-data-models**](#162)
[[Files
changed](https://github.com/adbc-drivers/databricks/pull/162/files)]
-
[stack/wi-2.1-exception-classifier](#163)
[[Files
changed](https://github.com/adbc-drivers/databricks/pull/163/files/1e58d3c3785fa7ec1b83da01f80ddea1f6167851..0dac01831e7d9d313c67dc31e4aacceb17e74298)]
-
[stack/wi-3.1-circuit-breaker](#164)
[[Files
changed](https://github.com/adbc-drivers/databricks/pull/164/files/0dac01831e7d9d313c67dc31e4aacceb17e74298..59b0221cb4c9262d80a35041a2f1098376f6e19e)]
-
[stack/wi-3.1-feature-flag-cache](#165)
[[Files
changed](https://github.com/adbc-drivers/databricks/pull/165/files/59b0221cb4c9262d80a35041a2f1098376f6e19e..8c30fc0649b09bc38e09cfd4d6875d66963ff6c0)]
-
[stack/wi-3.4-databricks-telemetry-exporter](#166)
[[Files
changed](https://github.com/adbc-drivers/databricks/pull/166/files/8c30fc0649b09bc38e09cfd4d6875d66963ff6c0..a6e926c8017e9a3b3b6de31bbbafb367adaba884)]
-
[stack/wi-3.5-metrics-aggregator](#167)
[[Files
changed](https://github.com/adbc-drivers/databricks/pull/167/files/a6e926c8017e9a3b3b6de31bbbafb367adaba884..c53df5d3c0124c490b920e1e1a611dd9c24e02a4)]
-
[stack/wi-3.5-circuit-breaker-manager](#168)
[[Files
changed](https://github.com/adbc-drivers/databricks/pull/168/files/c53df5d3c0124c490b920e1e1a611dd9c24e02a4..de8757a697dd023628011d1aff9961896560bc95)]
-
[stack/e2e-feature-flag-cache-tests](#169)
[[Files
changed](https://github.com/adbc-drivers/databricks/pull/169/files/de8757a697dd023628011d1aff9961896560bc95..0b77f8373958342da429c20f7e30c02105402331)]
-
[stack/databricks-activity-listener](#170)
[[Files
changed](https://github.com/adbc-drivers/databricks/pull/170/files/0b77f8373958342da429c20f7e30c02105402331..9090bdefba63d6c7fbff45bf60c2c63668f3884e)]
-
[stack/circuit-breaker-telemetry-exporter](#171)
[[Files
changed](https://github.com/adbc-drivers/databricks/pull/171/files/9090bdefba63d6c7fbff45bf60c2c63668f3884e..0a0159524a429726078bd7340057672d6927d1cd)]
-
[stack/telemetry-client-manager-wi-3.2](#172)
[[Files
changed](https://github.com/adbc-drivers/databricks/pull/172/files/0a0159524a429726078bd7340057672d6927d1cd..75039c6574c2dc437f5d670e71b938b98719c06f)]
-
[stack/telemetry-client-wi-5.5](#173)
[[Files
changed](https://github.com/adbc-drivers/databricks/pull/173/files/75039c6574c2dc437f5d670e71b938b98719c06f..254cdc75487f3e9344d3df6fb9b9cbf49fd03228)]
-
[stack/telemetry-client-manager-e2e-wi-7](#174)
[[Files
changed](https://github.com/adbc-drivers/databricks/pull/174/files/254cdc75487f3e9344d3df6fb9b9cbf49fd03228..7371da59309d109e8d457f4c27edd13adfa38a2c)]
-
[stack/telemetry-client-e2e-tests-wi-7-standalone](#175)
[[Files
changed](https://github.com/adbc-drivers/databricks/pull/175/files/7371da59309d109e8d457f4c27edd13adfa38a2c..5ff7e96827faa69e8bae1d5b5da06a9f95b91a8c)]
-
[stack/wi-6.1-databricks-connection-telemetry-integration](#176)
[[Files
changed](https://github.com/adbc-drivers/databricks/pull/176/files/5ff7e96827faa69e8bae1d5b5da06a9f95b91a8c..7757345889dbfd0b1dcb22556e2e6c746d7fa0f0)]
-
[stack/wi-6.2-telemetry-tags-driver-activities](#177)
[[Files
changed](https://github.com/adbc-drivers/databricks/pull/177/files/7757345889dbfd0b1dcb22556e2e6c746d7fa0f0..2364122ad5402c9205008f39acaec6a400a4db98)]
-
[stack/wi-9-full-integration-e2e-tests](#178)
[[Files
changed](https://github.com/adbc-drivers/databricks/pull/178/files/2364122ad5402c9205008f39acaec6a400a4db98..698f3ea13f65a17b62385be8e8e4032497f88993)]

---------

---------

Co-authored-by: Jade Wang <jade.wang+data@databricks.com>
Co-authored-by: Claude <noreply@anthropic.com>
@jadewang-db jadewang-db force-pushed the stack/telemetry-client-manager-wi-3.2 branch from 75039c6 to 4e3aeb0 Compare January 26, 2026 22:43
@jadewang-db jadewang-db force-pushed the stack/telemetry-client-manager-wi-3.2 branch from 4e3aeb0 to f47d08c Compare January 26, 2026 23:01
Jade Wang and others added 9 commits February 3, 2026 18:10
Implements per-host feature flag caching with reference counting to avoid
repeated API calls and rate limiting. Key features:

- FeatureFlagContext: Holds cached telemetry enabled state, last fetched
  timestamp, reference count, and configurable cache duration (default 15 min)
- FeatureFlagCache: Singleton managing per-host contexts with thread-safe
  ConcurrentDictionary storage

API:
- GetInstance(): Returns the singleton instance
- GetOrCreateContext(host): Creates/returns context and increments RefCount
- ReleaseContext(host): Decrements RefCount, removes context when zero
- IsTelemetryEnabledAsync(): Returns cached value if valid, otherwise fetches

Thread safety ensured via ConcurrentDictionary and Interlocked operations.
Includes 46 comprehensive unit tests covering all exit criteria.

Co-Authored-By: Claude <noreply@anthropic.com>
…(WI-3.1)

Refactored FeatureFlagCache based on updated design doc requirements:

- Moved from Telemetry namespace to root namespace (AdbcDrivers.Databricks)
  to make it a generic, reusable component
- Added HTTP API integration to fetch flags from
  /api/2.0/connector-service/feature-flags/OSS_JDBC/{version}
- Implemented background refresh scheduler with server-provided TTL
- Added FeatureFlagsResponse model for API response parsing
- Updated FeatureFlagContext interface:
  - GetFlagValue(string) - get individual flag value
  - GetAllFlags() - get all cached flags as dictionary
  - IsFeatureEnabled(string) - check if flag is "true"
  - Shutdown() - stop background refresh scheduler
  - IDisposable for proper cleanup
- Updated FeatureFlagCache.GetOrCreateContext() to accept HttpClient
  and driver version parameters
- Updated all unit tests for new interface

Co-Authored-By: Claude <noreply@anthropic.com>
…I-3.1)

Integrated feature flag cache into the connection lifecycle:

- Fetch feature flags from server during connection initialization
- Merge flags into Properties dictionary with proper priority:
  User Properties > Feature Flags > Driver Defaults
- Track host for proper context cleanup on Dispose
- Release feature flag context when connection is disposed
- All feature flag operations are fail-safe (errors logged, not thrown)

The feature flag endpoint used is:
GET /api/2.0/connector-service/feature-flags/OSS_JDBC/{driver_version}

Co-Authored-By: Claude <noreply@anthropic.com>
…-3.1)

- Add EnsureSuccessStatusCode pattern for HTTP response handling
- Extract common HTTP fetch code into single FetchFeatureFlags method
- Make feature flag endpoint configurable via optional parameter
- Replace Debug.WriteLine with Activity trace pattern
- Add E2E tests for FeatureFlagCache using real Databricks instance

Co-Authored-By: Claude (databricks-claude-opus-4-5) <noreply@anthropic.com>
…WI-3.1)

- Move MergePropertiesWithFeatureFlags, TryGetHost, CreateFeatureFlagHttpClient,
  and MergeProperties helper methods from DatabricksConnection to FeatureFlagCache
- Replace Debug.WriteLine with ActivitySource tracing for structured events
- DatabricksConnection now delegates to FeatureFlagCache.GetInstance().MergePropertiesWithFeatureFlags()

Co-Authored-By: Claude (databricks-claude-opus-4-5) <noreply@anthropic.com>
…ructor (WI-3.1)

Replace hardcoded "1.0.0" with ApacheUtility.GetAssemblyVersion() to use the
actual driver version in the test constructor.

Co-Authored-By: Claude (databricks-claude-opus-4-5) <noreply@anthropic.com>
…-3.1)

- Add proxy support using HiveServer2ProxyConfigurator
- Handle protocol prefix in host (e.g., "https://myhost.databricks.com")
- Add configurable timeout via FeatureFlagTimeoutSeconds parameter
- Use consistent User-Agent format: DatabricksJDBCDriverOSS/{version} (ADBC)
- Rename variables to localProperties/remoteProperties for clarity
- Remove IsFeatureEnabled method from FeatureFlagContext
- Use EnsureSuccessOrThrow extension method for HTTP error handling
- Enhance E2E tests to verify flags fetched and cache cleanup

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add support for OAuth client_credentials (M2M) authentication in addition
to token-based (PAT) auth for feature flag API calls. This ensures feature
flags work with all supported authentication methods.

- Add AuthHelper class with shared token extraction methods
- Update FeatureFlagCache to use AuthHelper.GetAccessToken
- Update HttpHandlerFactory to use AuthHelper.GetTokenFromProperties

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@jadewang-db jadewang-db force-pushed the stack/telemetry-client-manager-wi-3.2 branch from 5554e76 to 5b4756a Compare February 3, 2026 18:24
Jade Wang and others added 8 commits February 3, 2026 18:31
Move the test factory method from production to test code:
- Make FeatureFlagContext constructor internal instead of private
- Make Ttl setter internal to allow tests to configure TTL
- Remove CreateForTesting from FeatureFlagContext.cs
- Add CreateTestContext helper method in FeatureFlagCacheTests.cs

This addresses the PR review feedback that test-only code should
not be in production source files.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implement the HTTP exporter that sends telemetry events to Databricks service.

Key features:
- ITelemetryExporter interface with ExportAsync method
- Creates TelemetryRequest wrapper with uploadTime and protoLogs
- Uses /telemetry-ext for authenticated requests
- Uses /telemetry-unauth for unauthenticated requests
- Implements retry logic for transient failures
- Uses ExceptionClassifier for terminal vs retryable errors
- Never throws exceptions (all swallowed and logged at TRACE level)
- Cancellation is propagated (not swallowed)

Files added:
- src/Telemetry/ITelemetryExporter.cs
- src/Telemetry/DatabricksTelemetryExporter.cs
- test/Unit/Telemetry/DatabricksTelemetryExporterTests.cs

Co-Authored-By: Claude <noreply@anthropic.com>
Implement MetricsAggregator that aggregates Activity data by statement_id
and handles exception buffering with terminal vs retryable classification.

Key features:
- ProcessActivity extracts tags and aggregates by statement_id using
  ConcurrentDictionary<string, StatementTelemetryContext>
- CompleteStatement emits aggregated TelemetryEvent
- RecordException flushes terminal exceptions immediately
- RecordException buffers retryable exceptions until CompleteStatement
- FlushAsync exports when batch size or time interval reached
- Uses TelemetryTagRegistry to filter tags
- Creates TelemetryFrontendLog wrapper with workspace_id
- All exceptions swallowed and logged at TRACE level

Implementation details:
- Connection events emit immediately (no aggregation needed)
- Statement events aggregate until CompleteStatement is called
- Timer-based periodic flush using System.Threading.Timer
- Thread-safe aggregation using ConcurrentDictionary
- Nested StatementTelemetryContext holds aggregated metrics and
  buffered exceptions per statement

Test coverage:
- 29 unit tests covering all exit criteria
- Tests for exception handling, tag filtering, frontend log wrapping
- End-to-end statement lifecycle tests

Co-Authored-By: Claude <noreply@anthropic.com>
Implement CircuitBreakerManager as a singleton that manages circuit
breakers per host. Each host gets its own circuit breaker instance
for isolation, preventing one failing endpoint from affecting others.

Key features:
- Singleton pattern with GetInstance() method
- Per-host circuit breaker isolation using ConcurrentDictionary
- Thread-safe concurrent access
- Case-insensitive host matching
- Support for both default and custom configurations

This follows the JDBC driver pattern in CircuitBreakerManager.java.

Co-Authored-By: Claude <noreply@anthropic.com>
Add comprehensive E2E tests for feature flag fetching from real Databricks
endpoints and validate caching and reference counting behavior:

- FeatureFlagCache_FetchFromRealEndpoint_ReturnsBoolean: Tests real endpoint
- FeatureFlagCache_CachesValue_DoesNotRefetchWithinTTL: Validates caching
- FeatureFlagCache_InvalidHost_ReturnsDefaultFalse: Tests error handling
- FeatureFlagCache_RefCountingWorks_CleanupAfterRelease: Tests ref counting

Additional tests cover:
- Cache expiry and refetch behavior
- Null/empty host handling
- Unknown host behavior
- Multiple hosts with independent ref counts
- Concurrent reference counting thread safety
- False value caching
- Cancellation propagation

Co-Authored-By: Claude <noreply@anthropic.com>
Add DatabricksActivityListener that listens to 'Databricks.Adbc.Driver'
ActivitySource, extracts metrics from activities, and delegates to
MetricsAggregator. This implements Phase 5 of the telemetry design.

Key features:
- ShouldListenTo returns true for 'Databricks.Adbc.Driver' source
- Sample callback respects feature flag (AllDataAndRecorded when enabled,
  None when disabled)
- ActivityStopped callback delegates to MetricsAggregator.ProcessActivity
- All callbacks wrapped in try-catch with TRACE logging
- StopAsync flushes pending metrics via MetricsAggregator.FlushAsync
- Supports dynamic feature flag checking via optional Func<bool>

Co-Authored-By: Claude <noreply@anthropic.com>
Implement wrapper exporter that protects inner telemetry exporter with
circuit breaker pattern.

Key features:
- Wraps ITelemetryExporter with circuit breaker protection
- Uses CircuitBreakerManager.GetCircuitBreaker(host) for per-host isolation
- Exports events when circuit is closed
- Drops events silently when circuit is open (logged at DEBUG level)
- Circuit breaker tracks failures BEFORE exceptions are swallowed

This follows the design in Section 3.3 of the telemetry design document.

Co-Authored-By: Claude <noreply@anthropic.com>
Implement per-host telemetry client management with reference counting
to prevent rate limiting from concurrent connections.

- ITelemetryClient: Interface for telemetry clients with ExportAsync
  and CloseAsync methods
- TelemetryClientHolder: Holds client and reference count with atomic
  operations using Interlocked
- TelemetryClientManager: Singleton factory managing one client per
  host using ConcurrentDictionary for thread-safety
- TelemetryClientAdapter: Adapter bridging ITelemetryExporter to
  ITelemetryClient interface

Key features:
- GetInstance() returns singleton
- GetOrCreateClient() creates/returns client and increments RefCount
- ReleaseClientAsync() decrements RefCount, closes client when zero
- Same host returns same client instance (case-insensitive)
- Thread-safe with ConcurrentDictionary and atomic ref counting
- All exceptions swallowed per telemetry design requirement

Co-Authored-By: Claude <noreply@anthropic.com>
@jadewang-db jadewang-db force-pushed the stack/telemetry-client-manager-wi-3.2 branch from 5b4756a to 249f1db Compare February 3, 2026 18:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant