Skip to content

Conversation

@samikshya-db
Copy link
Contributor

@samikshya-db samikshya-db commented Dec 1, 2025

There were 2 gaps in telemetry python before:

  1. Per host batching was not present, which caused our pre-release steps to fail [Link]
  2. log levels could be misleading : all telemetry log levels should be as low as possible.

Key changes:

  • Add _TelemetryClientHolder with reference counting for shared clients
  • Change TelemetryClientFactory to key clients by host_url instead of session_id
  • Add getHostUrlSafely() helper for defensive null handling
  • Update all callers (client.py, exc.py, latency_logger.py) to pass host_url

Before: 100 connections to same host = 100 separate TelemetryClients
After: 100 connections to same host = 1 shared TelemetryClient (refcount=100)

This fixes rate limiting issues seen in e2e tests where 300+ parallel connections were overwhelming the telemetry endpoint with 429 errors.

What type of PR is this?

  • Refactor
  • Feature
  • Bug Fix
  • Other

Description

How is this tested?

  • Unit tests
  • E2E Tests
  • Manually
  • N/A

Related Tickets & Documents

Changes telemetry client architecture from per-session to per-host batching,
matching the JDBC driver implementation. This reduces the number of HTTP
requests to the telemetry endpoint and prevents rate limiting in test
environments.

Key changes:
- Add _TelemetryClientHolder with reference counting for shared clients
- Change TelemetryClientFactory to key clients by host_url instead of session_id
- Add getHostUrlSafely() helper for defensive null handling
- Update all callers (client.py, exc.py, latency_logger.py) to pass host_url

Before: 100 connections to same host = 100 separate TelemetryClients
After:  100 connections to same host = 1 shared TelemetryClient (refcount=100)

This fixes rate limiting issues seen in e2e tests where 300+ parallel
connections were overwhelming the telemetry endpoint with 429 errors.
@github-actions
Copy link

github-actions bot commented Dec 1, 2025

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Reduces log noise by changing all telemetry-related log statements
(info, warning, error) to debug level. Telemetry operations are
background tasks and should not clutter logs with operational messages.

Changes:
- Circuit breaker state changes: info/warning -> debug
- Telemetry send failures: error -> debug
- All telemetry operations now consistently use debug level
@github-actions
Copy link

github-actions bot commented Dec 1, 2025

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Changes remaining logger.warning in telemetry_push_client.py to debug level
for consistency with other telemetry logging.
@github-actions
Copy link

github-actions bot commented Dec 1, 2025

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

- Update circuit breaker test to check logger.debug instead of logger.info
- Replace all session_id_hex test parameters with host_url
- Apply Black formatting to exc.py and telemetry_client.py

This fixes test failures caused by the signature change from session_id_hex
to host_url in the Error class and TelemetryClientFactory.
@github-actions
Copy link

github-actions bot commented Dec 1, 2025

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Only Error classes changed from session_id_hex to host_url.
Other classes (TelemetryClient, ResultSetDownloadHandler, etc.) still use session_id_hex.

Reverted:
- test_telemetry.py: TelemetryClient and initialize_telemetry_client
- test_downloader.py: ResultSetDownloadHandler
- test_download_manager.py: ResultFileDownloadManager

Kept as host_url:
- test_client.py: Error class instantiation
@github-actions
Copy link

github-actions bot commented Dec 1, 2025

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Changes:
1. client.py: Changed all error raises from session_id_hex to host_url
   - Connection class: session_id_hex=self.get_session_id_hex() -> host_url=self.session.host
   - Cursor class: session_id_hex=self.connection.get_session_id_hex() -> host_url=self.connection.session.host

2. test_telemetry.py: Updated get_telemetry_client() and close() calls
   - get_telemetry_client(session_id) -> get_telemetry_client(host_url)
   - close(session_id) -> close(host_url=host_url)

3. test_telemetry_push_client.py: Changed logger.warning to logger.debug
   - Updated test assertion to match debug logging level

These changes complete the migration from session-level to host-level
telemetry client management.
@github-actions
Copy link

github-actions bot commented Dec 1, 2025

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Changes:
1. Added self._host attribute to store server_hostname
2. Updated all error raises to use host_url=self._host
3. Changed method signatures from session_id_hex to host_url:
   - _check_response_for_error
   - _hive_schema_to_arrow_schema
   - _col_to_description
   - _hive_schema_to_description
   - _check_direct_results_for_error
4. Updated all method calls to pass self._host instead of self._session_id_hex

This completes the migration from session-level to host-level error reporting.
@github-actions
Copy link

github-actions bot commented Dec 1, 2025

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Moved the `# fmt: on` directive to the except block level instead
of inside the if statement to resolve Black parsing confusion.
@github-actions
Copy link

github-actions bot commented Dec 2, 2025

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

@github-actions
Copy link

github-actions bot commented Dec 2, 2025

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

@github-actions
Copy link

github-actions bot commented Dec 2, 2025

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

@github-actions
Copy link

github-actions bot commented Dec 2, 2025

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

@github-actions
Copy link

github-actions bot commented Dec 2, 2025

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

@github-actions
Copy link

github-actions bot commented Dec 2, 2025

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link
Contributor

@nikhilsuri-db nikhilsuri-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

  1. please manually run Daily Telemetry E2E Tests
  2. Can you also fix [daily-telemetry-e2e.yml ] to run daily not evey sunday [ seems gap there ]

@samikshya-db
Copy link
Contributor Author

Thanks for your review @nikhilsuri-db

  1. Ran it now, https://github.com/databricks/databricks-sql-python/actions/runs/19883810489 : the run passes.
  2. I'll fix it in a later PR if that is ok?

@samikshya-db samikshya-db marked this pull request as ready for review December 3, 2025 07:03
@samikshya-db samikshya-db merged commit ebe4b07 into main Dec 3, 2025
29 of 32 checks passed
@samikshya-db samikshya-db deleted the fix-telemetry-host-level-batching branch December 3, 2025 15:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants