Skip to content

Add adaptive metadata upload batch sizing#659

Open
guysmoilov wants to merge 11 commits intomainfrom
feature/adaptive-upload
Open

Add adaptive metadata upload batch sizing#659
guysmoilov wants to merge 11 commits intomainfrom
feature/adaptive-upload

Conversation

@guysmoilov
Copy link
Member

Summary

  • make metadata uploads use adaptive batch sizing instead of a fixed batch size
  • add new config knobs for minimum/initial batch size and target per-batch upload time
  • shrink batch size on failures and grow it again after successful/fast uploads
  • add datasource tests for growth and retry-with-smaller-batch behavior

Testing

  • .venv/bin/pytest tests/data_engine/test_datasource.py -q

@coderabbitai
Copy link

coderabbitai bot commented Mar 3, 2026

📝 Walkthrough

Walkthrough

Adds an adaptive batching system for metadata uploads: new config keys for min/initial/max batch sizes and target batch time, a new upload_batching module with growth/backoff heuristics, and updated upload loop and tests to use dynamic batch sizing with retry/backoff behavior.

Changes

Cohort / File(s) Summary
Configuration
dagshub/common/config.py
Adds environment keys and runtime values for adaptive batching: DATAENGINE_METADATA_UPLOAD_BATCH_SIZE_MAX_KEY, DATAENGINE_METADATA_UPLOAD_BATCH_SIZE_MIN_KEY, DATAENGINE_METADATA_UPLOAD_BATCH_SIZE_INITIAL_KEY, DATAENGINE_METADATA_UPLOAD_TARGET_BATCH_TIME_SECONDS_KEY (with a backward-compatible alias), and corresponding public variables.
Adaptive batching module
dagshub/data_engine/model/metadata/upload_batching.py
New module defining AdaptiveUploadBatchConfig, normalization (from_values), growth/shrink strategies (next_batch_after_success, next_batch_after_retryable_failure), retry delay strategy, and is_retryable_metadata_upload_error.
Datasource upload logic
dagshub/data_engine/model/datasource.py
Replaces fixed-size upload loop with adaptive batching using the new config and helpers; adds per-batch timing, retry/backoff handling, last_good/last_bad tracking, and retry classification imports.
Tests
tests/data_engine/test_datasource.py
Adds helpers and multiple tests validating adaptive growth, shrinking on slow success, retry behavior on retryable errors, backoff/reset behavior, and interactions with configured min/initial/max values.

Sequence Diagram

sequenceDiagram
    participant Uploader as Upload Loop
    participant API as Upload API
    participant Evaluator as Outcome Evaluator
    participant Calculator as Batch Size Calculator

    Uploader->>Calculator: request current_batch_size
    Calculator-->>Uploader: return current_batch_size

    Uploader->>API: upload_metadata(batch)
    API-->>Uploader: success or error

    alt success
        Uploader->>Evaluator: report elapsed_time
        alt elapsed_time < target_batch_time
            Evaluator-->>Calculator: mark_good_batch
            Calculator-->>Uploader: increase_batch_size
        else elapsed_time >= target_batch_time
            Evaluator-->>Calculator: mark_bad_batch
            Calculator-->>Uploader: decrease_batch_size
        end
        Uploader->>Uploader: continue with next batch
    else error
        Uploader->>Evaluator: classify_error
        alt error retryable and batch_size > min
            Evaluator-->>Calculator: request_smaller_batch
            Calculator-->>Uploader: provide_smaller_batch_size
            Uploader->>API: retry_with_smaller_batch (with backoff)
        else
            Uploader-->>Uploader: abort and surface error
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I start my batches meek and small,
I time each hop and heed each fall.
When hops are swift I stretch my stride,
A stumble makes me step aside.
Metadata hops, adaptive and tall.

🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely describes the main change: introducing adaptive batch sizing for metadata uploads instead of fixed sizing.
Description check ✅ Passed The description is directly related to the changeset, providing clear details about adaptive batch sizing implementation, configuration changes, and testing approach.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/adaptive-upload

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/data_engine/test_datasource.py (1)

150-188: Add one test for the slow-success downshift path.

These tests validate grow-on-success and shrink-on-exception, but they do not cover the elapsed > target_batch_time branch in dagshub/data_engine/model/datasource.py (Lines 850-855). A dedicated test here would protect the time-based adaptation behavior from regressions.


ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 65e9f6a and 84632f2.

📒 Files selected for processing (3)
  • dagshub/common/config.py
  • dagshub/data_engine/model/datasource.py
  • tests/data_engine/test_datasource.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: build (3.12)
  • GitHub Check: build (3.13)
  • GitHub Check: build (3.11)
  • GitHub Check: build (3.10)
  • GitHub Check: build (3.9)
🧰 Additional context used
🧬 Code graph analysis (3)
dagshub/common/config.py (2)
dagshub/data_engine/model/datapoint.py (1)
  • get (325-334)
dagshub/data_engine/client/loaders/base.py (1)
  • get (102-124)
tests/data_engine/test_datasource.py (1)
dagshub/data_engine/model/datasource.py (3)
  • source (180-181)
  • update_metadata (1822-1857)
  • _upload_metadata (751-859)
dagshub/data_engine/model/datasource.py (4)
dagshub/data_engine/model/query_result.py (2)
  • entries (110-114)
  • entries (117-119)
dagshub/data_engine/client/data_client.py (1)
  • update_metadata (228-248)
dagshub/data_engine/client/gql_mutations.py (1)
  • update_metadata (32-47)
dagshub/data_engine/voxel_plugin_server/routes/datasource.py (1)
  • update_metadata (32-45)
🪛 Ruff (0.15.2)
tests/data_engine/test_datasource.py

[warning] 180-180: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (3)
dagshub/common/config.py (1)

63-74: Config knobs are integrated cleanly.

The new min/initial/target metadata batch settings are wired consistently for adaptive upload behavior.

tests/data_engine/test_datasource.py (1)

11-24: Test scaffolding looks good.

The direct config import plus _uploaded_batch_sizes helper make the adaptive-batching assertions clear and robust.

dagshub/data_engine/model/datasource.py (1)

758-805: Adaptive sizing setup is well-structured.

The min/initial/max normalization and bounded growth/shrink helpers are clean and easy to reason about.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates Data Engine metadata uploads to use an adaptive batch size strategy (instead of a fixed batch size), with new configuration knobs and tests validating the growth and failure-retry behavior.

Changes:

  • Added adaptive batch sizing logic to Datasource._upload_metadata, growing/shrinking based on success/failure and per-batch elapsed time.
  • Introduced new config/env knobs for min/initial batch size and a target upload time per batch.
  • Added tests covering batch-size growth and retry-with-smaller-batch on failure.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
dagshub/data_engine/model/datasource.py Implements adaptive upload batching with growth/shrink heuristics and time-based targeting.
dagshub/common/config.py Adds config variables/env keys for min/initial batch sizes and target batch time.
tests/data_engine/test_datasource.py Adds tests asserting batch growth and failure retry behavior with reduced batch sizes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
tests/data_engine/test_datasource.py (2)

155-158: Factor repeated config patch setup into a helper/fixture.

The same patch block is repeated in multiple tests, which makes future tuning harder. A small fixture (e.g., set_upload_batch_config(...)) would reduce drift.

Also applies to: 170-174, 195-199, 211-214, 228-231


199-199: Make monotonic mocking less brittle.

Line 199 hardcodes exactly four time.monotonic() values; minor internal instrumentation changes can break this test without behavior changes. Consider a generator/helper that safely serves extra values.


ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ded5007 and 567c830.

📒 Files selected for processing (2)
  • dagshub/data_engine/model/datasource.py
  • tests/data_engine/test_datasource.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: build (3.9)
  • GitHub Check: build (3.11)
  • GitHub Check: build (3.13)
  • GitHub Check: build (3.10)
  • GitHub Check: build (3.12)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/data_engine/test_datasource.py (4)
tests/data_engine/conftest.py (1)
  • ds (18-19)
dagshub/data_engine/client/data_client.py (1)
  • update_metadata (228-248)
dagshub/data_engine/model/metadata/dtypes.py (1)
  • DatapointMetadataUpdateEntry (23-31)
dagshub/data_engine/dtypes.py (1)
  • MetadataFieldType (20-36)
🪛 Ruff (0.15.2)
tests/data_engine/test_datasource.py

[warning] 180-180: Avoid specifying long messages outside the exception class

(TRY003)


[warning] 238-238: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (2)
tests/data_engine/test_datasource.py (1)

150-246: Great adaptive-batching test coverage.

These cases validate the key behavior transitions (grow, shrink-on-failure, slow-success downshift, retryability classification, and partial-batch edge handling) and are well-targeted.

dagshub/data_engine/model/datasource.py (1)

807-821: Retryability classification is a solid improvement.

The explicit non-retryable fail-fast path and narrowed retryable exception handling reduce noisy retries and surface real failures sooner.

Also applies to: 847-850

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/data_engine/test_datasource.py (1)

223-246: Add a regression test for failure at exact min_batch_size.

Current coverage validates partial-batch-below-min retries, but it doesn’t assert behavior when a retryable error occurs at exactly the configured minimum. Please add that case so min-floor behavior is locked in.


ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 567c830 and 4a767ab.

📒 Files selected for processing (2)
  • dagshub/data_engine/model/datasource.py
  • tests/data_engine/test_datasource.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: build (3.9)
  • GitHub Check: build (3.11)
  • GitHub Check: build (3.12)
  • GitHub Check: build (3.10)
  • GitHub Check: build (3.13)
🧰 Additional context used
🧬 Code graph analysis (1)
dagshub/data_engine/model/datasource.py (1)
dagshub/data_engine/model/errors.py (1)
  • DataEngineGqlError (35-45)
🪛 Ruff (0.15.2)
tests/data_engine/test_datasource.py

[warning] 180-180: Avoid specifying long messages outside the exception class

(TRY003)


[warning] 238-238: Avoid specifying long messages outside the exception class

(TRY003)


[warning] 264-264: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (2)
dagshub/data_engine/model/datasource.py (1)

807-821: Retryable error classification + bounded backoff integration looks solid.

Good call separating retryable/non-retryable failures and resetting backoff after successful uploads.

Also applies to: 860-863, 875-875

tests/data_engine/test_datasource.py (1)

150-271: Great adaptive batching test coverage expansion.

The added scenarios (growth, retry downsizing, slow-batch reduction, non-retryable abort, and backoff reset) materially improve confidence in the new upload loop.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
dagshub/data_engine/model/datasource.py (1)

860-862: Backoff cap is effectively 4s, not 5s.

The current exponent clamp prevents the expression from ever reaching the configured 5s ceiling. Consider adjusting the exponent cap so the 5s max is reachable.

Patch
-                    retry_delay_sec = min(5.0, 0.25 * (2 ** min(consecutive_retryable_failures - 1, 4)))
+                    retry_delay_sec = min(5.0, 0.25 * (2 ** min(consecutive_retryable_failures - 1, 5)))
tests/data_engine/test_datasource.py (1)

165-188: Avoid real backoff sleeps in retry tests.

These retry-path tests can incur actual sleep delays, which slows the suite. Mocking time.sleep here keeps tests fast and deterministic.

Patch
 def test_upload_metadata_retries_with_smaller_batch_after_failure(ds, mocker):
+    mocker.patch("dagshub.data_engine.model.datasource.time.sleep")
     entries = [
         DatapointMetadataUpdateEntry(f"dp-{i}", "field", str(i), MetadataFieldType.INTEGER) for i in range(10)
     ]
@@
 def test_upload_metadata_retries_partial_batch_below_min(ds, mocker):
+    mocker.patch("dagshub.data_engine.model.datasource.time.sleep")
     entries = [
         DatapointMetadataUpdateEntry(f"dp-{i}", "field", str(i), MetadataFieldType.INTEGER) for i in range(10)
     ]

Also applies to: 223-246


ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4a767ab and 0f5e9d7.

📒 Files selected for processing (2)
  • dagshub/data_engine/model/datasource.py
  • tests/data_engine/test_datasource.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: build (3.12)
  • GitHub Check: build (3.11)
  • GitHub Check: build (3.10)
  • GitHub Check: build (3.9)
  • GitHub Check: build (3.13)
🧰 Additional context used
🧬 Code graph analysis (1)
dagshub/data_engine/model/datasource.py (3)
dagshub/data_engine/model/errors.py (1)
  • DataEngineGqlError (35-45)
dagshub/data_engine/client/data_client.py (1)
  • update_metadata (228-248)
dagshub/data_engine/client/gql_mutations.py (1)
  • update_metadata (32-47)
🪛 Ruff (0.15.2)
tests/data_engine/test_datasource.py

[warning] 180-180: Avoid specifying long messages outside the exception class

(TRY003)


[warning] 238-238: Avoid specifying long messages outside the exception class

(TRY003)


[warning] 264-264: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (2)
dagshub/data_engine/model/datasource.py (1)

807-821: Good retryability split and failure handling path.

The non-retryable fast-fail branch plus targeted retryable exception filtering is a strong reliability improvement here.

Also applies to: 848-872

tests/data_engine/test_datasource.py (1)

150-290: Excellent adaptive-batching test coverage.

These scenarios exercise core growth/shrink behavior, retry classification, min-bound handling, and backoff reset semantics well.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@guysmoilov guysmoilov requested a review from Copilot March 3, 2026 23:00
@guysmoilov guysmoilov self-assigned this Mar 3, 2026
@guysmoilov guysmoilov added the enhancement New feature or request label Mar 3, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@guysmoilov guysmoilov added the auto-review-done Review loop by AI finished label Mar 4, 2026
next_batch_size = batch_size + max(1, (bad_batch_size - batch_size) // 2)
else:
next_batch_size = batch_size * 2

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will take at least 6 upload batches before things scale up from 150 (~2**7) to 15k (~2**13)
Considering how every batch will be a "checkpoint" in the metadata history, I think going for a "ask for forgiveness rather than permission" approach is better, and scaling up much faster (say, x10), and then scaling down dividing by two, will get us quicker to a good bound.

However, I honestly am thinking that this should be implemented in a way that actually checks the byte ammount that's being sent over, instead of number of metadata values being uploaded.
We're always hitting problems whenever users upload blobs, so if someone puts a megabyte blob in 150 files, things will still break.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
dagshub/common/config.py (1)

61-67: Clarify the max batch variable name for symmetry and readability.

dataengine_metadata_upload_batch_size now represents the max bound. Adding an explicit ..._max variable (while keeping the old name as alias) will make config intent much clearer.

Proposed refactor
 DATAENGINE_METADATA_UPLOAD_BATCH_SIZE_MAX_KEY = "DAGSHUB_DE_METADATA_UPLOAD_BATCH_SIZE_MAX"
-dataengine_metadata_upload_batch_size = int(
+dataengine_metadata_upload_batch_size_max = int(
     os.environ.get(
         DATAENGINE_METADATA_UPLOAD_BATCH_SIZE_MAX_KEY,
         os.environ.get(DATAENGINE_METADATA_UPLOAD_BATCH_SIZE_KEY, 15000),
     )
 )
+# Backwards compatibility alias
+dataengine_metadata_upload_batch_size = dataengine_metadata_upload_batch_size_max

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 515750c5-a1a0-4151-85b8-d61f5361d99e

📥 Commits

Reviewing files that changed from the base of the PR and between 0f5e9d7 and 6ae0a58.

📒 Files selected for processing (4)
  • dagshub/common/config.py
  • dagshub/data_engine/model/datasource.py
  • dagshub/data_engine/model/metadata/upload_batching.py
  • tests/data_engine/test_datasource.py
📜 Review details
🧰 Additional context used
🧬 Code graph analysis (3)
dagshub/data_engine/model/metadata/upload_batching.py (1)
dagshub/data_engine/model/errors.py (1)
  • DataEngineGqlError (35-45)
tests/data_engine/test_datasource.py (3)
tests/data_engine/conftest.py (1)
  • ds (18-19)
dagshub/data_engine/client/data_client.py (1)
  • update_metadata (228-248)
dagshub/data_engine/dtypes.py (1)
  • MetadataFieldType (20-36)
dagshub/data_engine/model/datasource.py (1)
dagshub/data_engine/model/metadata/upload_batching.py (6)
  • AdaptiveUploadBatchConfig (25-51)
  • get_retry_delay_seconds (114-116)
  • is_retryable_metadata_upload_error (97-111)
  • next_batch_after_retryable_failure (78-94)
  • next_batch_after_success (58-75)
  • from_values (32-51)
🪛 Ruff (0.15.2)
tests/data_engine/test_datasource.py

[warning] 181-181: Avoid specifying long messages outside the exception class

(TRY003)


[warning] 207-207: Avoid specifying long messages outside the exception class

(TRY003)


[warning] 266-266: Avoid specifying long messages outside the exception class

(TRY003)


[warning] 292-292: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (3)
dagshub/data_engine/model/datasource.py (2)

56-62: Nice modularization of adaptive upload policy.

Moving batching/retry policy into dagshub.data_engine.model.metadata.upload_batching keeps _upload_metadata focused and easier to maintain.


775-775: ⚠️ Potential issue | 🟡 Minor

Progress label is inaccurate when retries go below configured minimum.

Line 775 advertises min-max, but retry logic can still drop below min (down to 1). Please adjust wording so users aren’t misled during uploads.

Suggested text tweak
-            f"Uploading metadata (adaptive batch {batch_config.min_batch_size}-{batch_config.max_batch_size})...",
+            (
+                "Uploading metadata "
+                f"(adaptive target {batch_config.min_batch_size}-{batch_config.max_batch_size}, "
+                "retries may go lower)..."
+            ),
			> Likely an incorrect or invalid review comment.
tests/data_engine/test_datasource.py (1)

150-318: Great coverage for adaptive upload behavior.

These tests exercise growth, retry shrink, known-bad-size avoidance, non-retryable aborts, and backoff reset paths well.

Comment on lines +97 to +100
def is_retryable_metadata_upload_error(exc: Exception) -> bool:
if isinstance(exc, DataEngineGqlError):
return isinstance(exc.original_exception, (TransportServerError, TransportConnectionFailed))

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Wrapped retryable errors can be misclassified and skip retries.

When exc is DataEngineGqlError, only two wrapped types are checked. If original_exception is another retryable type already recognized below (e.g., timeout/connection), upload aborts instead of retrying.

Proposed fix
 def is_retryable_metadata_upload_error(exc: Exception) -> bool:
     if isinstance(exc, DataEngineGqlError):
-        return isinstance(exc.original_exception, (TransportServerError, TransportConnectionFailed))
+        original_exception = exc.original_exception
+        return isinstance(original_exception, Exception) and is_retryable_metadata_upload_error(original_exception)

     return isinstance(
         exc,
         (
             TransportServerError,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-review-done Review loop by AI finished enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants