Skip to content

Use curl as optional client #77

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 49 commits into
base: main
Choose a base branch
from

Conversation

carlopi
Copy link
Collaborator

@carlopi carlopi commented Jul 3, 2025

This integrates #58 AND #76, since they have some interactions that needed some care.

After duckdb/duckdb#18107 landed in duckdb/duckdb, and moving the duckdb submodule to a recent commit on v1.3-ossivalis, this PR allows to switch at runtime based on the newly added httpfs config option httpfs_client_implementation:

D SET logging_storage=stdout;
D PRAGMA enable_logging('HTTP');
D SET httpfs_client_implementation='default';
D select count(*)
  FROM 'https://github.com/duckdb/duckdb-data/releases/download/v1.0/event_baserunning_advance_attempt.parquet';
[LOG] 2025-07-03 10:06:18.479, HTTP, DEBUG, {'request': {'type': HEAD, 'url': 'https://github.com/duckdb/duckdb-data/releases/download/v1.0/event_baserunning_advance_attempt.parquet', 'headers': {}}, 'response': {'status': OK_200, 'reason': OK, 'headers': {X-Timer='S1751537178.169255,VS0,VE1', x-ms-request-id=91401b99-a01e-000c-7caa-dd765d000000, X-Served-By='cache-iad-kcgs7200132-IAD, cache-rtm-ehrd2290029-RTM', Fastly-Restarts=1, x-ms-lease-status=unlocked, x-ms-creation-time='Tue, 17 Jan 2023 21:28:40 GMT', x-ms-blob-type=BlockBlob, x-ms-blob-content-md5='0OgVgULsCWfa1mU4BlFEbg==', X-Cache-Hits='3730, 1', Via='1.1 varnish, 1.1 varnish', Server=Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0, x-ms-version=2025-05-05, X-Cache='HIT, HIT', ETag='"0x8DAF8D1CD43CA79"', Connection=keep-alive, x-ms-lease-state=available, Last-Modified='Tue, 17 Jan 2023 21:28:40 GMT', Date='Thu, 03 Jul 2025 10:06:18 GMT', Content-Length=21916382, Content-Type=application/octet-stream, Content-Disposition='attachment; filename=event_baserunning_advance_attempt.parquet', x-ms-server-encrypted=true, Age=806, Accept-Ranges=bytes}}}, CONNECTION, 2, 11, NULL
┌────────────────┐
│  count_star()  │
│     int64      │
├────────────────┤
│    9388600     │
│ (9.39 million) │
└────────────────┘
D SET httpfs_client_implementation='curl';
D select count(*)
  FROM 'https://github.com/duckdb/duckdb-data/releases/download/v1.0/event_baserunning_advance_attempt.parquet';
[LOG] 2025-07-03 10:06:30.247, HTTP, DEBUG, {'request': {'type': HEAD, 'url': 'https://github.com/duckdb/duckdb-data/releases/download/v1.0/event_baserunning_advance_attempt.parquet', 'headers': {}}, 'response': {'status': OK_200, 'reason': '', 'headers': {content-type=application/octet-stream, x-ms-lease-state=available, last-modified='Tue, 17 Jan 2023 21:28:40 GMT', x-ms-request-id=91401b99-a01e-000c-7caa-dd765d000000, accept-ranges=bytes, x-ms-version=2025-05-05, server=Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0, x-ms-creation-time='Tue, 17 Jan 2023 21:28:40 GMT', x-ms-blob-content-md5='0OgVgULsCWfa1mU4BlFEbg==', x-cache='HIT, HIT', __RESPONSE_STATUS__='HTTP/2 200 ', etag='"0x8DAF8D1CD43CA79"', x-ms-blob-type=BlockBlob, x-ms-server-encrypted=true, age=818, x-ms-lease-status=unlocked, x-served-by='cache-iad-kcgs7200132-IAD, cache-rtm-ehrd2290021-RTM', fastly-restarts=1, via='1.1 varnish, 1.1 varnish', date='Thu, 03 Jul 2025 10:06:30 GMT', x-cache-hits='3730, 1', content-disposition='attachment; filename=event_baserunning_advance_attempt.parquet', x-timer='S1751537190.940711,VS0,VE1', content-length=21916382}}}, CONNECTION, 2, 13, NULL
┌────────────────┐
│  count_star()  │
│     int64      │
├────────────────┤
│    9388600     │
│ (9.39 million) │
└────────────────┘
D SET httpfs_client_implementation='httplib';
D select count(*)
  FROM 'https://github.com/duckdb/duckdb-data/releases/download/v1.0/event_baserunning_advance_attempt.parquet';
[LOG] 2025-07-03 10:07:45.552, HTTP, DEBUG, {'request': {'type': HEAD, 'url': 'https://github.com/duckdb/duckdb-data/releases/download/v1.0/event_baserunning_advance_attempt.parquet', 'headers': {}}, 'response': {'status': OK_200, 'reason': OK, 'headers': {X-Timer='S1751537265.144944,VS0,VE0', x-ms-request-id=91401b99-a01e-000c-7caa-dd765d000000, X-Served-By='cache-iad-kcgs7200132-IAD, cache-rtm-ehrd2290047-RTM', Fastly-Restarts=1, x-ms-lease-status=unlocked, x-ms-creation-time='Tue, 17 Jan 2023 21:28:40 GMT', x-ms-blob-type=BlockBlob, x-ms-blob-content-md5='0OgVgULsCWfa1mU4BlFEbg==', X-Cache-Hits='3730, 0', Via='1.1 varnish, 1.1 varnish', Server=Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0, x-ms-version=2025-05-05, X-Cache='HIT, HIT', ETag='"0x8DAF8D1CD43CA79"', Connection=keep-alive, x-ms-lease-state=available, Last-Modified='Tue, 17 Jan 2023 21:28:40 GMT', Date='Thu, 03 Jul 2025 10:07:45 GMT', Content-Length=21916382, Content-Type=application/octet-stream, Content-Disposition='attachment; filename=event_baserunning_advance_attempt.parquet', x-ms-server-encrypted=true, Age=893, Accept-Ranges=bytes}}}, CONNECTION, 2, 15, NULL
┌────────────────┐
│  count_star()  │
│     int64      │
├────────────────┤
│    9388600     │
│ (9.39 million) │
└────────────────┘
D SET httpfs_client_implementation='something_else';
Invalid Input Error:
Unsupported option for httpfs_client_implementation, only `curl`, `httplib` and `default` are currently supported

It can be checked from the headers that slightly different implementations are used, given for example different styling for Etag vs etag or similar implementation details.

Please check original PR from @Tmonster that all relevant details: #58, this PR only adds a setting and resolve conflict with ongoing work.
Probably best path is cherry-picking commit back into original PR, or anyhow to be discussed on a side.

@carlopi
Copy link
Collaborator Author

carlopi commented Jul 3, 2025

As a note, I did give this some minor testing, I found a difference in behaviour while doing:

FORCE INSTALL non_existing_extension;

that would return an empty std::exception instead of the relevant error message.

Also, currently switching the setting is basically untested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants