Skip to content

Use curl as optional client #86

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 48 commits into
base: main
Choose a base branch
from

Conversation

Tmonster
Copy link
Contributor

Picking up #77 to merge #58 and add a way to optionally set the client_implementation.

This integrates AND #76, since they have some interactions that needed some care.

After duckdb/duckdb#18107 landed in duckdb/duckdb, and moving the duckdb submodule to a recent commit on v1.3-ossivalis, this PR allows to switch at runtime based on the newly added httpfs config option httpfs_client_implementation:

D SET logging_storage=stdout;
D PRAGMA enable_logging('HTTP');
D SET httpfs_client_implementation='default';
D select count(*)
  FROM 'https://github.com/duckdb/duckdb-data/releases/download/v1.0/event_baserunning_advance_attempt.parquet';
[LOG] 2025-07-03 10:06:18.479, HTTP, DEBUG, {'request': {'type': HEAD, 'url': 'https://github.com/duckdb/duckdb-data/releases/download/v1.0/event_baserunning_advance_attempt.parquet', 'headers': {}}, 'response': {'status': OK_200, 'reason': OK, 'headers': {X-Timer='S1751537178.169255,VS0,VE1', x-ms-request-id=91401b99-a01e-000c-7caa-dd765d000000, X-Served-By='cache-iad-kcgs7200132-IAD, cache-rtm-ehrd2290029-RTM', Fastly-Restarts=1, x-ms-lease-status=unlocked, x-ms-creation-time='Tue, 17 Jan 2023 21:28:40 GMT', x-ms-blob-type=BlockBlob, x-ms-blob-content-md5='0OgVgULsCWfa1mU4BlFEbg==', X-Cache-Hits='3730, 1', Via='1.1 varnish, 1.1 varnish', Server=Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0, x-ms-version=2025-05-05, X-Cache='HIT, HIT', ETag='"0x8DAF8D1CD43CA79"', Connection=keep-alive, x-ms-lease-state=available, Last-Modified='Tue, 17 Jan 2023 21:28:40 GMT', Date='Thu, 03 Jul 2025 10:06:18 GMT', Content-Length=21916382, Content-Type=application/octet-stream, Content-Disposition='attachment; filename=event_baserunning_advance_attempt.parquet', x-ms-server-encrypted=true, Age=806, Accept-Ranges=bytes}}}, CONNECTION, 2, 11, NULL
┌────────────────┐
│  count_star()  │
│     int64      │
├────────────────┤
│    9388600     │
│ (9.39 million) │
└────────────────┘
D SET httpfs_client_implementation='curl';
D select count(*)
  FROM 'https://github.com/duckdb/duckdb-data/releases/download/v1.0/event_baserunning_advance_attempt.parquet';
[LOG] 2025-07-03 10:06:30.247, HTTP, DEBUG, {'request': {'type': HEAD, 'url': 'https://github.com/duckdb/duckdb-data/releases/download/v1.0/event_baserunning_advance_attempt.parquet', 'headers': {}}, 'response': {'status': OK_200, 'reason': '', 'headers': {content-type=application/octet-stream, x-ms-lease-state=available, last-modified='Tue, 17 Jan 2023 21:28:40 GMT', x-ms-request-id=91401b99-a01e-000c-7caa-dd765d000000, accept-ranges=bytes, x-ms-version=2025-05-05, server=Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0, x-ms-creation-time='Tue, 17 Jan 2023 21:28:40 GMT', x-ms-blob-content-md5='0OgVgULsCWfa1mU4BlFEbg==', x-cache='HIT, HIT', __RESPONSE_STATUS__='HTTP/2 200 ', etag='"0x8DAF8D1CD43CA79"', x-ms-blob-type=BlockBlob, x-ms-server-encrypted=true, age=818, x-ms-lease-status=unlocked, x-served-by='cache-iad-kcgs7200132-IAD, cache-rtm-ehrd2290021-RTM', fastly-restarts=1, via='1.1 varnish, 1.1 varnish', date='Thu, 03 Jul 2025 10:06:30 GMT', x-cache-hits='3730, 1', content-disposition='attachment; filename=event_baserunning_advance_attempt.parquet', x-timer='S1751537190.940711,VS0,VE1', content-length=21916382}}}, CONNECTION, 2, 13, NULL
┌────────────────┐
│  count_star()  │
│     int64      │
├────────────────┤
│    9388600     │
│ (9.39 million) │
└────────────────┘
D SET httpfs_client_implementation='httplib';
D select count(*)
  FROM 'https://github.com/duckdb/duckdb-data/releases/download/v1.0/event_baserunning_advance_attempt.parquet';
[LOG] 2025-07-03 10:07:45.552, HTTP, DEBUG, {'request': {'type': HEAD, 'url': 'https://github.com/duckdb/duckdb-data/releases/download/v1.0/event_baserunning_advance_attempt.parquet', 'headers': {}}, 'response': {'status': OK_200, 'reason': OK, 'headers': {X-Timer='S1751537265.144944,VS0,VE0', x-ms-request-id=91401b99-a01e-000c-7caa-dd765d000000, X-Served-By='cache-iad-kcgs7200132-IAD, cache-rtm-ehrd2290047-RTM', Fastly-Restarts=1, x-ms-lease-status=unlocked, x-ms-creation-time='Tue, 17 Jan 2023 21:28:40 GMT', x-ms-blob-type=BlockBlob, x-ms-blob-content-md5='0OgVgULsCWfa1mU4BlFEbg==', X-Cache-Hits='3730, 0', Via='1.1 varnish, 1.1 varnish', Server=Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0, x-ms-version=2025-05-05, X-Cache='HIT, HIT', ETag='"0x8DAF8D1CD43CA79"', Connection=keep-alive, x-ms-lease-state=available, Last-Modified='Tue, 17 Jan 2023 21:28:40 GMT', Date='Thu, 03 Jul 2025 10:07:45 GMT', Content-Length=21916382, Content-Type=application/octet-stream, Content-Disposition='attachment; filename=event_baserunning_advance_attempt.parquet', x-ms-server-encrypted=true, Age=893, Accept-Ranges=bytes}}}, CONNECTION, 2, 15, NULL
┌────────────────┐
│  count_star()  │
│     int64      │
├────────────────┤
│    9388600     │
│ (9.39 million) │
└────────────────┘
D SET httpfs_client_implementation='something_else';
Invalid Input Error:
Unsupported option for httpfs_client_implementation, only `curl`, `httplib` and `default` are currently supported
It can be checked from the headers that slightly different implementations are used, given for example different styling for Etag vs etag or similar implementation details.

Please check original PR from @Tmonster that all relevant details: #58, this PR only adds a setting and resolve conflict with ongoing work.
Probably best path is cherry-picking commit back into original PR, or anyhow to be discussed on a side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants