Skip to content

Client modifications needed for the local cache#3175

Open
bbockelm wants to merge 2 commits intoPelicanPlatform:mainfrom
bbockelm:http_client_improvements
Open

Client modifications needed for the local cache#3175
bbockelm wants to merge 2 commits intoPelicanPlatform:mainfrom
bbockelm:http_client_improvements

Conversation

@bbockelm
Copy link
Collaborator

The refactored local cache work uses the client for all its "heavy lifting"; the goal is to keep all download, upload, and stat work centralized into a single location so any bugs (or improvements) only need to be written once.

Accordingly, we really had to "bulk up" some of the functionality of the client to make it work better as a library and not just a download/upload tool.

The two potentially-breaking changes in this work are:

  • Director responses are cached by the transfer engine in a TTLCache. Previously, each separate item transferred incurred the cost of a director lookup.
  • The internal "stat" function prefers to "stat" against a cache in cases where we will be downloading from the cache. If the object turns out to be a collection, it'll redo the stat with the collections URL.

Other important changes include:

  • Makes the ETag and other headers evailable to the client even before the data transfer starts. This will be used by the cache to determine where / how to store the object.
  • Some ETag awareness in the download paths: if different caches provide the same object with different ETags, abort the transfer.
  • Downloads can be done for a given byte-range instead of a whole object.
  • If the PUT response includes an ETag, then this is included in the transfer results returned to the client.
  • Uploads and downloads can be done against a Reader/Writer instead of a file on the filesystem.
  • The client can be put in "cache mode", contacting origins directly even for origins that refuse direct reads from end-users. For obvious reasons, this is not exposed to the CLI.
  • The client can provide a second, "federation token", that the cache uses to provide it's not an end-user to origins that reject.
  • Token generation is now an interface, TokenProvider, in case the caller has a separate implementation of the logic.

Further, this dulls the edge on - but doesn't fix - a medium-sized design flaw in the client: both the transfer client and transfer engine try to close the results channel as if they own it. So, if you are careless about the ordering of shutdown, you can trigger a panic on close. This fixes the panic but not the underlying "shared ownership" problem.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands the Pelican client’s library capabilities to support the refactored local cache implementation by centralizing transfer/stat logic in the client and adding new features (director response caching, early metadata, ETag handling, byte-range downloads, and token-provider abstractions).

Changes:

  • Add a prefix-matching TTL cache for director responses and integrate it into transfer job setup.
  • Introduce early-transfer metadata and ETag propagation/validation in download/upload paths, plus byte-range download support.
  • Add TokenProvider (including federation-token support) and thread the new token options through client APIs, tests, and director query logic (incl. cache-embedded mode).

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
client/sharing_url.go Update director query call signature to include cache-mode flag.
client/pelican_fs.go Switch to internal director lookup + new token generator; update stat signature.
client/main_test.go Add test coverage around SciTokens acceptability logic (new scenario).
client/main.go Add ETag to FileInfo and introduce TransferMetadata struct for early headers.
client/handle_http_test.go Update downloadHTTP call sites for new signature; add tests for ETag, metadata channel, byte-range, and upload ETag.
client/handle_http.go Core implementation: director caching integration, fed token support, cache-embedded mode, ETag handling, byte-range support, metadata channel, and close panic mitigation.
client/director_test.go Update queryDirector signature in tests.
client/director.go Add cache-embedded director routing and internal getDirectorInfoForPath API.
client/dir_resp_cache_test.go New tests for TTL/prefix matching and path reconstitution behavior.
client/dir_resp_cache.go New implementation of prefix-matching director response cache with TTL + in-flight coalescing.
client/bearer_auth_test.go Update token generator construction in tests.
client/acquire_token.go Introduce TokenProvider and static provider; adjust token acceptability parsing and token-source tracking behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

The refactored local cache work uses the client for all its
"heavy lifting"; the goal is to keep all download, upload, and stat
work centralized into a single location so any bugs (or improvements)
only need to be written once.

Accordingly, we really had to "bulk up" some of the functionality
of the client to make it work better as a library and not just a
download/upload tool.

The two potentially-breaking changes in this work are:

- Director responses are cached by the transfer engine in a TTLCache.
  Previously, each separate item transferred incurred the cost of a
  director lookup.
- The internal "stat" function prefers to "stat" against a cache in
  cases where we will be downloading from the cache.  If the object
  turns out to be a collection, it'll redo the stat with the collections
  URL.

Other important changes include:

- Makes the `ETag` and other headers evailable to the client even before
  the data transfer starts.  This will be used by the cache to determine
  where / how to store the object.
- Some `ETag` awareness in the download paths: if different caches provide
  the same object with different `ETag`s, abort the transfer.
- Downloads can be done for a given byte-range instead of a whole object.
- If the PUT response includes an ETag, then this is included in the
  transfer results returned to the client.
- Uploads and downloads can be done against a Reader/Writer instead of
  a file on the filesystem.
- The client can be put in "cache mode", contacting origins directly
  even for origins that refuse direct reads from end-users.  For
  obvious reasons, this is not exposed to the CLI.
- The client can provide a second, "federation token", that the cache
  uses to provide it's not an end-user to origins that reject.
- Token generation is now an interface, `TokenProvider`, in case the
  caller has a separate implementation of the logic.

Further, this dulls the edge on - but doesn't fix - a medium-sized
design flaw in the client: both the transfer client and transfer engine
try to close the results channel as if they own it.  So, if you are
careless about the ordering of shutdown, you can trigger a panic on
close.  This fixes the panic but not the underlying "shared ownership"
problem.
@bbockelm bbockelm force-pushed the http_client_improvements branch from edebd7f to 8fe6c81 Compare February 28, 2026 13:45
@bbockelm bbockelm linked an issue Feb 28, 2026 that may be closed by this pull request
@bbockelm bbockelm added enhancement New feature or request internal Internal code improvements, not user-facing client Issue affecting the OSDF client labels Feb 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

client Issue affecting the OSDF client enhancement New feature or request internal Internal code improvements, not user-facing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

(Local cache rewrite preparation)

2 participants