Skip to content

Add bulkFetch API for synchronous multi-document retrieval#745

Merged
stevevanhooser merged 3 commits intomainfrom
claude/add-bulk-fetch-endpoint-um3gD
Apr 16, 2026
Merged

Add bulkFetch API for synchronous multi-document retrieval#745
stevevanhooser merged 3 commits intomainfrom
claude/add-bulk-fetch-endpoint-um3gD

Conversation

@stevevanhooser
Copy link
Copy Markdown
Contributor

Summary

This PR adds a new bulkFetch API endpoint to enable synchronous fetching of multiple documents by ID from the NDI Cloud in a single call. This complements the existing asynchronous bulk download pipeline and is optimized for small document sets.

Key Changes

  • New API Implementation: Added ndi.cloud.api.implementation.documents.BulkFetch class that handles the HTTP POST request to the bulk fetch endpoint with proper authentication and error handling
  • User-Facing Wrapper: Added ndi.cloud.api.documents.bulkFetch() function that provides a clean interface for calling the bulk fetch API
  • URL Endpoint: Updated ndi.cloud.api.url() to include the new bulk_fetch_documents endpoint mapping
  • Comprehensive Tests: Added four test cases covering:
    • testBulkFetchRoundTrip: Validates successful bulk fetch of multiple documents with correct content
    • testBulkFetchSilentlyOmitsUnknownIDs: Verifies that nonexistent document IDs are silently omitted (not an error)
    • testBulkFetchRejectsMalformedIDs: Ensures malformed IDs are rejected with HTTP 400
    • testBulkFetchRejectsEmptyList: Confirms empty document ID arrays are rejected with HTTP 400

Implementation Details

  • Supports fetching up to 500 documents per call
  • Requires all document IDs to be 24-character hex strings
  • Returns documents with full data payload (id, ndiId, name, className, datasetId, data)
  • Silently omits documents that don't exist, are soft-deleted, or don't belong to the dataset
  • Order of returned documents is not guaranteed to match request order
  • Includes proper JSON encoding workaround for scalar string arrays to ensure correct API request format

https://claude.ai/code/session_014buyoUmSXQ8HyGZgzNQ8hZ

Thin MATLAB client for the new POST /datasets/:id/documents/bulk-fetch
cloud endpoint: a synchronous companion to the async bulk-download
pipeline that fetches up to 500 documents (with full data) in a single
call. Intended for small sets (e.g. a subset of IDs returned by
/ndiquery) where the async zip/S3 round-trip would be overkill.

Adds:
  - ndi.cloud.api.documents.bulkFetch (user-facing wrapper)
  - ndi.cloud.api.implementation.documents.BulkFetch (implementation
    class, following the same pattern as BulkDeleteDocuments and
    GetBulkDownloadURL, including the scalar-string duplication
    workaround for JSON array encoding).
  - new 'bulk_fetch_documents' entry in ndi.cloud.api.url.
  - four tests appended to ndi.unittest.cloud.DocumentsTest:
      * round-trip fetch of N documents, verifying IDs/names
        round-trip correctly (order-insensitive).
      * mixing a real ID with a bogus-but-syntactically-valid 24-hex
        ID is silently omitted, not an error.
      * a malformed (non-hex) ID in the list is rejected with HTTP 400.
      * an empty documentIds array is rejected with HTTP 400.

% Initialize outputs
b = false;
answer = [];
@stevevanhooser stevevanhooser merged commit 3d08f9d into main Apr 16, 2026
1 of 2 checks passed
@stevevanhooser stevevanhooser deleted the claude/add-bulk-fetch-endpoint-um3gD branch April 16, 2026 00:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants