Skip to content

(im) support im oapi range download large file#283

Open
chenxingtong-bytedance wants to merge 1 commit intolarksuite:mainfrom
chenxingtong-bytedance:feat/support_download_large_file
Open

(im) support im oapi range download large file#283
chenxingtong-bytedance wants to merge 1 commit intolarksuite:mainfrom
chenxingtong-bytedance:feat/support_download_large_file

Conversation

@chenxingtong-bytedance
Copy link
Copy Markdown

@chenxingtong-bytedance chenxingtong-bytedance commented Apr 7, 2026

Change-Id: I38e6f6f9cf8b8711dc40650d19c77503f4e44989

Summary

Changes

  • Change 1
  • Change 2

Test Plan

  • Unit tests pass
  • Manual local verification confirms the lark xxx command works as expected

Related Issues

  • None

Summary by CodeRabbit

  • New Features

    • Automatic chunked downloading for large files: initial probe request + 8MB chunks, and auto-append of file extension from Content-Type when missing.
  • Bug Fixes

    • More robust downloads with per-chunk retries and exponential backoff, standardized HTTP error handling, and integrity checks to detect file size mismatches and malformed Content-Range.
  • Tests

    • Expanded unit tests for range requests, retry scenarios, Content-Range parsing, and download recovery cases.
  • Documentation

    • Updated shortcut and skill docs to describe chunked downloads, probe behavior, auto-extension, retry strategy, and new error guidance.

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@github-actions github-actions bot added domain/im PR touches the im domain size/L Large or sensitive change across domains or core paths labels Apr 7, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 7, 2026

📝 Walkthrough

Walkthrough

Refactors IM resource download to use an initial ranged probe and subsequent chunked HTTP Range downloads with per-request retries/backoff, temp-file write+rename finalization, Content-Range parsing/validation, plus a header-merge change in shortcut streaming requests and expanded tests/docs.

Changes

Cohort / File(s) Summary
API Header Handling
shortcuts/common/runner.go
Apply shortcut-derived headers by building a separate option and appending each header entry into the main request headers instead of mutating the local option in-place.
IM Download Implementation
shortcuts/im/im_messages_resources_download.go
Switch from single-pass atomic write to probe (Range=0-(probe-1)) + chunked range downloads, parse/validate Content-Range, build chunk tasks, write chunks into a temp file with sync/close/rename, and add per-request retries/backoff and standardized HTTP error wrapping.
IM Download Tests
shortcuts/im/helpers_network_test.go
Add tests asserting initial probe Range header, range sequence coverage, retry behaviors (network errors, eventual success, context cancel), invalid Content-Range handling, and include a local parseRangeHeader test helper.
Content-Range Parsing Tests
shortcuts/im/helpers_test.go
Add unit tests for parseTotalSize covering valid Content-Range values and multiple invalid cases (empty, wrong prefix, missing/wildcard/non-numeric totals).
Docs / Skill Description
skills/lark-im/SKILL.md, skills/lark-im/references/lark-im-messages-resources-download.md
Document automatic chunked downloading for large files (probe size, 8MB chunks, single-worker sequential download), automatic output extension from Content-Type when needed, retry/backoff behavior, and new error entries (file size mismatch, invalid Content-Range).

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant Runner as Shortcut Runner
    participant Remote as Remote Server
    participant FS as File System

    Client->>Runner: Request resource download
    Runner->>Remote: Probe GET with Range: bytes=0-(probe-1)
    Note over Runner,Remote: Probe may be retried with backoff
    Remote-->>Runner: 200 OK or 206 Partial Content + Content-Range
    alt 200 OK
        Runner->>FS: Stream body -> temp file
    else 206 Partial Content
        Runner->>Runner: Parse Content-Range -> total size
        Runner->>FS: Create temp file, write probe bytes at offset 0
        loop For each remaining chunk
            Runner->>Remote: GET chunk with Range header
            Note over Runner,Remote: Per-chunk retries with backoff
            Remote-->>Runner: 206 Partial Content
            Runner->>FS: Write chunk at offset
        end
        Runner->>Runner: Validate temp file size == total
    end
    Runner->>FS: Sync, Close, Rename temp -> final path
    Runner-->>Client: Return success or error
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested reviewers

  • liangshuo-1

Poem

🐰 I nibble bytes by probe and chunk,

hopping retries when networks dunk.
Temp files tidy, ranges align,
headers stitched and downloads fine.
A rabbit’s cheer for chunked design!

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description uses the template structure but provides no substantive content—all sections contain only placeholder text or remain empty with unchecked checkboxes. Fill in the Summary with 1-3 sentences explaining motivation; replace 'Change 1/2' with actual changes; check test boxes and describe verification; provide any related issue links.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main change: adding range download support for large files in the IM module, which aligns with the substantial implementation in the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Apr 7, 2026

Greptile Summary

Adds HTTP Range-request based chunked downloading for large IM resources: a 128KB probe determines total file size via Content-Range, then 8MB chunks fill the rest using WriteAt into a temp file, with per-request retry/backoff. A companion fix in runner.go prevents the shortcut X-Cli-* headers from overwriting the Range header set by the caller.

Confidence Score: 4/5

Functionally sound; two prior P1 threads (probe byte-count not asserted, no-op semaphore) remain unresolved.

The chunked download implementation is well-structured with proper temp-file cleanup, integrity checks, and solid test coverage. The runner.go header-merge fix is correct and the new Range-header test confirms the regression is caught. Score is 4 rather than 5 because the prior-thread P1 finding — probe chunk written-byte count is silently discarded, leaving a potential sparse-file gap when the server delivers fewer bytes than requested — has not been addressed in this revision.

shortcuts/im/im_messages_resources_download.go — probe chunk write validation (line 194) and no-op semaphore (lines 199-206) flagged in prior review threads remain open.

Important Files Changed

Filename Overview
shortcuts/im/im_messages_resources_download.go New chunked download logic: 128KB probe + 8MB chunks with retries; probe written-byte count silently discarded (prior thread); no-op semaphore with downloadWorkers=1 (prior thread)
shortcuts/common/runner.go Fixes Range header being overwritten by shortcut headers by merging them additively instead of replacing
shortcuts/im/helpers_network_test.go Adds tests for range download, retries, context cancellation, and invalid Content-Range; uses md5.Sum array comparison which is valid in Go
shortcuts/im/helpers_test.go Adds parseTotalSize and normalizeDownloadOutputPath unit tests with good edge-case coverage
skills/lark-im/SKILL.md Updates description to mention large-file chunked download support
skills/lark-im/references/lark-im-messages-resources-download.md Documents chunked download, auto-extension, retry strategy, and new error types

Sequence Diagram

sequenceDiagram
    participant CLI as CLI (downloadIMResourceToPath)
    participant API as Lark IM API
    participant FS as Filesystem

    CLI->>API: GET /resources/:file_key\n(Range: bytes=0-131071)
    note over CLI,API: doIMResourceDownloadRequest\nretries up to 2× with backoff

    alt 200 OK (server ignores Range)
        API-->>CLI: 200 + full body
        CLI->>FS: io.Copy → tmpFile
    else 206 Partial Content
        API-->>CLI: 206 + Content-Range: bytes 0-N/totalSize
        CLI->>FS: WriteAt(offset=0) probe chunk
        loop each 8MB chunk [probeChunkSize..totalSize)
            CLI->>API: GET (Range: bytes=start-end)
            note over CLI,API: retries up to 2× with backoff
            API-->>CLI: 206 + chunk body
            CLI->>FS: WriteAt(start) chunk\nvalidate written == expected
        end
        CLI->>FS: Stat() → assert size == totalSize
    end

    CLI->>FS: Sync + Close\nRename tmpFile → finalPath
Loading

Reviews (3): Last reviewed commit: "(im) support im oapi range download larg..." | Re-trigger Greptile

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@shortcuts/im/im_messages_resources_download.go`:
- Around line 272-284: The retry loop can start another DoAPIStream call after
sleepIMDownloadRetry observes cancellation; update the loops that call
runtime.DoAPIStream with imDownloadRequestRetries to check ctx.Err() immediately
after sleepIMDownloadRetry and return nil, ctx.Err() if canceled so no extra
attempt runs; specifically modify the loops that use runtime.DoAPIStream,
imDownloadRequestRetries and sleepIMDownloadRetry (also the similar loop at the
later block around lines 289–297) to return on ctx.Err() before the next
iteration.
- Around line 189-215: The 206 handling currently only calls parseTotalSize and
doesn't verify the Content-Range start/end or check the number of bytes actually
written; update the logic that handles http.StatusPartialContent (and the
similar blocks referenced by downloadAndWriteChunk and the other 206 handlers)
to parse start, end, and total from downloadResp.Header.Get("Content-Range")
(not just total), verify the returned start/end exactly match the requested task
range, and ensure the number of bytes written by writeChunkAt equals
(end-start+1) before accepting the chunk; if mismatched, return a network/error
immediately. Modify or overload writeChunkAt (or the call sites) so it can
validate and return the written length for this check, and keep using
buildChunkTasks, sem, and downloadAndWriteChunk to locate locations to change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 74c0c338-e7d6-4dc7-85c6-bcb98affb66a

📥 Commits

Reviewing files that changed from the base of the PR and between 30dba35 and 5faf274.

📒 Files selected for processing (4)
  • shortcuts/common/runner.go
  • shortcuts/im/helpers_network_test.go
  • shortcuts/im/helpers_test.go
  • shortcuts/im/im_messages_resources_download.go

@chenxingtong-bytedance chenxingtong-bytedance force-pushed the feat/support_download_large_file branch from 5faf274 to ea10aa6 Compare April 7, 2026 05:27
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (3)
shortcuts/im/im_messages_resources_download.go (3)

369-393: ⚠️ Potential issue | 🟠 Major

Chunk download validates written bytes but not the response range.

downloadAndWriteChunk correctly validates that written == expected (line 390-392), which catches short reads. However, similar to the probe, it doesn't validate that the Content-Range header in the 206 response matches the requested start-end. If the server returns a different range, the data would be written at the wrong offset.

🛠️ Suggested validation
 if downloadResp.StatusCode != http.StatusPartialContent {
     return output.ErrNetwork("unexpected status code: %d", downloadResp.StatusCode)
 }
+
+// Validate Content-Range matches requested range
+respStart, respEnd, _, err := parseContentRange(downloadResp.Header.Get("Content-Range"))
+if err != nil {
+    return output.ErrNetwork("invalid Content-Range: %s", err)
+}
+if respStart != start || respEnd != end {
+    return output.ErrNetwork("response range %d-%d does not match requested %d-%d", respStart, respEnd, start, end)
+}

 written, err := writeChunkAt(file, downloadResp.Body, start)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@shortcuts/im/im_messages_resources_download.go` around lines 369 - 393, In
downloadAndWriteChunk, validate the HTTP 206 response's Content-Range header
before writing: parse downloadResp.Header.Get("Content-Range") (in function
downloadAndWriteChunk) to ensure the returned range start and end match the
requested start and end and that the reported length equals end-start+1; if
parsing fails or the range doesn't match, return an error (similar to
downloadResponseError/output.ErrNetwork) and avoid calling writeChunkAt. This
prevents writing data at the wrong offset when the server returns a different
range.

272-284: ⚠️ Potential issue | 🟡 Minor

Return immediately once backoff observes cancellation.

After sleepIMDownloadRetry() returns due to ctx.Done(), the loop still proceeds to the next iteration and calls DoAPIStream again. Add a context check after the sleep to avoid the extra attempt.

🔧 Proposed fix
 	sleepIMDownloadRetry(ctx, attempt)
+	if ctx.Err() != nil {
+		return nil, ctx.Err()
+	}
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@shortcuts/im/im_messages_resources_download.go` around lines 272 - 284, The
loop in the download retry logic (using imDownloadRequestRetries,
runtime.DoAPIStream and sleepIMDownloadRetry) can perform an extra request after
sleep if the context was canceled; add a context cancellation check immediately
after sleepIMDownloadRetry(ctx, attempt) and return nil, ctx.Err() (or propagate
ctx.Err()) when ctx.Err() != nil to avoid calling runtime.DoAPIStream again with
a canceled context.

189-216: ⚠️ Potential issue | 🟠 Major

Validate the full Content-Range before accepting a partial response.

The probe chunk handling at line 194 writes the response body to offset 0 without validating that the server actually returned bytes 0-N. If the server returns a misaligned range (e.g., bytes 1000-2000/total), the data will be written at offset 0, corrupting the file. The parseTotalSize() function only extracts the total, not the start/end values.

Consider parsing and validating the full Content-Range header to ensure the returned range matches the requested range before writing.

🛠️ Suggested approach

Create a parseContentRange function that returns start, end, and total, then validate:

start, end, total, err := parseContentRange(downloadResp.Header.Get("Content-Range"))
if err != nil {
    return "", 0, output.ErrNetwork("invalid Content-Range header: %s", err)
}
if start != 0 || end > probeChunkSize-1 {
    return "", 0, output.ErrNetwork("unexpected range in response: %d-%d", start, end)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@shortcuts/im/im_messages_resources_download.go` around lines 189 - 216, The
probe chunk handling currently calls parseTotalSize and writes downloadResp.Body
at offset 0 via writeChunkAt without validating the returned byte range; change
this to parse the full Content-Range (create/use a parseContentRange that
returns start, end, total) and validate that start == 0 and end is within the
expected probe chunk size before calling writeChunkAt on tmpFile; if validation
fails, return an output.ErrNetwork with a clear "unexpected range" message,
otherwise use the parsed total for sizeBytes and proceed to
buildChunkTasks/downloadAndWriteChunk as before.
🧹 Nitpick comments (1)
shortcuts/im/helpers_network_test.go (1)

321-354: Retry tests incur real wall-clock delays (~900ms total).

The retry logic uses sleepIMDownloadRetry which is not mocked, causing these tests to wait through actual exponential backoff delays (300ms + 600ms for 2 failed attempts). While this tests the real behavior, it slows down the test suite.

Consider extracting the sleep function as a package-level variable that can be overridden in tests, or accept the delay as acceptable for integration-style testing.

💡 Optional: Make sleep mockable

In the implementation file:

var sleepFunc = sleepIMDownloadRetry

func sleepIMDownloadRetry(ctx context.Context, attempt int) {
    // ... existing implementation
}

In tests:

func init() {
    sleepFunc = func(ctx context.Context, attempt int) {} // no-op for tests
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@shortcuts/im/helpers_network_test.go` around lines 321 - 354, The tests are
incurring real backoff delays because downloadIMResourceToPath calls
sleepIMDownloadRetry directly; refactor by introducing a package-level variable
(e.g., sleepFunc) initialized to sleepIMDownloadRetry and change
downloadIMResourceToPath (and any callers) to call sleepFunc(ctx, attempt)
instead of sleepIMDownloadRetry; in tests override sleepFunc to a no-op (and
optionally restore it) so retries run instantly. Ensure you reference and update
the symbols sleepIMDownloadRetry, sleepFunc, and downloadIMResourceToPath so the
test can set sleepFunc = func(ctx context.Context, attempt int) {} to avoid real
sleeps.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@shortcuts/im/im_messages_resources_download.go`:
- Around line 369-393: In downloadAndWriteChunk, validate the HTTP 206
response's Content-Range header before writing: parse
downloadResp.Header.Get("Content-Range") (in function downloadAndWriteChunk) to
ensure the returned range start and end match the requested start and end and
that the reported length equals end-start+1; if parsing fails or the range
doesn't match, return an error (similar to
downloadResponseError/output.ErrNetwork) and avoid calling writeChunkAt. This
prevents writing data at the wrong offset when the server returns a different
range.
- Around line 272-284: The loop in the download retry logic (using
imDownloadRequestRetries, runtime.DoAPIStream and sleepIMDownloadRetry) can
perform an extra request after sleep if the context was canceled; add a context
cancellation check immediately after sleepIMDownloadRetry(ctx, attempt) and
return nil, ctx.Err() (or propagate ctx.Err()) when ctx.Err() != nil to avoid
calling runtime.DoAPIStream again with a canceled context.
- Around line 189-216: The probe chunk handling currently calls parseTotalSize
and writes downloadResp.Body at offset 0 via writeChunkAt without validating the
returned byte range; change this to parse the full Content-Range (create/use a
parseContentRange that returns start, end, total) and validate that start == 0
and end is within the expected probe chunk size before calling writeChunkAt on
tmpFile; if validation fails, return an output.ErrNetwork with a clear
"unexpected range" message, otherwise use the parsed total for sizeBytes and
proceed to buildChunkTasks/downloadAndWriteChunk as before.

---

Nitpick comments:
In `@shortcuts/im/helpers_network_test.go`:
- Around line 321-354: The tests are incurring real backoff delays because
downloadIMResourceToPath calls sleepIMDownloadRetry directly; refactor by
introducing a package-level variable (e.g., sleepFunc) initialized to
sleepIMDownloadRetry and change downloadIMResourceToPath (and any callers) to
call sleepFunc(ctx, attempt) instead of sleepIMDownloadRetry; in tests override
sleepFunc to a no-op (and optionally restore it) so retries run instantly.
Ensure you reference and update the symbols sleepIMDownloadRetry, sleepFunc, and
downloadIMResourceToPath so the test can set sleepFunc = func(ctx
context.Context, attempt int) {} to avoid real sleeps.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 04629503-8d6f-410e-8ee4-e540a022fef7

📥 Commits

Reviewing files that changed from the base of the PR and between 5faf274 and ea10aa6.

📒 Files selected for processing (6)
  • shortcuts/common/runner.go
  • shortcuts/im/helpers_network_test.go
  • shortcuts/im/helpers_test.go
  • shortcuts/im/im_messages_resources_download.go
  • skills/lark-im/SKILL.md
  • skills/lark-im/references/lark-im-messages-resources-download.md
✅ Files skipped from review due to trivial changes (3)
  • shortcuts/common/runner.go
  • skills/lark-im/references/lark-im-messages-resources-download.md
  • shortcuts/im/helpers_test.go

Change-Id: I38e6f6f9cf8b8711dc40650d19c77503f4e44989
@chenxingtong-bytedance chenxingtong-bytedance force-pushed the feat/support_download_large_file branch from ea10aa6 to baa8261 Compare April 7, 2026 05:38
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
shortcuts/im/helpers_test.go (1)

492-527: Well-structured test with good coverage.

The table-driven test comprehensively covers the parseTotalSize function, including success cases and all major error paths. The substring matching for error assertions is appropriate for maintainability.

One minor gap: the implementation checks for totalSize < 0 (line 355-356 in im_messages_resources_download.go), but there's no test case for a negative total like "bytes 0-15/-100". Consider adding this edge case for completeness.

🧪 Optional: Add test case for negative total
 		{name: "unknown total size", contentRange: "bytes 0-99/*", wantErr: `unknown total size in content-range: "bytes 0-99/*"`},
 		{name: "invalid total", contentRange: "bytes 0-15/not-a-number", wantErr: "parse total size:"},
+		{name: "negative total", contentRange: "bytes 0-15/-100", wantErr: "invalid total size:"},
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@shortcuts/im/helpers_test.go` around lines 492 - 527, Add a table-driven test
case to TestParseTotalSize that covers the negative-total branch in
parseTotalSize: insert a test entry like {name: "negative total", contentRange:
"bytes 0-15/-100", wantErr: "total size"} so the test calls parseTotalSize and
asserts the returned error contains the "total size" substring (matching the
implementation's totalSize < 0 check in im_messages_resources_download.go); this
ensures the negative total-size path in parseTotalSize is exercised.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@shortcuts/im/helpers_test.go`:
- Around line 492-527: Add a table-driven test case to TestParseTotalSize that
covers the negative-total branch in parseTotalSize: insert a test entry like
{name: "negative total", contentRange: "bytes 0-15/-100", wantErr: "total size"}
so the test calls parseTotalSize and asserts the returned error contains the
"total size" substring (matching the implementation's totalSize < 0 check in
im_messages_resources_download.go); this ensures the negative total-size path in
parseTotalSize is exercised.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 028b515c-51cf-41b2-a7ef-7fb09176f88f

📥 Commits

Reviewing files that changed from the base of the PR and between ea10aa6 and baa8261.

📒 Files selected for processing (6)
  • shortcuts/common/runner.go
  • shortcuts/im/helpers_network_test.go
  • shortcuts/im/helpers_test.go
  • shortcuts/im/im_messages_resources_download.go
  • skills/lark-im/SKILL.md
  • skills/lark-im/references/lark-im-messages-resources-download.md
✅ Files skipped from review due to trivial changes (1)
  • skills/lark-im/references/lark-im-messages-resources-download.md
🚧 Files skipped from review as they are similar to previous changes (3)
  • skills/lark-im/SKILL.md
  • shortcuts/im/im_messages_resources_download.go
  • shortcuts/common/runner.go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain/im PR touches the im domain size/L Large or sensitive change across domains or core paths

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants