Skip to content

Conversation

@tomlau10
Copy link

@tomlau10 tomlau10 commented Apr 30, 2025

mcurl 確實是個好東西! 👍

在我的一個項目流程中,需要在一台運行舊 linux 的 vm 上對使用 Digest Auth 驗證的網站做下載。
但大多數的多線程下載工具比如 aria2 都不支持,而 wget2 又無法在這台 vm 上安裝,就只剩 wgetcurl 是支持: https://curl.se/docs/comparison-table.html

所以我就轉成尋找讓 wget / curl 變成多線程下載的方法,最後找到這個 mcurl
然而當前版本的 mcurl 沒有選項支持傳入自定義 curl 選項,比如 --digest -u "${USER}:${PASSWD}" 等。
我本來是直接修改到 script 內使用,但這樣的用法並不 general。
於是我就嘗試對 script 進行更多改動,測試過程中也 fix 了些 bug。

越改越多東西,所以不如也開個 PR 😃
改動有點多,我盡可能將 commit 拆分得仔細和清晰一些

新功能

  1. 傳遞 curl 選項
    • url 後的所有 argument 都當成 curl options
    ./mcurl.sh [options] url [curl options]
    
    • 例子:./mcurl.sh -s4 https://some.url -L --digest -u "${USER}:${PASSWD}"
    • 這裡 -L --digest -u "${USER}:${PASSWD}" 都會傳到底層的 curl 調用
  2. ctrl+c 中斷 script 時做 clean up
    • 增加了 trap 相關 signal (當然 SIGKILL 是無法 trap 的。。。)
    • 並利用 kill -- -$$ 連同 background 的 curl process 一併 kill 掉
  3. 如果 target file 已存在,先進行提示
    • 利用 rm -i 方式進行詢問,不回覆 y 的話會取消操作
    • 同時對 script 增加一個 -f|--force 選項,讓底層轉用 rm -f 不作提示
  4. 顯示下載進度和總用時
    • 計算 downloaded file size * 100 / total size 以顯示百分比
    • 最後也顯示總用時
  5. 支持 gitbash
    • gitbash 沒有內建 pgrep,我在 AI 建議下改用了 jobs,應該是 portable 的
  6. 簡單的錯誤檢查機制
    • 記錄所有 spawned task 的 pid,並對已完成的 spawned pid 逐一檢查 exit code
    • 如果任意一個 exit code 非 0,會視為錯誤並中斷下載和 clean up
    • 可以下載中時用 pkill "^curl" 來測試

優化

  1. 合併文件時,第1個 slice 用 mv 就可以了,節省一次 cat
  2. 當任意一個 part 文件存在時就顯示速度數據,而不等到 part1 存在時才顯示

修復

  1. 對於小文件,用 du 1024 blocksize 方式計算 size 不準確
  2. 小文件 使用 多個 slice 做下載時,callback 的觸發時機好像有 race condition 問題
    • 具體來說是後續的 curl 還未創建,前邊的 curl 已經完成並觸發 callback 了
    • 會導致 running jobs == 0 的判斷失效:當前確實是 0,但實際上還有 jobs 未被 spawn
    • 這個情況在 gitbash 下非常明顯
    • 改成去掉 signal callback 機制,單純在主循環中每次檢查 running jobs == 0

測試過的環境

  • win10 wsl2 ubuntu 22.04: GNU bash, version 5.1.16(1)-release (x86_64-pc-linux-gnu)
  • win10 gitbash 2.47: GNU bash, version 5.2.37(1)-release (x86_64-pc-msys)
  • win10 cygwin 3.6.1: GNU bash, version 5.2.21(1)-release (x86_64-pc-cygwin)
  • macos 11.6: GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin20)
  • gcp linux vm: GNU bash, version 4.2.46(1)-release (x86_64-redhat-linux-gnu)

另這個網站有不同 size 的 test file,有需要可以做更多測試: https://www.thinkbroadband.com/download


English Version (translated by deepseek-v3)

click to toggle

mcurl is truly a great tool! 👍

During one of my project workflows, I needed to download from a website using Digest Auth on an old Linux VM. However, most multi-threaded download tools like aria2 don't support this, and wget2 couldn't be installed on this VM, leaving only wget and curl as options: https://curl.se/docs/comparison-table.html

So I started looking for ways to make wget/curl work with multi-threaded downloads, and eventually found mcurl! However, the current version of mcurl doesn't support passing custom curl options like --digest -u "${USER}:${PASSWD}", etc. I initially modified the script directly for my use case, but this approach wasn't generalizable. Then I attempted to make more modifications to the script, fixing some bugs along the way during testing.

The changes kept growing, so I thought it would be better to open a PR 😃 There are quite a few modifications, and I've tried to keep the commits as detailed and clear as possible.

New Features

  1. Passing curl options
    • Treat all arguments after url as curl options
    ./mcurl.sh [options] url [curl options]
    
    • Example: ./mcurl.sh -s4 https://some.url -L --digest -u "${USER}:${PASSWD}"
    • Here -L --digest -u "${USER}:${PASSWD}" will be passed to the underlying curl calls
  2. Clean up on ctrl+c interruption
    • Added trap for relevant signals (though SIGKILL can't be trapped...)
    • Used kill -- -$$ to also kill background curl processes
  3. Prompt when target file exists
    • Uses rm -i for confirmation, canceling operation if response isn't y
    • Added -f|--force option to use rm -f without prompting
  4. Display download progress and total time
    • Calculates downloaded file size * 100 / total size to show percentage
    • Also displays total time taken at the end
  5. Support for gitbash
    • gitbash doesn't have pgrep built-in, replaced with jobs (more portable)
  6. Basic error checking mechanism
    • Records all spawned task pids and checks their exit codes
    • If any exit code is non-zero, treats as error and aborts download with cleanup
    • Can test during download with pkill "^curl"

Optimizations

  1. When merging files, use mv for the first slice to save one cat operation
  2. Show speed data when any part file exists, rather than waiting for part1

Fixes

  1. For small files, using du 1024 blocksize gave inaccurate size calculations
  2. Race condition in callback timing when downloading small files with multiple slices
    • Later curl processes might not be created before earlier ones complete and trigger callback
    • Causes running jobs == 0 check to fail (technically correct but jobs not spawned yet)
    • Particularly noticeable in gitbash
    • Removed signal callback mechanism, now simply checks running jobs == 0 in main loop

Tested Environments

  • win10 wsl2 ubuntu 22.04: GNU bash, version 5.1.16(1)-release (x86_64-pc-linux-gnu)
  • win10 gitbash 2.47: GNU bash, version 5.2.37(1)-release (x86_64-pc-msys)
  • win10 cygwin 3.6.1: GNU bash, version 5.2.21(1)-release (x86_64-pc-cygwin)
  • macos 11.6: GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin20)
  • gcp linux vm: GNU bash, version 4.2.46(1)-release (x86_64-redhat-linux-gnu)

This site has test files of various sizes if more testing is needed: https://www.thinkbroadband.com/download

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant