Skip to content

Conversation

@angt
Copy link
Collaborator

@angt angt commented Nov 29, 2025

I intentionally kept the bar simple without specifying part numbers (which ultimately don't matter much) the only thing we care about is tracking progress

Copy link
Collaborator

@ngxson ngxson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I missed something, I can't find any references to std::thread in download.cpp - do we actually using more than 1 thread here?

And AFAIK, httplib::Client is single-threaded.

Edit: nevermind, see next comment

Comment on lines 491 to 492
static std::mutex mutex;
static std::map<std::thread::id, int> lines;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a local variable at first glance, but because it's static, that effectively make it a global variable.

IMO we should avoid global variable if possible. Probably a better way is to make a dedicated class to manage threads and implement the mutex as a member of the class.

@ngxson
Copy link
Collaborator

ngxson commented Nov 29, 2025

Hmm, nevermind, I found it: the multi-threads is launched by std::async which uses a thread pool under the hood.

In this case, I think it's better to introduce a mutex at common_download_file_multiple level (not global/static level)

This is the code where threads are created:

    std::vector<std::future<bool>> futures_download;
    for (auto const & item : urls) {
        futures_download.push_back(std::async(std::launch::async, [bearer_token, offline](const std::pair<std::string, std::string> & it) -> bool {
            return common_download_file_single(it.first, it.second, bearer_token, offline);
        }, item));
    }

The cleaner idea looks like this (pseudo-code):

    std::mutex mutex_progress;
    std::vector<float> progress(urls.size(), 0); // from 0.0 to 100.0
    auto update_progress = [&mutex_progress, &progress](size_t index, float value) {
        std::lock_guard<std::mutex> lock(mutex_progress);
        progress[index] = value;
        // print all progress here
    };

    std::vector<std::future<bool>> futures_download;
    for (auto const & item : urls) {
        futures_download.push_back(std::async(std::launch::async, [bearer_token, offline, update_progress](const std::pair<std::string, std::string> & it) -> bool {
            return common_download_file_single(it.first, it.second, bearer_token, offline, update_progress);
        }, item));
    }

@angt
Copy link
Collaborator Author

angt commented Nov 30, 2025

The cleaner idea looks like this (pseudo-code):

    std::mutex mutex_progress;
    std::vector<float> progress(urls.size(), 0); // from 0.0 to 100.0
    auto update_progress = [&mutex_progress, &progress](size_t index, float value) {
        std::lock_guard<std::mutex> lock(mutex_progress);
        progress[index] = value;
        // print all progress here
    };

    std::vector<std::future<bool>> futures_download;
    for (auto const & item : urls) {
        futures_download.push_back(std::async(std::launch::async, [bearer_token, offline, update_progress](const std::pair<std::string, std::string> & it) -> bool {
            return common_download_file_single(it.first, it.second, bearer_token, offline, update_progress);
        }, item));
    }

I disagree on that, moving the lock to the caller is error-prone. Statics are generally avoided for good reasons, but in this case it fits perfectly as there is physically only one terminal to share.

The only bad thing is the static map, but it keeps the code extremely simple and works for the current usage (avoiding the need to change all prototypes). When wefully remove curl, we can refactor the entire download code for better structure and address this properly.

@angt
Copy link
Collaborator Author

angt commented Nov 30, 2025

@ngxson If you prefer, we can wait for #17427 to be merged. I can then propose a more intrusive refactor with a proper class and RAII to handle the map cleanup

@ngxson
Copy link
Collaborator

ngxson commented Nov 30, 2025

I disagree on that, moving the lock to the caller is error-prone.

Which kind of error can we get?

Statics are generally avoided for good reasons, but in this case it fits perfectly as there is physically only one terminal to share.

You can't assume this function will always be called in terminal application unless you explicit specify it in the function name or documentation.

Currently, there is nothing prevent someone from calling common_download_file_multiple from one of the tool/example. When server implement the /download endpoint to add model dynamically from the webui, without seeing this discussion, I will end up calling it from multiple thread, and didn't expect it to have a global mutex. This is a real example of why your code can be error-prone.


Edit: the given example does not directly related to the placement of mutex, but I just want to highlight the fact that functions should be well-documented especially in term of thread safety

@angt
Copy link
Collaborator Author

angt commented Nov 30, 2025

Which kind of error can we get?

We need a function to print the progress bar, as it could be called from many places. By moving the mutex outside of the function, you are literally asking for a non thread-safe function to be synchronized by the caller, which is error-prone by design. As this function shares a unique tty for the whole lifetime of the process, a static mutex is the perfect scenario here. There are no instances to handle.

You can't assume this function will always be called in terminal application unless you explicit specify it in the function name or documentation.

The function is designed to be safe regardless of the context, whether it's running in a terminal or not.

Currently, there is nothing prevent someone from calling common_download_file_multiple from one of the tool/example. When server implement the /download endpoint to add model dynamically from the webui, without seeing this discussion, I will end up calling it from multiple thread, and didn't expect it to have a global mutex. This is a real example of why your code can be error-prone.

It will work in that case as well, you just call the function from anywhere. If multiple threads call it, you will see multiple progress bars, which is the expected behavior :)

@angt
Copy link
Collaborator Author

angt commented Nov 30, 2025

Since #17427 is merged, I'll update the implementation to properly cleanup the map in all cases.

@ngxson
Copy link
Collaborator

ngxson commented Nov 30, 2025

It will work in that case as well, you just call the function from anywhere. If multiple threads call it, you will see multiple progress bars, which is the expected behavior :)

In server's use case, I would expect to have an abstracted implementation of the progress bar (whose the state can be reported into webui). Each progress instance of the download should have their own progress bar and they none depends on the other.

Probably this implementation is OK for just terminal and we can improve it later. But still, having a static mutex inside a function is obviously an anti-pattern (it should ideally in global scope)

Probably better to move it to log.cpp and reuse the already existing common_log::mtx (ofc, we also need to initialize log system prior to model downloading, which is currently not the case - but this is trivial to do)

@angt angt force-pushed the common-add-minimalist-multi-thread-progress-bar branch from c180d51 to 09717b6 Compare November 30, 2025 18:28
@angt
Copy link
Collaborator Author

angt commented Nov 30, 2025

Totally agree, for the WebUI we'll need a better abstraction of the downloader, to completly remove all the tty code.

Probably better to move it to log.cpp and reuse the already existing common_log::mtx (ofc, we also need to initialize log system prior to model downloading, which is currently not the case - but this is trivial to do)

That wouldn't be good. The progress bar needs a tty fd, while the log goes to stderr (for a modern process) or a dedicated fd. They shouldn't be mixed (you don't want a progress bar in your logs anyway).

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
@angt angt force-pushed the common-add-minimalist-multi-thread-progress-bar branch from 09717b6 to 4387ab2 Compare November 30, 2025 18:51
@angt
Copy link
Collaborator Author

angt commented Nov 30, 2025

The new implementation works like the old one but cleanup the map in all cases thanks to RAII. Also, instead of the thread id, we now use the instance itself to track the bar, which is more flexible.

@ngxson
Copy link
Collaborator

ngxson commented Nov 30, 2025

That wouldn't be good. The progress bar needs a tty fd, while the log goes to stderr (for a modern process) or a dedicated fd. They shouldn't be mixed (you don't want a progress bar in your logs anyway).

I don't mean the progress bar must use the same tty as log, but it should use the same mutex as log. In theory, that should be the only mutex controlling output to console/tty/file

@angt
Copy link
Collaborator Author

angt commented Dec 1, 2025

That wouldn't be good. The progress bar needs a tty fd, while the log goes to stderr (for a modern process) or a dedicated fd. They shouldn't be mixed (you don't want a progress bar in your logs anyway).

I don't mean the progress bar must use the same tty as log, but it should use the same mutex as log. In theory, that should be the only mutex controlling output to console/tty/file

Actually, this is a known anti-pattern we need to avoid. Since the download is a long process exposed to interrupts (signals, network loss, disk full...) sharing the mutex creates a guaranteed deadlock if we need to log something.

The mutex is absolutely not here to protect the fd, it's purely cosmetic to ensure a clean display when everything goes well as we share the same tty.

@ngxson
Copy link
Collaborator

ngxson commented Dec 1, 2025

I cannot think of any cases that it can result in a deadlock:

  • The progress bar only acquires a lock if it wants to display something, then release it
  • The log system also acquires a lock if it wants to display something:
    • If the lock cannot be acquired, there is another on-going log line or a progress bar on-going. In this case, we wait until the log line or progress bar finishes its work

At least, that's how I envision the system to work.

Also, I think we try to reduce the number of std::mutex across the code base whenever possible. Probably @ggerganov can you give some opinions here?

@ngxson
Copy link
Collaborator

ngxson commented Dec 1, 2025

Since the download is a long process exposed to interrupts (signals, network loss, disk full...)

Just to note that, download is a long process, but print_progress is not - it is more or less an enhanced version of LOG(...) with the ability to update the last line.

Edit: even better, I think we can just implement a new log macro: LOG_PROG(...), meaning "log using progress bar"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants