Skip to content

Conversation

@ngxson
Copy link
Collaborator

@ngxson ngxson commented Nov 17, 2025

Tested on my Macbook M3 Max, time reduced from ~13.46s to ~11.33s

master

time cmake --build build --target common
[  0%] Built target build_info
[  6%] Built target ggml-base
[ 10%] Built target ggml-metal
[ 20%] Built target ggml-cpu
[ 20%] Built target ggml-blas
[ 20%] Built target ggml
[ 91%] Built target llama
[ 91%] Building CXX object common/CMakeFiles/common.dir/arg.cpp.o
[ 91%] Linking CXX static library libcommon.a
[100%] Built target common
cmake --build build --target common  13.46s user 0.53s system 97% cpu 14.291 total

Repeated 3 more times:
cmake --build build --target common  13.39s user 0.50s system 98% cpu 14.143 total
cmake --build build --target common  13.63s user 0.50s system 98% cpu 14.357 total
cmake --build build --target common  13.49s user 0.53s system 97% cpu 14.388 total

PR

time cmake --build build --target common
[  0%] Built target build_info
[  6%] Built target ggml-base
[ 10%] Built target ggml-metal
[ 20%] Built target ggml-cpu
[ 20%] Built target ggml-blas
[ 20%] Built target ggml
[ 91%] Built target llama
[ 91%] Building CXX object common/CMakeFiles/common.dir/arg.cpp.o
[ 91%] Linking CXX static library libcommon.a
[100%] Built target common
cmake --build build --target common  11.33s user 0.52s system 97% cpu 12.145 total

Repeated 3 more times:
cmake --build build --target common  11.16s user 0.50s system 97% cpu 11.984 total
cmake --build build --target common  11.26s user 0.50s system 97% cpu 12.048 total
cmake --build build --target common  11.65s user 0.50s system 97% cpu 12.438 total

@ngxson ngxson requested a review from ggerganov as a code owner November 17, 2025 15:08
@ngxson
Copy link
Collaborator Author

ngxson commented Nov 17, 2025

Quite an interesting thing, here is the breakdown of the compilation time of each object: https://gist.github.com/ngxson/5569a8ec8f380a2c51df029feab260da

Summarized by AI:

  1. common/CMakeFiles/common.dir/chat.cpp.o - 20.8244 seconds
  2. tools/server/CMakeFiles/llama-server.dir/server.cpp.o - 15.4134 seconds
  3. src/CMakeFiles/llama.dir/llama-model.cpp.o - 10.6133 seconds
  4. common/CMakeFiles/common.dir/arg.cpp.o - 10.7781 seconds
  5. tests/CMakeFiles/test-chat.dir/test-chat.cpp.o - 10.0094 seconds
  6. tests/CMakeFiles/test-backend-ops.dir/test-backend-ops.cpp.o - 11.353 seconds
  7. vendor/cpp-httplib/CMakeFiles/cpp-httplib.dir/httplib.cpp.o - 11.5905 seconds
  8. src/CMakeFiles/llama.dir/llama-quant.cpp.o - 3.4398 seconds
  9. src/CMakeFiles/llama.dir/llama-sampling.cpp.o - 3.07033 seconds
  10. src/CMakeFiles/llama.dir/unicode.cpp.o - 4.31515 seconds

@jeffbolznv
Copy link
Collaborator

Have you considered using precompiled headers? I've had good luck with them in other projects, at least with msvc.

@ngxson
Copy link
Collaborator Author

ngxson commented Nov 17, 2025

Have you considered using precompiled headers? I've had good luck with them in other projects, at least with msvc.

I tried to address #17329 using pre-compiled header but it doesn't resolve the problem. Free free to give it a try on your side.

For the current PR, I'm not quite sure if pre-compiled header can help, as much of the time spent on creating copy constructor of common_arg. I try to prevent as much copy as possible in the current PR (which also improve the runtime performance, though it's very negligible)

Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't observe a significant compile-time reduction with this change on M2 Ultra. The reported 15% speed-up is good, though it seems a bit too large for such a change. If you can confirm it then I guess it's ok. Otherwise, if the speedup is small, I'd recommend keeping the old idiomatic l-value reference implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants