[Draft] GPT4all supported LLM architectures #53

iwr-redmond · 2025-08-07T03:15:48Z

iwr-redmond
Aug 7, 2025

From llamamodel.cpp@6518b33:

Baichuan
BERT (for generating embeddings)
Bloom
ChatGLM
CodeShell
Command-R
DeepSeek2
Falcon
Gemma (i.e. Gemma1)
Gemma2
GPT2 (used by many prompt enhancers)
GPTneoX
InternLM2
Jais
Llama (includes LLama1, LLama2, and LLama3)
MPT
Nomic-BERT (for generating embeddings, e.g. with Nomic Text Embedder v1.5)
Olmo
OpenElm
Orion
Phi2
Phi3
Qwen (i.e. Qwen1)
Qwen2
Qwen2moe
Refact
StableLM
StarCoder (i.e. StarCoder1)
StarCoder2
Xverse

Additionally, Mistral models work fine.

iwr-redmond · 2025-08-07T03:26:32Z

iwr-redmond
Aug 7, 2025
Author

Supported quant types (from modellist.cpp@6518b33):

Q4_0: fast, lower accuracy
Q4_1: better accuracy
F16: full size, good for GPT2 models
F32: high precision, very large size

10 replies

Teriks Aug 7, 2025
Maintainer

After some trying it seems infeasible to use the free runners on GitHub actions this way for llama.cpp, just due to memory overhead from compiling, though would have been neat, it’s too much for the runners even with the most aggressive memory saving compilation flags

iwr-redmond Aug 9, 2025
Author

Was this with CUDA or Vulkan? The latter may be more memory efficient and works with all GPU architectures.

Another option would be a script that takes the latest binaries from Oobabooga and adds them to the GPT4all-style wheels for each platform, adding the downloaded .dll/.so/.dylib files dynamically. Perhaps this could be extended to the official ROCm builds where there is a matching build.

In any case, it's probably best to leave this as a todo for now, and just add the documentation update as an interim step.

Teriks Aug 10, 2025
Maintainer

It’s mostly compilation with the macos runner, prior to compilation vm_stat reports 256mb of free memory, so when clang starts to chug through compiling the code it gives up the ghost after a few files get compiled, supposedly these runners have 7G of RAM but not sure how it is provisioned, there may be multiple jobs going on their mac minis or whatever they are farming it out to.

iwr-redmond Aug 10, 2025
Author

Have a look at the xlarge images, which IIUC are a significant upgrade on the original M1 hosts.

iwr-redmond Aug 11, 2025
Author

Never mind, that appears to be paid only.

However, take a look at this cross-compile project, which supports building C++ for MacOS on Linux. That might allow you to build on Linux.

Update 8/13: The Oobabooga MacOS workflow makes it seem like there a series of minor gotchas on OSX that may need to be accounted for regardless of host.

iwr-redmond · 2025-09-03T21:35:55Z

iwr-redmond
Sep 3, 2025
Author

Back to this topic again: XLlamaCPP is a Cython implementation that already provides builds for CUDA and Metal, with the latter enabled by default (see also setup.py#L74). What is missing is Vulkan for broad GPU support (xllamacpp#61). You could consider contributing your earlier Vulkan attempts to XLlamaCPP as they have already provided Metal builds.

0 replies

Teriks · 2025-09-07T21:26:01Z

Teriks
Sep 7, 2025
Maintainer

This is an interesting project, I think the problem people are having with getting this up in running is that llama.cpp split their build up in a way which makes a single binary build per major platform, that works for everyone the way gpt4all was building (with baked in cuda + vulkan + kompute, etc), no longer tenable. so extensive use of --extra-index-url is needed for these backends. most projects I have seen do not cover all possible backends, though hosting the index on github is interesting

1 reply

iwr-redmond Sep 8, 2025
Author

I reckon that the multitude of backends may obscure the flexibility provided by Vulkan, which works with any GPU. Most builds seem to have Metal for OSX and CPU for Windows and Linux, omitting Vulkan despite this being the easiest way to achieve broad compatibility.

with baked in cuda + vulkan + kompute, etc

GPT4all included CUDA + Kompute. Vulkan was a separate addition upstream in llama.cpp. See llama.cpp#4456 and llama.cpp#14501 for the history.

iwr-redmond · 2025-09-29T17:08:03Z

iwr-redmond
Sep 29, 2025
Author

Looks like we might be getting XLlamaCPP for Vulkan (xllamacpp#61 (comment)), which would make it usable on non-NVIDIA Windows and Linux devices, supplementing the existing CUDA and MPS support.

1 reply

iwr-redmond Nov 21, 2025
Author

Update: xllamacpp is now available for Vulkan along with the existing CUDA and Metal builds, see #73 for details

Uh oh!

[Draft] GPT4all supported LLM architectures #53

Uh oh!

Uh oh!

iwr-redmond Aug 7, 2025

Replies: 4 comments · 12 replies

Uh oh!

Uh oh!

iwr-redmond Aug 7, 2025 Author

Uh oh!

Teriks Aug 7, 2025 Maintainer

Uh oh!

Uh oh!

iwr-redmond Aug 9, 2025 Author

Uh oh!

Teriks Aug 10, 2025 Maintainer

Uh oh!

iwr-redmond Aug 10, 2025 Author

Uh oh!

Uh oh!

iwr-redmond Aug 11, 2025 Author

Uh oh!

Uh oh!

iwr-redmond Sep 3, 2025 Author

Uh oh!

Teriks Sep 7, 2025 Maintainer

Uh oh!

Uh oh!

iwr-redmond Sep 8, 2025 Author

Uh oh!

Uh oh!

iwr-redmond Sep 29, 2025 Author

Uh oh!

iwr-redmond Nov 21, 2025 Author

iwr-redmond
Aug 7, 2025

Replies: 4 comments 12 replies

iwr-redmond
Aug 7, 2025
Author

Teriks Aug 7, 2025
Maintainer

iwr-redmond Aug 9, 2025
Author

Teriks Aug 10, 2025
Maintainer

iwr-redmond Aug 10, 2025
Author

iwr-redmond Aug 11, 2025
Author

iwr-redmond
Sep 3, 2025
Author

Teriks
Sep 7, 2025
Maintainer

iwr-redmond Sep 8, 2025
Author

iwr-redmond
Sep 29, 2025
Author

iwr-redmond Nov 21, 2025
Author