Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 49 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,52 @@
# llama.cpp with sentencepiece

This fork aims to enhance llama.cpp by integrating the SentencePiece library as a tokenizer, enabling more flexible and language-agnostic tokenization for LLM inference. By using SentencePiece, this project supports advanced tokenization strategies, improves compatibility with a wider range of models, and simplifies workflows for users who require custom or multilingual tokenization. The scope of the fork includes adding a new tokenizer option, supporting SentencePiece model blobs, and implementing chat templates for specific models such as "Teuken 7B".

It introduces a new tokenizer named `sentencepiece` for the key `tokenizer.ggml.model`. In this case, the key `tokenizer.ggml.sentencepiece` must contain the binary blob with the SentencePiece model. Tokenization is performed by the SentencePiece library instead of the built-in algorithms in `llama.cpp`.

Additionally, this fork implements the chat template used by the LLM "Teuken 7B".

## Using llama.cpp with sentencepiece

### Setup for Windows

In the release section, you will find a setup for Windows containing `llama.cpp` with SentencePiece support and "Teuken 7B".
After install browser opens with the chat ui of `llama.cpp` and "Teuken 7B" as the llm.

### Installing from the source

First install sentencepiece static library with headers for your OS and compiler (see sentencepice documentation).

Clone the llama.cpp with sentencepiece repository:

```sh
git clone https://github.com/awenzel67/llama.cpp.git
cd llama.cpp
git switch teuken
```

Configure the build (see sentencepiece for details):
```sh
cmake -B buildFullCuda -DCURL_INCLUDE_DIR=C:/Del/vcpkg/installed/x64-windows/include -DCURL_LIBRARY=C:/Del/vcpkg/installed/x64-windows/lib/libcurl.lib -DSPIE_INCLUDE_DIR=C:\NHKI\llama\sentencepiece\src -DSPIE_LIBRARY=C:\NHKI\llama\sentencepiece\build\src\Release\sentencepiece.lib -DGGML_CUDA=ON
```

The cmake commands contains a variable to specify the include directory for sentencepiece library:
```sh
-DSPIE_INCLUDE_DIR=C:\NHKI\llama\sentencepiece\src
```

The cmake commands contains a variable to specify the path to the static sentencepiece library:
```sh
-DSPIE_LIBRARY=C:\NHKI\llama\sentencepiece\build\src\Release\sentencepiece.lib
```
Now you can use the common llama.cpp tools like llama-cli or llama-server.

You can use all models for llama.cpp. Additional the following Teuken 7B ggufs can be used:

- Teuken-7.5B-BF16-CM.gguf
- Teuken-7.5B-Q4_K_M.gguf


# llama.cpp

![llama](https://user-images.githubusercontent.com/1991296/230134379-7181e485-c521-4d23-a0d6-f7b3b61ba524.png)
Expand Down
1 change: 1 addition & 0 deletions include/llama.h
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,7 @@ extern "C" {
LLAMA_VOCAB_TYPE_UGM = 4, // T5 tokenizer based on Unigram
LLAMA_VOCAB_TYPE_RWKV = 5, // RWKV tokenizer based on greedy tokenization
LLAMA_VOCAB_TYPE_PLAMO2 = 6, // PLaMo-2 tokenizer based on Aho-Corasick with dynamic programming
LLAMA_VOCAB_TYPE_SPIE = 7, // TEUKEN tokenizer based on SentencePiece
};

enum llama_rope_type {
Expand Down
77 changes: 77 additions & 0 deletions innosetup.iss
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
; Script generated by the Inno Setup Script Wizard.
; SEE THE DOCUMENTATION FOR DETAILS ON CREATING INNO SETUP SCRIPT FILES!
; Non-commercial use only

#define MyAppName "llama.cpp-teuken"
#define MyAppVersion "1.5"
#define MyAppPublisher "awenzel67"
#define MyAppURL "https://github.com/awenzel67/llama.cpp"
#define MyAppExeName "runteuken.bat"
#define MyAppAssocName MyAppName + " File"
#define MyAppAssocExt ".myp"
#define MyAppAssocKey StringChange(MyAppAssocName, " ", "") + MyAppAssocExt

[Setup]
; NOTE: The value of AppId uniquely identifies this application. Do not use the same AppId value in installers for other applications.
; (To generate a new GUID, click Tools | Generate GUID inside the IDE.)
AppId={{56F7611F-2A10-49B9-B3A8-B627CA731E2B}
AppName={#MyAppName}
AppVersion={#MyAppVersion}
;AppVerName={#MyAppName} {#MyAppVersion}
AppPublisher={#MyAppPublisher}
AppPublisherURL={#MyAppURL}
AppSupportURL={#MyAppURL}
AppUpdatesURL={#MyAppURL}
DefaultDirName={autopf}\{#MyAppName}
UninstallDisplayIcon={app}\{#MyAppExeName}
; "ArchitecturesAllowed=x64compatible" specifies that Setup cannot run
; on anything but x64 and Windows 11 on Arm.
ArchitecturesAllowed=x64compatible
; "ArchitecturesInstallIn64BitMode=x64compatible" requests that the
; install be done in "64-bit mode" on x64 or Windows 11 on Arm,
; meaning it should use the native 64-bit Program Files directory and
; the 64-bit view of the registry.
ArchitecturesInstallIn64BitMode=x64compatible
ChangesAssociations=yes
DisableProgramGroupPage=yes
; Uncomment the following line to run in non administrative install mode (install for current user only).
;PrivilegesRequired=lowest
OutputDir=C:\Del\tk
OutputBaseFilename=llama.cpp-teuken
;SolidCompression=yes
WizardStyle=modern dynamic
DiskSpanning=yes
[Languages]
Name: "english"; MessagesFile: "compiler:Default.isl"
Name: "german"; MessagesFile: "compiler:Languages\German.isl"

[Tasks]
Name: "desktopicon"; Description: "{cm:CreateDesktopIcon}"; GroupDescription: "{cm:AdditionalIcons}"; Flags: unchecked

[Files]
Source: "C:\NHKI\mymodell\kilocal\{#MyAppExeName}"; DestDir: "{app}"; Flags: ignoreversion
Source: "C:\NHKI\llama\llama.cpp\buildFullCuda\bin\Release\llama-server.exe"; DestDir: "{app}"; Flags: ignoreversion
Source: "C:\NHKI\llama\llama.cpp\buildFullCuda\bin\Release\ggml.dll"; DestDir: "{app}"; Flags: ignoreversion
Source: "C:\NHKI\llama\llama.cpp\buildFullCuda\bin\Release\ggml-base.dll"; DestDir: "{app}"; Flags: ignoreversion
Source: "C:\NHKI\llama\llama.cpp\buildFullCuda\bin\Release\ggml-cpu.dll"; DestDir: "{app}"; Flags: ignoreversion
Source: "C:\NHKI\llama\llama.cpp\buildFullCuda\bin\Release\ggml-cuda.dll"; DestDir: "{app}"; Flags: ignoreversion
Source: "C:\NHKI\llama\llama.cpp\buildFullCuda\bin\Release\libcurl.dll"; DestDir: "{app}"; Flags: ignoreversion
Source: "C:\NHKI\llama\llama.cpp\buildFullCuda\bin\Release\llama.dll"; DestDir: "{app}"; Flags: ignoreversion
Source: "C:\NHKI\llama\llama.cpp\buildFullCuda\bin\Release\mtmd.dll"; DestDir: "{app}"; Flags: ignoreversion
Source: "C:\NHKI\llama\llama.cpp\buildFullCuda\bin\Release\zlib1.dll"; DestDir: "{app}"; Flags: ignoreversion
Source: "C:\NHKI\mymodell\kilocal\Teuken-7.5B-Q4_K_M.gguf"; DestDir: "{app}"; Flags: ignoreversion
; NOTE: Don't use "Flags: ignoreversion" on any shared system files

[Registry]
Root: HKA; Subkey: "Software\Classes\{#MyAppAssocExt}\OpenWithProgids"; ValueType: string; ValueName: "{#MyAppAssocKey}"; ValueData: ""; Flags: uninsdeletevalue
Root: HKA; Subkey: "Software\Classes\{#MyAppAssocKey}"; ValueType: string; ValueName: ""; ValueData: "{#MyAppAssocName}"; Flags: uninsdeletekey
Root: HKA; Subkey: "Software\Classes\{#MyAppAssocKey}\DefaultIcon"; ValueType: string; ValueName: ""; ValueData: "{app}\{#MyAppExeName},0"
Root: HKA; Subkey: "Software\Classes\{#MyAppAssocKey}\shell\open\command"; ValueType: string; ValueName: ""; ValueData: """{app}\{#MyAppExeName}"" ""%1"""

[Icons]
Name: "{autoprograms}\{#MyAppName}"; Filename: "{app}\{#MyAppExeName}"
Name: "{autodesktop}\{#MyAppName}"; Filename: "{app}\{#MyAppExeName}"; Tasks: desktopicon

;[Run]
;Filename: "{app}\{#MyAppExeName}"; Description: "{cm:LaunchProgram,{#StringChange(MyAppName, '&', '&&')}}"; Flags: postinstall skipifsilent

11 changes: 8 additions & 3 deletions src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -32,16 +32,21 @@ add_library(llama
llama-quant.cpp
llama-sampling.cpp
llama-vocab.cpp
llama-vocab-sentencepiece.cpp
unicode-data.cpp
unicode.cpp
unicode.h
)

target_include_directories(llama PRIVATE .)
target_include_directories(llama PUBLIC ../include)
target_compile_features (llama PRIVATE cxx_std_17) # don't bump
target_include_directories(llama PUBLIC ../include ${SPIE_INCLUDE_DIR} ${SPIE_INCLUDE_DIR}/..)
#target_link_directories(${TEST_TARGET} PRIVATE "C:/NHKI/llama/sentencepiece/build/src/Debug")

target_link_libraries(llama PUBLIC ggml)
target_compile_features (llama PRIVATE cxx_std_17) # don't bump
message(SPIE_LIBRARY="${SPIE_LIBRARY}")
message(SPIE_LIBRARY="${SPIE_INCLUDE_DIR}")
#target_link_libraries(llama PUBLIC ggml "C:/NHKI/llama/sentencepiece/build/src/Release/sentencepiece.lib")
target_link_libraries(llama PUBLIC ggml ${SPIE_LIBRARY})

if (BUILD_SHARED_LIBS)
set_target_properties(llama PROPERTIES POSITION_INDEPENDENT_CODE ON)
Expand Down
1 change: 1 addition & 0 deletions src/llama-arch.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -251,6 +251,7 @@ static const std::map<llm_kv, const char *> LLM_KV_NAMES = {
{ LLM_KV_TOKENIZER_ADD_PREFIX, "tokenizer.ggml.add_space_prefix" },
{ LLM_KV_TOKENIZER_REMOVE_EXTRA_WS, "tokenizer.ggml.remove_extra_whitespaces" },
{ LLM_KV_TOKENIZER_PRECOMPILED_CHARSMAP, "tokenizer.ggml.precompiled_charsmap" },
{ LLM_KV_TOKENIZER_SENTENCEPIECE_MODEL, "tokenizer.ggml.sentencepiece_model" },
{ LLM_KV_TOKENIZER_HF_JSON, "tokenizer.huggingface.json" },
{ LLM_KV_TOKENIZER_RWKV, "tokenizer.rwkv.world" },
{ LLM_KV_TOKENIZER_CHAT_TEMPLATE, "tokenizer.chat_template" },
Expand Down
1 change: 1 addition & 0 deletions src/llama-arch.h
Original file line number Diff line number Diff line change
Expand Up @@ -240,6 +240,7 @@ enum llm_kv {
LLM_KV_TOKENIZER_ADD_PREFIX,
LLM_KV_TOKENIZER_REMOVE_EXTRA_WS,
LLM_KV_TOKENIZER_PRECOMPILED_CHARSMAP,
LLM_KV_TOKENIZER_SENTENCEPIECE_MODEL,
LLM_KV_TOKENIZER_HF_JSON,
LLM_KV_TOKENIZER_RWKV,
LLM_KV_TOKENIZER_CHAT_TEMPLATE,
Expand Down
101 changes: 66 additions & 35 deletions src/llama-chat.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -63,8 +63,6 @@ static const std::map<std::string, llm_chat_template> LLM_CHAT_TEMPLATES = {
{ "megrez", LLM_CHAT_TEMPLATE_MEGREZ },
{ "yandex", LLM_CHAT_TEMPLATE_YANDEX },
{ "bailing", LLM_CHAT_TEMPLATE_BAILING },
{ "bailing-think", LLM_CHAT_TEMPLATE_BAILING_THINK },
{ "bailing2", LLM_CHAT_TEMPLATE_BAILING2 },
{ "llama4", LLM_CHAT_TEMPLATE_LLAMA4 },
{ "smolvlm", LLM_CHAT_TEMPLATE_SMOLVLM },
{ "hunyuan-moe", LLM_CHAT_TEMPLATE_HUNYUAN_MOE },
Expand All @@ -73,6 +71,35 @@ static const std::map<std::string, llm_chat_template> LLM_CHAT_TEMPLATES = {
{ "kimi-k2", LLM_CHAT_TEMPLATE_KIMI_K2 },
{ "seed_oss", LLM_CHAT_TEMPLATE_SEED_OSS },
{ "grok-2", LLM_CHAT_TEMPLATE_GROK_2 },
{ "teuken", LLM_CHAT_TEMPLATE_TEUKEN },
};


static const std::map<std::string,std::string> LLM_TEUKEN_SYSTEM = {
{"BG", "Чат между човек и асистент с изкуствен интелект. Асистентът дава полезни и учтиви отговори на въпросите на човека."},
{"CS", "Chat mezi člověkem a asistentem s umělou inteligencí. Asistent poskytuje vstřícné a zdvořilé odpovědi na otázky člověka."},
{"DA", "En chat mellem et menneske og en assistent med kunstig intelligens, som giver hjælpsomme og høflige svar på menneskets spørgsmål."},
{"DE", "Ein Gespräch zwischen einem Menschen und einem Assistenten mit künstlicher Intelligenz. Der Assistent gibt hilfreiche und höfliche Antworten auf die Fragen des Menschen."},
{"EL", "Μια συνομιλία μεταξύ ενός ανθρώπου και ενός βοηθού τεχνητής νοημοσύνης. Ο βοηθός δίνει χρήσιμες και ευγενικές απαντήσεις στις ερωτήσεις του ανθρώπου."},
{"EN", "A chat between a human and an artificial intelligence assistant.The assistant gives helpful and polite answers to the human's questions."},
{"ES", "Una conversación entre un humano y un asistente de inteligencia artificial. El asistente da respuestas útiles y amables a las preguntas del humano."},
{"ET", "Inimese ja tehisintellekti assistendi vaheline vestlus. Assistent annab inimese küsimustele abivalmis ja viisakaid vastuseid."},
{"FI", "Ihmisen ja tekoälyavustajan välinen keskustelu. Avustaja antaa avuliaita ja kohteliaita vastauksia ihmisen kysymyksiin."},
{"FR", "Conversation entre un humain et un assistant doté d'une intelligence artificielle. L'assistant donne des réponses utiles et polies aux questions de l'homme."},
{"GA", "Comhrá idir duine agus cúntóir hintleachta saorga. Tugann an cúntóir freagraí cabhracha dea-bhéasacha ar cheisteanna an duine."},
{"HR", "Razgovor između čovjeka i pomoćnika umjetne inteligencije. Pomoćnik daje korisne i ljubazne odgovore na ljudska pitanja."},
{"HU", "Egy ember és egy mesterséges intelligencia asszisztens közötti beszélgetés. Az asszisztens segítőkész és udvarias válaszokat ad az ember kérdéseire."},
{"IT", "Una chat tra un umano e un assistente di intelligenza artificiale. L'assistente fornisce risposte utili ed educate alle domande dell'uomo."},
{"LT", "Žmogaus ir dirbtinio intelekto asistento pokalbis. Asistentas naudingai ir mandagiai atsako į žmogaus klausimus."},
{"LV", "Cilvēka un mākslīgā intelekta asistenta tērzēšana. Asistents sniedz noderīgas un pieklājīgas atbildes uz cilvēka jautājumiem."},
{"MT", "Chat bejn bniedem u assistent ta' intelliġenza artifiċjali. L-assistent jagħti tweġibiet ta' għajnuna u edukat għall-mistoqsijiet tal-bniedem."},
{"NL", "Een chat tussen een mens en een assistent met kunstmatige intelligentie. De assistent geeft behulpzame en beleefde antwoorden op de vragen van de mens."},
{"PL", "Czat między człowiekiem a asystentem sztucznej inteligencji. Asystent udziela pomocnych i uprzejmych odpowiedzi na pytania człowieka."},
{"PT", "Uma conversa entre um ser humano e um assistente de inteligência artificial. O assistente dá respostas úteis e educadas às perguntas do utilizador."},
{"RO", "O conversație între un om și un asistent cu inteligență artificială. Asistentul oferă răspunsuri utile și politicoase la întrebările omului."},
{"SK", "Rozhovor medzi človekom a asistentom s umelou inteligenciou. Asistent poskytuje užitočné a zdvorilé odpovede na otázky človeka."},
{"SL", "Pogovor med človekom in pomočnikom z umetno inteligenco. Pomočnik človeku prijazno in vljudno odgovarja na njegova vprašanja."},
{"SV", "En chatt mellan en människa och en assistent med artificiell intelligens. Assistenten ger hjälpsamma och artiga svar på människans frågor."}
};

llm_chat_template llm_chat_template_from_str(const std::string & name) {
Expand Down Expand Up @@ -156,6 +183,8 @@ llm_chat_template llm_chat_detect_template(const std::string & tmpl) {
return LLM_CHAT_TEMPLATE_VICUNA_ORCA;
}
return LLM_CHAT_TEMPLATE_VICUNA;
} else if (tmpl_contains("User: ") && tmpl_contains("Assistant: ") && tmpl_contains("System: ")) {
return LLM_CHAT_TEMPLATE_TEUKEN;
} else if (tmpl_contains("### Instruction:") && tmpl_contains("<|EOT|>")) {
// deepseek-ai/deepseek-coder-33b-instruct
return LLM_CHAT_TEMPLATE_DEEPSEEK;
Expand Down Expand Up @@ -193,10 +222,6 @@ llm_chat_template llm_chat_detect_template(const std::string & tmpl) {
return LLM_CHAT_TEMPLATE_YANDEX;
} else if (tmpl_contains("<role>ASSISTANT</role>") && tmpl_contains("'HUMAN'")) {
return LLM_CHAT_TEMPLATE_BAILING;
} else if (tmpl_contains("<role>ASSISTANT</role>") && tmpl_contains("\"HUMAN\"") && tmpl_contains("<think>")) {
return LLM_CHAT_TEMPLATE_BAILING_THINK;
} else if (tmpl_contains("<role>ASSISTANT</role>") && tmpl_contains("<role>HUMAN</role>") && tmpl_contains("<|role_end|>")) {
return LLM_CHAT_TEMPLATE_BAILING2;
} else if (tmpl_contains("<|header_start|>") && tmpl_contains("<|header_end|>")) {
return LLM_CHAT_TEMPLATE_LLAMA4;
} else if (tmpl_contains("<|endofuserprompt|>")) {
Expand Down Expand Up @@ -430,6 +455,39 @@ int32_t llm_chat_apply_template(
if (add_ass) {
ss << "ASSISTANT:";
}
} else if (tmpl == LLM_CHAT_TEMPLATE_TEUKEN) {
// eachadea/vicuna-13b-1.1 (and Orca variant)
bool isSysOut=false;
for (auto message : chat) {
std::string role(message->role);
if (role == "system") {
const std::string lang=trim( message->content);
if(LLM_TEUKEN_SYSTEM.find(lang)==LLM_TEUKEN_SYSTEM.end())
{
std::string teuken_system=(*(LLM_TEUKEN_SYSTEM.find("EN"))).second;
ss << "System: " << teuken_system << "\n";
}
else
{
std::string teuken_system=(*(LLM_TEUKEN_SYSTEM.find(lang))).second;
ss << "System: " << teuken_system << "\n";
}
isSysOut=true;
} else if (role == "user") {
if (!isSysOut)
{
std::string teuken_system=(*(LLM_TEUKEN_SYSTEM.find("EN"))).second;
ss << "System: " << teuken_system << "\n";
isSysOut=true;
}
ss << "User: " << message->content << "\n";
} else if (role == "assistant") {
ss << "Assistant: " << message->content << "</s>\n";
}
}
if (add_ass) {
ss << "Assistant: ";
}
} else if (tmpl == LLM_CHAT_TEMPLATE_DEEPSEEK) {
// deepseek-ai/deepseek-coder-33b-instruct
for (auto message : chat) {
Expand Down Expand Up @@ -650,8 +708,8 @@ int32_t llm_chat_apply_template(
if (add_ass) {
ss << " Ассистент:[SEP]";
}
} else if (tmpl == LLM_CHAT_TEMPLATE_BAILING || tmpl == LLM_CHAT_TEMPLATE_BAILING_THINK) {
// Bailing (Ling/Ring) template
} else if (tmpl == LLM_CHAT_TEMPLATE_BAILING) {
// Bailing (Ling) template
for (auto message : chat) {
std::string role(message->role);

Expand All @@ -664,33 +722,6 @@ int32_t llm_chat_apply_template(
ss << "<role>" << role << "</role>" << message->content;
}

if (add_ass) {
ss << "<role>ASSISTANT</role>";

if (tmpl == LLM_CHAT_TEMPLATE_BAILING_THINK) {
ss << "<think>";
}
}
} else if (tmpl == LLM_CHAT_TEMPLATE_BAILING2) {
// Bailing2 (Ling 2.0) template
bool has_system = !chat.empty() && std::string(chat[0]->role) == "system";

if (!has_system) {
ss << "<role>SYSTEM</role>detailed thinking off<|role_end|>";
}

for (auto message : chat) {
std::string role(message->role);

if (role == "user") {
role = "HUMAN";
} else {
std::transform(role.begin(), role.end(), role.begin(), ::toupper);
}

ss << "<role>" << role << "</role>" << message->content << "<|role_end|>";
}

if (add_ass) {
ss << "<role>ASSISTANT</role>";
}
Expand Down
3 changes: 1 addition & 2 deletions src/llama-chat.h
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,6 @@ enum llm_chat_template {
LLM_CHAT_TEMPLATE_MEGREZ,
LLM_CHAT_TEMPLATE_YANDEX,
LLM_CHAT_TEMPLATE_BAILING,
LLM_CHAT_TEMPLATE_BAILING_THINK,
LLM_CHAT_TEMPLATE_BAILING2,
LLM_CHAT_TEMPLATE_LLAMA4,
LLM_CHAT_TEMPLATE_SMOLVLM,
LLM_CHAT_TEMPLATE_DOTS1,
Expand All @@ -53,6 +51,7 @@ enum llm_chat_template {
LLM_CHAT_TEMPLATE_KIMI_K2,
LLM_CHAT_TEMPLATE_SEED_OSS,
LLM_CHAT_TEMPLATE_GROK_2,
LLM_CHAT_TEMPLATE_TEUKEN,
LLM_CHAT_TEMPLATE_UNKNOWN,
};

Expand Down
Loading