Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
d91fd7c
fixed issue: crashed on deleting chat
rifkybujana Jan 15, 2025
e5975a2
- Added persistence KV Cache method\
rifkybujana Jan 22, 2025
4728dbf
added qwen 2.5 code 0.5b - 14b, and added qwen 2.5 14b models
rifkybujana Jan 22, 2025
6a78b30
Merge branch 'main' into dev
rifkybujana Jan 22, 2025
74a586f
merge with main
rifkybujana Jan 22, 2025
8daed41
fix model won't generate if we switch the model on a non empty chat
rifkybujana Jan 22, 2025
ba9446c
added deepseek r1
rifkybujana Jan 23, 2025
535bd81
removed deepseek unsupported
rifkybujana Jan 23, 2025
812cd45
Revert "added deepseek r0"
rifkybujana Jan 23, 2025
ccfc310
Revert "removed deepseek unsupported"
rifkybujana Jan 23, 2025
69268ad
added deepseek r1 support
rifkybujana Jan 23, 2025
517bd1f
added base markdown rendering
rifkybujana Jan 26, 2025
4135ddf
added ImGuiColorTextEdit for handling code rendering on markdown
rifkybujana Jan 26, 2025
847c726
added markdown renderer
rifkybujana Jan 28, 2025
e9e827a
added modified imgui_md
rifkybujana Jan 28, 2025
66c18f2
added cancel button and fix the model card duplication issue
rifkybujana Feb 1, 2025
e8bde60
fix last selected model issue
rifkybujana Feb 1, 2025
60b686e
added tps
rifkybujana Feb 1, 2025
c1ba124
don't show empty thought
rifkybujana Feb 1, 2025
899a8c2
add automation to detect number of thread to use
rifkybujana Feb 1, 2025
67c2a3a
add fallback to failed model loading
rifkybujana Feb 1, 2025
c813e6b
Merge branch 'main' into dev
rifkybujana Feb 1, 2025
42fb04a
fix model loaded syncronization on start and code rendering block glitch
rifkybujana Feb 1, 2025
d73ab5e
remove text debugging
rifkybujana Feb 1, 2025
dbfb365
refactored history sidebar
rifkybujana Feb 5, 2025
65f45b2
refactored preset sidebar
rifkybujana Feb 5, 2025
5d53b90
fix preset selection
rifkybujana Feb 7, 2025
0ea6802
re-refactor chat history sidebar
rifkybujana Feb 7, 2025
c00e99f
refactor model manager
rifkybujana Feb 7, 2025
83c12cd
rename constant namespace
rifkybujana Feb 7, 2025
4a25059
refactor chat history render
rifkybujana Feb 7, 2025
c00d90a
refactor chat_section
rifkybujana Feb 7, 2025
b58f1db
refactor main code and fixed preset selection duplication bug
rifkybujana Feb 8, 2025
90ebc27
fixed kv cache deletion and renaming
rifkybujana Feb 8, 2025
090a986
moved tab manager to ui/tab_manager.hpp
rifkybujana Feb 8, 2025
36af3a1
added stop generation button
rifkybujana Feb 10, 2025
ea59aaa
refactor input field rendering
rifkybujana Feb 10, 2025
73d2e71
added regenerate button
rifkybujana Feb 10, 2025
141aaf6
added regenerate functionality
rifkybujana Feb 10, 2025
88f0b90
stop all jobs on exit
rifkybujana Feb 10, 2025
8a20091
track job ids within model manager
rifkybujana Feb 10, 2025
d564491
restyle loading bar
rifkybujana Feb 10, 2025
9a5bfa8
[workaround] fixed the delete model button didn't work
rifkybujana Feb 10, 2025
19d35c1
handle model loading asynchronously
rifkybujana Feb 11, 2025
d1b6d30
refactored progress bar widget
rifkybujana Feb 11, 2025
8d31c98
fixed can't find IndeterminedProgressBar error, and progress bar pos…
rifkybujana Feb 11, 2025
abeda36
add unload model functionality
rifkybujana Feb 12, 2025
c1d4967
fixed chat code block ui glitch
rifkybujana Feb 12, 2025
88f38f5
fixed bug model trying to regenerate even if no model loaded, and add…
rifkybujana Feb 12, 2025
7c9c368
added context shifting on the engine
rifkybujana Feb 13, 2025
67bcfc7
if max_new_token set to be 0, don't stop until eos (infinitely generate)
rifkybujana Feb 13, 2025
8df4d2c
update engine
rifkybujana Feb 13, 2025
626967e
update the kv cache loading for context shifting
rifkybujana Feb 13, 2025
ac70696
Merge branch 'main' into dev
rifkybujana Feb 13, 2025
c09aa85
fix delete chat doesn't have to pass model name and variant name
rifkybujana Feb 13, 2025
9b6f5f0
Merge branch 'dev' of https://github.com/genta-technology/kolosal int…
rifkybujana Feb 13, 2025
bdd81f3
fixed merge with main branch
rifkybujana Feb 13, 2025
9139b34
Merge branch 'main' into dev
rifkybujana Feb 13, 2025
5eb446d
refactor model generation callback
rifkybujana Mar 2, 2025
959fcd1
added kolosal server library
rifkybujana Mar 3, 2025
6456ce7
model server tab
rifkybujana Mar 5, 2025
c49bb59
added server logs, model selection, tab selection buttons, and loadin…
rifkybujana Mar 6, 2025
f975644
added server functionality
rifkybujana Mar 7, 2025
96ce0d4
added reload model buttons
rifkybujana Mar 7, 2025
0b6eb1c
update installer version
rifkybujana Mar 7, 2025
806d6dd
fixed kv cache sequence id
rifkybujana Mar 8, 2025
8e5d4c4
merge with main
rifkybujana Mar 8, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file modified external/genta-personal/bin/InferenceEngineLib.dll
Binary file not shown.
Binary file modified external/genta-personal/bin/InferenceEngineLibVulkan.dll
Binary file not shown.
2 changes: 2 additions & 0 deletions external/genta-personal/include/job.h
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@ struct Job {
std::atomic<bool> cancelRequested{ false };
CompletionParameters params;

int seqId;

bool isDecodingPrompt = true;

int n_past;
Expand Down
2 changes: 2 additions & 0 deletions external/genta-personal/include/types.h
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ struct CompletionParameters
float topP = 0.5f;
bool streaming = false;
std::string kvCacheFilePath = "";
int seqId = -1;

bool isValid() const;
};
Expand All @@ -43,6 +44,7 @@ struct ChatCompletionParameters
float topP = 0.5f;
bool streaming = false;
std::string kvCacheFilePath = "";
int seqId = -1;

bool isValid() const;
};
Expand Down
Binary file modified external/genta-personal/lib/InferenceEngineLib.lib
Binary file not shown.
Binary file modified external/genta-personal/lib/InferenceEngineLibVulkan.lib
Binary file not shown.
4 changes: 4 additions & 0 deletions include/chat/chat_manager.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -94,24 +94,28 @@ namespace Chat
return std::async(std::launch::async, [this, newName]() {
if (!validateChatName(newName))
{
std::cerr << "[ChatManager] [ERROR] " << newName << " is not valid" << std::endl;
return false;
}

std::unique_lock<std::shared_mutex> lock(m_mutex);

if (!m_currentChatName)
{
std::cerr << "[ChatManager] No current chat selected.\n";
return false;
}

if (m_chatNameToIndex.find(newName) != m_chatNameToIndex.end())
{
std::cerr << "[ChatManager] Chat with name " << newName << " already exists.\n";
return false;
}

size_t currentIdx = m_currentChatIndex;
if (currentIdx >= m_chats.size())
{
std::cerr << "[ChatManager] Invalid chat index: " << currentIdx << std::endl;
return false;
}

Expand Down
69 changes: 39 additions & 30 deletions include/model/model_manager.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -321,6 +321,7 @@ namespace Model
);
if (kvCachePathOpt.has_value()) {
completionParams.kvCacheFilePath = kvCachePathOpt.value().string();
completionParams.seqId = currentChat.id;
}

return completionParams;
Expand Down Expand Up @@ -363,6 +364,7 @@ namespace Model
);
if (kvCachePathOpt.has_value()) {
completionParams.kvCacheFilePath = kvCachePathOpt.value().string();
completionParams.seqId = currentChat.id;
}

return completionParams;
Expand Down Expand Up @@ -438,7 +440,7 @@ namespace Model
return result;
}

CompletionResult chatCompleteSync(const ChatCompletionParameters& params)
CompletionResult chatCompleteSync(const ChatCompletionParameters& params, const bool saveChat = true)
{
{
std::shared_lock<std::shared_mutex> lock(m_mutex);
Expand Down Expand Up @@ -475,8 +477,6 @@ namespace Model
m_jobIds.push_back(jobId);
}

auto& chatManager = Chat::ChatManager::getInstance();

// Wait for the job to complete
m_inferenceEngine->waitForJob(jobId);

Expand All @@ -496,22 +496,26 @@ namespace Model
}

// Save the chat history
auto chatName = chatManager.getChatNameByJobId(jobId);
if (!chatManager.saveChat(chatName))
if (saveChat)
{
std::cerr << "[ModelManager] Failed to save chat: " << chatName << std::endl;
}
auto& chatManager = Chat::ChatManager::getInstance();
auto chatName = chatManager.getChatNameByJobId(jobId);
if (!chatManager.saveChat(chatName))
{
std::cerr << "[ModelManager] Failed to save chat: " << chatName << std::endl;
}

// Reset jobid tracking on chat manager
if (!chatManager.removeJobId(jobId))
{
std::cerr << "[ModelManager] Failed to remove job id from chat manager.\n";
// Reset jobid tracking on chat manager
if (!chatManager.removeJobId(jobId))
{
std::cerr << "[ModelManager] Failed to remove job id from chat manager.\n";
}
}

return result;
}

int startCompletionJob(const CompletionParameters& params, std::function<void(const std::string&, const float, const int, const bool)> streamingCallback)
int startCompletionJob(const CompletionParameters& params, std::function<void(const std::string&, const float, const int, const bool)> streamingCallback, const bool saveChat = true)
{
{
std::shared_lock<std::shared_mutex> lock(m_mutex);
Expand Down Expand Up @@ -539,7 +543,7 @@ namespace Model
m_jobIds.push_back(jobId);
}

std::thread([this, jobId, streamingCallback]() {
std::thread([this, jobId, streamingCallback, saveChat]() {
// Poll while job is running or until the engine says it's done
while (true)
{
Expand Down Expand Up @@ -569,17 +573,20 @@ namespace Model

// Reset jobid tracking on chat manager
{
if (!Chat::ChatManager::getInstance().removeJobId(jobId))
if (saveChat)
{
std::cerr << "[ModelManager] Failed to remove job id from chat manager.\n";
if (!Chat::ChatManager::getInstance().removeJobId(jobId))
{
std::cerr << "[ModelManager] Failed to remove job id from chat manager.\n";
}
}
}
}).detach();

return jobId;
}

int startChatCompletionJob(const ChatCompletionParameters& params, std::function<void(const std::string&, const float, const int, const bool)> streamingCallback)
int startChatCompletionJob(const ChatCompletionParameters& params, std::function<void(const std::string&, const float, const int, const bool)> streamingCallback, const bool saveChat = true)
{
{
std::shared_lock<std::shared_mutex> lock(m_mutex);
Expand Down Expand Up @@ -607,10 +614,7 @@ namespace Model
m_jobIds.push_back(jobId);
}

std::thread([this, jobId, streamingCallback]() {
// Poll while job is running or until the engine says it's done
auto& chatManager = Chat::ChatManager::getInstance();

std::thread([this, jobId, streamingCallback, saveChat]() {
while (true)
{
if (this->m_inferenceEngine->hasJobError(jobId)) break;
Expand All @@ -637,20 +641,25 @@ namespace Model
m_jobIds.erase(std::remove(m_jobIds.begin(), m_jobIds.end(), jobId), m_jobIds.end());
}

// Save the chat history
if (saveChat)
{
auto chatName = chatManager.getChatNameByJobId(jobId);
if (!chatManager.saveChat(chatName))
auto& chatManager = Chat::ChatManager::getInstance();

// Save the chat history
{
std::cerr << "[ModelManager] Failed to save chat: " << chatName << std::endl;
auto chatName = chatManager.getChatNameByJobId(jobId);
if (!chatManager.saveChat(chatName))
{
std::cerr << "[ModelManager] Failed to save chat: " << chatName << std::endl;
}
}
}

// Reset jobid tracking on chat manager
{
if (!chatManager.removeJobId(jobId))
// Reset jobid tracking on chat manager
{
std::cerr << "[ModelManager] Failed to remove job id from chat manager.\n";
if (!chatManager.removeJobId(jobId))
{
std::cerr << "[ModelManager] Failed to remove job id from chat manager.\n";
}
}
}
}).detach();
Expand Down Expand Up @@ -753,7 +762,7 @@ namespace Model
params.streaming = false;

// Invoke the synchronous chat completion method.
CompletionResult result = chatCompleteSync(params);
CompletionResult result = chatCompleteSync(params, false);

// Map the engine’s result to our ChatCompletionResponse.
ChatCompletionResponse response = convertToChatResponse(request, result);
Expand Down
75 changes: 75 additions & 0 deletions include/ui/chat/chat_window.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -279,6 +279,73 @@ class ChatWindow {
}
}

void generateChatTitle(const std::string& firstUserMessage) {
auto& modelManager = Model::ModelManager::getInstance();
auto& chatManager = Chat::ChatManager::getInstance();

// Create parameters for title generation
ChatCompletionParameters titleParams;

// Add a system prompt instructing the model to generate a short, descriptive title
const std::string titlePrompt = firstUserMessage +
"\n-----\n"
"Ignore all previous instructions. The preceding text is a conversation thread that needs a concise but descriptive 3 to 5 word title in natural English so that readers will be able to easily find it again. Do not add any quotation marks, formatting, or any symbol to the title. Respond only with the title text.";

// Add the title prompt as a user message
titleParams.messages.push_back({ "user", titlePrompt });

// Configure title generation parameters
titleParams.maxNewTokens = 20; // Short title only needs few tokens
titleParams.temperature = 0.7; // Slightly creative but not too random
titleParams.streaming = false; // No need for streaming for a quick title

// Use a separate thread to avoid blocking UI
std::thread([titleParams]() {
auto& modelManager = Model::ModelManager::getInstance();
auto& chatManager = Chat::ChatManager::getInstance();

// Generate the title (synchronous call)
CompletionResult titleResult = modelManager.chatCompleteSync(titleParams, false);

if (!titleResult.text.empty()) {
// Clean up the generated title
std::string newTitle = titleResult.text;

// Trim whitespace and quotes
// Remove symbols and trim whitespace, and if the title contain text "Title:", remove it
auto trim = [](std::string& s) {
// Remove "Title:" if present
const std::string titlePrefix = "Title:";
size_t pos = s.find(titlePrefix);
if (pos != std::string::npos) {
s.erase(pos, titlePrefix.length());
}

// Remove symbols except '+' and '-'
s.erase(std::remove_if(s.begin(), s.end(), [](char c) {
return std::ispunct(static_cast<unsigned char>(c)) && c != '+' && c != '-';
}), s.end());

// Trim whitespace
s.erase(0, s.find_first_not_of(" \t\n\r"));
if (!s.empty()) {
s.erase(s.find_last_not_of(" \t\n\r") + 1);
}
};

trim(newTitle);

// Apply the new title if it's valid
if (!newTitle.empty()) {
if (!chatManager.renameCurrentChat(newTitle).get())
{
std::cerr << "[ChatSection] Failed to rename chat to: " << newTitle << "\n";
}
}
}
}).detach();
}

// Render the row of buttons that allow the user to switch models or clear chat.
void renderChatFeatureButtons(float baseX, float baseY) {
Model::ModelManager& modelManager = Model::ModelManager::getInstance();
Expand Down Expand Up @@ -321,6 +388,9 @@ class ChatWindow {

auto& currentChat = currentChatOpt.value();

// Check if this is the first message in the chat
bool isFirstMessage = currentChat.messages.empty();

// Append the user message.
Chat::Message userMessage;
userMessage.id = static_cast<int>(currentChat.messages.size()) + 1;
Expand All @@ -339,6 +409,11 @@ class ChatWindow {
}

modelManager.setModelGenerationInProgress(true);

// If this is the first message, generate a title for the chat
if (isFirstMessage) {
generateChatTitle(message);
}
}

InputFieldConfig createInputFieldConfig(
Expand Down
28 changes: 28 additions & 0 deletions models/phi-4-14b.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
{
"name": "Phi 4 14B",
"author": "Microsoft",
"fullPrecision": {
"type": "Full Precision",
"path": "models/phi-4-14b/fp16/phi-4-F16.gguf",
"downloadLink": "https://huggingface.co/kolosal/phi-4/resolve/main/phi-4-F16.gguf",
"isDownloaded": false,
"downloadProgress": 0.0,
"lastSelected": 0
},
"quantized8Bit": {
"type": "8-bit Quantized",
"path": "models/phi-4-14b/int8/phi-4-Q8_0.gguf",
"downloadLink": "https://huggingface.co/kolosal/phi-4/resolve/main/phi-4-Q8_0.gguf",
"isDownloaded": false,
"downloadProgress": 0.0,
"lastSelected": 0
},
"quantized4Bit": {
"type": "4-bit Quantized",
"path": "models/phi-4-14b/int4/phi-4-Q4_K_M.gguf",
"downloadLink": "https://huggingface.co/kolosal/phi-4/resolve/main/phi-4-Q4_K_M.gguf",
"isDownloaded": false,
"downloadProgress": 0.0,
"lastSelected": 0
}
}
28 changes: 28 additions & 0 deletions models/phi-4-mini-3.8b.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
{
"name": "Phi 4 Mini 3.8B",
"author": "Microsoft",
"fullPrecision": {
"type": "Full Precision",
"path": "models/phi-4-mini-3.8b/fp16/Phi-4-mini-instruct.BF16.gguf",
"downloadLink": "https://huggingface.co/kolosal/phi-4-mini/resolve/main/Phi-4-mini-instruct.BF16.gguf",
"isDownloaded": false,
"downloadProgress": 0.0,
"lastSelected": 0
},
"quantized8Bit": {
"type": "8-bit Quantized",
"path": "models/phi-4-mini-3.8b/int8/Phi-4-mini-instruct.Q8_0.gguf",
"downloadLink": "https://huggingface.co/kolosal/phi-4-mini/resolve/main/Phi-4-mini-instruct.Q8_0.gguf",
"isDownloaded": false,
"downloadProgress": 0.0,
"lastSelected": 0
},
"quantized4Bit": {
"type": "4-bit Quantized",
"path": "models/phi-4-mini-3.8b/int4/Phi-4-mini-instruct-Q4_K_M.gguf",
"downloadLink": "https://huggingface.co/kolosal/phi-4-mini/resolve/main/Phi-4-mini-instruct-Q4_K_M.gguf",
"isDownloaded": false,
"downloadProgress": 0.0,
"lastSelected": 0
}
}
Loading
Loading