Tool call support added by whyisitworking · Pull Request #2 · whyisitworking/llama-bro

whyisitworking · 2026-03-24T15:13:53Z

No description provided.

* Updated llama.cpp to `cea560f` and enabled `GGML_BACKEND_DL` with `GGML_CPU` variant support. * Refactored C++ codebase into `engine`, `session`, `parsers`, and `jni` namespaces with corresponding directory reorganization. * Introduced `TagParser` and `TokenParser` to handle streaming content and multi-byte UTF-8 sequences more robustly. * Simplified JNI error handling by moving from custom `LlamaException` throwers to standard exception propagation. * Renamed `ingestPrompt` to `addUserPrompt` and simplified session API by removing explicit `addSpecial` flags in favor of internal logic. * Added support for the Nemotron prompt format and updated existing formats (ChatML, Llama3, Gemma3) with corrected suffixes. * Optimized KV cache management and overflow strategies (Halt, Clear History, Rolling Window). * Adjusted default `repeatPenalty` to 1.0 in `InferenceConfig`.

…inference pipeline and added support for tool calling. - **Refactored Core API**: Replaced `LlamaSession` and `LlamaChatSession` implementations with a new architecture centered around `ModelDefinition`, `SessionConfig`, and a parts-based `ChatEvent` hierarchy. - **New Streaming Pipeline**: Introduced a multi-layered stream processor using `AllocationOptimizedScanner` (DFA-based lexer) to efficiently extract text, thinking blocks, and tool calls from raw token streams. - **Tool Calling Support**: Added `ToolCallDefinition` and `ToolCallDecorator` to support function calling, including automated tool result injection into the conversation context. - **Enhanced Prompt Formatting**: Migrated to a decorator-based `PromptFormatter` for decoupled handling of system instructions, thinking blocks, and tool definitions across different model families (ChatML, Llama 3, Mistral, Gemma, Nemotron). - **Native & Performance**: Updated `llama.cpp` and optimized the GGML backend chooser for ARMv8.2+ (DotProd, FP16) and ARMv9 (SVE/SME) using KleidiAI kernels. - **Improved Testing**: Added extensive unit tests for the lexing scanner, prompt formatting, and end-to-end chat session logic. - **Dependency Updates**: Integrated Firebase Analytics and Crashlytics into the demo app and updated Kotlin/Serialization dependencies.

…overview - Added a "Declarative Inference Pipeline" section explaining the reactive token processing engine and DFA lexer. - Updated the "Quick Start" and code examples to reflect the new `flatMapResource` and `filterSuccess` flow composition patterns. - Expanded the "Built-In Prompt Formats" table with specific protocols and recommended models (Gemma, Llama 3, ChatML, DeepSeek-R1, etc.). - Improved documentation for `SessionConfig` and `InferenceConfig`, detailing sampling parameters like `minP` and `repeatPenalty`. - Added a dedicated section for "Thinking Blocks" to explain reasoning model support (DeepSeek-R1/QwQ). - Refined the "Architecture" visualization and "Roadmap" items. - Updated installation instructions and JitPack repository configuration.

whyisitworking added 3 commits March 21, 2026 19:49

whyisitworking self-assigned this Mar 24, 2026

whyisitworking merged commit 6ee0a87 into main Mar 24, 2026
1 check passed

whyisitworking deleted the feat/tool-call branch March 24, 2026 15:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tool call support added#2

Tool call support added#2
whyisitworking merged 3 commits intomainfrom
feat/tool-call

whyisitworking commented Mar 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

whyisitworking commented Mar 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant