Skip to content

Tool call support added#2

Merged
whyisitworking merged 3 commits intomainfrom
feat/tool-call
Mar 24, 2026
Merged

Tool call support added#2
whyisitworking merged 3 commits intomainfrom
feat/tool-call

Conversation

@whyisitworking
Copy link
Copy Markdown
Owner

No description provided.

* Updated llama.cpp to `cea560f` and enabled `GGML_BACKEND_DL` with `GGML_CPU` variant support.
* Refactored C++ codebase into `engine`, `session`, `parsers`, and `jni` namespaces with corresponding directory reorganization.
* Introduced `TagParser` and `TokenParser` to handle streaming content and multi-byte UTF-8 sequences more robustly.
* Simplified JNI error handling by moving from custom `LlamaException` throwers to standard exception propagation.
* Renamed `ingestPrompt` to `addUserPrompt` and simplified session API by removing explicit `addSpecial` flags in favor of internal logic.
* Added support for the Nemotron prompt format and updated existing formats (ChatML, Llama3, Gemma3) with corrected suffixes.
* Optimized KV cache management and overflow strategies (Halt, Clear History, Rolling Window).
* Adjusted default `repeatPenalty` to 1.0 in `InferenceConfig`.
…inference pipeline and added support for tool calling.

- **Refactored Core API**: Replaced `LlamaSession` and `LlamaChatSession` implementations with a new architecture centered around `ModelDefinition`, `SessionConfig`, and a parts-based `ChatEvent` hierarchy.
- **New Streaming Pipeline**: Introduced a multi-layered stream processor using `AllocationOptimizedScanner` (DFA-based lexer) to efficiently extract text, thinking blocks, and tool calls from raw token streams.
- **Tool Calling Support**: Added `ToolCallDefinition` and `ToolCallDecorator` to support function calling, including automated tool result injection into the conversation context.
- **Enhanced Prompt Formatting**: Migrated to a decorator-based `PromptFormatter` for decoupled handling of system instructions, thinking blocks, and tool definitions across different model families (ChatML, Llama 3, Mistral, Gemma, Nemotron).
- **Native & Performance**: Updated `llama.cpp` and optimized the GGML backend chooser for ARMv8.2+ (DotProd, FP16) and ARMv9 (SVE/SME) using KleidiAI kernels.
- **Improved Testing**: Added extensive unit tests for the lexing scanner, prompt formatting, and end-to-end chat session logic.
- **Dependency Updates**: Integrated Firebase Analytics and Crashlytics into the demo app and updated Kotlin/Serialization dependencies.
…overview

- Added a "Declarative Inference Pipeline" section explaining the reactive token processing engine and DFA lexer.
- Updated the "Quick Start" and code examples to reflect the new `flatMapResource` and `filterSuccess` flow composition patterns.
- Expanded the "Built-In Prompt Formats" table with specific protocols and recommended models (Gemma, Llama 3, ChatML, DeepSeek-R1, etc.).
- Improved documentation for `SessionConfig` and `InferenceConfig`, detailing sampling parameters like `minP` and `repeatPenalty`.
- Added a dedicated section for "Thinking Blocks" to explain reasoning model support (DeepSeek-R1/QwQ).
- Refined the "Architecture" visualization and "Roadmap" items.
- Updated installation instructions and JitPack repository configuration.
@whyisitworking whyisitworking self-assigned this Mar 24, 2026
@whyisitworking whyisitworking merged commit 6ee0a87 into main Mar 24, 2026
1 check passed
@whyisitworking whyisitworking deleted the feat/tool-call branch March 24, 2026 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant