Merged
Conversation
* Updated llama.cpp to `cea560f` and enabled `GGML_BACKEND_DL` with `GGML_CPU` variant support. * Refactored C++ codebase into `engine`, `session`, `parsers`, and `jni` namespaces with corresponding directory reorganization. * Introduced `TagParser` and `TokenParser` to handle streaming content and multi-byte UTF-8 sequences more robustly. * Simplified JNI error handling by moving from custom `LlamaException` throwers to standard exception propagation. * Renamed `ingestPrompt` to `addUserPrompt` and simplified session API by removing explicit `addSpecial` flags in favor of internal logic. * Added support for the Nemotron prompt format and updated existing formats (ChatML, Llama3, Gemma3) with corrected suffixes. * Optimized KV cache management and overflow strategies (Halt, Clear History, Rolling Window). * Adjusted default `repeatPenalty` to 1.0 in `InferenceConfig`.
…inference pipeline and added support for tool calling. - **Refactored Core API**: Replaced `LlamaSession` and `LlamaChatSession` implementations with a new architecture centered around `ModelDefinition`, `SessionConfig`, and a parts-based `ChatEvent` hierarchy. - **New Streaming Pipeline**: Introduced a multi-layered stream processor using `AllocationOptimizedScanner` (DFA-based lexer) to efficiently extract text, thinking blocks, and tool calls from raw token streams. - **Tool Calling Support**: Added `ToolCallDefinition` and `ToolCallDecorator` to support function calling, including automated tool result injection into the conversation context. - **Enhanced Prompt Formatting**: Migrated to a decorator-based `PromptFormatter` for decoupled handling of system instructions, thinking blocks, and tool definitions across different model families (ChatML, Llama 3, Mistral, Gemma, Nemotron). - **Native & Performance**: Updated `llama.cpp` and optimized the GGML backend chooser for ARMv8.2+ (DotProd, FP16) and ARMv9 (SVE/SME) using KleidiAI kernels. - **Improved Testing**: Added extensive unit tests for the lexing scanner, prompt formatting, and end-to-end chat session logic. - **Dependency Updates**: Integrated Firebase Analytics and Crashlytics into the demo app and updated Kotlin/Serialization dependencies.
…overview - Added a "Declarative Inference Pipeline" section explaining the reactive token processing engine and DFA lexer. - Updated the "Quick Start" and code examples to reflect the new `flatMapResource` and `filterSuccess` flow composition patterns. - Expanded the "Built-In Prompt Formats" table with specific protocols and recommended models (Gemma, Llama 3, ChatML, DeepSeek-R1, etc.). - Improved documentation for `SessionConfig` and `InferenceConfig`, detailing sampling parameters like `minP` and `repeatPenalty`. - Added a dedicated section for "Thinking Blocks" to explain reasoning model support (DeepSeek-R1/QwQ). - Refined the "Architecture" visualization and "Roadmap" items. - Updated installation instructions and JitPack repository configuration.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.