Local model integration #10
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR implements complete local AI model integration for Browser-LLM using Ollama. Users can now download, manage, and chat with local LLMs directly within the browser, with full support for tool calling, vision models, and context-aware interactions.
Type of Change
Changes Made
Core Ollama Integration
AI Tool Calling System
search_history- Search browsing historyget_bookmarks- Access saved bookmarksanalyze_page_content- Extract and analyze webpage contentcapture_screenshot- Take screenshots for vision modelsget_page_metadata- Retrieve page metadataweb_search- Perform Google searchesContext Management
Model Management UI
Chat Interface
System Prompt Configuration
Browser Integration
IPC Communication
ollama:chatToken,ollama:reasoning,ollama:toolCallsProcess Management Fixes
taskkill /PIDtotaskkill /IM ollama.exeto kill all instances (vision models spawn detached workers)Streaming & Parsing
}{for concatenated JSONthinkingfield (chain-of-thought reasoning)State Management
UI Improvements
Testing
Checklist
Additional Notes
Key Technical Decisions
Process Management Strategy: Using
taskkill /IMinstead of/PIDensures all Ollama processes are terminated, including detached workers spawned by vision models.Streaming Parser: Implemented three parsing strategies to handle different model output formats. Qwen models send tiny chunks (~140 bytes) that don't trigger newline-based parsers, requiring complete JSON parsing.
Thinking Token Separation: Qwen models send internal reasoning in a
thinkingfield before actual content. This is now captured separately and displayed in a collapsible UI section, preventing it from cluttering the main response.Dynamic Message ID Tracking: Tool execution creates follow-up requests that need separate messages. Listeners now track
currentMessageIddynamically to append tokens to the correct message.System Prompt Architecture: Base prompt is always present with clear instructions about context usage and tool calling. User additions are appended, not replacing the base, preventing users from accidentally breaking the AI.
Performance Metrics
Known Limitations
Future Enhancements