Skip to content

Conversation

@jasielmacedo
Copy link
Owner

@jasielmacedo jasielmacedo commented Nov 6, 2025

Description

This PR implements complete local AI model integration for Browser-LLM using Ollama. Users can now download, manage, and chat with local LLMs directly within the browser, with full support for tool calling, vision models, and context-aware interactions.

Type of Change

  • New feature (non-breaking change that adds functionality)
  • Code quality improvement (refactoring, formatting, etc.)

Changes Made

Core Ollama Integration

  • Ollama Service Management (src/main/services/ollama.ts)
    • Automatic process lifecycle management (start, stop, restart, force kill)
    • Process monitoring with PID, memory, CPU, and uptime tracking
    • Fixed race conditions in health checks with concurrency guards
    • Proper cleanup of orphan processes on app start
    • Support for both text and vision models (Qwen-VL spawns multiple worker processes)
    • Advanced streaming parser with 3 strategies for handling different model formats (especially Qwen's tiny chunks)

AI Tool Calling System

  • Tool Framework (src/shared/tools.ts)
    • search_history - Search browsing history
    • get_bookmarks - Access saved bookmarks
    • analyze_page_content - Extract and analyze webpage content
    • capture_screenshot - Take screenshots for vision models
    • get_page_metadata - Retrieve page metadata
    • web_search - Perform Google searches
    • Automatic tool result handling and error management

Context Management

  • Intelligent Context System (src/shared/contextManager.ts)
    • Dynamic token estimation and budget management
    • Automatic content optimization based on model type and available tokens
    • Support for page content, screenshots, browsing history, and bookmarks
    • Vision model detection and screenshot handling

Model Management UI

  • Enhanced Model Manager (src/renderer/components/Models/)
    • Service status monitoring with expandable process details
    • Download progress tracking with real-time updates
    • Model installation and deletion
    • Expanded model registry with 15+ pre-configured models
    • Service control buttons (restart, stop, force kill)

Chat Interface

  • Advanced Chat System (src/renderer/store/chat.ts, src/renderer/components/Chat/ChatSidebar.tsx)
    • Streaming responses with token-by-token display
    • Planning Mode toggle for agentic tool use
    • Page Context toggle for automatic context injection
    • Tool execution visualization (shows tool calls and results)
    • Chain-of-Thought Display - Collapsible "AI Reasoning Process" section for Qwen models
    • Performance metrics (TTFT, total time)
    • Context info badges (page content, screenshot, history, bookmarks, token count)
    • Cancel generation support

System Prompt Configuration

  • System Prompt Settings (src/renderer/components/Settings/SystemPromptSettings.tsx)
    • Comprehensive base system prompt (463 words)
    • User can ADD custom instructions (not replace base prompt)
    • User information field for personalization
    • Custom instructions for preferences
    • Automatic date/time injection

Browser Integration

  • Navigation Bar Enhancements (src/renderer/components/Browser/NavigationBar.tsx)
    • "Ask AI about this page" quick actions
    • "Explain selected text" context menu
    • "Summarize page" functionality
    • Fixed duplicate key warning in suggestions dropdown

IPC Communication

  • Secure IPC Handlers (src/main/ipc/handlers.ts, src/main/preload.ts)
    • Channel whitelisting for security
    • Handlers for model management, tool execution, settings
    • Support for streaming events: ollama:chatToken, ollama:reasoning, ollama:toolCalls

Process Management Fixes

  • Multi-Process Handling
    • Changed taskkill /PID to taskkill /IM ollama.exe to kill all instances (vision models spawn detached workers)
    • Added process finding by name when PID reference is lost
    • Proper cleanup on app close

Streaming & Parsing

  • Qwen Model Support
    • Strategy 1: Parse complete JSON for small chunks (<500 bytes)
    • Strategy 2: Split on }{ for concatenated JSON
    • Strategy 3: Line-based parsing with newlines
    • Separate handling for thinking field (chain-of-thought reasoning)
    • Stream monitoring with diagnostic logging

State Management

  • Fixed Tool Execution Flow
    • Dynamic message ID tracking for multi-turn tool conversations
    • Proper streaming state reset after tool follow-up requests
    • Token and thinking listeners correctly target current message

UI Improvements

  • Downloads status bar component with progress tracking
  • Model capabilities badges (vision, tool calling)
  • Collapsible thinking display with character count
  • Performance timing indicators
  • Context information badges

Testing

  • Tested locally in development mode
  • Tested production build
  • Manually tested affected features
    • Model download and installation (Llama 3.2, Qwen2.5-VL, DeepSeek, etc.)
    • Chat with streaming responses
    • Tool calling (analyze_page_content, search_history, web_search)
    • Vision model with screenshot analysis
    • Planning Mode with multi-turn tool conversations
    • System prompt customization
    • Service monitoring and control
    • Process cleanup on app close

Checklist

  • My code follows the project's code style (ESLint and Prettier)
  • I have performed a self-review of my code
  • I have commented my code where necessary
  • My changes generate no new warnings or errors
  • I have tested my changes locally

Additional Notes

Key Technical Decisions

  1. Process Management Strategy: Using taskkill /IM instead of /PID ensures all Ollama processes are terminated, including detached workers spawned by vision models.

  2. Streaming Parser: Implemented three parsing strategies to handle different model output formats. Qwen models send tiny chunks (~140 bytes) that don't trigger newline-based parsers, requiring complete JSON parsing.

  3. Thinking Token Separation: Qwen models send internal reasoning in a thinking field before actual content. This is now captured separately and displayed in a collapsible UI section, preventing it from cluttering the main response.

  4. Dynamic Message ID Tracking: Tool execution creates follow-up requests that need separate messages. Listeners now track currentMessageId dynamically to append tokens to the correct message.

  5. System Prompt Architecture: Base prompt is always present with clear instructions about context usage and tool calling. User additions are appended, not replacing the base, preventing users from accidentally breaking the AI.

Performance Metrics

  • Model Support: 15+ models in registry (Llama 3.2, Qwen2.5-VL, DeepSeek R1, Phi-4, etc.)
  • Tool Count: 6 available tools for AI agents
  • Code Changes: 5,847 insertions across 30 files
  • Token Estimation: Automatic context budget management with vision/text model optimization

Known Limitations

  • Ollama service must be bundled with the app or installed separately
  • Vision models require more memory (spawn 3 processes)
  • Streaming performance depends on model size and hardware

Future Enhancements

  • Model performance benchmarking
  • Custom tool creation interface
  • Multi-modal input (audio, video)
  • Model fine-tuning support

@jasielmacedo jasielmacedo merged commit af330c3 into main Nov 6, 2025
1 check passed
@jasielmacedo jasielmacedo deleted the local-model-integration branch November 6, 2025 01:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants