Releases: 7ZoneSystems/NativeLab
Windows_x86_64_
Native Lab Pro v2 — Windows Release
Native Lab Pro v2 is the first public Windows release of Native Lab Pro — a fully local, privacy-first desktop application for running large language models directly on your machine using llama.cpp.
No API keys.
No cloud.
No telemetry.
Your models and data stay entirely on your system.
🚀 Key Features
Fully Local LLM Chat
Run GGUF models directly on your machine using llama.cpp with a native PyQt6 desktop interface.
Multi-Model Architecture
Load multiple models simultaneously and assign them specialized roles:
- General — main chat model
- Reasoning — architectural reasoning and analysis
- Summarization — document summarization
- Coding — code generation tasks
- Secondary — additional pipeline insight engine
Pipeline Mode
Coding prompts can run through a multi-stage reasoning pipeline:
- Non-coding models produce architectural insights
- The coding model receives those insights as context
- Final structured code is generated
This produces more structured and reliable code output.
Document Reference Engine
Attach documents or source code files to a session and ask questions about them.
Supported reference types:
- PDFs
- Text files
- Source code files
The engine automatically retrieves the most relevant excerpts and injects them into prompts.
Structured Script Parsing
Source code files are parsed to extract:
- imports
- functions
- classes
- constants
- type definitions
The model receives structured context instead of raw text chunks.
Long Document Summarization
Built-in pipeline for summarizing large documents using chunked processing with context carryover.
Features:
- pause / resume long jobs
- automatic state saving
- multi-PDF cross-document summarization
Parallel Model Loading
Multiple models can run simultaneously through separate llama-server instances.
⚠ Each model consumes its full RAM allocation.
Quantization Detection
Automatic detection of GGUF quantization formats including:
- K-Quants (Q2_K → Q6_K)
- imatrix quants (IQ series)
- legacy quants (Q4_0, Q8_0)
- float formats (F16, BF16)
Models are labeled with human-readable quality tiers.
Prompt Template Auto-Detection
Correct prompt templates are automatically selected based on model filename.
Supported families include:
- LLaMA-2 / LLaMA-3
- Mistral / Mixtral
- DeepSeek / DeepSeek-R1
- Phi-3
- Qwen
- Gemma
- Falcon
- Vicuna
- Yi
- Zephyr
- Starling
- CodeLlama
- Orca
- Command-R
Smart Memory Management
A RAM watchdog prevents crashes during large document processing by automatically spilling reference caches to disk when memory pressure is detected.
🖥 System Requirements
Linux (primary supported platform)
Windows(current)
Minimum RAM depends on the model used.
Typical requirements:
| Model | RAM Required |
|---|---|
| 7B Q4 | ~4-5 GB |
| 13B Q5 | ~9-10 GB |
| 70B Q4 | ~38-40 GB |
📦 Dependencies
Python 3.10+
Required:
PyQt6
Optional:
psutil # RAM monitoring
PyPDF2 # PDF loading and summarization
Install with:
pip install PyQt6 psutil PyPDF2
llama.cpp Requirement
Native Lab Pro requires llama.cpp.
Compile or download it and configure the binary paths inside the application.
Default paths used by the application:
LLAMA_CLI = /home/hrirake/llama.cpp/build/bin/llama-cli
LLAMA_SERVER = /home/hrirake/llama.cpp/build/bin/llama-server
You can modify these paths in the source if needed.
Model Directory
The default directory scanned for models is:
localllm
You can also add models manually through the Models tab.
Supported format:
*.gguf
▶ Launching the Application
After extracting the release, start Native Lab Pro using:
run.bat (as administrator)
then
Run from your start menu
or directly run the Python file:
python native_lab_pro_v2.py
💾 Data Storage
Native Lab Pro stores data locally in the application directory:
| Folder | Purpose |
|---|---|
| sessions/ | chat history |
| paused_jobs/ | paused summarization jobs |
| ref_cache/ | reference text cache |
| ref_index/ | reference metadata |
| model_configs.json | per-model settings |
| app_config.json | global configuration |
No data is sent outside your system.
⌨ Keyboard Shortcuts
| Shortcut | Action |
|---|---|
| Ctrl + N | New session |
| Ctrl + Q | Quit |
| Ctrl + B | Toggle sidebar |
| Ctrl + L | Logs tab |
| Ctrl + M | Models tab |
| Enter | Send message |
| Shift + Enter | New line |
⚠ Notes
- This is the first Linux release of Native Lab Pro.
- GPU acceleration depends on your llama.cpp build configuration.
- Running multiple models simultaneously requires significant RAM.
*Place your model gguf file in folder of app /locallm
🔒 Privacy
Native Lab Pro runs entirely offline.
- No telemetry
- No external APIs
- No cloud services
All computation happens locally on your machine.
Linux_x86_64_cpu_only_Ubuntu_ServerConfig_update
Native Lab Pro — Release Notes
v2.1.0 · Server & Binary Configuration
March 2026 · Additive, non-breaking change · 1 new tab · 9 change sets
Overview
v2.1.0 introduces a dedicated 🖥️ Server & Binary Configuration tab, giving users full control over llama-cli and llama-server binary paths without hardcoding. Previously, binary locations were resolved at startup from a fixed directory structure. This update makes those paths user-configurable, adds per-OS detection, persists settings to disk, and plumbs the configuration through every inference code path.
Motivation
The previous approach had several limitations:
- Binaries had to live in a fixed relative path (
./llama/bin/) — no flexibility for system-wide installs or custom builds - Windows, macOS, and Linux users needed to manually edit source to point to the correct executable
- The
llama-serverhost and port range were hardcoded to127.0.0.1:8600–8700with no way to change them - Extra launch flags (e.g.
--numa,--flash-attn,--no-mmap) required source edits - There was no in-app way to verify a binary was present and functional before loading a model
What's New
🖥️ Server & Binary Configuration Tab
A new tab added to the main interface alongside Models, Config, Logs, and Appearance. It contains four sections:
- Binary Paths — browse for
llama-cliandllama-serverexecutables with live ✅/❌ file-exists indicators - Server Settings — configure bind host and port scan range used when starting
llama-serverinstances - Extra Launch Flags — append arbitrary flags to every CLI or server launch without touching source code
- Binary Test — run
--versionagainst either binary and see the output inline to verify it works
ServerConfig Dataclass & Persistence
A new ServerConfig dataclass stores all settings and serialises them to localllm/server_config.json. Settings survive restarts and are loaded at app startup. The class also exposes detected_os, default_cli_name, and default_server_name properties that adapt to Windows, macOS, and Linux automatically.
Dynamic Binary Resolution
A _resolve_binary() helper and _refresh_binary_paths() function update the module-level LLAMA_CLI and LLAMA_SERVER variables whenever settings are saved. All existing inference code paths — CLI workers, server launch, pipeline mode, chunked summary, and multi-PDF — automatically pick up the new paths without any further changes.
Configurable Port Range
_free_port() now reads port_range_lo and port_range_hi from ServerConfig instead of using hardcoded 8600/8700. Defaults remain unchanged, so existing setups are unaffected.
Detailed Change Log
| Location | Description | Type
-- | -- | -- | --
1 | SERVER_CONFIG_FILE | New constant pointing to localllm/server_config.json | ADDED
2 | ServerConfig dataclass | Stores cli_path, server_path, host, port ranges, extra flags. Includes save(), load(), and OS detection. | ADDED
3 | _resolve_binary() | Helper that returns a custom path if set and valid, otherwise the built-in default | ADDED
4 | _refresh_binary_paths() | Updates module-level LLAMA_CLI / LLAMA_SERVER from ServerConfig at runtime | ADDED
5 | LLAMA_CLI / LLAMA_SERVER | Renamed default constants to _LLAMA_CLI_DEFAULT / _LLAMA_SERVER_DEFAULT. Live vars now resolve through _refresh_binary_paths() | CHANGED
6 | _free_port() | Default lo/hi params changed to 0; reads ServerConfig.port_range_lo/hi at call time | CHANGED
7 | LlamaEngine._start_server() | Reads SERVER_CONFIG.host and extra_server_args; appends extra flags to the launch command | CHANGED
8 | LlamaEngine.create_worker() | CLI branch appends SERVER_CONFIG.extra_cli_args to the llama-cli command list | CHANGED
9 | ServerTab (UI class) | New widget — binary browse, server settings, flag editor, test runner. Saved via ServerConfig. | ADDED
10 | MainWindow._build_ui() | Registers ServerTab as new tab between Config and Logs | CHANGED
11 | MainWindow._toggle_theme() | Rebuilds ServerTab on theme switch to pick up updated palette colours | CHANGED
Files Changed
native_lab_pro.py (+~420 lines added, ~30 lines modified)
localllm/server_config.json (new — created on first save)
Migration & Compatibility
This release is fully backwards-compatible. No action required for existing installations:
- If
server_config.jsondoes not exist, all defaults match previous behaviour exactly LLAMA_CLIandLLAMA_SERVERresolve to the same built-in paths as before unless overridden- Port range defaults remain
8600–8700, host defaults remain127.0.0.1 - No database migrations, no session format changes, no breaking API changes
How to Use
- Open the app and navigate to the new 🖥️ Server tab
- Under Binary Paths, click Browse… next to
llama-cliand select your binary - Repeat for
llama-server - Optionally adjust the bind host, port range, and any extra launch flags
- Click Test to verify each binary responds correctly
- Click Save Server Settings — written to
localllm/server_config.json - Reload your model via Models tab or Model > Reload Model
The ✅ / ❌ indicators next to each path field update in real-time as you type or browse, showing whether the file exists before you save.
Known Limitations
- Extra flags are passed as a whitespace-split string — flags with spaces in their values are not supported
- The Test button uses
--version, which not allllama.cppbuilds support; some may exit non-zero but still be functional - Changing the port range does not affect already-running server instances — only new launches pick up the updated range
Upcoming
- Per-role binary overrides (e.g. a separate CUDA build for the coding engine)
- GPU layer configuration (
--n-gpu-layers) exposed in the Server tab - Auto-discovery scan that searches common install paths and populates fields automatically
- Quoted-argument support for extra flag fields
NativeLabPro_Major_featureUpdate_v4
NativeLab Pro — Release Notes
Version 2.5 · Development Session Changelog
Overview
This document covers every feature, improvement, and bug fix applied to nativelab.py during this development session. Changes are grouped by area. Each section describes what was added, how it works, and what files / classes were touched.
1. GPU Acceleration Support
Area: Server Tab · ServerConfig dataclass · ServerTab class
What was added
The Server tab now contains a dedicated GPU Acceleration card that auto-detects available graphics hardware on startup and exposes all GPU launch flags visually — no more hand-editing the Extra Launch Flags box.
How it works
A new utility function _detect_gpus() is called once when the Server tab builds. It probes three backends in order: nvidia-smi for NVIDIA CUDA cards, system_profiler on macOS for Apple Metal GPUs, and vulkaninfo as a Vulkan fallback. Each probe runs in a subprocess with a short timeout so the UI never freezes. The result is a list of dicts carrying device index, name, VRAM in MB, and backend type.
The GPU card renders a backend badge (🟢 CUDA / Metal, 🟡 Vulkan, ⚪ None) followed by a list of all detected GPUs with their VRAM. Four controls appear:
- Enable GPU offloading checkbox — disabled automatically if no GPU was detected.
- GPU layers spin box (
-1to999) — special display text "All (−1)" when set to −1 means offload every layer. - Primary GPU combo box — populated from the detected device list.
- Tensor split line edit — for multi-GPU ratio strings like
0.6,0.4.
On save (_save()), the GPU flags are serialised into ServerConfig and also injected into extra_server_args as --ngl N [--main-gpu N] [--tensor-split X,Y] using a regex strip-then-prepend so existing manually typed flags are preserved. The existing launch code picks them up with zero changes.
New fields in ServerConfig
enable_gpu bool = False
ngl int = -1
main_gpu int = 0
tensor_split str = ""
2. HuggingFace GGUF Model Download Tab
Area: New tab · ModelDownloadTab widget · HfSearchWorker / HfDownloadWorker QThreads
What was added
A new ⬇️ Download tab that lets users search HuggingFace for GGUF model files and download them directly into any local folder, with live progress and cancellation — no browser needed.
How it works
Search flow: The user types a repo ID (e.g. TheBloke/Mistral-7B-GGUF) and clicks Search. A HfSearchWorker QThread calls https://huggingface.co/api/models/{repo}, filters the siblings list to .gguf files only, and emits results_ready(list). Results appear in a QListWidget with colour-coded quantisation badges (Q2 = red through Q8 = green) and a human-readable file size.
Download flow: The user selects a file, picks a destination folder, and clicks Download. A HfDownloadWorker QThread fetches https://huggingface.co/{repo}/resolve/main/{filename} in 256 KB chunks, emitting progress(int) on each chunk. If the user cancels or the download errors, the partial file is deleted. On successful completion MODEL_REGISTRY.add(path) is called so the model appears immediately in all model lists, and a success dialog offers to open the folder.
Only Python standard library (urllib) is used — no new pip dependencies.
3. MCP Server Management Tab
Area: New tab · McpTab widget · MCP_CONFIG_FILE constant
What was added
A new 🔌 MCP tab for managing Model Context Protocol servers. Users can add, start, stop, and remove MCP servers (stdio or SSE transport) and see live log output from each.
How it works
Configuration is stored in ./localllm/mcp_config.json as {"servers": [...]}. Each server entry holds name, transport type (stdio / sse), command or URL, and description.
The tab has three panels: a server list with 🟢/⚪ running indicators, a control row (Start / Stop / Remove), and a log pane with timestamps. Clicking Start on a stdio server launches it via subprocess.Popen(shell=True) and stores the handle in a _procs dict keyed by server name. SSE servers just log their URL since they run externally. Log lines are appended from stdout polling. Stop terminates the process and removes it from _procs.
4. Pipeline Builder — Full Logic Block System
Area: PipelineBlockType · PipelineCanvas · PipelineExecutionWorker · PipelineBuilderTab sidebar
What was added
Seven new Python-evaluated logic blocks that let users build conditional, branching, and transforming pipelines without writing any model calls.
Block types
⑂ IF / ELSE evaluates a Python boolean expression (len(text) > 200, 'error' in text.lower()) against the incoming context. TRUE routes to the E port, FALSE to the W port. Users draw two labelled arrows to set up the branches.
⑃ SWITCH evaluates a Python expression that returns a string key ('long' if len(text) > 300 else 'short'). Each outgoing arrow carries a user-supplied label. Only the arm whose label matches the returned key is followed. A default labelled arm catches unmatched keys.
⊘ FILTER acts as a gate. If the condition is True the text continues unchanged. If False the pipeline terminates cleanly with a [FILTER DROPPED] message in the Output tab — no crash, no silent drop.
⟲ TRANSFORM performs instant deterministic text operations with no model: prefix, suffix, find-and-replace, upper, lower, strip whitespace, or truncate to N characters.
⊕ MERGE collects every context queued for it in the current execution pass (from multiple incoming arrows) and joins them. Modes: concat with separator, prepend, append, or JSON array.
⑁ SPLIT broadcasts the exact same text to every outgoing arrow simultaneously. No configuration needed — just draw multiple outgoing arrows.
⌥ Custom Code opens a full code editor dialog (described below) where the user writes arbitrary Python.
Multi-output fan-out
Before this change every source port was limited to one outgoing arrow (the old code deleted the previous connection on each new draw). Logic blocks are now added to a _LOGIC_BTYPES set that skips that deletion, allowing any number of arrows to fan out from the same port. Duplicate connections (same from_bid + from_port + to_bid) are silently ignored. Normal flow blocks (Input, Model, Output, Intermediate) still enforce single-output-per-port.
Branch label badges on arrows
When an arrow leaves an IF/ELSE or SWITCH block, a branch label is stored as a dynamic attribute conn.branch_label on the PipelineConnection object. The _draw_arrow method reads this attribute and renders a small rounded badge at 35% along the Bezier curve — green for TRUE, red for FALSE, pipeline-colour for other labels.
Custom Code Editor Dialog (_CodeEditorDialog)
A QDialog with a QTextEdit code editor (Consolas 11pt, 28-unit tab stops), a live syntax-check label that updates on every keystroke using compile(), an available-variables reference table, and a 🧪 Test button that runs the code in a sandboxed exec() with sample text and shows the result and log output in a QMessageBox. Saving validates syntax first and refuses to close if there is a syntax error. The block label is automatically set to the first non-comment code line.
The sandbox exposes only safe builtins: len str int float bool list dict tuple range enumerate zip map filter sorted min max sum abs round isinstance hasattr getattr repr type print — no open, os, subprocess, or __import__.
5. Pipeline Builder — LLM Logic Block System
Area: PipelineBlockType · PipelineCanvas · PipelineExecutionWorker · PipelineBuilderTab sidebar · _LlmLogicEditorDialog
What was added
Five new LLM-evaluated logic blocks that are functionally identical to the Python logic blocks above except every condition or instruction is written in plain English and evaluated at runtime by an attached GGUF model over the llama-server HTTP API.
Block types
🧠 LLM IF / ELSE sends the incoming text plus a plain-English condition to the model with a tight system prompt demanding a single word: YES or NO. The parser accepts YES Y TRUE 1 PASS POSITIVE as truthy. Routes to E (YES) or W (NO).
🧠 LLM SWITCH presents the model with the incoming text and the classification task. The valid category names are automatically extracted from the branch labels on the outgoing arrows and included in the prompt. Case-insensitive matching with a substring fallback ensures robustness. A default labelled arm catches unmatched classifications.
🧠 LLM FILTER demands PASS or STOP. On STOP the pipeline ends with a structured message showing the filter name, condition, model decision, and original text — so the user can inspect exactly what was blocked and why.
🧠 LLM TRANSFORM uses a higher default token budget (512), provides a system prompt that demands output-only with no preamble, and automatically strips common model preamble phrases (Here is, Result:, Output:, Transformed:) before the result flows downstream.
🧠 LLM SCORE extracts the first integer 1–10 from the model response using regex (handles prose like "I'd give it a 7"), maps to LOW (1–3) / MID (4–7) / HIGH (8–10) bands routed to E / S / W ports. A score-labelled outgoing arrow receives the raw numeric string instead of the original context.
LLM Logic Editor Dialog (_LlmLogicEditorDialog)
A QDialog tailored per block type. It shows: a colour-coded about-panel describing exactly what the model will receive and return, a branch-routing hint (e.g. "E port = YES / W port = NO"), a model selector combo box populated from MODEL_REGISTRY with a Browse button, and a multi-lin...
NativeLabPro_x86_64_Linux_Major_featureUpdate_v3
Native Lab Pro — Patch Release Notes
All changes applied in this session on top of Native Lab Pro v2.
Patches are listed in the order they were implemented.
🏗️ Pipeline Builder (new feature)
A fully interactive visual pipeline editor added as a dedicated 🔗 Pipeline tab in the main window.
Canvas
PipelineCanvas— customQWidgetwith its ownpaintEvent; renders blocks and connections entirely viaQPainter(no external libs)- Drag-and-drop blocks anywhere on the canvas; position snaps to an 8 px grid
- 8-directional connection ports (N · S · E · W on each block) drawn as dots; drag from one port dot to another to create a connection
- Connections render as smooth cubic Bézier curves with an arrowhead at the target port
- Loop connections detected automatically when a back-edge would form a cycle; rendered in
C["pipeline"]cyan instead of the defaultC["acc2"]purple, with a×Nmultiplier badge - Selecting a connection and pressing Delete removes it; selecting a block removes it along with all its connections
- Double-clicking a block opens a configuration dialog appropriate to its type
Block Types
| Block | Colour | Purpose |
|---|---|---|
| ▶ Input | C["ok"] green |
Entry point — receives the user's prompt |
| ◈ Intermediate | C["warn"] amber |
Passes output of one model into the next; required between two MODEL blocks |
| ■ Output | C["err"] red |
Terminal block — captures and displays the final result |
| 🤖 Model | C["pipeline"] cyan |
Wraps a loaded local model; configurable role, label, and model file |
| 📎 Reference | C["acc"] purple |
Injects a pasted text or loaded file snippet ahead of the context |
| 💡 Knowledge | C["acc2"] lavender |
Prepends a knowledge-base chunk to the context |
| 📄 PDF Summary | C["pipeline"] cyan |
Extracts / summarises a PDF and prepends the result |
Sidebar
- FLOW BLOCKS section — one-click add for Input, Intermediate, and Output blocks
- CONTEXT BLOCKS section — one-click add for Reference, Knowledge, and PDF Summary blocks
- MODELS (dbl-click to add) — live list of all models in
MODEL_REGISTRY; double-click inserts a pre-configured Model block - ↻ Refresh button re-scans the model registry without restarting
- CANVAS CONTROLS — Clear All button wipes blocks and connections
Right Panel
- Server status badge — polls every 2 s; shows green when a llama-server is ready, amber when loading
- ▶ Run Pipeline button — validates the graph then kicks off
PipelineExecutionWorker - ⏹ Stop Execution — aborts the worker mid-stream
- 📋 Log tab — live execution log with timestamps
- ■ Output tab — rendered final output with Markdown formatting
- Per-intermediate ◈ BlockName tabs created dynamically as execution reaches each Intermediate block; each streams tokens live
Execution Engine (PipelineExecutionWorker)
- Runs entirely in a
QThread; never blocks the UI - Server-mode only — calls
ensure_server_or_reload()on the engine before starting; retries up to 3 times with a 6 s delay - Walks the connection graph in topological order; context is carried forward through each block
- Signals:
step_started,step_token,step_done,intermediate_live,pipeline_done,err,log_msg - Loop connections cause the enclosed sub-graph to execute
loop_timestimes, accumulating context on each pass
Validation
- Refuses to run if no INPUT block is present
- Refuses to run if two MODEL blocks are connected directly without an INTERMEDIATE block between them
- Checks for server readiness before dispatch; surfaces errors as dialog boxes rather than silent failures
🔗 Pipeline Save / Load System
Patches A → D
Added full pipeline persistence so canvas states survive restarts.
- Pipelines serialised to
~/.native_lab/pipelines/<name>.json(version 2 format) _pipeline_to_dict/_pipeline_from_dicthelpers handle blocks + connections including all metadata- Block IDs are remapped on load so counter collisions never occur
list_saved_pipelines(),save_pipeline(),load_pipeline()module-level helpers
New sidebar buttons in Pipeline Builder:
| Button | Action |
|---|---|
| 💾 Save Pipeline… | Prompts for name, overwrites if exists |
| 📂 Load Pipeline… | Lists saved pipelines; includes inline delete option |
| 🗑 Delete | Available inside the Load dialog |
🔧 Pipeline Execution Fixes
Patches E → F
- OUTPUT block no longer accumulates all intermediate text — it receives only what was directly piped into it
- Sender block label is captured and included in
pipeline_donepayload as JSON{ "text": …, "sender": … } _on_pipeline_doneinPipelineBuilderTabrenders output with a**Output from: BlockName**header
🔗 Pipeline Button in Main Chat Window
Patches G → H
- Added
🔗 Pipelinebutton next to the Send button inInputBar - Button emits
pipeline_run_requestedsignal (proper PyQt signal chain — avoids direct cross-widget method calls) - Connected in
MainWindow._build_uito_on_pipeline_from_chat() - Selecting a pipeline prompts a dialog, reads current chat input, and runs
PipelineExecutionWorkerwithout leaving the Chat tab - Each pipeline stage renders as its own labelled chat bubble:
⚡ Processing block: Name…— system note bubble◈ Name — intermediate output— amber intermediate bubble■ Output (from: Name)— standard assistant bubble
🎨 New Chat Bubble Roles
Patch I
Two new MessageWidget roles added alongside user / assistant:
| Role | Colour | Label | Use |
|---|---|---|---|
pipeline_intermediate |
C["warn"] amber |
◈ Intermediate | Mid-pipeline block output |
system_note |
C["txt3"] muted |
⚡ System | Stage progress notes |
📜 Pipeline Builder Sidebar — Scrollable
The sidebar previously clipped content when many models were loaded.
- Inner sidebar
QWidgetwrapped in aQScrollArea(214 px wide, accounts for scrollbar) setFixedWidthmoved from the widget to the scroll areasetHorizontalScrollBarPolicy(AlwaysOff)— horizontal scrolling disabledaddStretch()appended before wrapping so buttons stay top-aligned when content is short- All existing block buttons, model list, and canvas controls remain fully functional
✨ Fluid UI Animations
Animated Chat Bubbles
- Every new
MessageWidgetfades in over 220 ms (ease-out cubic) via_fade_in()helper _fade_inusesQGraphicsOpacityEffect+QPropertyAnimationonopacityproperty- Guard clause skips
PipelineCanvasandThinkingBlockto prevent QPainter re-entry conflicts
Empty-State Placeholder
ChatAreanow shows "Hi, message me up when you are ready." when no messages exist- Placeholder is centred, 22 px light-weight text in
C["txt3"] - Fades in at 300 ms on
clear_messages()and hides automatically when the first message arrives
Tab Switch Fade
_FadeOverlay— a siblingQWidgetplaced on top of tab content, never aQGraphicsOpacityEffecton the tab itself (avoids QPainter conflicts withPipelineCanvas.paintEvent)alphaanimated from 220 → 0 over 180 ms on every tab change viapyqtProperty(int)- Overlay covers only the newly-visible tab page geometry; hidden immediately after animation completes
Reference Panel Slide-In
- References panel slides in from the right edge over 240 ms (ease-out cubic) by animating
maximumWidthfrom0 → 260 - Slides out over 200 ms (ease-in cubic);
setVisible(False)fires only after animation completes - No
QGraphicsOpacityEffectinvolved — zero painter conflicts
🌐 API Models Tab
A new 🌐 API Models tab in the main window lets users connect to any cloud or local API endpoint. Once verified, the API engine is treated identically to a loaded local model.
Supported Providers (pre-configured)
| Provider | Format | Notable Models |
|---|---|---|
| OpenAI | OpenAI | GPT-4o, o1, o3-mini |
| Anthropic | Anthropic | claude-opus-4-5, claude-sonnet-4-5, claude-haiku-4-5 |
| Groq | OpenAI-compat | Llama-3.3-70b, Mixtral-8x7b |
| Mistral | OpenAI-compat | mistral-large, codestral |
| Together AI | OpenAI-compat | Llama-3.3-70b-Turbo, Qwen-2.5-72b |
| OpenRouter | OpenAI-compat | GPT-4o, Claude-3.5, Gemini-Pro |
| Ollama | OpenAI-compat | llama3.2, qwen2.5, phi4 |
| Custom | Configurable | Any OpenAI-compatible endpoint |
How it works
- Fill in provider, model, API key, base URL, max tokens
- Click ⚡ Test & Load — sends a 1-token
"hi"message to verify connectivity - On success the engine swaps into the main chat system; status bar updates to
🌐 Provider · model-id - Click 💾 Save Config to persist the connection for future sessions
- Saved configs appear as cards with ▶ Load and 🗑 Delete buttons
Architecture
ApiConfigdataclass — serialisable to/from~/.localllm/api_models.jsonApiRegistry— load/save/add/remove configsApiStreamWorker(QThread)— SSE streaming for both OpenAI-compatible and Anthropic formatsApiEngine— drop-in replacement forLlamaEngine; implementsis_loaded,status_text,create_worker,ensure_server,shutdown_active_engine_for()inMainWindowprioritisesApiEnginewhen loaded- Full structured message history (last 60 turns) passed to API models instead of raw prompt string
🎨 Custom Prompt Format for API Models
Prompt Template Presets
Seven built-in presets selectable from a dropdown:
| Preset | Used by |
|---|---|
| Default | Provider handles formatting (recommended) |
| ChatML | OpenHermes, Qwen, and ChatML-trained models |
| Llama-2 Chat | Meta Llama-2 instruction models |
Linux_x86_64_cpu_only_Ubuntu_hotfix0.1
Fixes has been applied for issues where in reasoning and coding pipelines model may output training or irrelevant data .
Auto server - cli switch issues have been fixed .
Multiple stability fixes have been applied especially in coding and reasoning sectors , and model specific prompt templates.
Linux_x86_64_cpu_only_Ubuntu_hotfix0.2
Native Lab Pro — Changelog
[2.2.0] — 2026
Overview
This release extends the stability work begun in 2.1.0 with two major areas of improvement: a comprehensive PDF summarisation pipeline overhaul, and a deeper second pass on server process safety and role assignment reliability. The summarisation pipeline gains mode selection, live pause/abort controls, and a robust fallback chain for the final consolidation pass. Server management gains a global process registry so every spawned server is tracked and killed precisely on exit or reload. Role assignment gains atomic state tracking to eliminate the remaining race conditions and failure-point edge cases not addressed in 2.1.0.
📄 PDF Summarisation Pipeline
Summary mode selector. A new combo box in the input toolbar lets users choose between three summarisation modes before starting a job. Summary produces a standard structured overview. Logical produces a mechanism/methodology breakdown with numbered steps and sub-bullets, focused on how and why things work. Advisory extracts actionable recommendations framed as a practical brief. The selected mode is embedded in every section prompt and the final consolidation prompt, so the model's output is consistently shaped throughout the entire pipeline rather than only at the final pass.
Live pause/abort banner. As soon as a chunked summarisation job begins, a control bar appears inline in the chat panel showing the current status, a "Pause & Save" button, and an "Abort" button. Previously, the only way to stop a job was from a separate Config tab. The banner updates its status text as chunks complete and is removed automatically when the job finishes, pauses, or errors.
Input blocking during summarisation. The message input field is disabled and given a descriptive placeholder while a summarisation job is active. Attempting to send a message via _on_send while _summarizing_active is True shows a dialog explaining that the job must be paused or aborted first. This prevents the inference queue from being silently overloaded by a second request competing for the same model.
Final consolidation fallback chain. Previously, if the final consolidation pass failed (e.g. due to a context-length overflow on a very long document), the entire job returned an error. The method now attempts the pass on the secondary/reasoning engine first, falls back to the primary engine if that fails, and as a last resort concatenates the section summaries with a clear [Auto-fallback] header rather than losing the work already done.
Error message fix for section failures. The inference error message previously hardcoded "Inference failed on section 4" regardless of which section actually failed. It now interpolates the correct section index.
Resume preserves mode. Paused jobs now save summary_mode to disk alongside the existing state. When a job is resumed from the Config tab, the original mode is restored so the remaining sections and final pass use the same instructions as the sections already completed.
Summary bubble and input bar fully restored on all exit paths. The pause banner removal, _summarizing_active flag reset, and input re-enable are now applied in _on_summary_final, _on_summary_err_or_pause (both the pause path and the error path), and _on_summary_err individually, so no code path can leave the UI in a locked state.
🛡️ Llama Server Process Management
Global server registry. A module-level _SERVER_REGISTRY dictionary maps every port to the PID of the server that was bound to it. The registry is populated in LlamaEngine._start_server() immediately after the server passes its health check, and entries are removed in shutdown() and _kill_server_on_port(). This gives the application a precise, session-scoped picture of every server it owns, independently of psutil process enumeration.
Port cleared before every bind. _start_server() now calls _kill_server_on_port() on the chosen port before attempting to start a new server. A 150 ms settle is observed after the kill. This eliminates the "address already in use" failure that could occur when reloading a model quickly on the same port slot, which previously required a manual app restart to recover from.
_kill_server_on_port() utility. A new top-level helper kills whatever OS process is occupying a given port, using psutil.net_connections when available and falling back to netstat/lsof cross-platform parsing otherwise. It also prunes the port from the registry so subsequent _free_port() scans reflect the true state.
_kill_all_registered_servers() on exit. closeEvent calls this after shutting down all named engine instances. It iterates the registry and kills by both PID and port, ensuring any server that was started but whose engine reference was subsequently lost (e.g. replaced by a failed reload) is still terminated. The registry is cleared on completion.
Startup orphan scan. _scan_and_kill_orphaned_servers() is scheduled via QTimer.singleShot(0, ...) during MainWindow.__init__. It uses psutil.process_iter to find any llama-server processes occupying ports in the 8600–8700 range that are not present in the current session's registry, and terminates them. On systems without psutil the function exits immediately, preserving the no-dependency fallback behaviour.
shutdown() kills by port as well as PID. In addition to the process-tree kill added in 2.1.0, shutdown() now calls _kill_server_on_port() as a belt-and-suspenders step and zeros self.server_port afterward. This covers the edge case where the Popen handle's PID no longer matches the process actually using the port (e.g. after a rapid respawn).
closeEvent stops all workers before killing servers. The summary worker and multi-PDF worker are now explicitly stopped (abort() + wait(1000)) in closeEvent alongside the existing inference and pipeline worker teardown. This prevents a background thread from attempting to POST to a server that has already been killed.
⚡ Multi-Model Role Assignment
_roles_loading concurrency guard. A set instance tracks which roles currently have an active ModelLoaderThread. _start_role_engine_load() returns immediately if the target role is already in the set. The role is added at load start and removed in _on_role_engine_loaded(), including on failure. This is an explicit set-based check that complements the signal-disconnection approach from 2.1.0.
_set_role_buttons_enabled() helper. A new method enables or disables the full strip of model-management buttons (btn_load_role_engine, btn_unload_all, btn_load_primary, btn_browse_model, btn_remove_model, btn_save_cfg) in a single call. It is called with False when any load begins and with True only when _roles_loading becomes empty again. This closes the window between disabling only the load button (2.1.0 behaviour) and disabling the entire management surface, preventing config saves and model removes from racing against an in-progress load.
Engine cleanup on load failure. If _on_role_engine_loaded receives ok=False, it calls shutdown() on the newly created engine and sets the role attribute back to None. Previously, a failed load left a partially-initialised LlamaEngine in place, which could cause the next load attempt to inherit stale port or mode state.
_unload_all_engines() clears loading state. The unload-all handler now calls self._roles_loading.clear() and self._set_role_buttons_enabled(True) before shutting down engines, preventing a scenario where an in-progress load had disabled the buttons and was then overtaken by an unload-all, leaving the UI permanently locked.
Model list selection preserved across refresh. _refresh_model_list() now records the currently selected model path before clearing the list and restores the selection afterward using blockSignals(True/False) to avoid triggering spurious currentItemChanged callbacks. Previously, any operation that saved a config change (and thus called _refresh_model_list()) would silently deselect the model the user was configuring.
Role attribute kept in sync on config save. _save_model_config() now reads the old role from the registry before writing the new config. If the role changed and the model is currently loaded in a live engine, the engine's role attribute is updated in place and _refresh_engine_status() is called. This keeps the engine status list consistent with the registry without requiring a reload.
LlamaEngine carries a role attribute. LlamaEngine.__init__ now accepts an optional role string (default "general"). All engine construction sites pass the appropriate role. This makes role attribution intrinsic to the engine object rather than inferred from which attribute it was stored under, simplifying status display and future logging.
Bug Fixes
Fixed pause banner left visible after summary error. If the summarisation pipeline raised an error before the final pass, the pause banner widget was left in the chat indefinitely. All error-exit paths now call chat_area.remove_pause_banner().
Fixed input field stuck disabled after summary abort. If the user aborted a job via the banner's Abort button, input_bar.input.setEnabled(True) was not always reached. The enable call is now present on every exit path including the __PAUSED__ signal branch.
Fixed _on_summary_err called with stale _summary_bubble. On certain timing paths, _on_summary_err appended an error message to a bubble that had already been nulled out. The write is now guarded with if self._summary_bubble:.
🔧 Internal / Developer Notes
_SERVER_REGISTRYis a plain module-leveldict. It is intentionally not enca...
Linux_x86_64_cpu_only_Ubuntu_UI0.1_Major
NativeLab Pro — UI Theme Changelog
All changes relate to the dual light/dark theme system, appearance consistency, and the live Theme Editor tab introduced during this session.
Phase 1 — Dual Theme Architecture (Initial Implementation)
Added the foundational infrastructure to support runtime theme switching.
- Added
CURRENT_THEME = "light"global variable to track active theme state. - Split the single
Ccolour dictionary intoC_DARK(original palette) andC_LIGHT(new light palette), withCassigned dynamically based onCURRENT_THEME. - Converted the static
QSSstylesheet string into a_build_qss(c: dict)function so the stylesheet can be regenerated from any palette at runtime. - Added a View → Switch to Light/Dark Theme menu item with a dynamic label that updates to reflect the current state.
- Implemented
_toggle_theme()and_update_theme_action_label()methods onMainWindow. - Added theme persistence: the active theme is saved to
app_config.jsonand restored on next launch.
Initial C_LIGHT palette
The first iteration used a warm cream aesthetic: #faf7f2 canvas, sage green #4a7652 accent, and warm brown #1c1810 text.
Phase 2 — Professional Light Palette (Stripe/Linear aesthetic)
Replaced the cream/sage palette with a clinical, high-contrast SaaS-style palette after user feedback that the initial version looked unprofessional.
Key values introduced:
| Token | Value | Purpose |
|---|---|---|
| bg0 | #ffffff | Pure white canvas |
| bg1 | #f7f7f8 | Sidebar/panel |
| acc | #2563eb | Vivid blue accent (Stripe standard) |
| txt | #0d0d10 | Near-black primary text |
| bdr | #e4e4e7 | Barely-there zinc-200 border |
| usr | #eff6ff | Whisper-blue user bubble |
All warm-neutral tones; no cool greys. Accent colour shifted from blue to burnt orange to harmonise with the peach base.
Phase 7 — Live Appearance / Theme Editor Tab
Added a full 🎨 Appearance tab allowing users to edit every colour token of the active theme in real time using colour swatches, hex inputs, and HSL sliders.
New class: AppearanceTab(QWidget)
- Emits
theme_changed = pyqtSignal(dict)whenever any colour is modified. - Colour tokens are grouped into six logical sections: Backgrounds, Text, Accent, Bubbles, Borders, Semantic.
- Each token row contains: a labelled swatch button (opens
QColorDialog), a hexQLineEdit, and three HSLQSliderwidgets (Hue 0–360, Saturation 0–100, Lightness 0–100). - All three controls stay in sync — editing one updates the others.
- Reset button reverts to the current built-in palette for the active theme.
- Save button persists the custom palette to
app_config.json.
Separate persistence per theme
- Light mode saves to
APP_CONFIG["custom_light_palette"]. - Dark mode saves to
APP_CONFIG["custom_dark_palette"]. - Both are loaded and merged at startup independently, so customising one theme does not affect the other.
QSS additions for Appearance tab
Added rules for: #appearance_bar, #appearance_hdr, #appearance_group_hdr, #appearance_row_lbl, #appearance_sl_lbl, QLineEdit#appearance_hex, QSlider#appearance_slider (groove, handle, sub-page), QPushButton#appearance_btn, QPushButton#appearance_btn_acc.
MainWindow wiring
AppearanceTabis instantiated in_build_uiand wired viatheme_changed → _on_appearance_changed._on_appearance_changedupdatesC_LIGHTorC_DARK(whichever is active), rebuildsQSS, and callsself.setStyleSheet(QSS)— changes are visible instantly without restarting._toggle_themecallsappearance_tab.load_palette(...)after switching so the editor always reflects the current theme's colours.- Palette loading at startup was moved into
__init__after_build_uireturns, so it executes afterQApplicationis fully initialised.
setStyleSheet migration
All QApplication.instance().setStyleSheet(QSS) calls were replaced with self.setStyleSheet(QSS) on the QMainWindow instance to avoid NoneType errors during initialisation. Stylesheet inheritance from the top-level window to all child widgets is identical.
End of changelog.
Linux_x86_64_cpu_only_Ubuntu
Native Lab Pro v2 — Linux Release
Native Lab Pro v2 is the first public Linux release of Native Lab Pro — a fully local, privacy-first desktop application for running large language models directly on your machine using llama.cpp.
No API keys.
No cloud.
No telemetry.
Your models and data stay entirely on your system.
🚀 Key Features
Fully Local LLM Chat
Run GGUF models directly on your machine using llama.cpp with a native PyQt6 desktop interface.
Multi-Model Architecture
Load multiple models simultaneously and assign them specialized roles:
- General — main chat model
- Reasoning — architectural reasoning and analysis
- Summarization — document summarization
- Coding — code generation tasks
- Secondary — additional pipeline insight engine
Pipeline Mode
Coding prompts can run through a multi-stage reasoning pipeline:
- Non-coding models produce architectural insights
- The coding model receives those insights as context
- Final structured code is generated
This produces more structured and reliable code output.
Document Reference Engine
Attach documents or source code files to a session and ask questions about them.
Supported reference types:
- PDFs
- Text files
- Source code files
The engine automatically retrieves the most relevant excerpts and injects them into prompts.
Structured Script Parsing
Source code files are parsed to extract:
- imports
- functions
- classes
- constants
- type definitions
The model receives structured context instead of raw text chunks.
Long Document Summarization
Built-in pipeline for summarizing large documents using chunked processing with context carryover.
Features:
- pause / resume long jobs
- automatic state saving
- multi-PDF cross-document summarization
Parallel Model Loading
Multiple models can run simultaneously through separate llama-server instances.
⚠ Each model consumes its full RAM allocation.
Quantization Detection
Automatic detection of GGUF quantization formats including:
- K-Quants (Q2_K → Q6_K)
- imatrix quants (IQ series)
- legacy quants (Q4_0, Q8_0)
- float formats (F16, BF16)
Models are labeled with human-readable quality tiers.
Prompt Template Auto-Detection
Correct prompt templates are automatically selected based on model filename.
Supported families include:
- LLaMA-2 / LLaMA-3
- Mistral / Mixtral
- DeepSeek / DeepSeek-R1
- Phi-3
- Qwen
- Gemma
- Falcon
- Vicuna
- Yi
- Zephyr
- Starling
- CodeLlama
- Orca
- Command-R
Smart Memory Management
A RAM watchdog prevents crashes during large document processing by automatically spilling reference caches to disk when memory pressure is detected.
🖥 System Requirements
Linux (primary supported platform)
Minimum RAM depends on the model used.
Typical requirements:
| Model | RAM Required |
|---|---|
| 7B Q4 | ~4-5 GB |
| 13B Q5 | ~9-10 GB |
| 70B Q4 | ~38-40 GB |
📦 Dependencies
Python 3.10+
Required:
PyQt6
Optional:
psutil # RAM monitoring
PyPDF2 # PDF loading and summarization
Install with:
pip install PyQt6 psutil PyPDF2
llama.cpp Requirement
Native Lab Pro requires llama.cpp.
Compile or download it and configure the binary paths inside the application.
Default paths used by the application:
LLAMA_CLI = /home/hrirake/llama.cpp/build/bin/llama-cli
LLAMA_SERVER = /home/hrirake/llama.cpp/build/bin/llama-server
You can modify these paths in the source if needed.
Model Directory
The default directory scanned for models is:
/home/hrirake/localllm
You can also add models manually through the Models tab.
Supported format:
*.gguf
▶ Launching the Application
After extracting the release, start Native Lab Pro using:
nativelabpro.desktop
or directly run the Python file:
python native_lab_pro_v2.py
💾 Data Storage
Native Lab Pro stores data locally in the application directory:
| Folder | Purpose |
|---|---|
| sessions/ | chat history |
| paused_jobs/ | paused summarization jobs |
| ref_cache/ | reference text cache |
| ref_index/ | reference metadata |
| model_configs.json | per-model settings |
| app_config.json | global configuration |
No data is sent outside your system.
⌨ Keyboard Shortcuts
| Shortcut | Action |
|---|---|
| Ctrl + N | New session |
| Ctrl + Q | Quit |
| Ctrl + B | Toggle sidebar |
| Ctrl + L | Logs tab |
| Ctrl + M | Models tab |
| Enter | Send message |
| Shift + Enter | New line |
⚠ Notes
- This is the first Linux release of Native Lab Pro.
- GPU acceleration depends on your llama.cpp build configuration.
- Running multiple models simultaneously requires significant RAM.
*Place your model gguf file in folder of app /locallm
🔒 Privacy
Native Lab Pro runs entirely offline.
- No telemetry
- No external APIs
- No cloud services
All computation happens locally on your machine.