Gemma 4 Lightweight Installer

Welcome to the Gemma 4 Lightweight Installer repository — a universal tool for running Google DeepMind's most capable open-weight models natively on your device, in a single click.

Project Concept

Gemma 4 represents a massive leap in accessible AI, bridging the gap between server-grade performance and local execution. Built for advanced reasoning and agentic workflows, it delivers an unprecedented level of intelligence-per-parameter.

Why do you need this installer right now?

Zero-Setup: No terminals, Python dependencies, or environment conflicts. Download the .exe — launch the UI.
100% Offline & Private: No cloud APIs or rate limits. Your codebase, prompts, and data never leave your machine.
State-of-the-Art Power: Run the 26B MoE or 31B Dense models directly on your hardware, turning your PC into a local-first AI coding assistant and reasoning engine.

Our installer packages a highly optimized C++ inference engine and a sleek interface so you can focus on building, not configuring.

Core Features & Capabilities of Gemma 4

Gemma 4 moves beyond simple chat to handle complex logic, vision, and agentic tasks. Here is what you get locally:

Configurable Thinking Mode: A built-in reasoning engine that allows the model to "think" step-by-step before answering complex logical or mathematical problems.
Massive 256K Context Window: Feed it entire codebases, massive documents, or long chat histories without losing context.
Extended Multimodality: Natively processes text and images with variable aspect ratio and resolution support, excelling at visual tasks like OCR and chart understanding.
Agentic Workflows & Tool Use: Features native function-calling support to power highly capable autonomous agents and integrate with developer tools.
High-Quality Code Generation: Delivers exceptional performance in offline coding and algorithmic optimization, directly on your workstation.

Technologies Under the Hood

At the core of our local execution is a custom build of llama.cpp. This guarantees:

Maximum performance and resource utilization of modern hardware (NVIDIA CUDA, AMD ROCm, Apple Metal).
Intelligent load distribution with mandatory offloading of heavy model layers to your discrete graphics card.
Support for quantized GGUF formats for extreme compression, allowing frontier models to fit into consumer VRAM.

Two Local Installation Options: Choose Your Engine

We provide two separate installers tailored for different workstation capabilities. Both options run 100% locally.

Option 1: Gemma 4 26B MoE

Exceptionally fast tokens-per-second with advanced reasoning.

Architecture: A 26-billion parameter Mixture-of-Experts (MoE) model that only activates 4 billion parameters during inference.
Best for: High-throughput tasks, fast conversational agents, and rapid code generation.
Hardware Requirements: 16 to 32 GB of system RAM. A discrete graphics card (e.g., NVIDIA RTX 3080/4070 or AMD RX 6800+) with at least 12-16GB VRAM is highly recommended.

Option 2: Gemma 4 31B Dense

Maximum raw reasoning quality and foundational power.

Architecture: A powerful 31-billion parameter dense model.
Best for: Deep logical analysis, complex mathematical reasoning, and heavy multi-step autonomous coding tasks.
Hardware Requirements: 32 GB of system RAM minimum. A high-end discrete graphics card (e.g., NVIDIA RTX 3090/4090 or Mac Studio) with 24GB+ VRAM is strictly mandatory for stable operation.

📦 Quick Start

Go to the Releases section of this repository.
Download the installer that matches your hardware:
- Download Gemma4-26B-x64.exe for a balanced, high-speed MoE experience.
- Download Gemma4-31B-x64.exe for maximum reasoning capabilities.
Run the downloaded .exe file. (Note: The installer will automatically download the required quantized GGUF model weights during setup).
Launch the desktop shortcut and start building with Gemma 4!

❓ FAQ (Frequently Asked Questions)

1. Do I need an internet connection to use these models? Only during the initial installation to download the engine and model weights. Once installed, Gemma 4 runs 100% offline. Your data is completely private.

2. What is the difference between the 26B MoE and 31B Dense versions? The 26B MoE (Mixture of Experts) is optimized for latency; it holds 26 billion parameters in memory but only uses 4 billion for each word it generates, making it incredibly fast. The 31B is a "Dense" model that uses all 31 billion parameters for every word, providing deeper reasoning at the cost of slower generation speeds and higher memory requirements.

3. Is a powerful graphics card (GPU) mandatory? Yes, absolutely. Because we are deploying the heavy 26B and 31B enterprise-grade models, a high-end or upper-mid-range discrete GPU is strictly required to offload the llama.cpp computations to VRAM. Integrated graphics (iGPU) will not work.

4. Can I use vision features (uploading images) in this local installer? Yes! Both the 26B and 31B versions support native multimodal processing. You can upload images into the local chat interface for OCR, analysis, and visual reasoning.

5. How much hard drive space do I need? The quantized GGUF weights for the 26B model require approximately 16-18 GB of storage, while the 31B model requires about 19-21 GB. We highly recommend installing them on a fast NVMe SSD.

📄 License

This installer project is distributed under the MIT License. You are free to use, modify, and distribute the installer software.

Note: The Gemma 4 model architectures, weights, and brand names belong to Google DeepMind and are released under the open Apache 2.0 License. By downloading the models via this installer, you agree to Google's usage terms. See the LICENSE file for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
internal		internal
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
llms.txt		llms.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gemma 4 Lightweight Installer

Project Concept

Core Features & Capabilities of Gemma 4

Technologies Under the Hood

Two Local Installation Options: Choose Your Engine

Option 1: Gemma 4 26B MoE

Option 2: Gemma 4 31B Dense

📦 Quick Start

❓ FAQ (Frequently Asked Questions)

📄 License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

Gemma 4 Lightweight Installer

Project Concept

Core Features & Capabilities of Gemma 4

Technologies Under the Hood

Two Local Installation Options: Choose Your Engine

Option 1: Gemma 4 26B MoE

Option 2: Gemma 4 31B Dense

📦 Quick Start

❓ FAQ (Frequently Asked Questions)

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 1

Languages

Packages