-
Notifications
You must be signed in to change notification settings - Fork 0
System Architecture
Gemma CLI is designed with a Modular Library Architecture. By separating the "Brain" (API logic), the "Body" (System tools), and the "Voice" (UI rendering), the workstation remains stable, fast, and easy to extend.
The workstation is divided into the main entry point and four specialized libraries.
The heartbeat of the system. It manages the main interactive loop, handles user input, parses model responses, and coordinates between all other modules. It is responsible for:
- Global state management (API keys, settings, history).
- Routing
/commandsto their respective logic. - The "Tool Execution Loop" (Parsing XML -> Requesting Permission -> Executing).
Manages all communication with the Google Gemini API.
- Asynchronous Jobs: API calls run in background jobs so the UI (spinner) remains responsive.
- Rate Limiting (RPM): Implements independent quota buckets for Gemma and Gemini backends.
-
Dual-Agent Pipelines: Contains the logic for
bigBrotherandlittleSisterchaining. -
Automatic Retries: Gracefully handles
429 Resource Exhaustederrors with exponential backoff.
Dynamically expands Gemma's brain by discovering tools on disk.
-
Auto-Registration: Scans the
tools/folder and populates the$script:TOOLSregistry. -
Tiered Guidance Engine: Injects either
ToolUseGuidanceMajororToolUseGuidanceMinorinto the system prompt based on the active model's intelligence tier.
Manages the conversation's context window to prevent crashes and high token costs.
-
Smart Trim: Uses semantic embeddings (
gemini-embedding-001) to score history turns. It keeps only the most relevant context for your current query. -
Role Alternation: Ensures the history always follows the strict
User -> Modelpattern required by the API.
Handles the "look and feel" of the workstation.
-
Custom Rendering: Provides the
Draw-BoxandShow-ArrowMenufunctions. - Status Bar: A real-time tracker for token usage, model type, and context pressure.
- Async Spinner: A thread-safe loading indicator that doesn't stutter during API calls.
- User Input: You type a request in the terminal.
-
Prompt Assembly:
GemmaCLI.ps1gathers the history and usesToolLoaderto build the list of available tools. -
API Call:
Api.ps1sends the payload to Google. -
Response Parsing:
GemmaCLI.ps1detects if the model wants to use a tool (via<tool_call>tags). -
Permission: The UI asks you to
Allow/Deny. -
Execution: The tool runs (either in a job or main session for GUI tasks like
clipboard). - Synthesis: The tool result is fed back to the model for a final human-friendly response.
Gemma CLI uses Windows Data Protection API (DPAPI) to store your API key.
- The key is encrypted using your Windows User SID as the primary key.
-
Result: Even if a hacker steals your
apikey.xmlfile, they cannot decrypt it on another machine or under a different user account.
Next Steps: Dive into the Tool Development Guide to start building your own capabilities.