gemma4-wdc

Local multi-agent execution with middleware-level deduplication.

gemma4-wdc is a laptop-first prototype of a local, offline multi-agent system where Gemma-family agents emit overlapping tool tasks and a WDC-style middleware runtime collapses duplicate backend work before execution.

The project explores a simple systems thesis: as local agentic workflows become practical, one of the next bottlenecks is duplicated downstream work across concurrent agents. gemma4-wdc demonstrates how shared SQL, API, document, and code-analysis tasks can be detected, collapsed into shared execution units, and fanned out across agents on consumer hardware.

Quick links

See the Runtime Flow

This short walkthrough shows how overlapping agent tasks are collapsed into one shared execution unit.

Live Local Demo

This real capture shows the actual local dashboard and runtime handling the coding-agent overlap scenario.

What It Does

Multiple local or simulated agents emit tool tasks independently.
Middleware detects semantic overlap across those tasks.
A bounded non-resetting admission window captures duplicate work before execution.
One shared execution unit serves multiple subscribers.
Dashboard and benchmarks make the savings visible.

Architecture Preview

gemma4-wdc presents a local multi-agent runtime centered on middleware-level deduplication: task ingress, semantic overlap detection, bounded admission, shared execution units, and result fan-out.

It is most relevant to coding-agent and research-agent workflows where overlapping repo scans, document extraction, API calls, and analytics tasks become a runtime bottleneck.

Why This Exists

Local multi-agent systems are getting easier to run. Even on modest hardware, we can already simulate or partially realize agent teams that branch, plan, search, inspect code, extract evidence, and call tools in parallel.

What still gets wasted is the backend work underneath those branches. Multiple locally rational agents can independently ask for the same repo scan, the same document extraction pass, or the same analytics query. Gemma4-WDC exists to demonstrate that this is a middleware problem as much as a model problem.

Laptop-First Prototype

gemma4-wdc is designed as a consumer-hardware proof of concept. The goal is to make the middleware thesis visible on a single machine, not to imply cluster-scale runtime behavior.

Current Scope

This repository is intentionally scoped as:

a single-machine proof of concept
simulation-first by default
practical on a modest Windows laptop or Windows plus WSL2 setup
optionally able to host one real local model adapter in hybrid mode

It is explicitly not:

a production cluster
a polished enterprise control plane
a proof of 10-node throughput
a claim that many heavy local model instances should run concurrently on consumer hardware

Simulation mode is first-class. Hybrid mode with one real local model is optional. The repository is designed to demonstrate the systems thesis on consumer hardware, not to present a production multi-node cluster.

How It Works

gemma4-wdc receives structured tool tasks from multiple agents or branches:

a task ingress layer records the task, agent, branch, type, and resource hint
a fingerprinting layer computes a canonical key, exact structural hash, and semantic comparison text
matching first tries exact structural overlap, then lightweight near-duplicate similarity where safe
the first arrival opens a bounded admission window that does not reset
matching tasks attach to one shared execution unit, or SEU
the SEU executes once through the backend executor
the result fans out to all subscribing agents
metrics show tasks received, SEUs created, executions saved, dedup multiplier, and safety rejections

multi-agent tasks
  -> ingress registry
  -> semantic fingerprinting
  -> bounded admission window
  -> shared execution unit
  -> execute once
  -> result fan-out
  -> observability

Simulation Mode

Simulation mode is the default and the primary deliverable.

multiple agents are simulated with structured task templates and lightweight local logic
the runtime behaves like a local multi-agent middleware service
the dashboard stays useful even without a real local model attached
demo scenarios and benchmarks are designed to be credible in this mode

This is the mode that should make a reader believe the systems thesis.

Hybrid Mode

Hybrid mode keeps most agents simulated while allowing one optional real local model adapter to participate.

one local Gemma-compatible node can propose tasks or rewrite them into structured tool calls
the remaining agents stay lightweight so the demo remains laptop-friendly
the middleware, not the model count, remains the focus

If no real model is available, the project is still complete in simulation mode.

Demo Scenarios

SQL overlap demo Multiple agents ask materially similar analytics questions and one execution serves several branches.
API overlap demo Similar backend requests collapse into one shared execution path.
Document research demo Research agents extract overlapping evidence from the same corpus.
Coding-agent demo Parallel planner, coder, and reviewer branches inspect the same local repo and collapse overlapping repo-understanding work.
Unique-task counterexample Similar-looking tasks stay separate when correctness should win over aggressive collapse.

Representative visuals:

Coding-agent overlap
Local dashboard
Benchmark summary

Benchmark Presentation

These numbers are preliminary and come from the current local harness using mock executors and hand-authored scenarios. They are here to show runtime behavior clearly, not to overclaim production performance.

Scenario	Tasks Requested	Actual Executions	Executions Saved	Dedup Ratio
`coding_repo_scan`	4	2	2	2.0x
`document_research`	3	2	1	1.5x
`api_fanout`	3	2	1	1.5x
`false_collapse_safety`	4	4	0	1.0x

See docs/benchmark-methodology.md and benchmarks/README.md for detail.

Benchmark Harness

cd benchmarks
python run_benchmarks.py

Outputs:

machine-readable summary: benchmarks/results/latest_summary.json
readable summary: benchmarks/results/summary.md

Run Locally

Simulation Mode

cd runtime/shared_execution/backend
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
uvicorn app.main:app --reload

In a second terminal:

cd runtime/shared_execution/frontend
python -m http.server 4173

Then open http://localhost:4173 and run the built-in demo scenarios.

Hybrid Mode

Hybrid mode currently means: run the same stack, but wire one local model adapter into the task-generation path while keeping the rest of the agents simulated.

For now, the repository includes the interface and positioning for this mode, but a stronger real-model path is still a next-step item rather than a polished default experience.

GitHub Pages Site

The public microsite is rooted at site/ and includes:

a Pages-safe static asset tree
architecture, benchmark, and example pages
canonical URLs, Open Graph metadata, robots.txt, sitemap.xml, and a custom 404.html
a deploy workflow at .github/workflows/pages.yml

To preview locally:

cd site
python -m http.server 8080

Then open http://localhost:8080.

Repo Structure

gemma4-wdc/
  README.md
  docs/
  diagrams/
  benchmarks/
  examples/
  runtime/shared_execution/
  site/

Key entry points:

Roadmap

stronger local model adapter path for one real Gemma-family node
branch-aware intermediate state reuse beyond full result fan-out
richer coding-agent traces and replayable scenario logs
stronger explainability for why tasks did or did not collapse
eventual multi-node coordination after the single-machine thesis is fully proven

See docs/roadmap.md for the longer plan.

Limits

in-memory single-process runtime
mock or lightweight local backends
hand-authored scenarios instead of replayed production traces
no distributed coordination yet
hybrid mode is real as a concept, but the simulation-first path is still the best experience

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

gemma4-wdc

See the Runtime Flow

What It Does

Architecture Preview

Why This Exists

Laptop-First Prototype

Current Scope

How It Works

Simulation Mode

Hybrid Mode

Demo Scenarios

Benchmark Presentation

Benchmark Harness

Run Locally

Simulation Mode

Hybrid Mode

GitHub Pages Site

Repo Structure

Roadmap

Limits

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github/workflows		.github/workflows
benchmarks		benchmarks
diagrams		diagrams
docs		docs
examples		examples
runtime/shared_execution		runtime/shared_execution
site		site
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

gemma4-wdc

See the Runtime Flow

What It Does

Architecture Preview

Why This Exists

Laptop-First Prototype

Current Scope

How It Works

Simulation Mode

Hybrid Mode

Demo Scenarios

Benchmark Presentation

Benchmark Harness

Run Locally

Simulation Mode

Hybrid Mode

GitHub Pages Site

Repo Structure

Roadmap

Limits

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages