minigit

Concurrent, multi-agent editing with deterministic convergence and Git-compatible output.

minigit is a real-time semantic concurrency layer that sits above Git, enabling multiple AI agents and humans to edit the same codebase simultaneously without conflicts.

Documentation

Document	Description
Architecture	Complete system architecture and design
Agent Tasks	Task breakdown for completing the product
Identity Resolver	ML pipeline for entity identity resolution

Key Features

Deterministic merging: Same inputs always produce same outputs
Semantic awareness: Operations understand code structure, not just text
Stable entity identity: Track symbols across renames, moves, and refactors (99.99% accuracy)
Quarantine mechanism: Ambiguity is isolated, never corrupts trunk
Git-compatible: Clean export to standard Git commits and PRs

Architecture

┌─────────────┐        ops/events        ┌────────────────────┐
│ IDE / Human │ <─────────────────────> │ Workspace Server    │
└─────────────┘                          │                    │
                                         │  OpsLog (WAL)       │
┌─────────────┐        ops/events        │  Materializer       │
│ AI Agents   │ <─────────────────────> │  Identity Resolver  │
│ (N)         │                          │  Merge Policy       │
└─────────────┘                          │  Verifier           │
                                         └─────────┬──────────┘
                                                   │
                                                   │ Git Projection
                                                   ▼
                                         ┌────────────────────┐
                                         │ Git Adapter         │
                                         │ (commits / PRs)     │
                                         └────────────────────┘

Project Structure

minigit/
├── crates/
│   ├── minigit-core/      # Core data types and domain logic
│   ├── minigit-hlc/       # Hybrid Logical Clock for ordering
│   ├── minigit-opslog/    # Append-only operation log (WAL)
│   ├── minigit-git/       # Git import/export adapter
│   └── minigit-server/    # HTTP/WebSocket server
├── ml/
│   └── identity_resolver/ # ML pipeline for entity identity resolution
│       ├── crawler/       # Git history mining
│       ├── parser/        # Code entity extraction (tree-sitter)
│       ├── features/      # Feature extraction for ML
│       ├── model/         # Neural network + hybrid classifier
│       ├── experiments/   # Adversarial testing
│       └── pipeline/      # Training data generation
├── plugins/
│   └── ts-semantics/      # TypeScript/JavaScript semantic plugin
├── architecture..md       # Full architecture specification
└── IDENTITY_RESOLVER.md   # ML system architecture

Core Concepts

Entity

A logical code object with stable identity across edits:

Files, modules, symbols (functions, classes, variables)
Tracked via fingerprinting for rename/move resilience

Operation (Op)

An intentful mutation applied to entities:

SemOp: Semantic operations (rename, add parameter, etc.)
TextOp: Anchored text edits (fallback)
MetaOp: Bundle lifecycle, review decisions

Bundle

A coherent unit of work (roughly "a commit"):

Contains operations + intent metadata
Accepted/rejected as a unit
Can be quarantined on conflict

Getting Started

Prerequisites

Rust 1.75+ (for core)
Node.js 18+ (for TS plugin)
Python 3.10+ (for ML pipeline)

Building

# Build Rust crates
cargo build

# Run tests
cargo test

# Build TypeScript plugin
cd plugins/ts-semantics
npm install
npm run build

Note: Some features (Git integration, WebSocket) are currently disabled because their dependencies require Rust 1.82+. Upgrade your Rust toolchain to enable them by running rustup update.

Running the Server

cargo run --bin minigit-server

The server starts on http://127.0.0.1:7432 by default.

API Overview

POST /ops              - Append operations
GET  /ops/since        - Get ops since clock
GET  /state            - Get current state
POST /bundles          - Create bundle
POST /bundles/:id/submit - Submit for review
POST /git/export       - Export to Git

Design Principles

Determinism first: Same inputs → same outputs, no nondeterministic merges
Never corrupt trunk: Ambiguity ⇒ quarantine, no silent failures
Semantics over text: Prefer entity-level operations
LLMs assist, never decide: AI proposes, deterministic engine verifies

Identity Resolution (ML Pipeline)

The identity resolver is the core innovation that enables tracking code entities across renames, moves, and refactors. It answers: "Is entity A the same as entity B?"

Results

Metric	Score
Test Set Accuracy	99.99% (11,679/11,680)
Precision	100% (0 false positives)
Adversarial MILD	100% ✅
Adversarial MODERATE	100% ✅
Adversarial HARD	100% ✅
Adversarial ADVERSARIAL	100% ✅

Quick Start

cd ml

# Create virtual environment
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r identity_resolver/requirements.txt

Parse & Analyze Code

# Parse a file and show entities
python -m identity_resolver parse path/to/file.ts

# Compare entities between two files
python -m identity_resolver compare file_v1.ts file_v2.ts

Collect Training Data

# Crawl GitHub repos and generate training data
python -m identity_resolver crawl \
    --max-repos 100 \
    --output-dir ./data/pairs

# View statistics
python -m identity_resolver stats ./data/pairs

Train the Model

# Train V4 model (with type signatures + domain awareness)
python -m identity_resolver.model.train_v4 ./data/pairs

# Model saved to: ./checkpoints_v4/best_model_v4.pt

Training takes ~2 minutes on Apple Silicon (M1/M2/M3).

Run Adversarial Tests

# Test model on adversarial cases
python -m identity_resolver.experiments.adversarial_v4

# Test hybrid classifier (model + rules)
python -m identity_resolver.model.hybrid_classifier

Features Extracted (34 total)

Category	Features
Token Similarity	Jaccard, bigram Jaccard, length ratio
Name Features	Exact match, case-insensitive, parts overlap
Type Signature	Param types Jaccard, return type match, param count
Domain Tokens	User, Account, Order, Product detection
Composite Signals	Same-name-diff-signature, same-name-low-overlap
Structural	Block count, call count, return statements

Hybrid Classifier

The production system combines neural network + rule-based overrides:

from identity_resolver.model.hybrid_classifier import HybridClassifier

classifier = HybridClassifier(model, device)
result = classifier.classify(
    body_a="function validate(s: string) { ... }",
    body_b="function validate(user: User) { ... }",
    name_a="validate",
    name_b="validate",
)
# result.prediction = "different"
# result.source = "rule:same_name_diff_signature"

Rules handle clear-cut cases:

high_body_similarity (>90%) → SAME (refactored entity)
same_name_diff_signature → DIFFERENT (different overloads)
similar_name_diff_suffix → DIFFERENT (getUserData vs getUserProfile)

Development Status

Completed ✅

Phase 1 (MVP) - In Progress

Full text operation anchoring
Git import/export integration
WebSocket real-time sync
Basic verification layer
Entity feature extraction
Training data collection (150K+ pairs)
Identity resolution model (99.99% accuracy)

Phase 2 - Planned

TypeScript identity resolver integration
Semantic operation application
Typecheck gating
Live agent testing

Phase 3 - Future

LLM-assisted conflict resolution
Proof-based auto-apply
Multi-language support (Python, Go, Rust)
Production feedback loop

Model Architecture

┌─────────────────────────────────────────────────────────┐
│                   Hybrid Classifier                      │
├─────────────────────────────────────────────────────────┤
│  ┌─────────────────┐    ┌─────────────────────────────┐│
│  │  Rule Engine    │    │    V4 Neural Network        ││
│  │                 │    │                             ││
│  │  • body_sim>90% │    │  34 features → 256 hidden   ││
│  │  • same_name+   │    │  → 77K params               ││
│  │    diff_sig     │    │  → sigmoid output           ││
│  │  • diff_suffix  │    │                             ││
│  └────────┬────────┘    └─────────────┬───────────────┘│
│           │                           │                 │
│           └─────────┬─────────────────┘                 │
│                     ▼                                   │
│              Final Prediction                           │
│         (rule override or model)                        │
└─────────────────────────────────────────────────────────┘

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
crates		crates
docs		docs
ml		ml
plugins/ts-semantics		plugins/ts-semantics
sdks		sdks
.gitignore		.gitignore
Cargo.toml		Cargo.toml
IDENTITY_RESOLVER.md		IDENTITY_RESOLVER.md
README.md		README.md
architecture..md		architecture..md

Folders and files

Latest commit

History

Repository files navigation

minigit

Documentation

Key Features

Architecture

Project Structure

Core Concepts

Entity

Operation (Op)

Bundle

Getting Started

Prerequisites

Building

Running the Server

API Overview

Design Principles

Identity Resolution (ML Pipeline)

Results

Quick Start

Parse & Analyze Code

Collect Training Data

Train the Model

Run Adversarial Tests

Features Extracted (34 total)

Hybrid Classifier

Development Status

Completed ✅

Phase 1 (MVP) - In Progress

Phase 2 - Planned

Phase 3 - Future

Model Architecture

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages