Concurrent, multi-agent editing with deterministic convergence and Git-compatible output.
minigit is a real-time semantic concurrency layer that sits above Git, enabling multiple AI agents and humans to edit the same codebase simultaneously without conflicts.
| Document | Description |
|---|---|
| Architecture | Complete system architecture and design |
| Agent Tasks | Task breakdown for completing the product |
| Identity Resolver | ML pipeline for entity identity resolution |
- Deterministic merging: Same inputs always produce same outputs
- Semantic awareness: Operations understand code structure, not just text
- Stable entity identity: Track symbols across renames, moves, and refactors (99.99% accuracy)
- Quarantine mechanism: Ambiguity is isolated, never corrupts trunk
- Git-compatible: Clean export to standard Git commits and PRs
┌─────────────┐ ops/events ┌────────────────────┐
│ IDE / Human │ <─────────────────────> │ Workspace Server │
└─────────────┘ │ │
│ OpsLog (WAL) │
┌─────────────┐ ops/events │ Materializer │
│ AI Agents │ <─────────────────────> │ Identity Resolver │
│ (N) │ │ Merge Policy │
└─────────────┘ │ Verifier │
└─────────┬──────────┘
│
│ Git Projection
▼
┌────────────────────┐
│ Git Adapter │
│ (commits / PRs) │
└────────────────────┘
minigit/
├── crates/
│ ├── minigit-core/ # Core data types and domain logic
│ ├── minigit-hlc/ # Hybrid Logical Clock for ordering
│ ├── minigit-opslog/ # Append-only operation log (WAL)
│ ├── minigit-git/ # Git import/export adapter
│ └── minigit-server/ # HTTP/WebSocket server
├── ml/
│ └── identity_resolver/ # ML pipeline for entity identity resolution
│ ├── crawler/ # Git history mining
│ ├── parser/ # Code entity extraction (tree-sitter)
│ ├── features/ # Feature extraction for ML
│ ├── model/ # Neural network + hybrid classifier
│ ├── experiments/ # Adversarial testing
│ └── pipeline/ # Training data generation
├── plugins/
│ └── ts-semantics/ # TypeScript/JavaScript semantic plugin
├── architecture..md # Full architecture specification
└── IDENTITY_RESOLVER.md # ML system architecture
A logical code object with stable identity across edits:
- Files, modules, symbols (functions, classes, variables)
- Tracked via fingerprinting for rename/move resilience
An intentful mutation applied to entities:
- SemOp: Semantic operations (rename, add parameter, etc.)
- TextOp: Anchored text edits (fallback)
- MetaOp: Bundle lifecycle, review decisions
A coherent unit of work (roughly "a commit"):
- Contains operations + intent metadata
- Accepted/rejected as a unit
- Can be quarantined on conflict
- Rust 1.75+ (for core)
- Node.js 18+ (for TS plugin)
- Python 3.10+ (for ML pipeline)
# Build Rust crates
cargo build
# Run tests
cargo test
# Build TypeScript plugin
cd plugins/ts-semantics
npm install
npm run buildNote: Some features (Git integration, WebSocket) are currently disabled because their dependencies require Rust 1.82+. Upgrade your Rust toolchain to enable them by running
rustup update.
cargo run --bin minigit-serverThe server starts on http://127.0.0.1:7432 by default.
POST /ops - Append operations
GET /ops/since - Get ops since clock
GET /state - Get current state
POST /bundles - Create bundle
POST /bundles/:id/submit - Submit for review
POST /git/export - Export to Git
- Determinism first: Same inputs → same outputs, no nondeterministic merges
- Never corrupt trunk: Ambiguity ⇒ quarantine, no silent failures
- Semantics over text: Prefer entity-level operations
- LLMs assist, never decide: AI proposes, deterministic engine verifies
The identity resolver is the core innovation that enables tracking code entities across renames, moves, and refactors. It answers: "Is entity A the same as entity B?"
| Metric | Score |
|---|---|
| Test Set Accuracy | 99.99% (11,679/11,680) |
| Precision | 100% (0 false positives) |
| Adversarial MILD | 100% ✅ |
| Adversarial MODERATE | 100% ✅ |
| Adversarial HARD | 100% ✅ |
| Adversarial ADVERSARIAL | 100% ✅ |
cd ml
# Create virtual environment
python3 -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r identity_resolver/requirements.txt# Parse a file and show entities
python -m identity_resolver parse path/to/file.ts
# Compare entities between two files
python -m identity_resolver compare file_v1.ts file_v2.ts# Crawl GitHub repos and generate training data
python -m identity_resolver crawl \
--max-repos 100 \
--output-dir ./data/pairs
# View statistics
python -m identity_resolver stats ./data/pairs# Train V4 model (with type signatures + domain awareness)
python -m identity_resolver.model.train_v4 ./data/pairs
# Model saved to: ./checkpoints_v4/best_model_v4.ptTraining takes ~2 minutes on Apple Silicon (M1/M2/M3).
# Test model on adversarial cases
python -m identity_resolver.experiments.adversarial_v4
# Test hybrid classifier (model + rules)
python -m identity_resolver.model.hybrid_classifier| Category | Features |
|---|---|
| Token Similarity | Jaccard, bigram Jaccard, length ratio |
| Name Features | Exact match, case-insensitive, parts overlap |
| Type Signature | Param types Jaccard, return type match, param count |
| Domain Tokens | User, Account, Order, Product detection |
| Composite Signals | Same-name-diff-signature, same-name-low-overlap |
| Structural | Block count, call count, return statements |
The production system combines neural network + rule-based overrides:
from identity_resolver.model.hybrid_classifier import HybridClassifier
classifier = HybridClassifier(model, device)
result = classifier.classify(
body_a="function validate(s: string) { ... }",
body_b="function validate(user: User) { ... }",
name_a="validate",
name_b="validate",
)
# result.prediction = "different"
# result.source = "rule:same_name_diff_signature"Rules handle clear-cut cases:
high_body_similarity (>90%)→ SAME (refactored entity)same_name_diff_signature→ DIFFERENT (different overloads)similar_name_diff_suffix→ DIFFERENT (getUserData vs getUserProfile)
- Hybrid Logical Clock (HLC) for ordering
- OpsLog (append-only WAL)
- Entity and EntityGraph types
- Semantic operations (SemOp, TextOp, MetaOp)
- Bundle and quarantine mechanism
- Basic merge engine
- HTTP API scaffold
- TypeScript semantic plugin scaffold
- ML data pipeline (Git crawler, parser, feature extraction)
- Identity resolution model (V4 neural network)
- Hard negative mining (same name different entity)
- Adversarial testing framework
- Hybrid classifier (model + rules)
- Full text operation anchoring
- Git import/export integration
- WebSocket real-time sync
- Basic verification layer
- Entity feature extraction
- Training data collection (150K+ pairs)
- Identity resolution model (99.99% accuracy)
- TypeScript identity resolver integration
- Semantic operation application
- Typecheck gating
- Live agent testing
- LLM-assisted conflict resolution
- Proof-based auto-apply
- Multi-language support (Python, Go, Rust)
- Production feedback loop
┌─────────────────────────────────────────────────────────┐
│ Hybrid Classifier │
├─────────────────────────────────────────────────────────┤
│ ┌─────────────────┐ ┌─────────────────────────────┐│
│ │ Rule Engine │ │ V4 Neural Network ││
│ │ │ │ ││
│ │ • body_sim>90% │ │ 34 features → 256 hidden ││
│ │ • same_name+ │ │ → 77K params ││
│ │ diff_sig │ │ → sigmoid output ││
│ │ • diff_suffix │ │ ││
│ └────────┬────────┘ └─────────────┬───────────────┘│
│ │ │ │
│ └─────────┬─────────────────┘ │
│ ▼ │
│ Final Prediction │
│ (rule override or model) │
└─────────────────────────────────────────────────────────┘
Proprietary. All rights reserved.