Skip to content

calmdentist/minigit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

minigit

Concurrent, multi-agent editing with deterministic convergence and Git-compatible output.

minigit is a real-time semantic concurrency layer that sits above Git, enabling multiple AI agents and humans to edit the same codebase simultaneously without conflicts.

Documentation

Document Description
Architecture Complete system architecture and design
Agent Tasks Task breakdown for completing the product
Identity Resolver ML pipeline for entity identity resolution

Key Features

  • Deterministic merging: Same inputs always produce same outputs
  • Semantic awareness: Operations understand code structure, not just text
  • Stable entity identity: Track symbols across renames, moves, and refactors (99.99% accuracy)
  • Quarantine mechanism: Ambiguity is isolated, never corrupts trunk
  • Git-compatible: Clean export to standard Git commits and PRs

Architecture

┌─────────────┐        ops/events        ┌────────────────────┐
│ IDE / Human │ <─────────────────────> │ Workspace Server    │
└─────────────┘                          │                    │
                                         │  OpsLog (WAL)       │
┌─────────────┐        ops/events        │  Materializer       │
│ AI Agents   │ <─────────────────────> │  Identity Resolver  │
│ (N)         │                          │  Merge Policy       │
└─────────────┘                          │  Verifier           │
                                         └─────────┬──────────┘
                                                   │
                                                   │ Git Projection
                                                   ▼
                                         ┌────────────────────┐
                                         │ Git Adapter         │
                                         │ (commits / PRs)     │
                                         └────────────────────┘

Project Structure

minigit/
├── crates/
│   ├── minigit-core/      # Core data types and domain logic
│   ├── minigit-hlc/       # Hybrid Logical Clock for ordering
│   ├── minigit-opslog/    # Append-only operation log (WAL)
│   ├── minigit-git/       # Git import/export adapter
│   └── minigit-server/    # HTTP/WebSocket server
├── ml/
│   └── identity_resolver/ # ML pipeline for entity identity resolution
│       ├── crawler/       # Git history mining
│       ├── parser/        # Code entity extraction (tree-sitter)
│       ├── features/      # Feature extraction for ML
│       ├── model/         # Neural network + hybrid classifier
│       ├── experiments/   # Adversarial testing
│       └── pipeline/      # Training data generation
├── plugins/
│   └── ts-semantics/      # TypeScript/JavaScript semantic plugin
├── architecture..md       # Full architecture specification
└── IDENTITY_RESOLVER.md   # ML system architecture

Core Concepts

Entity

A logical code object with stable identity across edits:

  • Files, modules, symbols (functions, classes, variables)
  • Tracked via fingerprinting for rename/move resilience

Operation (Op)

An intentful mutation applied to entities:

  • SemOp: Semantic operations (rename, add parameter, etc.)
  • TextOp: Anchored text edits (fallback)
  • MetaOp: Bundle lifecycle, review decisions

Bundle

A coherent unit of work (roughly "a commit"):

  • Contains operations + intent metadata
  • Accepted/rejected as a unit
  • Can be quarantined on conflict

Getting Started

Prerequisites

  • Rust 1.75+ (for core)
  • Node.js 18+ (for TS plugin)
  • Python 3.10+ (for ML pipeline)

Building

# Build Rust crates
cargo build

# Run tests
cargo test

# Build TypeScript plugin
cd plugins/ts-semantics
npm install
npm run build

Note: Some features (Git integration, WebSocket) are currently disabled because their dependencies require Rust 1.82+. Upgrade your Rust toolchain to enable them by running rustup update.

Running the Server

cargo run --bin minigit-server

The server starts on http://127.0.0.1:7432 by default.

API Overview

POST /ops              - Append operations
GET  /ops/since        - Get ops since clock
GET  /state            - Get current state
POST /bundles          - Create bundle
POST /bundles/:id/submit - Submit for review
POST /git/export       - Export to Git

Design Principles

  1. Determinism first: Same inputs → same outputs, no nondeterministic merges
  2. Never corrupt trunk: Ambiguity ⇒ quarantine, no silent failures
  3. Semantics over text: Prefer entity-level operations
  4. LLMs assist, never decide: AI proposes, deterministic engine verifies

Identity Resolution (ML Pipeline)

The identity resolver is the core innovation that enables tracking code entities across renames, moves, and refactors. It answers: "Is entity A the same as entity B?"

Results

Metric Score
Test Set Accuracy 99.99% (11,679/11,680)
Precision 100% (0 false positives)
Adversarial MILD 100% ✅
Adversarial MODERATE 100% ✅
Adversarial HARD 100% ✅
Adversarial ADVERSARIAL 100% ✅

Quick Start

cd ml

# Create virtual environment
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r identity_resolver/requirements.txt

Parse & Analyze Code

# Parse a file and show entities
python -m identity_resolver parse path/to/file.ts

# Compare entities between two files
python -m identity_resolver compare file_v1.ts file_v2.ts

Collect Training Data

# Crawl GitHub repos and generate training data
python -m identity_resolver crawl \
    --max-repos 100 \
    --output-dir ./data/pairs

# View statistics
python -m identity_resolver stats ./data/pairs

Train the Model

# Train V4 model (with type signatures + domain awareness)
python -m identity_resolver.model.train_v4 ./data/pairs

# Model saved to: ./checkpoints_v4/best_model_v4.pt

Training takes ~2 minutes on Apple Silicon (M1/M2/M3).

Run Adversarial Tests

# Test model on adversarial cases
python -m identity_resolver.experiments.adversarial_v4

# Test hybrid classifier (model + rules)
python -m identity_resolver.model.hybrid_classifier

Features Extracted (34 total)

Category Features
Token Similarity Jaccard, bigram Jaccard, length ratio
Name Features Exact match, case-insensitive, parts overlap
Type Signature Param types Jaccard, return type match, param count
Domain Tokens User, Account, Order, Product detection
Composite Signals Same-name-diff-signature, same-name-low-overlap
Structural Block count, call count, return statements

Hybrid Classifier

The production system combines neural network + rule-based overrides:

from identity_resolver.model.hybrid_classifier import HybridClassifier

classifier = HybridClassifier(model, device)
result = classifier.classify(
    body_a="function validate(s: string) { ... }",
    body_b="function validate(user: User) { ... }",
    name_a="validate",
    name_b="validate",
)
# result.prediction = "different"
# result.source = "rule:same_name_diff_signature"

Rules handle clear-cut cases:

  • high_body_similarity (>90%) → SAME (refactored entity)
  • same_name_diff_signature → DIFFERENT (different overloads)
  • similar_name_diff_suffix → DIFFERENT (getUserData vs getUserProfile)

Development Status

Completed ✅

  • Hybrid Logical Clock (HLC) for ordering
  • OpsLog (append-only WAL)
  • Entity and EntityGraph types
  • Semantic operations (SemOp, TextOp, MetaOp)
  • Bundle and quarantine mechanism
  • Basic merge engine
  • HTTP API scaffold
  • TypeScript semantic plugin scaffold
  • ML data pipeline (Git crawler, parser, feature extraction)
  • Identity resolution model (V4 neural network)
  • Hard negative mining (same name different entity)
  • Adversarial testing framework
  • Hybrid classifier (model + rules)

Phase 1 (MVP) - In Progress

  • Full text operation anchoring
  • Git import/export integration
  • WebSocket real-time sync
  • Basic verification layer
  • Entity feature extraction
  • Training data collection (150K+ pairs)
  • Identity resolution model (99.99% accuracy)

Phase 2 - Planned

  • TypeScript identity resolver integration
  • Semantic operation application
  • Typecheck gating
  • Live agent testing

Phase 3 - Future

  • LLM-assisted conflict resolution
  • Proof-based auto-apply
  • Multi-language support (Python, Go, Rust)
  • Production feedback loop

Model Architecture

┌─────────────────────────────────────────────────────────┐
│                   Hybrid Classifier                      │
├─────────────────────────────────────────────────────────┤
│  ┌─────────────────┐    ┌─────────────────────────────┐│
│  │  Rule Engine    │    │    V4 Neural Network        ││
│  │                 │    │                             ││
│  │  • body_sim>90% │    │  34 features → 256 hidden   ││
│  │  • same_name+   │    │  → 77K params               ││
│  │    diff_sig     │    │  → sigmoid output           ││
│  │  • diff_suffix  │    │                             ││
│  └────────┬────────┘    └─────────────┬───────────────┘│
│           │                           │                 │
│           └─────────┬─────────────────┘                 │
│                     ▼                                   │
│              Final Prediction                           │
│         (rule override or model)                        │
└─────────────────────────────────────────────────────────┘

License

Proprietary. All rights reserved.

About

Realtime git for parallel agentic coding

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors