ChelatedAI

ChelatedAI is a Python research repository for adaptive retrieval, post-hoc embedding correction, multi-dataset evaluation, and computational-storage experiments.

The codebase now spans two connected themes:

improving vector retrieval quality through chelation, sedimentation, distillation, topology analysis, and online correction
exploring whether parts of model execution can be pushed toward storage-resident node graphs, deterministic transport paths, and multi-drive speculative execution

Note The computational-storage track includes drive-resident graph execution experiments and RP2040 transport tooling. It does not yet prove full on-device LLM inference on physical hard drives or SSDs. The current merged hardware claim is scope-locked to a deterministic transport proof. See docs/computational-storage-transport-scope-decision.md.

Why This Repo Exists

Most embedding systems assume the base embedding model is fixed and that retrieval quality is mainly a search-index problem. ChelatedAI treats retrieval failures as a dynamic systems problem:

detect when a query enters a noisy neighborhood
rerank or adapt before collapse propagates
track structural drift over time
benchmark whether improvements generalize across datasets
test whether some inference primitives can move closer to storage media

Repository Tracks

Track	What it covers	Main entrypoints
Adaptive retrieval	Chelation, sedimentation, adapter-based correction, vector-store integration	`antigravity_engine.py`, `chelation_adapter.py`, `vector_store.py`, `config.py`
Distillation and correction	Teacher guidance, cross-lingual routing, online updates, schedule tuning	`teacher_distillation.py`, `cross_lingual_distillation.py`, `teacher_weight_scheduler.py`, `online_updater.py`
Evaluation and reporting	BEIR runs, comparative benchmarks, sweeps, and dashboards	`benchmark_beir.py`, `benchmark_comparative.py`, `benchmark_multitask.py`, `run_sweep.py`, `run_large_sweep.py`, `dashboard_server.py`
Structural analysis	Topology cohesion, isomer drift, embedding quality, stability diagnostics	`topology_analyzer.py`, `isomer_detector.py`, `embedding_quality.py`, `stability_tracker.py`
Computational storage and drive nodes	Block-graph execution, mock NVMe path, multi-drive array simulation, RP2040 firmware, emulator, host reader, evidence capture	`computational_storage_poc/`, `test_computational_storage_poc.py`, `test_computational_storage_payload.py`, `test_computational_storage_emulation.py`
Process and remediation	Agentic review workflow, tracker docs, session logs, verification evidence	`aep_orchestrator.py`, `docs/ARCH AGENTIC ENGINEERING AND PLANNING/`

Quick Start

1. Install Python dependencies

Windows PowerShell:

python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
pip install -e .

macOS / Linux:

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
pip install -e .

requirements.txt installs the full research stack, including requests, mteb, and scikit-learn. pyproject.toml exposes the installable package metadata and optional dependency groups.

2. Optional local embedding backend

If you want to use the Ollama-backed embedding path:

docker run -d -p 11434:11434 ollama/ollama
docker exec ollama ollama pull nomic-embed-text

Use model names like ollama:nomic-embed-text to route through the HTTP embedding backend.

3. Run the main validation surfaces

python -m unittest discover -s . -p "test_*.py" -v
python -m unittest test_computational_storage_poc.py -v
python -m unittest test_computational_storage_emulation.py -v
python computational_storage_poc/run_all_tests.py
python computational_storage_poc/emulation/validate_emulation_path.py

4. Run representative research entrypoints

python benchmark_beir.py --tier small --output benchmark_beir_small.json
python benchmark_multitask.py --tasks small --epochs 5 --max-queries 100
python dashboard_server.py --port 8080

Information Flows

Retrieval and adaptation loop

flowchart TD
    A[Documents] --> B[Embedding backend]
    B --> C[Vector store ingestion]
    Q[Query] --> E[AntigravityEngine]
    E --> F[Neighborhood retrieval]
    F --> G{Variance / structure check}
    G -->|Stable| H[Standard ranking]
    G -->|Noisy| I[Chelation / reranking]
    I --> J[Noise-center logging]
    J --> K[Sedimentation or online update]
    K --> L[Adapter weights / corrected behavior]
    H --> M[Result set]
    I --> M

Computational-storage research flow

flowchart LR
    A[Train or define graph] --> B[Compile matrix blocks]
    B --> C[Flash or file-backed payload]
    C --> D[Software block-graph validation]
    C --> E[Mock NVMe latency model]
    C --> F[RP2040 firmware or emulator]
    F --> G[Sector 100 payload contract]
    G --> H[Host reader / evidence capture]

Current Research Status

As of 2026-03-06:

the adaptive retrieval, benchmarking, and distillation surfaces are implemented on main
the remaining non-hardware work is primarily evaluation and weight refinement, not missing feature delivery
the computational-storage follow-through is narrowed to real RP2040 evidence capture and a dated retention review
the repository includes credible storage-node experiments, but not a shipped hard-drive-hosted LLM runtime

For the current audit and post-feature evaluation plan, see docs/roadmap-audit-and-weight-refinement-plan-2026-03-06.md.

Module Walkthrough

Core retrieval runtime

antigravity_engine.py: central engine for ingestion, inference, adaptive chelation, logging, and training hooks
embedding_backend.py: routes embeddings to Ollama or local SentenceTransformers
vector_store.py: Qdrant abstraction used by the retrieval engine
chelation_adapter.py: near-identity adapter variants for post-hoc correction
config.py: presets and validation for retrieval, distillation, online updates, topology, and BEIR

Training, correction, and analysis

teacher_distillation.py: offline, hybrid, and teacher-guided correction helpers
cross_lingual_distillation.py: language-aware teacher routing
online_updater.py: inference-time update mechanisms and diagnostics
topology_analyzer.py and isomer_detector.py: structural drift analysis
stability_tracker.py, embedding_quality.py, convergence_monitor.py: health and learning diagnostics

Evaluation and experimentation

benchmark_beir.py, benchmark_multitask.py, benchmark_comparative.py, benchmark_distillation.py: retrieval-quality evaluation
run_sweep.py and run_large_sweep.py: grid-search style parameter studies
dashboard_server.py and dashboard/index.html: local research dashboard

Computational storage and drive nodes

computational_storage_poc/block_graph.py: flash-friendly block packing and traversal
computational_storage_poc/mock_nvme.py: software parity and latency model for computational-storage reads
computational_storage_poc/mock_array.py: speculative multipath racing across storage nodes
computational_storage_poc/payload_contract.py: deterministic trigger-sector payload used by firmware and emulator
computational_storage_poc/usb_host_inference.py: host-side raw-sector reader
computational_storage_poc/capture_hardware_evidence.py: auditable RP2040 evidence capture tool
computational_storage_poc/firmware/: RP2040/TinyUSB transport firmware
computational_storage_poc/emulation/: dependency-light emulator validation path

CI and Validation

GitHub Actions currently verifies:

Python linting with ruff
full unittest discovery across Python 3.9, 3.10, 3.11, and 3.12
computational-storage fundamentals and the script harness
computational-storage emulation validation
RP2040 firmware build and artifact upload

See .github/workflows/test.yml and .github/workflows/build_firmware.yml.

Documentation Guide

Start here:

docs/README.md: canonical docs home and legacy-to-canonical map
docs/SYSTEM_BLUEPRINT.md: architecture, stack, and information flows
docs/MODULE_GUIDE.md: module-by-module inventory
docs/RESEARCH_TRACKS.md: active and historical research tracks
docs/COMPUTATIONAL_STORAGE_DRIVE_NODES.md: hard-drive / storage-node research summary
docs/INDEX.md: broader index, including the AEP process archive

Use Cases

Retrieval researcher

compare standard vs. chelated ranking behavior
run cross-dataset BEIR evaluations
refine adapter schedules and teacher weights

Systems researcher

test whether block-graph traversal can remain correct when moved toward storage media
compare host-driven vs. storage-driven latency models
validate deterministic firmware or emulator transport surfaces

Documentation or review session

use the canonical docs set first
fall back to the AEP archive for process evidence, session logs, and prior decisions

License

This repository is distributed under the MIT license. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 180 Commits
.github/workflows		.github/workflows
computational_storage_poc		computational_storage_poc
dashboard		dashboard
docs		docs
.gitattributes		.gitattributes
.gitignore		.gitignore
.report.json		.report.json
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODEOWNERS		CODEOWNERS
COMPLETION_SUMMARY.md		COMPLETION_SUMMARY.md
LICENSE		LICENSE
PR_DESCRIPTION.md		PR_DESCRIPTION.md
README.md		README.md
REFACTORING_PLAN.md		REFACTORING_PLAN.md
REFERENCES.md		REFERENCES.md
SECURITY.md		SECURITY.md
TECHNICAL_ANALYSIS.md		TECHNICAL_ANALYSIS.md
aep_orchestrator.py		aep_orchestrator.py
antigravity_engine.py		antigravity_engine.py
benchmark_beir.py		benchmark_beir.py
benchmark_comparative.py		benchmark_comparative.py
benchmark_distillation.py		benchmark_distillation.py
benchmark_evolution.py		benchmark_evolution.py
benchmark_multitask.py		benchmark_multitask.py
benchmark_rlm.py		benchmark_rlm.py
benchmark_utils.py		benchmark_utils.py
checkpoint_manager.py		checkpoint_manager.py
chelation_adapter.py		chelation_adapter.py
chelation_logger.py		chelation_logger.py
config.py		config.py
convergence_monitor.py		convergence_monitor.py
cross_lingual_distillation.py		cross_lingual_distillation.py
dashboard_server.py		dashboard_server.py
dimension_mask_predictor.py		dimension_mask_predictor.py
embedding_backend.py		embedding_backend.py
embedding_quality.py		embedding_quality.py
isomer_detector.py		isomer_detector.py
kalman_lr_scheduler.py		kalman_lr_scheduler.py
language_detector.py		language_detector.py
online_updater.py		online_updater.py
pyproject.toml		pyproject.toml
recursive_decomposer.py		recursive_decomposer.py
requirements.txt		requirements.txt
run_large_sweep.py		run_large_sweep.py
run_overnight_campaign.py		run_overnight_campaign.py
run_sweep.py		run_sweep.py
run_weight_refinement_campaign.py		run_weight_refinement_campaign.py
sedimentation.py		sedimentation.py
sedimentation_loss.py		sedimentation_loss.py
sedimentation_trainer.py		sedimentation_trainer.py
stability_tracker.py		stability_tracker.py
sweep_results.json		sweep_results.json
task_plan.md		task_plan.md
teacher_distillation.py		teacher_distillation.py
teacher_weight_scheduler.py		teacher_weight_scheduler.py
test_adaptive_threshold.py		test_adaptive_threshold.py
test_aep_orchestrator.py		test_aep_orchestrator.py
test_antigravity_engine.py		test_antigravity_engine.py
test_benchmark_beir.py		test_benchmark_beir.py
test_benchmark_comparative.py		test_benchmark_comparative.py
test_benchmark_distillation.py		test_benchmark_distillation.py
test_benchmark_multitask.py		test_benchmark_multitask.py
test_benchmark_rlm.py		test_benchmark_rlm.py
test_benchmark_utils.py		test_benchmark_utils.py
test_checkpoint_manager.py		test_checkpoint_manager.py
test_chelation_logger.py		test_chelation_logger.py
test_computational_storage_emulation.py		test_computational_storage_emulation.py
test_computational_storage_hardware_evidence.py		test_computational_storage_hardware_evidence.py
test_computational_storage_payload.py		test_computational_storage_payload.py
test_computational_storage_poc.py		test_computational_storage_poc.py
test_convergence_monitor.py		test_convergence_monitor.py
test_cross_lingual_distillation.py		test_cross_lingual_distillation.py
test_dashboard_server.py		test_dashboard_server.py
test_dimension_mask_predictor.py		test_dimension_mask_predictor.py
test_integration_rlm.py		test_integration_rlm.py
test_isomer_detector.py		test_isomer_detector.py
test_kalman_lr.py		test_kalman_lr.py
test_language_detector.py		test_language_detector.py
test_memory_optimization.py		test_memory_optimization.py
test_noise_injection.py		test_noise_injection.py
test_online_correction.py		test_online_correction.py
test_online_updater.py		test_online_updater.py
test_recursive_decomposer.py		test_recursive_decomposer.py
test_run_weight_refinement_campaign.py		test_run_weight_refinement_campaign.py
test_sedimentation_loss.py		test_sedimentation_loss.py
test_sedimentation_trainer.py		test_sedimentation_trainer.py
test_stability_tracker.py		test_stability_tracker.py
test_structural_health_report.py		test_structural_health_report.py
test_sweep_presets.py		test_sweep_presets.py
test_teacher_distillation.py		test_teacher_distillation.py
test_teacher_weight_scheduler.py		test_teacher_weight_scheduler.py
test_topology_analyzer.py		test_topology_analyzer.py
test_unit_core.py		test_unit_core.py
test_vector_store.py		test_vector_store.py
topology_analyzer.py		topology_analyzer.py
vector_store.py		vector_store.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ChelatedAI

Why This Repo Exists

Repository Tracks

Quick Start

1. Install Python dependencies

2. Optional local embedding backend

3. Run the main validation surfaces

4. Run representative research entrypoints

Information Flows

Retrieval and adaptation loop

Computational-storage research flow

Current Research Status

Module Walkthrough

Core retrieval runtime

Training, correction, and analysis

Evaluation and experimentation

Computational storage and drive nodes

CI and Validation

Documentation Guide

Use Cases

Retrieval researcher

Systems researcher

Documentation or review session

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ChelatedAI

Why This Repo Exists

Repository Tracks

Quick Start

1. Install Python dependencies

2. Optional local embedding backend

3. Run the main validation surfaces

4. Run representative research entrypoints

Information Flows

Retrieval and adaptation loop

Computational-storage research flow

Current Research Status

Module Walkthrough

Core retrieval runtime

Training, correction, and analysis

Evaluation and experimentation

Computational storage and drive nodes

CI and Validation

Documentation Guide

Use Cases

Retrieval researcher

Systems researcher

Documentation or review session

License

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages