Skip to content
This repository was archived by the owner on Mar 23, 2026. It is now read-only.

Techknowmadlabs/tkml-research-suite

Repository files navigation

TKML Research Suite — AI/ML

Enterprise-grade Claude plugin for AI/ML research, MLOps automation, and autonomous research workflows. Built by TechKnowmad AI for serious AI/ML practitioners.

Version License Claude Frameworks


What It Does

Three intelligent layers added to your Claude sessions:

1. Domain-Aware Prompt Architecture (automatic) — Fires on every message. Classifies your request (ML research, MLOps, agents, DevOps), scores clarity 1–10. Clear prompts pass through silently. Vague ML/research/agent requests get 1–3 targeted, domain-specific clarifying questions. Eliminates wasted iterations on complex tasks.

2. MLOps Quality Gate (automatic) — Fires before writing any Python file. Audits ML training scripts for the four most-missed production patterns: reproducibility seeds, experiment tracking, clean paths, and device detection. Offers to auto-fix before writing.

3. Research & Development Commands (on demand) — Four slash commands: deep research synthesis, full experiment scaffolding, MLOps infrastructure generation, and ML code review.


Quickstart

Install

  1. Download tkml-research-suite.plugin
  2. Open Claude Desktop → Plugins → Install → select the file
  3. Restart Claude — hooks activate automatically on next message

No configuration required. No external API keys needed.

First Use in 60 Seconds

The hook saves you an iteration:

train a model on my NER dataset

Quick context check:

  1. Dataset size and label schema? (e.g., 8k sentences, CoNLL-style BIO tags)
  2. HuggingFace Trainer, PyTorch custom loop, or spaCy?
  3. Target metric and compute budget?

Scaffold a complete experiment:

/experiment fine-tune DistilBERT on 15k customer intent dataset, 8 classes, W&B tracking, RTX 3090

Generates full directory structure, training loop, Hydra configs, Dockerfile, CI/CD workflows, and Makefile.

Deep research with saved brief:

/research speculative decoding — production-ready approaches 2024

Searches arXiv, engineering blogs, and GitHub. Synthesizes findings. Saves research-speculative-decoding-YYYYMMDD.md.

MLOps quality gate in action:

When you write a training script without seeds or tracking, you'll see:

MLOps gaps: • SEED → add torch.manual_seed(42); np.random.seed(42) before model init • TRACKING → add wandb.init(project="...", config=cfg)

→ Proceed as-is, or I'll add these automatically?


Commands Reference

Command Description Example
/research <topic> Multi-source AI/ML research → synthesized brief /research mixture of experts routing strategies
/experiment <desc> Complete ML experiment scaffold /experiment train GPT-2 small on code, JAX/Flax, MLflow
/mlops <type> MLOps infrastructure generation /mlops docker ci-cd serving or /mlops all
/review-ml <path> ML code review: 6 dimensions, line-level findings /review-ml src/training/trainer.py

/mlops component types

Type What Gets Generated
docker Multi-stage Dockerfile + docker-compose + .dockerignore optimized for ML
ci-cd 4 GitHub Actions workflows (test, train, evaluate, deploy)
tracking Unified W&B + MLflow tracker wrapper + run naming conventions
serving FastAPI inference server with async batching + Prometheus metrics
monitoring Drift detection + Grafana dashboards + Prometheus alert rules
k8s K8s deployment, HPA, service, configmap, secret template manifests
all Everything above

/review-ml scoring dimensions

Dimension What Gets Checked
ML Correctness Data leakage, loss function, DataLoader shuffle, eval mode, gradient handling
Reproducibility Seeds (torch/numpy/random/CUDA), config externalization, env pinning
Experiment Tracking W&B/MLflow init, hyperparameter logging, metric granularity, artifact versioning
Performance num_workers, mixed precision, CPU syncs in training loop, memory leaks
Code Quality Hardcoded paths, type hints, structured logging, error handling
Production Readiness Dynamic device detection, OOM handling, checkpoint resume logic

Hooks: Detailed Behavior

Prompt Architect

Every user message
    ├── Bypass (starts * / # | follow-up | conversational) → silent pass-through
    ├── Domain: GENERAL                                    → silent pass-through
    ├── Domain: ML/MLOps/Agents/DevOps + clarity ≥ 7     → silent pass-through
    └── Domain: ML/MLOps/Agents/DevOps + clarity < 7
            └── ask_user: 1–3 targeted, consequential questions
                    ├── User answers → executes with full context
                    └── User skips  → executes with available context

Bypass prefixes (skip Prompt Architect entirely):

Prefix When to use
* Force-proceed on any prompt: * just do it exactly as described
/ All slash commands bypass automatically
# Memory updates, instruction overrides

Token overhead: ~250 tokens when fired; zero for bypassed messages.

MLOps Guard

Before every Write or Edit tool call
    ├── Not a .py file                        → approve silently
    ├── Test/config/utility filename pattern  → approve silently
    ├── No ML imports in content              → approve silently
    └── ML training script detected
            ├── 2–4 checks pass → approve silently
            └── 0–1 checks pass
                    └── ask_user: specific gaps + one-line fixes
                            ├── "proceed"   → write file as-is
                            └── "add them"  → Claude inserts patterns, then writes

Four checks: seed setup · experiment tracking · no hardcoded paths · dynamic device detection

Token overhead: ~175 tokens per Python write when fired; zero for non-ML files.


Skills (on-demand, loaded when triggered)

Skill Trigger phrases
prompt-architect "improve this prompt", "prompt engineering", "why did this prompt fail", "craft a prompt for [task]"
research-workflow "design an experiment", "ablation study", "literature review", "evaluation framework", "research roadmap"
mlops-standards "MLOps best practices", "reproducibility standards", "deployment checklist", "production ML", "MLOps maturity"

Each skill uses progressive disclosure: core knowledge in SKILL.md (~1,500 tokens), detailed references loaded on demand.


Research Synthesizer Agent

Autonomous multi-source research. No hand-holding required.

Triggers on: "do a deep dive on...", "research all approaches to...", "find best current methods for..."

Protocol: formulates search strategy → systematic multi-source search → critical evaluation by actionability → synthesized brief → saves to file

Stop condition: ≥5 high-quality primary sources, or 20 tool calls (whichever first)

Output structure:

Executive Summary
Landscape Map (organized by technique, not chronologically)
Top Methods (with trade-offs: compute, ease, production-readiness)
Comparison Table
Open Problems
Recommended Starting Point (fastest to run + highest potential)
Key References (arXiv IDs, GitHub repos, star counts)

Customization

See configs/README.md for tuning:

  • Hook clarity threshold (default: 7/10)
  • CI/CD bypass via CLAUDE_SKIP_HOOKS=1 environment variable
  • MLOps guard strictness (warn vs. block mode)
  • Domain keyword extension

Examples


Project Structure

tkml-research-suite/
├── .claude-plugin/plugin.json    # Plugin manifest
├── hooks/hooks.json              # UserPromptSubmit + PreToolUse hooks
├── commands/
│   ├── research.md               # /research command
│   ├── experiment.md             # /experiment command
│   ├── mlops.md                  # /mlops command
│   └── review-ml.md              # /review-ml command
├── skills/
│   ├── prompt-architect/         # Prompt evaluation + improvement framework
│   ├── research-workflow/        # Experiment design + literature review
│   └── mlops-standards/          # MLOps maturity + reproducibility standards
├── agents/
│   └── research-synthesizer.md  # Autonomous research agent
├── configs/README.md             # Customization guide
├── examples/                     # Concrete usage examples
├── .github/                      # Issue templates, PR template
├── CHANGELOG.md
└── CONTRIBUTING.md

Contributing

See CONTRIBUTING.md — covers local development setup, testing hooks, and the PR process.


License

MIT — see LICENSE.


Built and maintained by TechKnowmad AI

About

[Legacy] ML research suite. Evolved into github.com/TECHKNOWMAD-LABS

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors