Enterprise-grade Claude plugin for AI/ML research, MLOps automation, and autonomous research workflows. Built by TechKnowmad AI for serious AI/ML practitioners.
Three intelligent layers added to your Claude sessions:
1. Domain-Aware Prompt Architecture (automatic) — Fires on every message. Classifies your request (ML research, MLOps, agents, DevOps), scores clarity 1–10. Clear prompts pass through silently. Vague ML/research/agent requests get 1–3 targeted, domain-specific clarifying questions. Eliminates wasted iterations on complex tasks.
2. MLOps Quality Gate (automatic) — Fires before writing any Python file. Audits ML training scripts for the four most-missed production patterns: reproducibility seeds, experiment tracking, clean paths, and device detection. Offers to auto-fix before writing.
3. Research & Development Commands (on demand) — Four slash commands: deep research synthesis, full experiment scaffolding, MLOps infrastructure generation, and ML code review.
- Download
tkml-research-suite.plugin - Open Claude Desktop → Plugins → Install → select the file
- Restart Claude — hooks activate automatically on next message
No configuration required. No external API keys needed.
The hook saves you an iteration:
train a model on my NER dataset
Quick context check:
- Dataset size and label schema? (e.g., 8k sentences, CoNLL-style BIO tags)
- HuggingFace Trainer, PyTorch custom loop, or spaCy?
- Target metric and compute budget?
Scaffold a complete experiment:
/experiment fine-tune DistilBERT on 15k customer intent dataset, 8 classes, W&B tracking, RTX 3090
Generates full directory structure, training loop, Hydra configs, Dockerfile, CI/CD workflows, and Makefile.
Deep research with saved brief:
/research speculative decoding — production-ready approaches 2024
Searches arXiv, engineering blogs, and GitHub. Synthesizes findings. Saves research-speculative-decoding-YYYYMMDD.md.
MLOps quality gate in action:
When you write a training script without seeds or tracking, you'll see:
MLOps gaps: • SEED → add
torch.manual_seed(42); np.random.seed(42)before model init • TRACKING → addwandb.init(project="...", config=cfg)→ Proceed as-is, or I'll add these automatically?
| Command | Description | Example |
|---|---|---|
/research <topic> |
Multi-source AI/ML research → synthesized brief | /research mixture of experts routing strategies |
/experiment <desc> |
Complete ML experiment scaffold | /experiment train GPT-2 small on code, JAX/Flax, MLflow |
/mlops <type> |
MLOps infrastructure generation | /mlops docker ci-cd serving or /mlops all |
/review-ml <path> |
ML code review: 6 dimensions, line-level findings | /review-ml src/training/trainer.py |
| Type | What Gets Generated |
|---|---|
docker |
Multi-stage Dockerfile + docker-compose + .dockerignore optimized for ML |
ci-cd |
4 GitHub Actions workflows (test, train, evaluate, deploy) |
tracking |
Unified W&B + MLflow tracker wrapper + run naming conventions |
serving |
FastAPI inference server with async batching + Prometheus metrics |
monitoring |
Drift detection + Grafana dashboards + Prometheus alert rules |
k8s |
K8s deployment, HPA, service, configmap, secret template manifests |
all |
Everything above |
| Dimension | What Gets Checked |
|---|---|
| ML Correctness | Data leakage, loss function, DataLoader shuffle, eval mode, gradient handling |
| Reproducibility | Seeds (torch/numpy/random/CUDA), config externalization, env pinning |
| Experiment Tracking | W&B/MLflow init, hyperparameter logging, metric granularity, artifact versioning |
| Performance | num_workers, mixed precision, CPU syncs in training loop, memory leaks |
| Code Quality | Hardcoded paths, type hints, structured logging, error handling |
| Production Readiness | Dynamic device detection, OOM handling, checkpoint resume logic |
Every user message
├── Bypass (starts * / # | follow-up | conversational) → silent pass-through
├── Domain: GENERAL → silent pass-through
├── Domain: ML/MLOps/Agents/DevOps + clarity ≥ 7 → silent pass-through
└── Domain: ML/MLOps/Agents/DevOps + clarity < 7
└── ask_user: 1–3 targeted, consequential questions
├── User answers → executes with full context
└── User skips → executes with available context
Bypass prefixes (skip Prompt Architect entirely):
| Prefix | When to use |
|---|---|
* |
Force-proceed on any prompt: * just do it exactly as described |
/ |
All slash commands bypass automatically |
# |
Memory updates, instruction overrides |
Token overhead: ~250 tokens when fired; zero for bypassed messages.
Before every Write or Edit tool call
├── Not a .py file → approve silently
├── Test/config/utility filename pattern → approve silently
├── No ML imports in content → approve silently
└── ML training script detected
├── 2–4 checks pass → approve silently
└── 0–1 checks pass
└── ask_user: specific gaps + one-line fixes
├── "proceed" → write file as-is
└── "add them" → Claude inserts patterns, then writes
Four checks: seed setup · experiment tracking · no hardcoded paths · dynamic device detection
Token overhead: ~175 tokens per Python write when fired; zero for non-ML files.
| Skill | Trigger phrases |
|---|---|
prompt-architect |
"improve this prompt", "prompt engineering", "why did this prompt fail", "craft a prompt for [task]" |
research-workflow |
"design an experiment", "ablation study", "literature review", "evaluation framework", "research roadmap" |
mlops-standards |
"MLOps best practices", "reproducibility standards", "deployment checklist", "production ML", "MLOps maturity" |
Each skill uses progressive disclosure: core knowledge in SKILL.md (~1,500 tokens), detailed references loaded on demand.
Autonomous multi-source research. No hand-holding required.
Triggers on: "do a deep dive on...", "research all approaches to...", "find best current methods for..."
Protocol: formulates search strategy → systematic multi-source search → critical evaluation by actionability → synthesized brief → saves to file
Stop condition: ≥5 high-quality primary sources, or 20 tool calls (whichever first)
Output structure:
Executive Summary
Landscape Map (organized by technique, not chronologically)
Top Methods (with trade-offs: compute, ease, production-readiness)
Comparison Table
Open Problems
Recommended Starting Point (fastest to run + highest potential)
Key References (arXiv IDs, GitHub repos, star counts)
See configs/README.md for tuning:
- Hook clarity threshold (default: 7/10)
- CI/CD bypass via
CLAUDE_SKIP_HOOKS=1environment variable - MLOps guard strictness (warn vs. block mode)
- Domain keyword extension
examples/prompt-transformations.md— 5 before/after prompts with domain classification and clarity scoresexamples/mlops-guard-scenarios.md— 6 real intercept/bypass scenarios with exact hook output
tkml-research-suite/
├── .claude-plugin/plugin.json # Plugin manifest
├── hooks/hooks.json # UserPromptSubmit + PreToolUse hooks
├── commands/
│ ├── research.md # /research command
│ ├── experiment.md # /experiment command
│ ├── mlops.md # /mlops command
│ └── review-ml.md # /review-ml command
├── skills/
│ ├── prompt-architect/ # Prompt evaluation + improvement framework
│ ├── research-workflow/ # Experiment design + literature review
│ └── mlops-standards/ # MLOps maturity + reproducibility standards
├── agents/
│ └── research-synthesizer.md # Autonomous research agent
├── configs/README.md # Customization guide
├── examples/ # Concrete usage examples
├── .github/ # Issue templates, PR template
├── CHANGELOG.md
└── CONTRIBUTING.md
See CONTRIBUTING.md — covers local development setup, testing hooks, and the PR process.
MIT — see LICENSE.
Built and maintained by TechKnowmad AI