TKML Research Suite — AI/ML

Enterprise-grade Claude plugin for AI/ML research, MLOps automation, and autonomous research workflows. Built by TechKnowmad AI for serious AI/ML practitioners.

What It Does

Three intelligent layers added to your Claude sessions:

1. Domain-Aware Prompt Architecture (automatic) — Fires on every message. Classifies your request (ML research, MLOps, agents, DevOps), scores clarity 1–10. Clear prompts pass through silently. Vague ML/research/agent requests get 1–3 targeted, domain-specific clarifying questions. Eliminates wasted iterations on complex tasks.

2. MLOps Quality Gate (automatic) — Fires before writing any Python file. Audits ML training scripts for the four most-missed production patterns: reproducibility seeds, experiment tracking, clean paths, and device detection. Offers to auto-fix before writing.

3. Research & Development Commands (on demand) — Four slash commands: deep research synthesis, full experiment scaffolding, MLOps infrastructure generation, and ML code review.

Quickstart

Install

Download tkml-research-suite.plugin
Open Claude Desktop → Plugins → Install → select the file
Restart Claude — hooks activate automatically on next message

No configuration required. No external API keys needed.

First Use in 60 Seconds

The hook saves you an iteration:

train a model on my NER dataset

Quick context check:

Dataset size and label schema? (e.g., 8k sentences, CoNLL-style BIO tags)

HuggingFace Trainer, PyTorch custom loop, or spaCy?

Target metric and compute budget?

Scaffold a complete experiment:

/experiment fine-tune DistilBERT on 15k customer intent dataset, 8 classes, W&B tracking, RTX 3090

Generates full directory structure, training loop, Hydra configs, Dockerfile, CI/CD workflows, and Makefile.

Deep research with saved brief:

/research speculative decoding — production-ready approaches 2024

Searches arXiv, engineering blogs, and GitHub. Synthesizes findings. Saves research-speculative-decoding-YYYYMMDD.md.

MLOps quality gate in action:

When you write a training script without seeds or tracking, you'll see:

MLOps gaps: • SEED → add torch.manual_seed(42); np.random.seed(42) before model init • TRACKING → add wandb.init(project="...", config=cfg)

→ Proceed as-is, or I'll add these automatically?

Commands Reference

Command	Description	Example
`/research <topic>`	Multi-source AI/ML research → synthesized brief	`/research mixture of experts routing strategies`
`/experiment <desc>`	Complete ML experiment scaffold	`/experiment train GPT-2 small on code, JAX/Flax, MLflow`
`/mlops <type>`	MLOps infrastructure generation	`/mlops docker ci-cd serving` or `/mlops all`
`/review-ml <path>`	ML code review: 6 dimensions, line-level findings	`/review-ml src/training/trainer.py`

`/mlops` component types

Type	What Gets Generated
`docker`	Multi-stage Dockerfile + docker-compose + .dockerignore optimized for ML
`ci-cd`	4 GitHub Actions workflows (test, train, evaluate, deploy)
`tracking`	Unified W&B + MLflow tracker wrapper + run naming conventions
`serving`	FastAPI inference server with async batching + Prometheus metrics
`monitoring`	Drift detection + Grafana dashboards + Prometheus alert rules
`k8s`	K8s deployment, HPA, service, configmap, secret template manifests
`all`	Everything above

`/review-ml` scoring dimensions

Dimension	What Gets Checked
ML Correctness	Data leakage, loss function, DataLoader shuffle, eval mode, gradient handling
Reproducibility	Seeds (torch/numpy/random/CUDA), config externalization, env pinning
Experiment Tracking	W&B/MLflow init, hyperparameter logging, metric granularity, artifact versioning
Performance	num_workers, mixed precision, CPU syncs in training loop, memory leaks
Code Quality	Hardcoded paths, type hints, structured logging, error handling
Production Readiness	Dynamic device detection, OOM handling, checkpoint resume logic

Hooks: Detailed Behavior

Prompt Architect

Every user message
    ├── Bypass (starts * / # | follow-up | conversational) → silent pass-through
    ├── Domain: GENERAL                                    → silent pass-through
    ├── Domain: ML/MLOps/Agents/DevOps + clarity ≥ 7     → silent pass-through
    └── Domain: ML/MLOps/Agents/DevOps + clarity < 7
            └── ask_user: 1–3 targeted, consequential questions
                    ├── User answers → executes with full context
                    └── User skips  → executes with available context

Bypass prefixes (skip Prompt Architect entirely):

Prefix	When to use
`*`	Force-proceed on any prompt: `* just do it exactly as described`
`/`	All slash commands bypass automatically
`#`	Memory updates, instruction overrides

Token overhead: ~250 tokens when fired; zero for bypassed messages.

MLOps Guard

Before every Write or Edit tool call
    ├── Not a .py file                        → approve silently
    ├── Test/config/utility filename pattern  → approve silently
    ├── No ML imports in content              → approve silently
    └── ML training script detected
            ├── 2–4 checks pass → approve silently
            └── 0–1 checks pass
                    └── ask_user: specific gaps + one-line fixes
                            ├── "proceed"   → write file as-is
                            └── "add them"  → Claude inserts patterns, then writes

Four checks: seed setup · experiment tracking · no hardcoded paths · dynamic device detection

Token overhead: ~175 tokens per Python write when fired; zero for non-ML files.

Skills (on-demand, loaded when triggered)

Skill	Trigger phrases
`prompt-architect`	"improve this prompt", "prompt engineering", "why did this prompt fail", "craft a prompt for [task]"
`research-workflow`	"design an experiment", "ablation study", "literature review", "evaluation framework", "research roadmap"
`mlops-standards`	"MLOps best practices", "reproducibility standards", "deployment checklist", "production ML", "MLOps maturity"

Each skill uses progressive disclosure: core knowledge in SKILL.md (~1,500 tokens), detailed references loaded on demand.

Research Synthesizer Agent

Autonomous multi-source research. No hand-holding required.

Triggers on: "do a deep dive on...", "research all approaches to...", "find best current methods for..."

Protocol: formulates search strategy → systematic multi-source search → critical evaluation by actionability → synthesized brief → saves to file

Stop condition: ≥5 high-quality primary sources, or 20 tool calls (whichever first)

Output structure:

Executive Summary
Landscape Map (organized by technique, not chronologically)
Top Methods (with trade-offs: compute, ease, production-readiness)
Comparison Table
Open Problems
Recommended Starting Point (fastest to run + highest potential)
Key References (arXiv IDs, GitHub repos, star counts)

Customization

See configs/README.md for tuning:

Hook clarity threshold (default: 7/10)
CI/CD bypass via CLAUDE_SKIP_HOOKS=1 environment variable
MLOps guard strictness (warn vs. block mode)
Domain keyword extension

Examples

examples/prompt-transformations.md — 5 before/after prompts with domain classification and clarity scores
examples/mlops-guard-scenarios.md — 6 real intercept/bypass scenarios with exact hook output

Project Structure

tkml-research-suite/
├── .claude-plugin/plugin.json    # Plugin manifest
├── hooks/hooks.json              # UserPromptSubmit + PreToolUse hooks
├── commands/
│   ├── research.md               # /research command
│   ├── experiment.md             # /experiment command
│   ├── mlops.md                  # /mlops command
│   └── review-ml.md              # /review-ml command
├── skills/
│   ├── prompt-architect/         # Prompt evaluation + improvement framework
│   ├── research-workflow/        # Experiment design + literature review
│   └── mlops-standards/          # MLOps maturity + reproducibility standards
├── agents/
│   └── research-synthesizer.md  # Autonomous research agent
├── configs/README.md             # Customization guide
├── examples/                     # Concrete usage examples
├── .github/                      # Issue templates, PR template
├── CHANGELOG.md
└── CONTRIBUTING.md

Contributing

See CONTRIBUTING.md — covers local development setup, testing hooks, and the PR process.

License

MIT — see LICENSE.

Built and maintained by TechKnowmad AI

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TKML Research Suite — AI/ML

What It Does

Quickstart

Install

First Use in 60 Seconds

Commands Reference

`/mlops` component types

`/review-ml` scoring dimensions

Hooks: Detailed Behavior

Prompt Architect

MLOps Guard

Skills (on-demand, loaded when triggered)

Research Synthesizer Agent

Customization

Examples

Project Structure

Contributing

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.claude-plugin		.claude-plugin
.github		.github
agents		agents
commands		commands
configs		configs
examples		examples
hooks		hooks
skills		skills
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

TKML Research Suite — AI/ML

What It Does

Quickstart

Install

First Use in 60 Seconds

Commands Reference

/mlops component types

/review-ml scoring dimensions

Hooks: Detailed Behavior

Prompt Architect

MLOps Guard

Skills (on-demand, loaded when triggered)

Research Synthesizer Agent

Customization

Examples

Project Structure

Contributing

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

`/mlops` component types

`/review-ml` scoring dimensions

Packages