Runner refactor, MCP server, and simulation skills#133
Conversation
…inting Provides add_common_args, resolve_solver, save_checkpoint, load_checkpoint, run(), DEFAULT_METRICS_CONFIG. Uses orbax for JAX-native array/key checkpointing and atomic JSON for metadata/series. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace custom run loop, append_series, save_run with runner.run(). All physics functions preserved unchanged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Trim common.py to physics-only (remove run/checkpoint infrastructure). All 4 scripts (benchmark_3x3, bond_dim_scan, energy_vs_J, finite_size_scaling) now use runner.run() for their main loop. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Merge z2_vison_propagation (L=6, Fig 5a) and z2_vison_propagation_L10 (L=10, Fig 5b) into a unified set of scripts parameterized by --L. Runner handles checkpointing/resume for all lattice sizes. - ground_state.py: imaginary-time optimization - dynamics.py: real-time vison propagation with selected plaquettes - plot.py: Fig 5a exact open-data comparison - exact.py: JAX Lanczos solver (moved from z2_vison_propagation/) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Split subcommand __main__.py into ground_state.py + dynamics.py using runner. Preserves all Higgs physics: parity-sector model, interior vison pair creation, all-plaquette observables for 2D map snapshots. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace chunked driver.run(k*dt) with runner.run() + log_every. Drop SimulationData dependency. Add CLI via add_common_args. Delete lgt/observables.py (no remaining imports). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace custom run loop with runner.run(T_final=t1). Use resolve_solver from runner. Remove RunConfig dataclass, measurement_row, inline CSV. Keep all physics: smooth cubic ramp, product state init, Czz observables. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Import DEFAULT_METRICS_CONFIG from runner instead of defining local MetricsConfig. All 3 optimization methods (SR fixed, Adam, SR adaptive) preserved as-is for demo purposes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace inline solver dict with resolve_solver(). Remove individual solver imports. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove original exact.py (now in z2_vison/), empty old directories. Full test suite passes (140/140). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Standalone validation script, not an example. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- runner.py: Add load_model_from_checkpoint() for cross-stage handoff (partial orbax restore of tensors only) - z2_vison/physics.py: Shared Hamiltonian, model, plaquette observables, vison insertion. ground_state.py and dynamics.py import from it. - z2_vison_higgs/physics.py: Shared Higgs Hamiltonian, model, plaquette observables, interior vison pair. Scripts import from it. - Remove save_model_state/load_model_state from scripts (runner handles) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Skills: simulate (orchestrator), understand-codebase (developer Q&A), visualize (intelligent plotting with review). MCP server: discovery, compatibility matrix, experience queries, visualization, smoke testing. Includes review agent prompts for physics and code correctness. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
7 tasks: EXPERIENCE.md, discovery tools, compatibility matrix, experience query, visualization, runner tools, config + integration. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Covers contraction strategy, bond dimension, solver choice, SR vs minSR, sampling, time steps, convergence indicators, gauge removal, and diag shift with references to Wu & Liu 2025, Wu & Nys 2026, Liu et al. 2021. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
tools/vmc-mcp/server.py: FastMCP entry point with stdio transport tools/vmc-mcp/discovery.py: list_models, list_operators, list_strategies, list_solvers, list_examples, find_closest_example — all with lazy vmc imports and caching. 19/19 tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- compatibility.py: TERM_MODEL_COMPAT matrix, check_compatibility, check_feasibility with gauge group and term validation - experience.py: parse + query EXPERIENCE.md with keyword matching - visualization.py: plot_convergence, plot_heatmap (both naming conventions), animate (GIF via imageio) - runner_tools.py: smoke_test (subprocess with cleanup), read_checkpoint_metadata - server.py: registers all 14 MCP tools - .claude/settings.json: MCP server config - 45/45 tests passing Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- simulate: Full flow from physics proposal to validated script with two-stage review (physics + code) and smoke test - understand-codebase: Developer/physicist Q&A using MCP discovery tools - visualize: Convergence plots, 2D heatmaps, GIF animations with physical interpretation and review Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7f92c00763
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #133 +/- ##
==========================================
- Coverage 87.42% 86.93% -0.50%
==========================================
Files 45 46 +1
Lines 3388 3658 +270
Branches 297 345 +48
==========================================
+ Hits 2962 3180 +218
- Misses 364 395 +31
- Partials 62 83 +21 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
- Add --output to add_common_args and all script arg parsers - Scripts use args.output or <computed default> - smoke_test passes --output to a temp directory instead of cleaning up user data with shutil.rmtree - Coerce override values to strings for subprocess argv Addresses PR review comments about data deletion and type coercion. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
z2_hardcore_boson scripts and z2_pure_gauge now use add_common_args + parser.set_defaults() for problem-specific defaults, eliminating duplicated arg definitions. Every script using runner.run() now has consistent CLI: --n-samples, --n-chains, --dt, --output, --resume, etc. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 34e505d01e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
- Move examples/runner.py → src/vmc/workflow.py (proper package import) - Add AbstractLog / ConsoleLog / JsonLog / CompositeLog logging interface - Add runtime metadata in latest.json (started, finished, hostname, platform, jax_version) - Replace print() with logging.info() for config table and status messages - Remove setup_logging() auto-call from config.py (library shouldn't configure logging) - Update all scripts: from vmc.workflow import ... (no sys.path for workflow) - Remove sys.path.insert for runner from all scripts (keep only for local physics.py/common.py) - 45/45 tests passing Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update skills to reference vmc.workflow instead of runner. Remove runner.py from MCP discovery skip list (no longer in examples/). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@codex review |
- Use math.ceil instead of round for T_final step count — prevents stopping short of requested final time - Remove --L from smoke_test defaults — dynamics scripts don't accept it (they read L from checkpoint). Pass via overrides when needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 54ffbf0eb3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
- z2_vison: P_{r}{c} → P_{r}_{c} to match visualization underscore format
- smoke_test docstring: warn callers to pass --output for workflow scripts
- test_runner: use default logger instead of pre-opening files in run_dir
- CLAUDE.md: add collaboration and first-principles guidelines, remove
outdated eval API bullet, add efficiency priority
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9523ffd5bc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
- smoke_test always uses tmpdir for --output (prevents data deletion) - Add ruff as dev dependency with PostToolUse format hook - CLAUDE.md: remove outdated eval API and personal preference bullets (moved to ~/.claude/CLAUDE.md) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@codex review |
1 similar comment
|
@codex review |
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@codex review |
1 similar comment
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ac01e7ac7c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
N: int | None for Z_N group order (None = no gauge → PEPS) Qx: int for background charge (0 = even, 1 = odd) Eliminates fragile string parsing that rejected valid "odd_Z2". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b7c231d0fa
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b7c231d0fa
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
- raise ValueError if --resume but no checkpoint exists - Compare full config (model, shape, dt, n_samples, n_chains, full_gradient, solver_space, diag_shift, extra) on resume, warn per-key on any difference Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@codex review |
1 similar comment
|
@codex review |
… push Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@codex review |
1 similar comment
|
@codex review |
Extract repo slug from git remote URL and match against push output, instead of pattern-matching the command string. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d57f3212f9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if not resume and run_dir.exists(): | ||
| shutil.rmtree(run_dir) |
There was a problem hiding this comment.
Preserve existing run directories on fresh runs
The new run() implementation unconditionally deletes run_dir whenever resume=False, which can erase previous checkpoints/metrics and any other artifacts if a user reruns with the same output path. This is destructive behavior for experiment workflows and makes accidental data loss likely; a safer default is to keep existing contents (or fail fast) unless explicit overwrite is requested.
Useful? React with 👍 / 👎.
| if err: | ||
| ax.errorbar(x, y, yerr=err, linewidth=1.2, capsize=2, label=key) | ||
| else: | ||
| ax.plot(x, y, linewidth=1.2, label=key) |
There was a problem hiding this comment.
Reject unknown series keys before plotting convergence
When callers pass a key that is not present in metrics.jsonl, y becomes an empty list while x is non-empty, and matplotlib raises a dimension mismatch at plot time. Because MCP users can provide arbitrary keys, this causes plot_convergence to crash instead of returning a useful error response.
Useful? React with 👍 / 👎.
| m = pattern.match(key) | ||
| if m: | ||
| r, c = int(m.group(1)), int(m.group(2)) | ||
| coords[(r, c)] = values[step] |
There was a problem hiding this comment.
Validate heatmap step bounds before indexing series
The grid extractor indexes each observable with values[step] without checking that step is within range. If a caller passes a step index beyond available samples, this throws IndexError and breaks both plot_heatmap and animate instead of returning a controlled "step out of range" response.
Useful? React with 👍 / 👎.
| return {"path": "", "description": "No frames generated."} | ||
|
|
||
| out_path = str(Path(run_dir) / "animation.gif") | ||
| iio.imwrite(out_path, frames, duration=1000 // fps, loop=0) |
There was a problem hiding this comment.
Enforce positive FPS for GIF generation
GIF duration is computed as 1000 // fps with no input validation, so fps=0 causes a ZeroDivisionError and negative values produce invalid timing. Since fps is user-provided through the MCP tool, this should be validated early to avoid runtime crashes.
Useful? React with 👍 / 👎.
|
@codex review |
1 similar comment
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d57f3212f9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| "hooks": [ | ||
| { | ||
| "type": "command", | ||
| "command": "REPO=$(git remote get-url origin 2>/dev/null | sed 's/.*github.com[:/]//') && jq -r '.tool_response // \"\"' | grep -qF \"$REPO\" && gh pr view --json number -q .number >/dev/null 2>&1 && gh pr comment --body '@codex review' 2>/dev/null || true" |
There was a problem hiding this comment.
Restore push-only check before posting review comment
This hook no longer verifies that the Bash tool actually ran git push; it now triggers solely when tool_response contains the repo slug. In practice, other commands (for example git remote -v, and any output containing that slug) can satisfy this condition and cause unrelated gh pr comment --body '@codex review' calls, producing duplicate/noisy review comments on an open PR. Keeping a tool_input.command push guard in addition to the output check would avoid these false positives.
Useful? React with 👍 / 👎.
Summary
examples/runner.py): Shared run infrastructure with orbax checkpointing, config table, per-step printing, series accumulation. All 14 example scripts refactored to use it.tools/vmc-mcp/): 14 structured tools for codebase discovery, compatibility checking, experience queries, visualization (convergence plots, 2D heatmaps, GIF animations), and smoke testing..claude/skills/):simulate(full physics-to-script pipeline with two-stage review),understand-codebase(developer/physicist Q&A),visualize(intelligent result plotting with interpretation).Runner refactor details
load_model_from_checkpoint()for cross-stage handoff (ground state → dynamics)physics.pymodules for z2_vison and z2_vison_higgslgt/observables.py,exact_tdvp_3x3_check.py, old__main__.pyfilesMCP server tools
Test plan
🤖 Generated with Claude Code