Skip to content

Runner refactor, MCP server, and simulation skills#133

Merged
fliingelephant merged 55 commits intomainfrom
worktree-toasty-sniffing-sedgewick
Mar 27, 2026
Merged

Runner refactor, MCP server, and simulation skills#133
fliingelephant merged 55 commits intomainfrom
worktree-toasty-sniffing-sedgewick

Conversation

@fliingelephant
Copy link
Copy Markdown
Owner

Summary

  • Runner module (examples/runner.py): Shared run infrastructure with orbax checkpointing, config table, per-step printing, series accumulation. All 14 example scripts refactored to use it.
  • MCP server (tools/vmc-mcp/): 14 structured tools for codebase discovery, compatibility checking, experience queries, visualization (convergence plots, 2D heatmaps, GIF animations), and smoke testing.
  • Skills (.claude/skills/): simulate (full physics-to-script pipeline with two-stage review), understand-codebase (developer/physicist Q&A), visualize (intelligent result plotting with interpretation).
  • EXPERIENCE.md: Practitioner knowledge base (contraction strategy, bond dimension, solver choice, convergence, etc.)

Runner refactor details

  • Orbax for JAX-native array/PRNG key checkpointing
  • load_model_from_checkpoint() for cross-stage handoff (ground state → dynamics)
  • Physics extracted into physics.py modules for z2_vison and z2_vison_higgs
  • z2_vison_propagation + z2_vison_propagation_L10 consolidated into z2_vison/
  • z2_vison_higgs_confinement consolidated into z2_vison_higgs/
  • Deleted: lgt/observables.py, exact_tdvp_3x3_check.py, old __main__.py files
  • z2_hardcore_boson/common.py trimmed to physics-only

MCP server tools

Module Tools
discovery list_models, list_operators, list_strategies, list_solvers, list_examples, find_closest_example
compatibility check_compatibility, check_feasibility
experience query_experience
visualization plot_convergence, plot_heatmap, animate
runner_tools smoke_test, read_checkpoint_metadata

Test plan

  • 140 core tests pass
  • 5 runner tests pass
  • 40 MCP tests pass (discovery, compatibility, experience, visualization, runner_tools)
  • Full suite: 185 tests, 0 failures

🤖 Generated with Claude Code

fliingelephant and others added 19 commits March 20, 2026 07:54
Signed-off-by: 周唤海 <albus.zhouhh@gmail.com>
…inting

Provides add_common_args, resolve_solver, save_checkpoint, load_checkpoint,
run(), DEFAULT_METRICS_CONFIG. Uses orbax for JAX-native array/key
checkpointing and atomic JSON for metadata/series.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace custom run loop, append_series, save_run with runner.run().
All physics functions preserved unchanged.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Trim common.py to physics-only (remove run/checkpoint infrastructure).
All 4 scripts (benchmark_3x3, bond_dim_scan, energy_vs_J,
finite_size_scaling) now use runner.run() for their main loop.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Merge z2_vison_propagation (L=6, Fig 5a) and z2_vison_propagation_L10
(L=10, Fig 5b) into a unified set of scripts parameterized by --L.
Runner handles checkpointing/resume for all lattice sizes.

- ground_state.py: imaginary-time optimization
- dynamics.py: real-time vison propagation with selected plaquettes
- plot.py: Fig 5a exact open-data comparison
- exact.py: JAX Lanczos solver (moved from z2_vison_propagation/)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Split subcommand __main__.py into ground_state.py + dynamics.py using
runner. Preserves all Higgs physics: parity-sector model, interior vison
pair creation, all-plaquette observables for 2D map snapshots.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace chunked driver.run(k*dt) with runner.run() + log_every.
Drop SimulationData dependency. Add CLI via add_common_args.
Delete lgt/observables.py (no remaining imports).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace custom run loop with runner.run(T_final=t1). Use resolve_solver
from runner. Remove RunConfig dataclass, measurement_row, inline CSV.
Keep all physics: smooth cubic ramp, product state init, Czz observables.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Import DEFAULT_METRICS_CONFIG from runner instead of defining local
MetricsConfig. All 3 optimization methods (SR fixed, Adam, SR adaptive)
preserved as-is for demo purposes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace inline solver dict with resolve_solver(). Remove individual
solver imports.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove original exact.py (now in z2_vison/), empty old directories.
Full test suite passes (140/140).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Standalone validation script, not an example.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- runner.py: Add load_model_from_checkpoint() for cross-stage handoff
  (partial orbax restore of tensors only)
- z2_vison/physics.py: Shared Hamiltonian, model, plaquette observables,
  vison insertion. ground_state.py and dynamics.py import from it.
- z2_vison_higgs/physics.py: Shared Higgs Hamiltonian, model, plaquette
  observables, interior vison pair. Scripts import from it.
- Remove save_model_state/load_model_state from scripts (runner handles)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Skills: simulate (orchestrator), understand-codebase (developer Q&A),
visualize (intelligent plotting with review). MCP server: discovery,
compatibility matrix, experience queries, visualization, smoke testing.
Includes review agent prompts for physics and code correctness.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
7 tasks: EXPERIENCE.md, discovery tools, compatibility matrix,
experience query, visualization, runner tools, config + integration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Covers contraction strategy, bond dimension, solver choice, SR vs minSR,
sampling, time steps, convergence indicators, gauge removal, and diag shift
with references to Wu & Liu 2025, Wu & Nys 2026, Liu et al. 2021.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
tools/vmc-mcp/server.py: FastMCP entry point with stdio transport
tools/vmc-mcp/discovery.py: list_models, list_operators, list_strategies,
  list_solvers, list_examples, find_closest_example — all with lazy vmc
  imports and caching. 19/19 tests passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- compatibility.py: TERM_MODEL_COMPAT matrix, check_compatibility,
  check_feasibility with gauge group and term validation
- experience.py: parse + query EXPERIENCE.md with keyword matching
- visualization.py: plot_convergence, plot_heatmap (both naming
  conventions), animate (GIF via imageio)
- runner_tools.py: smoke_test (subprocess with cleanup),
  read_checkpoint_metadata
- server.py: registers all 14 MCP tools
- .claude/settings.json: MCP server config
- 45/45 tests passing

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- simulate: Full flow from physics proposal to validated script with
  two-stage review (physics + code) and smoke test
- understand-codebase: Developer/physicist Q&A using MCP discovery tools
- visualize: Convergence plots, 2D heatmaps, GIF animations with
  physical interpretation and review

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7f92c00763

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 24, 2026

Codecov Report

❌ Patch coverage is 73.35766% with 73 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.93%. Comparing base (6d325b7) to head (d57f321).
⚠️ Report is 56 commits behind head on main.

Files with missing lines Patch % Lines
src/vmc/workflow.py 73.33% 47 Missing and 25 partials ⚠️
src/vmc/config.py 75.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #133      +/-   ##
==========================================
- Coverage   87.42%   86.93%   -0.50%     
==========================================
  Files          45       46       +1     
  Lines        3388     3658     +270     
  Branches      297      345      +48     
==========================================
+ Hits         2962     3180     +218     
- Misses        364      395      +31     
- Partials       62       83      +21     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

fliingelephant and others added 2 commits March 24, 2026 17:32
- Add --output to add_common_args and all script arg parsers
- Scripts use args.output or <computed default>
- smoke_test passes --output to a temp directory instead of cleaning
  up user data with shutil.rmtree
- Coerce override values to strings for subprocess argv

Addresses PR review comments about data deletion and type coercion.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
z2_hardcore_boson scripts and z2_pure_gauge now use add_common_args +
parser.set_defaults() for problem-specific defaults, eliminating
duplicated arg definitions. Every script using runner.run() now has
consistent CLI: --n-samples, --n-chains, --dt, --output, --resume, etc.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@fliingelephant
Copy link
Copy Markdown
Owner Author

@codex review

@fliingelephant fliingelephant linked an issue Mar 24, 2026 that may be closed by this pull request
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 34e505d01e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

fliingelephant and others added 2 commits March 25, 2026 17:34
- Move examples/runner.py → src/vmc/workflow.py (proper package import)
- Add AbstractLog / ConsoleLog / JsonLog / CompositeLog logging interface
- Add runtime metadata in latest.json (started, finished, hostname, platform, jax_version)
- Replace print() with logging.info() for config table and status messages
- Remove setup_logging() auto-call from config.py (library shouldn't configure logging)
- Update all scripts: from vmc.workflow import ... (no sys.path for workflow)
- Remove sys.path.insert for runner from all scripts (keep only for local physics.py/common.py)
- 45/45 tests passing

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update skills to reference vmc.workflow instead of runner.
Remove runner.py from MCP discovery skip list (no longer in examples/).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@fliingelephant
Copy link
Copy Markdown
Owner Author

@codex review

- Use math.ceil instead of round for T_final step count — prevents
  stopping short of requested final time
- Remove --L from smoke_test defaults — dynamics scripts don't accept
  it (they read L from checkpoint). Pass via overrides when needed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@fliingelephant
Copy link
Copy Markdown
Owner Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 54ffbf0eb3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

- z2_vison: P_{r}{c} → P_{r}_{c} to match visualization underscore format
- smoke_test docstring: warn callers to pass --output for workflow scripts
- test_runner: use default logger instead of pre-opening files in run_dir
- CLAUDE.md: add collaboration and first-principles guidelines, remove
  outdated eval API bullet, add efficiency priority

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@fliingelephant
Copy link
Copy Markdown
Owner Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9523ffd5bc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

- smoke_test always uses tmpdir for --output (prevents data deletion)
- Add ruff as dev dependency with PostToolUse format hook
- CLAUDE.md: remove outdated eval API and personal preference bullets
  (moved to ~/.claude/CLAUDE.md)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@fliingelephant
Copy link
Copy Markdown
Owner Author

@codex review

1 similar comment
@fliingelephant
Copy link
Copy Markdown
Owner Author

@codex review

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@fliingelephant
Copy link
Copy Markdown
Owner Author

@codex review

1 similar comment
@fliingelephant
Copy link
Copy Markdown
Owner Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ac01e7ac7c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

N: int | None for Z_N group order (None = no gauge → PEPS)
Qx: int for background charge (0 = even, 1 = odd)

Eliminates fragile string parsing that rejected valid "odd_Z2".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@fliingelephant
Copy link
Copy Markdown
Owner Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b7c231d0fa

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@fliingelephant
Copy link
Copy Markdown
Owner Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b7c231d0fa

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

- raise ValueError if --resume but no checkpoint exists
- Compare full config (model, shape, dt, n_samples, n_chains,
  full_gradient, solver_space, diag_shift, extra) on resume,
  warn per-key on any difference

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@fliingelephant
Copy link
Copy Markdown
Owner Author

@codex review

1 similar comment
@fliingelephant
Copy link
Copy Markdown
Owner Author

@codex review

… push

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@fliingelephant
Copy link
Copy Markdown
Owner Author

@codex review

1 similar comment
@fliingelephant
Copy link
Copy Markdown
Owner Author

@codex review

Extract repo slug from git remote URL and match against push output,
instead of pattern-matching the command string.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@fliingelephant
Copy link
Copy Markdown
Owner Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d57f3212f9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +260 to +261
if not resume and run_dir.exists():
shutil.rmtree(run_dir)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve existing run directories on fresh runs

The new run() implementation unconditionally deletes run_dir whenever resume=False, which can erase previous checkpoints/metrics and any other artifacts if a user reruns with the same output path. This is destructive behavior for experiment workflows and makes accidental data loss likely; a safer default is to keep existing contents (or fail fast) unless explicit overwrite is requested.

Useful? React with 👍 / 👎.

if err:
ax.errorbar(x, y, yerr=err, linewidth=1.2, capsize=2, label=key)
else:
ax.plot(x, y, linewidth=1.2, label=key)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject unknown series keys before plotting convergence

When callers pass a key that is not present in metrics.jsonl, y becomes an empty list while x is non-empty, and matplotlib raises a dimension mismatch at plot time. Because MCP users can provide arbitrary keys, this causes plot_convergence to crash instead of returning a useful error response.

Useful? React with 👍 / 👎.

m = pattern.match(key)
if m:
r, c = int(m.group(1)), int(m.group(2))
coords[(r, c)] = values[step]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Validate heatmap step bounds before indexing series

The grid extractor indexes each observable with values[step] without checking that step is within range. If a caller passes a step index beyond available samples, this throws IndexError and breaks both plot_heatmap and animate instead of returning a controlled "step out of range" response.

Useful? React with 👍 / 👎.

return {"path": "", "description": "No frames generated."}

out_path = str(Path(run_dir) / "animation.gif")
iio.imwrite(out_path, frames, duration=1000 // fps, loop=0)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 Badge Enforce positive FPS for GIF generation

GIF duration is computed as 1000 // fps with no input validation, so fps=0 causes a ZeroDivisionError and negative values produce invalid timing. Since fps is user-provided through the MCP tool, this should be validated early to avoid runtime crashes.

Useful? React with 👍 / 👎.

@fliingelephant fliingelephant merged commit d57f321 into main Mar 27, 2026
1 of 3 checks passed
@fliingelephant
Copy link
Copy Markdown
Owner Author

@codex review

1 similar comment
@fliingelephant
Copy link
Copy Markdown
Owner Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d57f3212f9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

"hooks": [
{
"type": "command",
"command": "REPO=$(git remote get-url origin 2>/dev/null | sed 's/.*github.com[:/]//') && jq -r '.tool_response // \"\"' | grep -qF \"$REPO\" && gh pr view --json number -q .number >/dev/null 2>&1 && gh pr comment --body '@codex review' 2>/dev/null || true"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Restore push-only check before posting review comment

This hook no longer verifies that the Bash tool actually ran git push; it now triggers solely when tool_response contains the repo slug. In practice, other commands (for example git remote -v, and any output containing that slug) can satisfy this condition and cause unrelated gh pr comment --body '@codex review' calls, producing duplicate/noisy review comments on an open PR. Keeping a tool_input.command push guard in addition to the output check would avoid these false positives.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unify runner interfaces for examples

1 participant