Skip to content

Unified workflow, MCP server, skills, and example refactor#135

Merged
fliingelephant merged 57 commits intomainfrom
worktree-toasty-sniffing-sedgewick
Mar 27, 2026
Merged

Unified workflow, MCP server, skills, and example refactor#135
fliingelephant merged 57 commits intomainfrom
worktree-toasty-sniffing-sedgewick

Conversation

@fliingelephant
Copy link
Copy Markdown
Owner

Summary

  • src/vmc/workflow.py: Unified run loop with Orbax CheckpointManager, JSONL logging (AbstractLog/ConsoleLog/JsonLog/CompositeLog), CLI via add_common_args, resume support with per-key config warnings, FileExistsError on non-empty run_dir
  • tools/vmc-mcp/: MCP server with 14 tools — discovery, compatibility (typed N/Qx API), experience, visualization (underscore-separated plaquette naming), runner tools (smoke_test with isolated tmpdir output)
  • 12 example scripts refactored to use workflow: forwarding --solver, --solver-space, --full-gradient, --boundary-dim, --gauge-removal; per-script boundary_dim resolution; multi-tau_q collision fix
  • Skills: simulate, understand-codebase, visualize
  • Hooks: ruff format on edit, @codex review on push
  • Config: drop VMC_LOG_LEVEL, lazy logging fallback, ruff as dev dependency

Test plan

  • uv run pytest -m "not slow" passes
  • Example scripts accept --help and forward all CLI args
  • Fresh run errors on non-empty directory
  • Resume restores tensors, sampler, time; warns on config changes
  • Resume without checkpoint raises ValueError
  • MCP tools return correct results for Z_N with Qx

🤖 Generated with Claude Code

fliingelephant and others added 30 commits March 20, 2026 07:54
Signed-off-by: 周唤海 <albus.zhouhh@gmail.com>
…inting

Provides add_common_args, resolve_solver, save_checkpoint, load_checkpoint,
run(), DEFAULT_METRICS_CONFIG. Uses orbax for JAX-native array/key
checkpointing and atomic JSON for metadata/series.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace custom run loop, append_series, save_run with runner.run().
All physics functions preserved unchanged.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Trim common.py to physics-only (remove run/checkpoint infrastructure).
All 4 scripts (benchmark_3x3, bond_dim_scan, energy_vs_J,
finite_size_scaling) now use runner.run() for their main loop.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Merge z2_vison_propagation (L=6, Fig 5a) and z2_vison_propagation_L10
(L=10, Fig 5b) into a unified set of scripts parameterized by --L.
Runner handles checkpointing/resume for all lattice sizes.

- ground_state.py: imaginary-time optimization
- dynamics.py: real-time vison propagation with selected plaquettes
- plot.py: Fig 5a exact open-data comparison
- exact.py: JAX Lanczos solver (moved from z2_vison_propagation/)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Split subcommand __main__.py into ground_state.py + dynamics.py using
runner. Preserves all Higgs physics: parity-sector model, interior vison
pair creation, all-plaquette observables for 2D map snapshots.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace chunked driver.run(k*dt) with runner.run() + log_every.
Drop SimulationData dependency. Add CLI via add_common_args.
Delete lgt/observables.py (no remaining imports).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace custom run loop with runner.run(T_final=t1). Use resolve_solver
from runner. Remove RunConfig dataclass, measurement_row, inline CSV.
Keep all physics: smooth cubic ramp, product state init, Czz observables.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Import DEFAULT_METRICS_CONFIG from runner instead of defining local
MetricsConfig. All 3 optimization methods (SR fixed, Adam, SR adaptive)
preserved as-is for demo purposes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace inline solver dict with resolve_solver(). Remove individual
solver imports.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove original exact.py (now in z2_vison/), empty old directories.
Full test suite passes (140/140).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Standalone validation script, not an example.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- runner.py: Add load_model_from_checkpoint() for cross-stage handoff
  (partial orbax restore of tensors only)
- z2_vison/physics.py: Shared Hamiltonian, model, plaquette observables,
  vison insertion. ground_state.py and dynamics.py import from it.
- z2_vison_higgs/physics.py: Shared Higgs Hamiltonian, model, plaquette
  observables, interior vison pair. Scripts import from it.
- Remove save_model_state/load_model_state from scripts (runner handles)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Skills: simulate (orchestrator), understand-codebase (developer Q&A),
visualize (intelligent plotting with review). MCP server: discovery,
compatibility matrix, experience queries, visualization, smoke testing.
Includes review agent prompts for physics and code correctness.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
7 tasks: EXPERIENCE.md, discovery tools, compatibility matrix,
experience query, visualization, runner tools, config + integration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Covers contraction strategy, bond dimension, solver choice, SR vs minSR,
sampling, time steps, convergence indicators, gauge removal, and diag shift
with references to Wu & Liu 2025, Wu & Nys 2026, Liu et al. 2021.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
tools/vmc-mcp/server.py: FastMCP entry point with stdio transport
tools/vmc-mcp/discovery.py: list_models, list_operators, list_strategies,
  list_solvers, list_examples, find_closest_example — all with lazy vmc
  imports and caching. 19/19 tests passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- compatibility.py: TERM_MODEL_COMPAT matrix, check_compatibility,
  check_feasibility with gauge group and term validation
- experience.py: parse + query EXPERIENCE.md with keyword matching
- visualization.py: plot_convergence, plot_heatmap (both naming
  conventions), animate (GIF via imageio)
- runner_tools.py: smoke_test (subprocess with cleanup),
  read_checkpoint_metadata
- server.py: registers all 14 MCP tools
- .claude/settings.json: MCP server config
- 45/45 tests passing

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- simulate: Full flow from physics proposal to validated script with
  two-stage review (physics + code) and smoke test
- understand-codebase: Developer/physicist Q&A using MCP discovery tools
- visualize: Convergence plots, 2D heatmaps, GIF animations with
  physical interpretation and review

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add --output to add_common_args and all script arg parsers
- Scripts use args.output or <computed default>
- smoke_test passes --output to a temp directory instead of cleaning
  up user data with shutil.rmtree
- Coerce override values to strings for subprocess argv

Addresses PR review comments about data deletion and type coercion.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
z2_hardcore_boson scripts and z2_pure_gauge now use add_common_args +
parser.set_defaults() for problem-specific defaults, eliminating
duplicated arg definitions. Every script using runner.run() now has
consistent CLI: --n-samples, --n-chains, --dt, --output, --resume, etc.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Move examples/runner.py → src/vmc/workflow.py (proper package import)
- Add AbstractLog / ConsoleLog / JsonLog / CompositeLog logging interface
- Add runtime metadata in latest.json (started, finished, hostname, platform, jax_version)
- Replace print() with logging.info() for config table and status messages
- Remove setup_logging() auto-call from config.py (library shouldn't configure logging)
- Update all scripts: from vmc.workflow import ... (no sys.path for workflow)
- Remove sys.path.insert for runner from all scripts (keep only for local physics.py/common.py)
- 45/45 tests passing

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update skills to reference vmc.workflow instead of runner.
Remove runner.py from MCP discovery skip list (no longer in examples/).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Use math.ceil instead of round for T_final step count — prevents
  stopping short of requested final time
- Remove --L from smoke_test defaults — dynamics scripts don't accept
  it (they read L from checkpoint). Pass via overrides when needed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- run() takes n_steps OR T, not both. Explicit TypeError on conflict.
- T is duration from current time (like NetKet experimental TDVP),
  not absolute final time.
- Remove --n-steps and --T-final from add_common_args — scripts declare
  whichever they need with their own default.
- Update all dynamics scripts: --T with default 20.0
- Update quench: T = t1 - t0 (duration, not absolute)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
T (duration) breaks resume — would add more time instead of reaching
the original target. T_final is absolute: same value always means
"reach t=20" regardless of checkpoint position.

API: run(driver, n_steps=400, ...) or run(driver, T_final=20.0, ...)
with mutual exclusion validation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Like --L, --n-steps is script-specific (ground-state scripts accept it,
dynamics scripts don't). Smoke test should only inject args that all
scripts accept via add_common_args. Callers pass --n-steps or --T-final
via overrides.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- T_final: validate remaining/dt is near-integer, use round() instead of
  ceil(). Errors on non-multiple T_final/dt. Handles float noise correctly
  (round(4.99999)=5, round(0.00001)=0).
- smoke_test: remove _SMOKE_DEFAULTS entirely. Caller provides exactly
  the args the target script accepts via overrides. Prevents false
  failures on scripts that don't use add_common_args.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Boolean True emits only the flag (--resume), False skips it.
Previously --resume True was emitted which argparse store_true rejects.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- save_checkpoint: write to _latest_new, then delete old latest, then
  rename. Old checkpoint survives until new one is fully written.
- z2_pure_gauge.py, schmitt_tfim_quench.py: pass resume=args.resume.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fliingelephant and others added 23 commits March 25, 2026 22:56
…ointer

Split checkpoint into 'tensors' and 'sampler' items via item_names +
Composite args. load_model_from_checkpoint restores only the 'tensors'
item — proper CheckpointManager API, no fallback to PyTreeCheckpointer.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add step to _build_step_item (shows in console and JSONL)
- Default out = CompositeLog(ConsoleLog(), JsonLog(metrics.jsonl))
- Remove unused jnp, math imports
- Simplify _json_default: hasattr(obj, 'item') covers numpy + JAX
- Simplify sampler reshape: always .reshape(n_chains, -1), no if
- Pass runtime to config table instead of re-querying jax.devices()
- Add __all__ for clean public API

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Save/restore driver.t in sampler checkpoint state (as JAX scalar)
- Add read_config() to workflow — reads orbax metadata without tensors
- z2_vison/dynamics.py and z2_vison_higgs/dynamics.py use read_config()
  instead of latest.json (which no longer exists)
- Remove unused json import from z2_vison_higgs/dynamics.py

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add full_gradient=args.full_gradient to z2_pure_gauge, all 4
  z2_hardcore_boson scripts, and schmitt_tfim_quench
- plot.py reads metrics.jsonl + orbax metadata instead of latest.json

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- run() configures basicConfig(INFO) if no handlers set — fixes silent
  output when scripts don't call setup_logging()
- Remove hard-coded GAUGE_GROUPS dict — accept any Z_N (GIPEPS is
  generic in N)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- check_feasibility: gauge_group.upper() != "NONE" handles None/NONE/none
- visualization: drop ambiguous concat regex (P_00), use only
  underscore-separated (P_0_0) which handles multi-digit indices
- Test updated: verify multi-digit indices (P_10_11_mean)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… dicts

Add SOLVERS and SPACES dicts to vmc.workflow (same pattern). All scripts
that use add_common_args now forward args.solver and args.solver_space
to TDVPDriver construction — no more accepted-but-ignored CLI flags.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Caller provides all args via overrides, no hidden defaults.
Removes tempfile dependency.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…riational

- boundary_dim: default=None in add_common_args, per-script resolution
  (bond_dim**2 for standard PEPS, 3*bond_dim for GI-PEPS)
- gauge_removal: forward GaugeConfig to all 12 SRPreconditioner calls
- Replace ZipUp with Variational in z2/z3/odd_z2 pure gauge scripts
- Fix multi-tau_q collision when --output is provided (Schmitt)
- Fix duplicate space= kwarg and duplicate import in z2_vison_higgs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…oading

- Only call _build_step_item when step will be logged (was called every step)
- visualization.py: reuse runner_tools._load_jsonl/_jsonl_to_columnar
- z2_pure_gauge.py: remove redundant boundary_dim local variable

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fresh runs (resume=False) unlink metrics.jsonl before logging to
  prevent stale rows from prior runs corrupting analysis
- read_checkpoint_metadata returns empty series when metrics.jsonl
  doesn't exist yet (valid for mid-run or custom-logger checkpoints)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
VMC_LOG_LEVEL was legacy. Library should not define custom log env vars —
users control verbosity via standard Python logging.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fresh runs (resume=False) rmtree the entire run_dir before starting,
  preventing Orbax step-already-exists errors on reuse
- Removes now-redundant metrics.jsonl unlink (directory is wiped)
- smoke_test catches TimeoutExpired and returns structured response

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- z2_vison: P_{r}{c} → P_{r}_{c} to match visualization underscore format
- smoke_test docstring: warn callers to pass --output for workflow scripts
- test_runner: use default logger instead of pre-opening files in run_dir
- CLAUDE.md: add collaboration and first-principles guidelines, remove
  outdated eval API bullet, add efficiency priority

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- smoke_test always uses tmpdir for --output (prevents data deletion)
- Add ruff as dev dependency with PostToolUse format hook
- CLAUDE.md: remove outdated eval API and personal preference bullets
  (moved to ~/.claude/CLAUDE.md)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
N: int | None for Z_N group order (None = no gauge → PEPS)
Qx: int for background charge (0 = even, 1 = odd)

Eliminates fragile string parsing that rejected valid "odd_Z2".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- raise ValueError if --resume but no checkpoint exists
- Compare full config (model, shape, dt, n_samples, n_chains,
  full_gradient, solver_space, diag_shift, extra) on resume,
  warn per-key on any difference

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… push

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extract repo slug from git remote URL and match against push output,
instead of pattern-matching the command string.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: 周唤海 <albus.zhouhh@gmail.com>
@chatgpt-codex-connector
Copy link
Copy Markdown

Summary

  • Added test_run_fresh_non_empty_dir_raises to assert run() fails with FileExistsError when starting a non-resume run in an already non-empty output directory (matching the PR test plan edge case). tests/test_runner.pyL112-L123
  • Added test_resume_without_checkpoint_raises to assert run() fails with ValueError when resume=True is requested without any existing checkpoint (matching another PR test plan edge case). tests/test_runner.pyL125-L131
  • Committed the change on branch work (0ae259d) and created a follow-up PR via make_pr titled “Add workflow tests for non-empty output and missing resume checkpoints.” tests/test_runner.pyL112-L131

Testing

  • python -m compileall tests/test_runner.py
  • ⚠️ uv run pytest -m "not slow" (dependency download failed in this environment: h11 fetch tunnel error)
  • ⚠️ pytest -m "not slow" (environment is missing required runtime deps such as jax, numpy, orbax, etc.)

View task →

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 27, 2026

Codecov Report

❌ Patch coverage is 72.52747% with 75 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.87%. Comparing base (6d325b7) to head (c6c0fab).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/vmc/workflow.py 72.49% 48 Missing and 26 partials ⚠️
src/vmc/config.py 75.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #135      +/-   ##
==========================================
- Coverage   87.42%   86.87%   -0.56%     
==========================================
  Files          45       46       +1     
  Lines        3388     3657     +269     
  Branches      297      345      +48     
==========================================
+ Hits         2962     3177     +215     
- Misses        364      396      +32     
- Partials       62       84      +22     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@fliingelephant
Copy link
Copy Markdown
Owner Author

@codex review the whole runner UI/UX

Checkpoint writes now overlap with TDVP steps instead of blocking.
Only wait once at the end to ensure the final save completes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@fliingelephant
Copy link
Copy Markdown
Owner Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c6c0fab41b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

return {"path": "", "description": "No frames generated."}

out_path = str(Path(run_dir) / "animation.gif")
iio.imwrite(out_path, frames, duration=1000 // fps, loop=0)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Pass GIF frame duration in seconds

imageio's GIF writer interprets duration in seconds, but this code passes 1000 // fps (milliseconds logic). With the default fps=5, the output gets duration=200, i.e. ~200 seconds per frame, making generated animations effectively unusable for visualization workflows.

Useful? React with 👍 / 👎.

Comment on lines +375 to +378
if step % log_every == 0 or step == target_step:
item = _build_step_item(driver, observable_names)
out(step, item)
if step % save_every == 0 or step == target_step:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Validate log/save intervals before modulo checks

log_every and save_every are accepted as arbitrary integers from CLI/common args, but the run loop uses them directly in modulo expressions. Passing 0 (or other invalid values) causes a runtime ZeroDivisionError on the first step, which aborts long experiments instead of failing fast with a clear argument validation error.

Useful? React with 👍 / 👎.

@fliingelephant fliingelephant merged commit d2f9752 into main Mar 27, 2026
1 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant