Unified workflow, MCP server, skills, and example refactor#135
Unified workflow, MCP server, skills, and example refactor#135fliingelephant merged 57 commits intomainfrom
Conversation
…inting Provides add_common_args, resolve_solver, save_checkpoint, load_checkpoint, run(), DEFAULT_METRICS_CONFIG. Uses orbax for JAX-native array/key checkpointing and atomic JSON for metadata/series. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace custom run loop, append_series, save_run with runner.run(). All physics functions preserved unchanged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Trim common.py to physics-only (remove run/checkpoint infrastructure). All 4 scripts (benchmark_3x3, bond_dim_scan, energy_vs_J, finite_size_scaling) now use runner.run() for their main loop. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Merge z2_vison_propagation (L=6, Fig 5a) and z2_vison_propagation_L10 (L=10, Fig 5b) into a unified set of scripts parameterized by --L. Runner handles checkpointing/resume for all lattice sizes. - ground_state.py: imaginary-time optimization - dynamics.py: real-time vison propagation with selected plaquettes - plot.py: Fig 5a exact open-data comparison - exact.py: JAX Lanczos solver (moved from z2_vison_propagation/) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Split subcommand __main__.py into ground_state.py + dynamics.py using runner. Preserves all Higgs physics: parity-sector model, interior vison pair creation, all-plaquette observables for 2D map snapshots. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace chunked driver.run(k*dt) with runner.run() + log_every. Drop SimulationData dependency. Add CLI via add_common_args. Delete lgt/observables.py (no remaining imports). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace custom run loop with runner.run(T_final=t1). Use resolve_solver from runner. Remove RunConfig dataclass, measurement_row, inline CSV. Keep all physics: smooth cubic ramp, product state init, Czz observables. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Import DEFAULT_METRICS_CONFIG from runner instead of defining local MetricsConfig. All 3 optimization methods (SR fixed, Adam, SR adaptive) preserved as-is for demo purposes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace inline solver dict with resolve_solver(). Remove individual solver imports. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove original exact.py (now in z2_vison/), empty old directories. Full test suite passes (140/140). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Standalone validation script, not an example. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- runner.py: Add load_model_from_checkpoint() for cross-stage handoff (partial orbax restore of tensors only) - z2_vison/physics.py: Shared Hamiltonian, model, plaquette observables, vison insertion. ground_state.py and dynamics.py import from it. - z2_vison_higgs/physics.py: Shared Higgs Hamiltonian, model, plaquette observables, interior vison pair. Scripts import from it. - Remove save_model_state/load_model_state from scripts (runner handles) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Skills: simulate (orchestrator), understand-codebase (developer Q&A), visualize (intelligent plotting with review). MCP server: discovery, compatibility matrix, experience queries, visualization, smoke testing. Includes review agent prompts for physics and code correctness. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
7 tasks: EXPERIENCE.md, discovery tools, compatibility matrix, experience query, visualization, runner tools, config + integration. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Covers contraction strategy, bond dimension, solver choice, SR vs minSR, sampling, time steps, convergence indicators, gauge removal, and diag shift with references to Wu & Liu 2025, Wu & Nys 2026, Liu et al. 2021. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
tools/vmc-mcp/server.py: FastMCP entry point with stdio transport tools/vmc-mcp/discovery.py: list_models, list_operators, list_strategies, list_solvers, list_examples, find_closest_example — all with lazy vmc imports and caching. 19/19 tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- compatibility.py: TERM_MODEL_COMPAT matrix, check_compatibility, check_feasibility with gauge group and term validation - experience.py: parse + query EXPERIENCE.md with keyword matching - visualization.py: plot_convergence, plot_heatmap (both naming conventions), animate (GIF via imageio) - runner_tools.py: smoke_test (subprocess with cleanup), read_checkpoint_metadata - server.py: registers all 14 MCP tools - .claude/settings.json: MCP server config - 45/45 tests passing Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- simulate: Full flow from physics proposal to validated script with two-stage review (physics + code) and smoke test - understand-codebase: Developer/physicist Q&A using MCP discovery tools - visualize: Convergence plots, 2D heatmaps, GIF animations with physical interpretation and review Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add --output to add_common_args and all script arg parsers - Scripts use args.output or <computed default> - smoke_test passes --output to a temp directory instead of cleaning up user data with shutil.rmtree - Coerce override values to strings for subprocess argv Addresses PR review comments about data deletion and type coercion. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
z2_hardcore_boson scripts and z2_pure_gauge now use add_common_args + parser.set_defaults() for problem-specific defaults, eliminating duplicated arg definitions. Every script using runner.run() now has consistent CLI: --n-samples, --n-chains, --dt, --output, --resume, etc. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Move examples/runner.py → src/vmc/workflow.py (proper package import) - Add AbstractLog / ConsoleLog / JsonLog / CompositeLog logging interface - Add runtime metadata in latest.json (started, finished, hostname, platform, jax_version) - Replace print() with logging.info() for config table and status messages - Remove setup_logging() auto-call from config.py (library shouldn't configure logging) - Update all scripts: from vmc.workflow import ... (no sys.path for workflow) - Remove sys.path.insert for runner from all scripts (keep only for local physics.py/common.py) - 45/45 tests passing Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update skills to reference vmc.workflow instead of runner. Remove runner.py from MCP discovery skip list (no longer in examples/). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Use math.ceil instead of round for T_final step count — prevents stopping short of requested final time - Remove --L from smoke_test defaults — dynamics scripts don't accept it (they read L from checkpoint). Pass via overrides when needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- run() takes n_steps OR T, not both. Explicit TypeError on conflict. - T is duration from current time (like NetKet experimental TDVP), not absolute final time. - Remove --n-steps and --T-final from add_common_args — scripts declare whichever they need with their own default. - Update all dynamics scripts: --T with default 20.0 - Update quench: T = t1 - t0 (duration, not absolute) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
T (duration) breaks resume — would add more time instead of reaching the original target. T_final is absolute: same value always means "reach t=20" regardless of checkpoint position. API: run(driver, n_steps=400, ...) or run(driver, T_final=20.0, ...) with mutual exclusion validation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Like --L, --n-steps is script-specific (ground-state scripts accept it, dynamics scripts don't). Smoke test should only inject args that all scripts accept via add_common_args. Callers pass --n-steps or --T-final via overrides. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- T_final: validate remaining/dt is near-integer, use round() instead of ceil(). Errors on non-multiple T_final/dt. Handles float noise correctly (round(4.99999)=5, round(0.00001)=0). - smoke_test: remove _SMOKE_DEFAULTS entirely. Caller provides exactly the args the target script accepts via overrides. Prevents false failures on scripts that don't use add_common_args. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Boolean True emits only the flag (--resume), False skips it. Previously --resume True was emitted which argparse store_true rejects. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- save_checkpoint: write to _latest_new, then delete old latest, then rename. Old checkpoint survives until new one is fully written. - z2_pure_gauge.py, schmitt_tfim_quench.py: pass resume=args.resume. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ointer Split checkpoint into 'tensors' and 'sampler' items via item_names + Composite args. load_model_from_checkpoint restores only the 'tensors' item — proper CheckpointManager API, no fallback to PyTreeCheckpointer. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add step to _build_step_item (shows in console and JSONL) - Default out = CompositeLog(ConsoleLog(), JsonLog(metrics.jsonl)) - Remove unused jnp, math imports - Simplify _json_default: hasattr(obj, 'item') covers numpy + JAX - Simplify sampler reshape: always .reshape(n_chains, -1), no if - Pass runtime to config table instead of re-querying jax.devices() - Add __all__ for clean public API Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Save/restore driver.t in sampler checkpoint state (as JAX scalar) - Add read_config() to workflow — reads orbax metadata without tensors - z2_vison/dynamics.py and z2_vison_higgs/dynamics.py use read_config() instead of latest.json (which no longer exists) - Remove unused json import from z2_vison_higgs/dynamics.py Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add full_gradient=args.full_gradient to z2_pure_gauge, all 4 z2_hardcore_boson scripts, and schmitt_tfim_quench - plot.py reads metrics.jsonl + orbax metadata instead of latest.json Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- run() configures basicConfig(INFO) if no handlers set — fixes silent output when scripts don't call setup_logging() - Remove hard-coded GAUGE_GROUPS dict — accept any Z_N (GIPEPS is generic in N) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- check_feasibility: gauge_group.upper() != "NONE" handles None/NONE/none - visualization: drop ambiguous concat regex (P_00), use only underscore-separated (P_0_0) which handles multi-digit indices - Test updated: verify multi-digit indices (P_10_11_mean) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… dicts Add SOLVERS and SPACES dicts to vmc.workflow (same pattern). All scripts that use add_common_args now forward args.solver and args.solver_space to TDVPDriver construction — no more accepted-but-ignored CLI flags. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Caller provides all args via overrides, no hidden defaults. Removes tempfile dependency. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…riational - boundary_dim: default=None in add_common_args, per-script resolution (bond_dim**2 for standard PEPS, 3*bond_dim for GI-PEPS) - gauge_removal: forward GaugeConfig to all 12 SRPreconditioner calls - Replace ZipUp with Variational in z2/z3/odd_z2 pure gauge scripts - Fix multi-tau_q collision when --output is provided (Schmitt) - Fix duplicate space= kwarg and duplicate import in z2_vison_higgs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…oading - Only call _build_step_item when step will be logged (was called every step) - visualization.py: reuse runner_tools._load_jsonl/_jsonl_to_columnar - z2_pure_gauge.py: remove redundant boundary_dim local variable Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fresh runs (resume=False) unlink metrics.jsonl before logging to prevent stale rows from prior runs corrupting analysis - read_checkpoint_metadata returns empty series when metrics.jsonl doesn't exist yet (valid for mid-run or custom-logger checkpoints) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
VMC_LOG_LEVEL was legacy. Library should not define custom log env vars — users control verbosity via standard Python logging. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fresh runs (resume=False) rmtree the entire run_dir before starting, preventing Orbax step-already-exists errors on reuse - Removes now-redundant metrics.jsonl unlink (directory is wiped) - smoke_test catches TimeoutExpired and returns structured response Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- z2_vison: P_{r}{c} → P_{r}_{c} to match visualization underscore format
- smoke_test docstring: warn callers to pass --output for workflow scripts
- test_runner: use default logger instead of pre-opening files in run_dir
- CLAUDE.md: add collaboration and first-principles guidelines, remove
outdated eval API bullet, add efficiency priority
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- smoke_test always uses tmpdir for --output (prevents data deletion) - Add ruff as dev dependency with PostToolUse format hook - CLAUDE.md: remove outdated eval API and personal preference bullets (moved to ~/.claude/CLAUDE.md) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
N: int | None for Z_N group order (None = no gauge → PEPS) Qx: int for background charge (0 = even, 1 = odd) Eliminates fragile string parsing that rejected valid "odd_Z2". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- raise ValueError if --resume but no checkpoint exists - Compare full config (model, shape, dt, n_samples, n_chains, full_gradient, solver_space, diag_shift, extra) on resume, warn per-key on any difference Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… push Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extract repo slug from git remote URL and match against push output, instead of pattern-matching the command string. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Testing
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #135 +/- ##
==========================================
- Coverage 87.42% 86.87% -0.56%
==========================================
Files 45 46 +1
Lines 3388 3657 +269
Branches 297 345 +48
==========================================
+ Hits 2962 3177 +215
- Misses 364 396 +32
- Partials 62 84 +22 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@codex review the whole runner UI/UX |
Checkpoint writes now overlap with TDVP steps instead of blocking. Only wait once at the end to ensure the final save completes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c6c0fab41b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| return {"path": "", "description": "No frames generated."} | ||
|
|
||
| out_path = str(Path(run_dir) / "animation.gif") | ||
| iio.imwrite(out_path, frames, duration=1000 // fps, loop=0) |
There was a problem hiding this comment.
Pass GIF frame duration in seconds
imageio's GIF writer interprets duration in seconds, but this code passes 1000 // fps (milliseconds logic). With the default fps=5, the output gets duration=200, i.e. ~200 seconds per frame, making generated animations effectively unusable for visualization workflows.
Useful? React with 👍 / 👎.
| if step % log_every == 0 or step == target_step: | ||
| item = _build_step_item(driver, observable_names) | ||
| out(step, item) | ||
| if step % save_every == 0 or step == target_step: |
There was a problem hiding this comment.
Validate log/save intervals before modulo checks
log_every and save_every are accepted as arbitrary integers from CLI/common args, but the run loop uses them directly in modulo expressions. Passing 0 (or other invalid values) causes a runtime ZeroDivisionError on the first step, which aborts long experiments instead of failing fast with a clear argument validation error.
Useful? React with 👍 / 👎.
Summary
src/vmc/workflow.py: Unified run loop with Orbax CheckpointManager, JSONL logging (AbstractLog/ConsoleLog/JsonLog/CompositeLog), CLI viaadd_common_args, resume support with per-key config warnings,FileExistsErroron non-empty run_dirtools/vmc-mcp/: MCP server with 14 tools — discovery, compatibility (typed N/Qx API), experience, visualization (underscore-separated plaquette naming), runner tools (smoke_test with isolated tmpdir output)--solver,--solver-space,--full-gradient,--boundary-dim,--gauge-removal; per-script boundary_dim resolution; multi-tau_q collision fixTest plan
uv run pytest -m "not slow"passes--helpand forward all CLI args🤖 Generated with Claude Code