[PERF] improve MPM solver speed.#2720
Open
Kashu7100 wants to merge 6 commits intoGenesis-Embodied-AI:mainfrom
Open
[PERF] improve MPM solver speed.#2720Kashu7100 wants to merge 6 commits intoGenesis-Embodied-AI:mainfrom
Kashu7100 wants to merge 6 commits intoGenesis-Embodied-AI:mainfrom
Conversation
- Drop the unused [f+1] grid frame; grid is only indexed at [f] in p2g/g2p. - Fuse compute_F_tmp + svd on the forward pass to keep F_tmp in registers; keep them separate on the autodiff path so the backward composition is unchanged. - Rate-limit _is_state_valid to every 10 substeps so the NaN check no longer forces a GPU->CPU sync every substep. - Add benches/mpm_rigid_bench.py: Franka-squeezes-elastic-cube harness that reports per-step/substep wall-clock, peak GPU memory, and a mean-position fingerprint so variants can be compared against a baseline run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
MPM materials now expose `needs_svd`. For Elastic(neohooken) and non-viscous Liquid, neither the F-update nor the stress uses U/V/S, so the SVD kernel is pure waste. When no registered material needs SVD, the solver dispatches to a new `compute_F_tmp_only` kernel and p2g reads J from `F_tmp.determinant()` via a qd.static branch (det(F_tmp) == det(S) for qd.svd's proper-rotation U/V). Also split grid reset: the non-differentiable forward path uses `reset_grid`, which skips the grad-buffer writes that `reset_grid_and_grad` still performs on the autodiff path. Halves reset DRAM traffic for forward-only runs. Bench impact (Franka gripper + Elastic(corotation) cube, 4 envs): baseline (3 runs): 9.77 / 9.71 / 9.24 ms/step, mean 9.58 plan 1 (3 runs): 9.73 / 9.59 / 9.68 ms/step, mean 9.67 Within run-to-run variance; corotation still requires SVD on this scene, so the SVD-skip path isn't exercised. Fingerprint identical across runs (0.641477, 0.001549, 0.064053). Verified neohooken / non-viscous liquid scenes now take the SVD-free path end-to-end. Harness: `benches/mpm_rigid_bench.py` now prepends the repo root to sys.path so the editable checkout wins over any site-packages `genesis` namespace stub left behind by a prior wheel install. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Dense reset_grid zeros all ~30K cells per batch every substep even when particles only touch a few percent of them. Replace on the forward path with a per-slot dirty list: - init_grid_fields now allocates grid_dirty_list (substeps_local, max_dirty_cells, B) and grid_dirty_count (substeps_local, B), sized from n_particles * 27 (each particle scatters to a 3^3 neighbourhood, so n_particles*27 is the exact upper bound on distinct cells touched per batch). - p2g's grid-scatter now captures the prior mass via atomic_add; the unique thread that sees prev_mass == 0 appends the flat cell index into grid_dirty_list[f, :, :]. Guarded by qd.static on `_sparse_reset_enabled = not requires_grad` so the autodiff composition of p2g.grad is untouched. - sparse_reset_grid(f) zeros only the cells in grid_dirty_list[f, :, count], then clears the counter in the same kernel. Fields are zero-initialized, so the first pass through each slot is a correct no-op (grid already zero). - substep_pre_coupling forward path now calls sparse_reset_grid instead of reset_grid; the diff path still calls reset_grid_and_grad unchanged. Bench impact (Franka gripper + Elastic(corotation) cube, 4 envs): plan 1 (3 runs): 9.73 / 9.59 / 9.68 ms/step, mean 9.67 plan 2 (3 runs): 9.11 / 9.00 / 9.06 ms/step, mean 9.05 ~6.5% speedup, fingerprint identical (0.641477, 0.001549, 0.064053). For larger grids or sparser particle occupancy the relative win scales with grid_size / (n_particles * 27). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Kashu7100
commented
Apr 20, 2026
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6c117c2c89
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
🔴 Benchmark Regression Detected ➡️ Report |
|
🔴 Benchmark Regression Detected ➡️ Report |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Related Issue
Resolves Genesis-Embodied-AI/Genesis#
Motivation and Context
How Has This Been / Can This Be Tested?
Screenshots (if appropriate):
Checklist:
Submitting Code Changessection of CONTRIBUTING document.