[MISC] Add support of opt-in shared memory for tiled hessian to improve performance. by duburcqa · Pull Request #2629 · Genesis-Embodied-AI/Genesis

duburcqa · 2026-03-31T10:25:04Z

Related Issue

Resolves #2626

Checklist:

I read the CONTRIBUTING document.
I followed the Submitting Code Changes section of CONTRIBUTING document.
I tagged the title correctly (including BUG FIX/FEATURE/MISC/BREAKING)
I updated the documentation accordingly or no change is needed.
I tested my changes and added instructions on how to test it for reviewers.
I have added tests to cover my changes.
All new and existing tests passed.

duburcqa · 2026-03-31T15:57:46Z

This snippet is crashing on CUDA for now, preventing this PR to pass.

import quadrants as qd

qd.init(arch=qd.cuda, debug=False, cfg_optimization=False)

@qd.kernel
def func_solve_init(
    nt_H: qd.types.ndarray,
):
    BLOCK_DIM = qd.static(64)
    MAX_DOFS = qd.static(111)  # Slightly over 48Kb, 110 would pass

    n_dofs = nt_H.shape[1]
    n_dofs_2 = n_dofs**2
    n_lower_tri = n_dofs * (n_dofs + 1) // 2

    qd.loop_config(block_dim=BLOCK_DIM)
    for tid in range(BLOCK_DIM):
        H = qd.simt.block.SharedArray((MAX_DOFS, MAX_DOFS + 1), qd.f32)

        i_pair = tid
        while i_pair < n_lower_tri:
            i_d1 = qd.cast(qd.floor((qd.sqrt(qd.cast(8 * i_pair + 1, qd.f32)) - 1.0) / 2.0), qd.i32)
            if (i_d1 + 1) * (i_d1 + 2) // 2 <= i_pair:
                i_d1 = i_d1 + 1
            i_d2 = i_pair - i_d1 * (i_d1 + 1) // 2
            H[i_d1, i_d2] = nt_H[0, i_d1, i_d2]
            i_pair = i_pair + BLOCK_DIM

    qd.loop_config(block_dim=BLOCK_DIM)
    for tid in range(BLOCK_DIM):
        H = qd.simt.block.SharedArray((MAX_DOFS, MAX_DOFS + 1), qd.f32)

        i_flat = tid
        while i_flat < n_dofs_2:
            i_d1 = i_flat // n_dofs
            i_d2 = i_flat % n_dofs
            if i_d2 <= i_d1:
                H[i_d1, i_d2] = nt_H[0, i_d1, i_d2]
            i_flat = i_flat + BLOCK_DIM


nt_H = qd.ndarray(dtype=qd.f32, shape=(1, 102, 102))

func_solve_init(nt_H)

Fixed by Genesis-Embodied-AI/quadrants#442

Add support of optin shared memory for tiled hessian.

c9baa9d

duburcqa requested a review from YilingQiao as a code owner March 31, 2026 10:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MISC] Add support of opt-in shared memory for tiled hessian to improve performance.#2629

[MISC] Add support of opt-in shared memory for tiled hessian to improve performance.#2629
duburcqa wants to merge 1 commit intoGenesis-Embodied-AI:mainfrom
duburcqa:optin_shared_memory

duburcqa commented Mar 31, 2026 •

edited

Loading

Uh oh!

duburcqa commented Mar 31, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

duburcqa commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related Issue

Checklist:

Uh oh!

duburcqa commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

duburcqa commented Mar 31, 2026 •

edited

Loading

duburcqa commented Mar 31, 2026 •

edited

Loading