prover: better parallelization strategy for poseidon table #62

tcoratger · 2025-09-24T19:28:18Z

@TomWambsgans Let me clean this description now that I've a clearer view of where this thing is used.

Using this kind of join parallelization gave us some benefits for some specific scenarios in the past.

For example this sort of trick is used here: https://github.com/WizardOfMenlo/whir/blob/22c675807fc9295fef68a11945713dc3e184e1c1/src/ntt/transpose.rs#L148-L161

The IA seemed to validate this idea by inspecting the prover execution (just ran Claude Code with this proposed idea to validate).

Let me know if you see any sort of benefit when compared to your traditional benchmarks.

TomWambsgans · 2025-09-24T20:44:29Z

Do you have an intuition on why:

pub fn all_poseidon_16_indexes(poseidons_16: &[WitnessPoseidon16]) -> [Vec<F>; 3] {
    let ((addr_a, addr_b), addr_c) = rayon::join(
        || {
            rayon::join(
                || {
                    poseidons_16
                        .par_iter()
                        .map(|p| F::from_usize(p.addr_input_a))
                        .collect()
                },
                || {
                    poseidons_16
                        .par_iter()
                        .map(|p| F::from_usize(p.addr_input_b))
                        .collect()
                },
            )
        },
        || {
            poseidons_16
                .par_iter()
                .map(|p| F::from_usize(p.addr_output))
                .collect()
        },
    );
    [addr_a, addr_b, addr_c]
}

would be faster than:

pub fn all_poseidon_16_indexes(poseidons_16: &[WitnessPoseidon16]) -> [Vec<F>; 3] {
    [
        poseidons_16
            .par_iter()
            .map(|p| F::from_usize(p.addr_input_a))
            .collect::<Vec<_>>(),
        poseidons_16
            .par_iter()
            .map(|p| F::from_usize(p.addr_input_b))
            .collect::<Vec<_>>(),
        poseidons_16
            .par_iter()
            .map(|p| F::from_usize(p.addr_output))
            .collect::<Vec<_>>(),
    ]
}

?

I am not a rayon expert but it my basic understanding tells me it should not improve perf ?

TomWambsgans · 2025-09-24T20:48:23Z

In the example you point to, transpose_square_swap, clearly rayon::join is necessary because otherwise there would be no parallelization at all. But in our case, all_poseidon_16_indexes already has parallelism ?

tcoratger · 2025-09-25T09:30:22Z

Sometimes, indicating to the compiler that this part

poseidons_16
                        .par_iter()
                        .map(|p| F::from_usize(p.addr_input_a))
                        .collect()

can be executed in parallel to this part

poseidons_16
                        .par_iter()
                        .map(|p| F::from_usize(p.addr_input_b))
                        .collect()

which can also be executed in parallel to

poseidons_16
                .par_iter()
                .map(|p| F::from_usize(p.addr_output))
                .collect()

can give a better repartition of the work between threads but this is from my experience by experimenting which things in other repos. I guess this is not always the absolute truth. If in this case this brings nothing to the perfs then this is useless to merge because this brings a bit more line/complexity to the code. Roughly the proposed strategy is the same for the other functions

TomWambsgans · 2025-09-25T16:02:53Z

Hmm ok. I will run some benchmarks when I find some time, making this PR wait should not cause conflict issue think

tcoratger · 2025-09-25T16:05:41Z

Hmm ok. I will run some benchmarks when I find some time, making this PR wait should not cause conflict issue think

Yes no problem, this is an independent file, so we should be able to play with it without any interaction.

tcoratger added 3 commits September 24, 2025 21:26

prover: rm useless poseidon table file

ee67b1d

rm useless ia thinker

45c5cfb

better parallelization

5b7147a

tcoratger changed the title ~~prover: rm useless poseidon table file~~ prover: better parallelization strategy for poseidon table Sep 24, 2025

TomWambsgans force-pushed the main branch from 41ac038 to 32d6243 Compare September 25, 2025 06:28

TomWambsgans force-pushed the main branch 2 times, most recently from 6283fa4 to 4b0ae0a Compare October 7, 2025 06:36

TomWambsgans force-pushed the main branch from 9740f2c to 182d63e Compare October 18, 2025 14:40

TomWambsgans force-pushed the main branch from c7ae238 to 04a0761 Compare October 30, 2025 20:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

prover: better parallelization strategy for poseidon table #62

prover: better parallelization strategy for poseidon table #62

Uh oh!

tcoratger commented Sep 24, 2025 •

edited

Loading

Uh oh!

TomWambsgans commented Sep 24, 2025

Uh oh!

TomWambsgans commented Sep 24, 2025

Uh oh!

tcoratger commented Sep 25, 2025

Uh oh!

TomWambsgans commented Sep 25, 2025

Uh oh!

tcoratger commented Sep 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

prover: better parallelization strategy for poseidon table #62

Are you sure you want to change the base?

prover: better parallelization strategy for poseidon table #62

Uh oh!

Conversation

tcoratger commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TomWambsgans commented Sep 24, 2025

Uh oh!

TomWambsgans commented Sep 24, 2025

Uh oh!

tcoratger commented Sep 25, 2025

Uh oh!

TomWambsgans commented Sep 25, 2025

Uh oh!

tcoratger commented Sep 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tcoratger commented Sep 24, 2025 •

edited

Loading