Add OpenMP Parallel Implementation for Belief Propagation Decoder#78
Add OpenMP Parallel Implementation for Belief Propagation Decoder#780xSooki wants to merge 5 commits intoquantumgizmos:mainfrom
Conversation
|
Nice work. Do you have any intuition as to why there is a slow down from 4 ---> 8 threads?
|
|
My intuition would be that with more threads an overhead of managing them also occurs and it makes use of the cache less efficiently. During writing my thesis something of similar sort also occurred thus I only used 5 threads in the end to reap the maximum speed benefits of parallelization. I will look into further speedup improvements. |
|
Is the shared memory for each thread being recopied at each iteration I wonder? However, if the entire PCM is being copied to each thread at each iteration, I could see that this might cause quite some overhead. |
|
I believe that class members should be shared by default (does not copy values). However I will try to benchmark them using the explicit shared keyword. |
|
Another thing to try would be benchmarking on larger LDPC codes. In quantum error correction, we often decode codes over matries of 10,000+ columns. It's possible that the parallelisation overhead could be less of a bottleneck in this regime. |
|
Ahh yes, with 15000 columns I was able to achieve the following results. This time, 8 threads did much better.
|
|
This is great to see. I noticed your benchmark is running over a single random syndrome. Some syndromes are more difficult to decode than others, so this could be accounting for the increased speed at |
|
I have added a
|
|
I have tried some improvements like using locks, omp atomics, or storing the partial results in a matrix and reducing it afterward, thus eliminating the critical section. I managed to get a small speedup for more threads by using atomics for a run with the same specs as #78 (comment). It reduced the time from ~48,000μs to ~42,000μs for 8 threads |
|
@quantumgizmos, just a gentle reminder that today was mentioned as the last day for reviewing open PRs. If it would be helpful to keep iterating, I’d be more than happy to continue working on it. |
|
If you are interested on working on this beyond UnitaryHack, we can explore ways of furhter improving the OpenMp implementation. Let me know :) |
|
Would be more than happy to further work on it. What would be your preferred way of communication? |
|
@0xSooki Great. My email address is joschka@roffe.eu |
This PR introduces OpenMP-based parallelization to the Belief Propagation (BP) decoder, significantly improving performance for large LDPC codes. Closes #72
Summary
Testing
Performance
Bason on the
TEST(BpDecoderParallel, ThreadScaling)benchmark