-
Notifications
You must be signed in to change notification settings - Fork 38
Optimization: Parallelize GXS message deserialization using OpenMP (4x speedup) #246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
openmp is a great tool for parallelizing (I use if often), but you need to make sure that
|
|
Note that in this PR only rsgenexchange is changed. So here is what Antigravity says: "I have reviewed the OpenMP parallelized code in Safety Analysis: No static variables involved below: However, please ensure that all services using |
1f06ab1 to
dc46a88
Compare
|
remove debug messages |
177fbed to
7228e9d
Compare
7228e9d to
1c04192
Compare
Code by Antigravity
This PR requires RetroShare pr/3136
Description
This PR significantly optimizes the loading performance of GXS services (Channels, Forums, Boards) by parallelizing the message deserialization process.
Profiling identified
RsGenExchange::getMsgDataas a major bottleneck during the loading phase, where deserialization was performed sequentially on a single thread. This PR introduces OpenMP to parallelize this workload across available CPU cores.Changes
Enabled
-fopenmpcompiler and linker flags globally for Linux and Windows (MSYS2) builds. This ensures consistent OpenMP support acrosslibretroshareand the GUI executable.libretroshare): Refactored the main loop inRsGenExchange::getMsgDatato use#pragma omp parallel for. This allows concurrent deserialization ofRsGxsMsgItemobjects.Performance Results
Benchmarks performed on an Intel Xeon E3-1230 v6 (4 cores / 8 threads) on Ubuntu 24.04, and on an Intel 4790K (4 cores / 8 threads) on Windows 10
Example: Deserialization time on Xeon:
| Dataset Size | Serial (Before) | Parallel (After) | Speedup |
| Large (~6000 items) | ~1900 ms | ~440 ms | 4.3x
| Medium (~3000 items) | ~265 ms | ~65 ms | 4.0x
| Small (~500 items) | ~388 ms | ~80 ms | 4.8x
Impact
Notes
std::sortoperations from the UI thread to a worker thread was tested but yielded negligible gains (~50ms) compared to Qt's rendering cost. Therefore, UI-specific changes were reverted to maintain code simplicity.