Conversation
|
BTW I suspect a lot of the slowness is because of how the rade_dsp functions work (i.e. they're not actually inlined unless optimization is turned on, and as a result we have overhead trying to return a non-trivial object back to the caller). |
|
Should probably ping @drowe67 and @peterbmarks too. |
|
@peterbmarks - pls hold off on merge on this one. @tmiw - I think we are slipping into "RADE V1 maintenance mode" which I think we agreed at PLT was not our current strategy? I'm not convinced these optimisations are necessary. Before we go down this path, does simply building with Release get us the performance we need? If there is a justification for optimisation, then happy to discuss it. If not - then we all have a lot of other high priority work to do and should focus there. |
|
FWIW, here's a comparison between mainThis PRI'd say ~20% improvement in RX but it's already pretty fast without this change, so we can defer review until later. |
|
Impressive! If it passes the tests I think we should merge it. But I agree optimisation should mostly come later. Peter |
|
Thanks @tmiw. Couple of thoughts:
#2 is the critical question I think. |
|
@tmiw - as the work in this PR has not been signed off pls ensure the main branch of this repo is used for any distribution of the |
|
I have a user who is (barely) able to run on a Pi 3 with 1GB of RAM. Leighton wrote just now: Hi Peter, I just finished compiling the latest code on an old stock standard Raspberry Pi 3 with 1G memory. I started the build with the latest clean headless image (trixie). I wasn't able to get it running with ALSA as I had wanted (thinking that this might be lighter?) so I had to install pulseaudio and compile with the pulse audio libraries. The best way I can describe it is ALSA resulted in broken audio packets being transmitted - I didn't test on receive before moving to pulse. Running in receive, without any sync, top shows the program CPU usage at 105%. In transmit mode, the program CPU usage drops to around 60%. Even with the CPU over 100% the PI3 was still responding promptly between receive and transmit. I just did a quick test on-air with Joe. With RADE sync, the CPU was still up around 105%. There was some audio underrun with the occasional "pop" throughout reception and the decoder spent a few seconds catching up on buffered receive audio. Even with these limitations, I was able to clearly understand Joe (SNR was reporting 22db my end). Anyway, I just thought that it would be worth letting you know as this seems promising for running on anything with a little more processing power (or even more functional on a Pi3 with some optimisation?). When I get a chance, I will see if there is anything that I can do to squeeze some more out of the Pi. Regards, |
|
That's an interesting data point Peter. Jean-Marc has told me the FARGAN Vocoder (which dominates theoretical CPU by a factor of 10) should run on a Pi 3 as minimum. If Leighton wishes to see more optimisation work done by the team pls encourage him to submit a feature request form. |
|
Couple of things..
Peter |
Yes, as agreed at our last PLT I will be signing off on this code before release. |
|
Oops, sorry, pressed the wrong button 😃 |
FR created: drowe67/freedv-gui#1254
The current recommended audio engine these days is pipewire, with Pulse being the next best if pipewire isn't possible for whatever reason.
The changes do remove some recursion that was added during initial debugging, which might make it easier for the compiler to optimize. |
Basic optimizations in DSP and acquisition code. Seems to reduce percentage used by
rade_acq_check_pilotsandrade_acq_detect_pilotsinperfby around 25% or so. Tested via:and then viewing the generated report via
perf report(ensuring we zoom into librade.so).