- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 895
[RFC] Mysensors HAL retry hack #1477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: development
Are you sure you want to change the base?
[RFC] Mysensors HAL retry hack #1477
Conversation
Signed-off-by: Andrew Andrianov <andrew@ncrmnt.org>
Signed-off-by: Andrew Andrianov <andrew@ncrmnt.org>
| Hi @nekromant, great topic! | 
| Should there be some random (and maybe exponential) backoff/delay? If the first 15x repeats message is lost due to collisions, will transmitting N*15 times more in a continuous stream make the transmission situation better or worse? | 
| @nekromant and @mfalkvidd Instead I recommend to add two hooks in transport HAL for send and receive. Then the user has the opportunity to handle the telegram repetition on send and receive side in his user code sketch. And it will not break any existing code, if the hooks are simply not used. How to handle the repetition in user code would be a nice topic for the forum thread. Also a discussion for optimal values for MY_RF24_SET_ARD and MY_RF24_SET_ARC may be depending from MySensors network setup would be helpful. | 
| 
 @mfalkvidd I'm not sure if it's a good idea to call wait() in transportHAL, so I didn't implement the backoff. But I may be totally wrong, since I'm pretty new to this codebase. 
 I would speculate, that the 15x repeats failure occur when the uplink buffers a burst of packets and slowly chews them, sending further, not listening to whatever is sent, since the radio is only half duplex. If it's an atmega328p running at some 12Mhz working as a repeater, this may take a while. 
 Before learning about mysensors and moving on from my homebrew RPC-over-air to mysensors, I've been pretty much stuck on ARD=ARC=15. The rationale here is, that if the packet didn't go through, it's either a collision (unlikely) or the other side is not listening. Switching modes takes a long while, so there's no need to retry fast. I'm still playing with these settings and collecting stats, but I was surprised to learn that it's not ARD!=15 in mysensors. 
 There's a way to do the retransfer (at least for nRF24) that wouldn't lead to telegram duplication. See: https://github.com/nekromant/antares/blob/master/src/lib/wireless/rf24.c#L924 Summing up, I see 2 ways of implementing retransfers cleaner: 
 | 
| @nekromant I would be happy if we could discuss this with other MySensors users at the MySensors forum thread I have started yesterday. May be some of the MySensors experts will join there and give a comment. Otherwise I'm afraid this PR will become one more in the list of unfinished and unmerged ones. I would regret that very much, because the sometimes lost telegrams in MySensors are a serious problem from my point of view. A problem that should be solved. And I am convinced that it can be solved with your approach. | 
| @virtual-maker Okay, moving on discussion over to mysensors forum thread. (Hopefully someone will approve my first post there soon) | 
This MR implements my simple attempt to get more reliable packet delivery for actuators and exposes a few more parameters that can be tuned for optimal operation. Tested with nRF24L01+ modules.
A side-effect of this would be packet duplication in case the 'ack' is lost, since we don't 'reuse' payloads already stored in the radio. Reusing the payload stored already in the radio will work as well (tested by me with nRF24L01 on a different project), but that requires changes into the transport API and for every radio driver out there.
Test setup: A moderate (~20 nodes) mysensors network, some within 3-4 hops from the gateway within several small buildings. (e.g. heaters in the chicken coop). Heavy snow and rain caused significant link quality degradation that mostly affected nodes with enabled signing that run into problems requesting nonce-s.
When running at 250Kbps for better range, setting the RF24_SET_ARD to the maximum improves connection quality. I'm still testing different parameters of this option, but it seems to help out when there are repeaters that get choked with traffic (e.g. we give them time to send the messages upwards, and