-
Notifications
You must be signed in to change notification settings - Fork 427
Rework auto-retry send errors #2014
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework auto-retry send errors #2014
Conversation
|
@TheBlueMatt I'm wondering if generating |
22698fc to
8f1a5db
Compare
Codecov ReportBase: 87.20% // Head: 87.89% // Increases project coverage by
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more Additional details and impacted files@@ Coverage Diff @@
## main #2014 +/- ##
==========================================
+ Coverage 87.20% 87.89% +0.68%
==========================================
Files 101 101
Lines 44175 49323 +5148
Branches 44175 49323 +5148
==========================================
+ Hits 38525 43351 +4826
- Misses 5650 5972 +322
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
8f1a5db to
c331f58
Compare
9946121 to
6513398
Compare
lightning/src/util/events.rs
Outdated
| (7, short_channel_id, option), | ||
| (9, retry, option), | ||
| (11, payment_id, option), | ||
| (13, failure_opt, ignorable), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't the ignorable field imply that if its missing we'll return None for the full event if its missing? Also a bug for the network_update field, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The expanded macro code looks good to me. I think we'll only return None for the full event if the MaybeReadable::read returns Ok(None), which only happens if the TLV type for the field is present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, hmm, I was thinking of the ignorable case in _init_tlv_based_struct_field. This is confusing - we handle ignorable in the higher-level macros as "implement MaybeReadable and return Ok(None)" but in the lower-level macros as just a regular ol' option. Can we make it consistent - use the new ignorable_option as a regular ol' option but make the ignorable be a required field in the lower-level macros as well, returning Ok(None) in _check_missing_tlv (which implies we're in a MaybeReadable, and maybe renaming the ignorable to ignorable_no_read or whatever?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe renaming the ignorable to ignorable_no_read or whatever?
That all sounds good, though I'd also be in favor of renaming ignorable_* to upgradable_*. Because I'd like to name the above variant ignorable_required, but that sounds like an oxymoron.
lightning/src/util/events.rs
Outdated
| pub enum Failure { | ||
| /// We failed to initially send the payment and no HTLC was committed to. Contains a relevant | ||
| /// error message. | ||
| InitialSend(String), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in ~every case we have exactly one APIError to put here, why not that instead of a String?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't implement writeable at the moment, though that wouldn't be too hard to add.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, let's just do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, I think we'll need to change the APIError::InvalidRoute to contain a String. I don't think we can implement Readable for a &'static str because it references data owned by the read function.
Given where other things are it seems we have time? Its not exactly critical, but if I'm tracking failures/successes I'd like to see a failure for every |
Yeah agreed, that comment was out of date sorry |
Useful for generating Payment(Path)Failed events in this method
Useful for generating Payment(Path)Failed events in this method
1f272c0 to
8492847
Compare
|
Tacked on a small commit to fix #1969. This PR ended up taking on some unrelated changes, so I'd be happy to split it up if reviewers prefer. |
TheBlueMatt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, let's definitely try to split this up, maybe everything after Fix outdated PendingOutboundPayment::Abandoned docs can go into separate (probably two?) PRs?
| let mut payment_hash = PaymentHash([0; 32]); | ||
| let mut payment_failed_permanently = false; | ||
| let mut network_update = None; | ||
| let mut all_paths_failed = Some(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the backwards compat approach when this was added was that we we didnt support MPP at all so we could assume this was always true. For forwards compat, though, this doesn't make sense - if someone downgrades to 113 from 114 and looks at this field it will always report that all HTLCs have failed, which is obviously nonsense. Its been long enough since we added the abandon flow that we can break forwards compat here, I think, but its a bit safer to write Some(false) rather than letting old clients assume true by writing None. We also need to add a draft release note here that says that the field has been removed and old readers will always see false, and users wishing to support downgrade who still rely on the field should instead first migrate to the abandon-based approach and look at PaymentFailed events and then migrate to 114.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will address this in #2043
Edit: addressed
107e6e1 to
2b14430
Compare
|
Split up the PR, see #2043 |
2b14430 to
ed1ef50
Compare
lightning/src/ln/outbound_payment.rs
Outdated
| }, | ||
| PaymentSendFailure::PartialFailure { failed_paths_retry: Some(retry), results, .. } => { | ||
| Self::push_payment_path_failed_evs(payment_id, payment_hash, route.paths, results.into_iter(), pending_events); | ||
| Self::push_payment_path_failed_evs(payment_id, payment_hash, None, route.paths, results.into_iter(), pending_events); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why None here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could push the failed scids here, but they won't be used because we just use the PartialFailure::failed_paths_retry for resend, which already contains them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But shouldn't the event that we surface also reflect this? Why would we exclude the scid here but include it for the AllFailedResendSafe case?
More generally, I guess it's hard to follow where previously_failed_channels is populated. In this function, it is a side effect of calling push_payment_path_failed_evs, but only for AllFailedResendSafe. Then when looking at fail_htlc for a non-immediate failure, it looks to be added twice. Once when calling insert_previously_failed_scid and another time when determining the failure was not a probe. Is that expected or am I misreading that? Maybe the second insert is necessary for the backwards compatibility case? Just thinking if there s there any way we could have it added in a single place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when looking at fail_htlc for a non-immediate failure, it looks to be added twice. Once when calling insert_previously_failed_scid and another time when determining the failure was not a probe.
So the second add will go away when we get rid of PaymentPathFailed::retry, which as you pointed out should be removed.
But shouldn't the event that we surface also reflect this? Why would we exclude the scid here but include it for the AllFailedResendSafe case?
We never include failed scids in the event surfaced in handle_pay_route_err, PaymentPathFailed::retry is always None, if I understand you correctly. The scids are being pushed into the route params for immediate retry purposes only.
On a related note, on second glance my statement that we push failed scids into failed_paths_retry on PartialFailure was not correct. I think I thought that because the issue says it: #1969 but it isn't in the code. Fixing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, there was also a rebase error, push_payment_path_failed_evs should be renamed to push_path_failed_evs_and_scids. Will also fix this
Edit: fixed this and above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We never include failed scids in the event surfaced in
handle_pay_route_err,PaymentPathFailed::retryis alwaysNone, if I understand you correctly. The scids are being pushed into the route params for immediate retry purposes only.
Ah, right, I missed that. Was thinking it was added from the parameter.
| /// If a payment fails to send with [`ChannelManager::send_payment`], it can be in one of several | ||
| /// states. This enum is returned as the Err() type describing which state the payment is in, see | ||
| /// the description of individual enum states for more. | ||
| /// | ||
| /// [`ChannelManager::send_payment`]: crate::ln::channelmanager::ChannelManager::send_payment | ||
| #[derive(Clone, Debug)] | ||
| pub enum PaymentSendFailure { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we plan to deprecate ChannelManager::send_payment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if we've fully decided since it doesn't hurt, but it would definitely clean up this module a bit and the manual route case is already supported by the Router trait anyway.
ed1ef50 to
7698078
Compare
lightning/src/ln/outbound_payment.rs
Outdated
| }, | ||
| PaymentSendFailure::PartialFailure { failed_paths_retry: Some(retry), results, .. } => { | ||
| Self::push_payment_path_failed_evs(payment_id, payment_hash, route.paths, results.into_iter(), pending_events); | ||
| Self::push_payment_path_failed_evs(payment_id, payment_hash, None, route.paths, results.into_iter(), pending_events); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We never include failed scids in the event surfaced in
handle_pay_route_err,PaymentPathFailed::retryis alwaysNone, if I understand you correctly. The scids are being pushed into the route params for immediate retry purposes only.
Ah, right, I missed that. Was thinking it was added from the parameter.
fea92d5 to
d086918
Compare
|
Feel free to squash, IMO. |
Same! LGTM |
InvalidRoute is reserved for malformed routes, not routes where a channel or its peer is unavailable
d086918 to
dc05c6d
Compare
|
Squashed! |
jkczyz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Largely looks good minus a couple things that I noticed.
| &self, payment_id: PaymentId, payment_hash: PaymentHash, payment_secret: &Option<PaymentSecret>, | ||
| keysend_preimage: Option<PaymentPreimage>, retry_strategy: Retry, route_params: RouteParameters, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parameters are a little inconsistent here. Should we pass payment_secret by value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's preexisting and I think it would be better to fix it all in one go, since ChannelManager::send_payment and friends also need this fix
dc05c6d to
6b4ce57
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed to the best of my ability with limited understanding of ldk/rust.
Nice refinement of payment states for retries!
| return Err(PaymentSendFailure::ParameterError(APIError::APIMisuseError { | ||
| err: format!("Invoice expired for payment id {}", log_bytes!(payment_id.0)), | ||
| })) | ||
| return Err(RetryableSendFailure::PaymentExpired) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this check means that the invoice is expired, does it make more sense to rename PaymentExpired -> InvoiceExpired? Assuming that there's also a sender-set time limit on how long we attempt a payment for, I'd think that time running out is a payment expiry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PaymentExpired is definitely confusing since we also have Retry::Timeout which is a different thing. Since the PaymentParameters::expiry_time technically doesn't need to be from an invoice (it could even be set for spontaneous payments), I ended up clarifying the docs on this error variant and on Retry::Timeout
| Some(&first_hops.iter().collect::<Vec<_>>()), &inflight_htlcs() | ||
| ).map_err(|_| RetryableSendFailure::RouteNotFound)?; | ||
|
|
||
| let onion_session_privs = self.add_new_pending_payment(payment_hash, *payment_secret, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If add_new_pending_payment errors are always mapped to DuplicatePayment, can it be updated to return a RetryableSendFailure instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, it isn't always mapped to that, but yeah that would be nicer.
Prior to this, we returned PaymentSendFailure from auto retry send payment methods. This implied that we might return a PartialFailure from them, which has never been the case. So it makes sense to rework the errors to be a better fit for the methods. We're taking error handling in a totally different direction now to make it more asynchronous, see send_payment_internal for more information.
Since it's only used one place now
Previously, we could have tried the same failed channels over and over until retries are exhausted.
a6e9123
6b4ce57 to
a6e9123
Compare
TheBlueMatt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix in a followup.
| debug_assert_eq!(paths.len(), path_results.len()); | ||
| for (path, path_res) in paths.into_iter().zip(path_results) { | ||
| if let Err(e) = path_res { | ||
| let failed_scid = if let APIError::InvalidRoute { .. } = e { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to not push a failure event if the error is APIError::MonitorUpdateInProgress.
Partially addresses #1932. Fixes #1969.
Previously we would return a
PaymentSendFailurefrom auto-retry send_payment methods. This doesn't make sense because they will never return thePartialFailurevariant, so we add a new error enum for them here.We now also generate
PaymentPathFailedevents for paths that fail on the first hop before they can commit to the htlc (eg if we tried to send more than we have in the channel).Based on #2008