Skip to content

Conversation

@valentinewallace
Copy link
Contributor

@valentinewallace valentinewallace commented Nov 17, 2025

We have an overarching goal of (mostly) getting rid of ChannelManager persistence and rebuilding the ChannelManager's state from existing ChannelMonitors, due to issues when the two structs are out-of-sync on restart. The main issue that can arise is channel force closure.

Here we start this process by rebuilding ChannelManager::decode_update_add_htlcs, forward_htlcs, and pending_intercepted_htlcs from the Channels, which will soon be included in the ChannelMonitors as part of #4218.

  • test upgrading from a manager containing pending HTLC forwards that was serialized on <= LDK 0.2, i.e. where Channels will not contain committed update_add_htlcs
  • currently, no tests fail when we force using the new rebuilt decode_update_add_htlcs map and ignoring the legacy maps. This may indicate missing test coverage, since in theory we need to re-forward these HTLCs sometimes so they go back in the forward_htlcs map for processing
  • only use the old legacy maps if the manager and its channels were last serialized on <= 0.2. Currently this is not guaranteed

@ldk-reviews-bot
Copy link

👋 Hi! I see this is a draft PR.
I'll wait to assign reviewers until you mark it as ready for review.
Just convert it out of draft status when you're ready for review!

@codecov
Copy link

codecov bot commented Nov 18, 2025

Codecov Report

❌ Patch coverage is 94.81865% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.36%. Comparing base (6d9c676) to head (1e86521).

Files with missing lines Patch % Lines
lightning/src/ln/channel.rs 86.66% 5 Missing and 1 partial ⚠️
lightning/src/ln/channelmanager.rs 96.59% 1 Missing and 2 partials ⚠️
lightning/src/ln/reload_tests.rs 98.33% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4227      +/-   ##
==========================================
+ Coverage   89.32%   89.36%   +0.03%     
==========================================
  Files         180      180              
  Lines      138641   138796     +155     
  Branches   138641   138796     +155     
==========================================
+ Hits       123844   124031     +187     
+ Misses      12174    12142      -32     
  Partials     2623     2623              
Flag Coverage Δ
fuzzing 35.97% <36.36%> (+<0.01%) ⬆️
tests 88.72% <94.81%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

}
}

for htlcs in claimable_payments.values().map(|pmt| &pmt.htlcs) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO comment here

@valentinewallace
Copy link
Contributor Author

valentinewallace commented Nov 19, 2025

@joostjager helped me realize this may be way overcomplicated, essentially all tests pass on main when we simply read-and-discard the pending forwards maps. It's a bit suspicious though that all tests pass so it seems like additional test coverage could be useful.

Nvm, our test coverage for reload of these maps is just pretty incomplete.

@valentinewallace valentinewallace force-pushed the 2025-10-reconstruct-mgr-fwd-htlcs branch from 0b4eb68 to ce2ccac Compare November 19, 2025 21:56
@valentinewallace
Copy link
Contributor Author

Updated with new testing and a few tweaks: diff

Will rebase next

@valentinewallace valentinewallace force-pushed the 2025-10-reconstruct-mgr-fwd-htlcs branch from ce2ccac to 1e86521 Compare November 19, 2025 22:03
Copy link
Contributor

@joostjager joostjager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did an initial review pass. Even though the change is pretty contained, I still found it difficult to fully follow what's happening.

Overall I think comments are very much on the light side in LDK, and the code areas touched in this PR are no exception in my opinion. Maybe, now that you've invested the time to build understanding, you can liberally sprinkle comments on your changes and nearby code?

/// Implies AwaitingRemoteRevoke.
AwaitingAnnouncedRemoteRevoke(InboundHTLCResolution),
Committed,
Committed {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that you touch this, maybe add some comments on the Committed state and the update_add_htlc_opt field.

to_forward_infos.push((forward_info, htlc.htlc_id));
htlc.state = InboundHTLCState::Committed;
// TODO: this is currently unreachable, so is it okay? will we lose a forward?
htlc.state = InboundHTLCState::Committed {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be left out completely because we already swapping this state into the htlc?

log_trace!(logger, " ...promoting inbound AwaitingAnnouncedRemoteRevoke {} to Committed, attempting to forward", &htlc.payment_hash);
to_forward_infos.push((forward_info, htlc.htlc_id));
htlc.state = InboundHTLCState::Committed;
// TODO: this is currently unreachable, so is it okay? will we lose a forward?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it unreachable?

htlc.state = InboundHTLCState::Committed;
pending_update_adds.push(update_add_htlc.clone());
htlc.state = InboundHTLCState::Committed {
update_add_htlc_opt: Some(update_add_htlc),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could add a comment here explaining why we store the messages?

payment_hash: PaymentHash(Sha256::hash(&[42; 32]).to_byte_array()),
cltv_expiry: 300000000,
state: InboundHTLCState::Committed,
state: InboundHTLCState::Committed { update_add_htlc_opt: None },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it ok to never populate update_add_htlc_opt in the tests in this commit, if that isn't supposed to happen in prod?

.iter()
.filter_map(|htlc| match htlc.state {
InboundHTLCState::Committed { ref update_add_htlc_opt } => {
update_add_htlc_opt.clone()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this be a vec of refs? Probably using it and want to avoid the clone later?

HashMap<(PublicKey, ChannelId), Vec<ChannelMonitorUpdate>>,
> = None;
let mut decode_update_add_htlcs: Option<HashMap<u64, Vec<msgs::UpdateAddHTLC>>> = None;
let mut decode_update_add_htlcs_legacy: Option<HashMap<u64, Vec<msgs::UpdateAddHTLC>>> =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the rename be extracted in a commit?

is_channel_closed = match peer_state.channel_by_id.get(channel_id) {
Some(chan) => {
if let Some(funded_chan) = chan.as_funded() {
let inbound_committed_update_adds =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand why you put this code here, but it feels like a heavy side effect of determining whether the chan is closed. Also the comment above this loop is mentioning actions required for closed channels.

// `ChannelManager` persistence.
let mut pending_intercepted_htlcs_legacy: Option<HashMap<InterceptId, PendingAddHTLCInfo>> =
Some(new_hash_map());
let mut decode_update_add_htlcs_legacy: Option<HashMap<u64, Vec<msgs::UpdateAddHTLC>>> =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe all the legacy renames together can go in one preparatory commit

// the legacy serialized maps instead.
// TODO: if we read an upgraded channel but there just happened to be no committed update_adds
// present, we'll use the old maps here. Maybe that's fine but we might want to add a flag in
// the `Channel` that indicates it is upgraded and will serialize committed update_adds.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems safer to be explicit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think being explicit may not work either because IIUC we need to handle the case where some of the deserialized Channels have the new update_add_htlc_opt field set and some don't. I think this can happen if we restart with a channel_a with Committed { update_add: None } HTLCs, then add some Committed { update_add: Some(..) } HTLCs to channel_b, then shut down before processing the Committed { update_add: None } HTLCs on channel_a.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean. But maybe it can be explicit on the channel level?

args_b_c.send_announcement_sigs = (true, true);
reconnect_nodes(args_b_c);

// Forward the HTLC and ensure we can claim it post-reload.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would happen again without the changes in this PR? I assume the htlc wouldn't be forwarded here, but would it be failed back instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean if we did the 3-line fix discussed offline? I don't think so, because the manager will only fail HTLCs that it knows about. I think we just wouldn't handle the HTLC and the channel would FC.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes indeed, if we just discard the forward htlcs. I thought there was still some inbound htlc timer somewhere, but I guess not then?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants