Persist external funding state across restarts#1252
Persist external funding state across restarts#1252joii2020 wants to merge 6 commits intonervosnetwork:developfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR addresses restart-recovery gaps in the external funding channel-open flow by persisting enough state to allow submit_signed_funding_tx and the subsequent handshake to resume after a node restart (notably important for fiber-wasm redirect-based signing flows).
Changes:
- Add a persisted
ExternalFundingRecoveryState(new store key prefix + store API) and hydrate external-funding runtime state on reload/reestablish. - Update channel/network logic to persist/move channel state appropriately and to restore/reestablish channels when
submit_signed_funding_txhappens after restart. - Add/extend store + restart-focused unit tests and an e2e restart regression script/docs.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/bruno/e2e/external-funding-open/run-restart-test.sh | Adds a restart regression script that restarts node1 between signing and submit_signed_funding_tx. |
| tests/bruno/e2e/external-funding-open/README.md | Documents the restart regression script and how to run it. |
| crates/fiber-types/src/schema.rs | Adds a new store key prefix for external funding recovery state. |
| crates/fiber-types/src/channel.rs | Introduces ExternalFundingRecoveryState type for persisted recovery data. |
| crates/fiber-lib/src/store/tests/store.rs | Adds store-level persistence tests for external funding recovery state. |
| crates/fiber-lib/src/store/store_impl/mod.rs | Implements persistence for ExternalFundingRecoveryState and adds move_channel_actor_state. |
| crates/fiber-lib/src/store/.schema.json | Updates schema hash for the new persisted type. |
| crates/fiber-lib/src/fiber/tests/settle_tlc_set_command_tests.rs | Updates mock store to satisfy the extended ChannelActorStateStore trait. |
| crates/fiber-lib/src/fiber/tests/channel.rs | Adds restart-focused tests for external funding submit/handshake/timeout/cleanup behavior. |
| crates/fiber-lib/src/fiber/network.rs | Allows submit_signed_funding_tx to restore/reestablish missing channel state after restart. |
| crates/fiber-lib/src/fiber/channel.rs | Persists/hydrates external funding recovery state and clears it on terminal paths; moves acceptor state under final channel id for restart routing. |
c4f0980 to
28e9d89
Compare
|
Instead of introducing a new store prefix and ExternalFundingRecoveryState with separate get/insert/delete methods, we can add a single Option field to the existing ChannelActorData: #[serde_as]
#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct ExternalFundingPersistState {
#[serde_as(as = "EntityHex")]
pub funding_lock_script: Script,
#[serde_as(as = "Vec<EntityHex>")]
pub funding_lock_script_cell_deps: Vec<packed::CellDep>,
#[serde_as(as = "EntityHex")]
pub unsigned_funding_tx: Transaction,
pub started_at_ms: u64,
pub signed_submitted: bool,
}Append to ChannelActorData: #[serde(default)]
pub external_funding: Option<ExternalFundingPersistState>,When an external funding flow begins, populate external_funding = Some(...) and persist via the normal insert_channel_actor_state path. On deserialization (restart/reload), rebuild ephemeral_config.external_funding from the persisted core.external_funding field. When the channel reaches ChannelReady or Closed, set it back to None. This eliminates: No new store key prefix or schema change, no new ExternalFundingRecoveryState type with separate store operations, no new trait methods on ChannelActorStateStore (no mock/wasm changes needed). And the should_persist_channel_state special case can be removed, channel state is always persisted normally. Migration is simple: append 1 byte (0u8, the bincode encoding of Option::None) to existing serialized ChannelActorState entries. |
8b0a540 to
27f8785
Compare
f14f625 to
e206d16
Compare
Store pending external funding state separately from the channel runtime so submit_signed_funding_tx can recover after node restart without any RPC changes. Hydrate the runtime from store when reloading or reestablishing a channel, clear the persisted state on terminal paths, and teach the network actor to restore a missing channel before forwarding SubmitSignedFundingTx. Add store coverage and restart-focused external funding tests, and update the store schema for the new persisted record.
e206d16 to
9a9639a
Compare
|
|
||
| ### Restart regression script | ||
|
|
||
| [`run-restart-test.sh`](tests/bruno/e2e/external-funding-open/run-restart-test.sh) is a restart regression test for the external funding flow. It specifically verifies that after node1 has already: |
There was a problem hiding this comment.
The markdown link to run-restart-test.sh uses a path that is already relative to this README’s directory, so it will resolve incorrectly on GitHub (it effectively duplicates tests/bruno/e2e/external-funding-open/ in the URL). Update it to a path relative to this README (e.g., ./run-restart-test.sh).
| [`run-restart-test.sh`](tests/bruno/e2e/external-funding-open/run-restart-test.sh) is a restart regression test for the external funding flow. It specifically verifies that after node1 has already: | |
| [`run-restart-test.sh`](./run-restart-test.sh) is a restart regression test for the external funding flow. It specifically verifies that after node1 has already: |
There was a problem hiding this comment.
I think using this path here would be more intuitive.
Summary
This PR fixes a restart-recovery issue in the external funding flow.
In the
fiber-wasmscenario, cross-origin restrictions, especially on many mobile browsers, mean JoyID signing often has to be completed through a redirect flow. In practice, this can causefiber-wasmto restart, which previously caused the runtime state created byopen_channel_with_external_fundingto be lost. As a result, the subsequentsubmit_signed_funding_txcall would fail.Changes
This PR adds persisted recovery state for the external funding flow so the following scenarios can continue correctly after a restart:
submit_signed_funding_txcan still be executed after restartSpecifically, this PR:
submit_signed_funding_txto recover the missing channel state before continuing the flowTest Coverage
This PR adds or updates coverage for:
submit_signed_funding_txafter restarttests/bruno/e2e/external-funding-open/run-restart-test.shThe restart behavior is added on top of the existing Bruno e2e flow. Since node restart is difficult to express cleanly in the Bruno workflow itself, this part is covered by a dedicated script.
Notes
This PR is focused on restart recovery for the external funding flow and does not change RPC semantics.