Skip to content

consensus_3_nodes_with_failures::case_11_fail_on_proposal_committed fails intermittently #3286

@vbar

Description

@vbar

Integration tests are intermittently failing. Initially thought to be timeouts, but there’s likely a real bug in the consensus logic - specifically in the consensus_3_nodes_with_failures::case_11_fail_on_proposal_committed test, Bob crashes and restarts ahead of others. Bob commits a block and advances, while Alice and Charlie lag behind. This causes consensus to stall due to inconsistent states:

running 1 test
Test artifacts will be stored in /tmp/consensus-integration-tests/.tmpTyiJGQ
...
Pathfinder instance Alice   (pid: 188505) port 35693 decided h:r (2:0)
Pathfinder instance Alice   (pid: 188505) port 35693 has block 2 < 6
Pathfinder instance Bob     (pid: 188596) port 43529 decided h:r (3:0)
Pathfinder instance Charlie (pid: 188597) port 37891 decided h:r (3:0)
Pathfinder instance Charlie (pid: 188597) port 37891 has block 3 < 6
Pathfinder instance Bob     (pid: 188596) port 43529 has block 3 < 6
Got SIGCHLD!
Respawning Bob...
Pathfinder instance Bob     (pid: 188596) has already exited
Pathfinder instance Bob     (pid: 188826) has been spawned
Pathfinder instance Alice   (pid: 188505) port 35693 decided h:r (3:0)
Pathfinder instance Alice   (pid: 188505) port 35693 has block 3 < 6
Pathfinder instance Charlie (pid: 188597) port 37891 decided h:r (3:0)
Pathfinder instance Charlie (pid: 188597) port 37891 has block 3 < 6
Pathfinder instance Bob     (pid: 188826) ready after 1 s
Bob is ready again
Pathfinder instance Bob     (pid: 188826) port 35765 decided h:r None
Pathfinder instance Bob     (pid: 188826) port 35765 has block 4 < 6
Pathfinder instance Alice   (pid: 188505) port 35693 decided h:r (3:0)
Pathfinder instance Alice   (pid: 188505) port 35693 has block 3 < 6
Pathfinder instance Charlie (pid: 188597) port 37891 decided h:r (3:0)
Pathfinder instance Charlie (pid: 188597) port 37891 has block 3 < 6
...
Pathfinder instance Alice   (pid: 188505) port 35693 decided h:r (3:0)
Pathfinder instance Alice   (pid: 188505) port 35693 has block 3 < 6
Pathfinder instance Bob     (pid: 188826) port 35765 decided h:r None
Pathfinder instance Bob     (pid: 188826) port 35765 has block 4 < 6
Test timed out after 240s

thread 'test::consensus_3_nodes_with_failures::case_11_fail_on_proposal_committed' (188426) panicked at crates/pathfinder/tests/consensus.rs:201:10:
called `Result::unwrap()` on an `Err` value: Test timed out after 240s

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

Status

Todo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions