Skip to content

Conversation

@bomanaps
Copy link
Member

closes #164

@bomanaps bomanaps requested a review from GrapeBaBa September 16, 2025 16:28
@g11tech
Copy link
Member

g11tech commented Sep 16, 2025

add CI task for the same

@bomanaps
Copy link
Member Author

add CI task for the same

This will be on0hold till Kai Pr 198 gets in, so I ca properly refactor

@bomanaps
Copy link
Member Author

bomanaps commented Sep 19, 2025

Hello @g11tech @gballet @GrapeBaBa both nodes are using networkId = 0, which causes them to access the same global SWARM_STATE in the Rust libp2p bridge that sort of creates a race condition and memory corruption and .networkId = 0 is (hardcoded)
right now i can't run two node in process I can only run one node since it is hardcoded am not sure how to go around this

@GrapeBaBa
Copy link
Member

Hello @g11tech @gballet @GrapeBaBa both nodes are using networkId = 0, which causes them to access the same global SWARM_STATE in the Rust libp2p bridge that sort of creates a race condition and memory corruption and .networkId = 0 is (hardcoded) right now i can't run two node in process I can only run one node since it is hardcoded am not sure how to go around this

@g11tech it seems a hack only used for testing such as beam cmd, I think we don't need the global SWARM_STATE at all for real case. What is your thought? We need that change to make this test works.

@bomanaps
Copy link
Member Author

Screenshot 2025-09-27 at 18 28 57 Am getting this segfaults error, should I spawn external process for this so to avoid the infinite loop in node.run() or am I getting something wrong here @g11tech @gballet @GrapeBaBa

@gballet
Copy link
Contributor

gballet commented Sep 28, 2025

This is what I get when running it once.

❌ Test failed: Timeout reached after 300 seconds
/home/gballet/src/zeam/pkgs/cli/src/test/genesis_to_finalization_test.zig:333:9: 0x20b9ef9 in test.genesis_generator_two_node_finalization_sim (test)
        return error.TestTimeout;
        ^
Segmentation fault at address 0x1
/home/gballet/bin/lib/std/os/linux/IoUring.zig:259:36: 0x1b5fa9d in cq_ready (test)
error: while executing test 'test.genesis_to_finalization_test.test.genesis_generator_two_node_finalization_sim', the following command terminated with signal 11 (expected exited with code 0):
/home/gballet/src/zeam/.zig-cache/o/17a861b7fc869da74cac7ff7582de53a/test --seed=0x89ac4a95 --cache-dir=/home/gballet/src/zeam/.zig-cache --listen=-
Build Summary: 32/34 steps succeeded; 1 failed; 37/37 tests passed
test transitive failure
└─ run test failure

I don't know if the segmentation fault is responsible for the timeout at this point.

In another run, I was able to see that this has to do with an invalid ListArray / some issue in SSZ. Make sure your PR is rebased on top of masters, some bugs have been fixed in this general area recently.

@bomanaps
Copy link
Member Author

This is what I get when running it once.

❌ Test failed: Timeout reached after 300 seconds
/home/gballet/src/zeam/pkgs/cli/src/test/genesis_to_finalization_test.zig:333:9: 0x20b9ef9 in test.genesis_generator_two_node_finalization_sim (test)
        return error.TestTimeout;
        ^
Segmentation fault at address 0x1
/home/gballet/bin/lib/std/os/linux/IoUring.zig:259:36: 0x1b5fa9d in cq_ready (test)
error: while executing test 'test.genesis_to_finalization_test.test.genesis_generator_two_node_finalization_sim', the following command terminated with signal 11 (expected exited with code 0):
/home/gballet/src/zeam/.zig-cache/o/17a861b7fc869da74cac7ff7582de53a/test --seed=0x89ac4a95 --cache-dir=/home/gballet/src/zeam/.zig-cache --listen=-
Build Summary: 32/34 steps succeeded; 1 failed; 37/37 tests passed
test transitive failure
└─ run test failure

I don't know if the segmentation fault is responsible for the timeout at this point.

In another run, I was able to see that this has to do with an invalid ListArray / some issue in SSZ. Make sure your PR is rebased on top of masters, some bugs have been fixed in this general area recently.

I rebased but am not sure why am getting this @gballet
Screenshot 2025-09-29 at 03 55 12

@g11tech
Copy link
Member

g11tech commented Sep 29, 2025

seems like this is segfauling in create and run network, for which another thread is spun so I am not sure how tokio/rust runtimes exactly behaves

so ether first create the two networks how we do in in beam sim and the pass them to the two nodes, or spin the two nodes as different processes

@bomanaps bomanaps force-pushed the add-test-genesis-two-nodes branch from e27d60f to ad05b56 Compare September 29, 2025 10:43
}

test {
_ = @import("test/genesis_to_finalization_test.zig");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an integraton test so it needs to be bundles in simtest (create a common file to import these tests in the test/integration.zig (rename the other integration test file and import it as well)

.genesis_spec = undefined,
.validator_indices = undefined,
.local_priv_key = undefined,
.bootnodes = &[_][]const u8{},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this fix?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed a segfault caused by the defer-with-undefined antipattern, the fields were initialized as undefined, but defer deinit() always runs - even if initialization fails. When deinit() tried to free undefined pointers (0xaaaaaaaaaaaaaaaa), it crashed.

Copy link
Member

@GrapeBaBa GrapeBaBa Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

each fields should be assigned in the next function, I don't think it is the cause, may at least not all fields need this empty value. can you point specific field, and why it is not assigned in the next function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right that only 3 fields technically need safe initialization, the deeper issue is the architectural pattern, setting up cleanup (defer) before initialization is complete. The proper fix would be refactoring buildStartOptions() to return the struct instead of mutating it, so defer is only set after full initialization. For now, I'll keep the current defensive fix since it's safe and works, but I'll note this for future refactoring.

Copy link
Member

@GrapeBaBa GrapeBaBa Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you point the specific line of the problem code and how to reproduce this issue, current the change, some default values such as "" seems also not allocated in heap and can't be free

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me check this code, how can I reproduce the segfault?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't see the change we were talking in the chat?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did reply you on that then you pointed out a different pattern am yet to do that just merged latest changes from main, I will push on it shortly

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, i just see a github notification. Is it not ready to review again right now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I will push shortly

@bomanaps bomanaps requested review from GrapeBaBa and g11tech October 3, 2025 20:57
@bomanaps
Copy link
Member Author

bomanaps commented Oct 4, 2025

@g11tech @gballet

@bomanaps
Copy link
Member Author

bomanaps commented Oct 6, 2025

@gballet

@bomanaps bomanaps force-pushed the add-test-genesis-two-nodes branch 2 times, most recently from 87bda95 to 3c08acf Compare November 11, 2025 05:10
@bomanaps bomanaps force-pushed the add-test-genesis-two-nodes branch from 8835d4c to d4ff9b1 Compare November 15, 2025 14:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

add test to genesis generate and run two-nodes in-process to finalization

4 participants