Skip to content

swarmit: adjust timeout based on the number of available mari nodes#126

Open
aabadie wants to merge 10 commits intoDotBots:mainfrom
aabadie:adapted_timeouts
Open

swarmit: adjust timeout based on the number of available mari nodes#126
aabadie wants to merge 10 commits intoDotBots:mainfrom
aabadie:adapted_timeouts

Conversation

@aabadie
Copy link
Contributor

@aabadie aabadie commented Feb 12, 2026

fixes #125

@aabadie aabadie added the enhancement New feature or request label Feb 12, 2026
@codecov-commenter
Copy link

codecov-commenter commented Feb 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (9c02ae0) to head (1f279e2).

Additional details and impacted files
@@            Coverage Diff            @@
##              main      #126   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           18        18           
  Lines         2107      2173   +66     
=========================================
+ Hits          2107      2173   +66     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@aabadie
Copy link
Contributor Author

aabadie commented Feb 13, 2026

I had to rework a bit the swarmit node and adapter mocks to have a more efficient communication logic between them (now they use a queue). It works well locally, although it takes time with 500 nodes but this is still flaky on github CI.

@aabadie
Copy link
Contributor Author

aabadie commented Feb 13, 2026

As a side note, the same message queuing strategy could be used to increase the scalability of the PyDotBot simulator. I'll work on that next week.

…variables

This way they can be mocked to easily pass the tests
@aabadie aabadie changed the title [WIP] swarmit: adjust timeout based on the number of available mari nodes swarmit: adjust timeout based on the number of available mari nodes Feb 16, 2026
@aabadie
Copy link
Contributor Author

aabadie commented Feb 16, 2026

I had to rework a bit the swarmit node and adapter mocks to have a more efficient communication logic between them (now they use a queue). It works well locally, although it takes time with 500 nodes but this is still flaky on github CI.

I improved the situation in test by mocking the threshold values to use smaller ones. The problem comes from the Python GIL that slows down the tests drastically on thread deletion when there are more that 200 threads (at least on my computer).

@geonnave
Copy link
Contributor

Thanks for this PR. It's great that you also looked at scalability aspects, which are very important issues now.

I tested the change with 1 node and it works fine.

Regarding the timeouts and thresholds, I am not sure if the approach of modifying KNOWN_DEVICES_* is really what we need (or at least the only thing). I think we need a way to configure COMMAND_TIMEOUT, COMMAND_MAX_ATTEMPTS, etc. based on the number of available nodes.

Following up on the discussion of today's meeting, what if we:

  • add a new dashboard-api adapter, this way the list of known_devices will be always ready behind an almost-instant HTTP GET request
  • create _DEFAULT, _SMALL, _MEDIUM, and _LARGE variations for COMMAND_TIMEOUT and COMMAND_MAX_ATTEMPTS

One way to proceed would be to merge the test scalability improvements (and rename this PR) and create a new PR with the two points above, if you agree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Tweak timeouts for scalability

3 participants