-
Notifications
You must be signed in to change notification settings - Fork 503
CI: Test new runtime compatibility with latest client #1517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ashutoshvarma
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
LGTM
ermalkaleci
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be separated from release workflow. There's no need for this to run before release note.
Dinonard
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Please just try not to introduce any additional possible failure points for the draft release process.
| ${{ matrix.chain }}-metadata.json | ||
| ${{ matrix.chain }}-diff.txt | ||
| fetch-github-releases: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a need to do this as a separate step?
If previous step for fetching last runtime release is kept where it was, we have zero impact on the existing CI. With this change, you're introducing a new possibility of the whole release failing because e.g. if fails to fetch the last client release. Realistically, release can be made without that information.
I'd suggest to keep this as it was before, and move client fetching to the new task you've added for regression testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All three chains matrix jobs need the same client binary, it's much more efficient to fetch it once in a separate job. Now that I've added a separate workflow, there are no changes to the original Runtime Build one.
| collator2: reports block height is greater than 5 within 200 seconds | ||
|
|
||
| ## perform runtime upgrade | ||
| collator1: parachain 1000 perform upgrade with ./runtime.compact.compressed.wasm within 200 seconds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this step fail if runtime upgrade fails?
Asking since I'm wondering if the script is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tested to upgrade using the same spec version and this step did not failed, I prefer to keep the JS script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upgrade failed but step succeeded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, when done manually with a local Zombienet network, the upgrade fails with a SpecVersionNeedsToIncrease error. But when performed via smoke testing, the step succeeded (check the attached screenshot below, the JS script caught it).
However, in the case of an uplift with a client + runtime release, we could be in a situation where the new runtime is checked against the new client, and I don’t want false positives to be emitted, so I’ll remove the JS script.
I think Ermal raised a good point here, it’s better to check compatibility against the two latest clients. I reproduced the issue we had with host functions using version v5.41.0 and the upgrade step failed during Zombienet testing with runtime 91 (see the second screenshot below). But not with v5.42.0 for the first reason explained (client have runtime 91 in it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, but the upgrade didn't fail in that case - it just replaced runtime with the one that has the same spec version (and is probably identical).
Maybe for more context, why I asked - it seems strange to me that a DSL exists for testing, a dedicated command for runtime upgrade exists, but it can silently fail. This is why your script seemed redundant to me.
For checking last two clients, it seems a bit too much to me, but if you think it's the best, do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks for the extra context, it has been removed now.
|
@Dinonard @ermalkaleci - Thank you for your feedbacks, I've separated this check into a Runtime Compatibility workflow that is triggered after the original Runtime Build one is completed (or manually if a Runtime Build worflow ID is provided to download artifacts from) |
| sleep $((i * 5)) | ||
| done | ||
| # Get latest client tag (v*.*.* format) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure about this, the idea is to test new runtime with old clients, if we release new client then the test will always pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there something better we can do though?
Test last two releases? Seems like an overkill.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point @ermalkaleci, the workflow checks compatibility for the 2 latest clients now. Since they're non-gating checks, this only provides extra infos but could be beneficial as explained here.
| # Get latest client info | ||
| CLIENT_TAG_1="${CLIENT_TAGS_ARRAY[0]}" | ||
| ASSET_NAME_1="astar-collator-${CLIENT_TAG_1}-ubuntu-x86_64.tar.gz" | ||
| CLIENT_ASSET_URL_1=$(curl -s "https://api.github.com/repos/AstarNetwork/Astar/releases/tags/${CLIENT_TAG_1}" | \ | ||
| jq -r --arg name "$ASSET_NAME_1" '.assets[] | select(.name == $name) | .browser_download_url') | ||
| # Get second latest client info | ||
| CLIENT_TAG_2="${CLIENT_TAGS_ARRAY[1]}" | ||
| ASSET_NAME_2="astar-collator-${CLIENT_TAG_2}-ubuntu-x86_64.tar.gz" | ||
| CLIENT_ASSET_URL_2=$(curl -s "https://api.github.com/repos/AstarNetwork/Astar/releases/tags/${CLIENT_TAG_2}" | \ | ||
| jq -r --arg name "$ASSET_NAME_2" '.assets[] | select(.name == $name) | .browser_download_url') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm no bash expert, but this can be a loop, and you can transfer values via evn variables like an array 🙂.
And to expand on that, the job that tests the last two clients - you can change it to spin up collator using two different binaries. Anyways, just a suggestion if that's something you'd like to try.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good suggestion, it reduces the matrix size and reflects a more realistic scenario where collators use different client versions within the same network. I had to slightly adjust the network setup files.
Also with the for loops code is somehow more DRY.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tested here: https://github.com/ipapandinas/Astar/actions/runs/16517224780
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
Just one more comment from me, in zndsl file.
Otherwise LGTM.
|
|
||
| ## perform runtime upgrade | ||
| collator1: parachain 1000 perform upgrade with ./runtime.compact.compressed.wasm within 200 seconds | ||
| collator2: parachain 1000 perform upgrade with ./runtime.compact.compressed.wasm within 200 seconds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also needs to be done for collator1, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the previous iteration with the JS script, I performed the upgrade on collator 1 and tested propagation with the script on collator 2. Now I'm upgrading collator 2 because it's the second-latest client, so in client + runtime releases, it's the one most likely to fail. I think the upgrade is automatically propagated to 1, and after that, I'm making sure it keeps building blocks. However now, I am not 100% sure with different clients, WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason in my head it was for two separate networks. Ignore my incorrect comment, you are right, upgrade is propagated.
It doesn't matter which one is upgraded since it's the relaychain that sends the signal for the upgrade to go ahead.
Minimum allowed line rate is |
Dinonard
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Seems new release for zepter is causing problems though.
Pull Request Summary
Closes #1310
Tested here https://github.com/AstarNetwork/Astar/actions/runs/16443086835 with a fake runtime upgrade 1602