-
Notifications
You must be signed in to change notification settings - Fork 115
fix: more reproducible builds with pixi install and source date epoch #1956
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: more reproducible builds with pixi install and source date epoch #1956
Conversation
I need something to test with...
e98d4c8 to
8de6fb7
Compare
|
This looks good to me but Im wondering if we can maybe only pay the price of these extra syscalls when explicitly requested. To fix the original issue perhaps it makes sense to only set the filetimes etc if the |
|
I have no strong opinion, but am leaning towards always doing it as long as that is not causing major regressions in performance. If we say we do not care for mtimes in the environment, then we should also not care to keep mtimes when extracting the packages. That we do care to keep mtimes during extraction points me to assume that those are important to us -- if they are, then we should also copy them over into the environments we create. That is the one place where our users will actually care for those timestamps. |
|
There were some similar issues in prefix-dev/rattler-build#1865 However, there seem to be differences even between Linux distributions (Alma9 vs Ubuntu) ... Might be good to start with a test suite. |
|
Yeah, maybe we should first benchmark if this has a significant overhead. Can you try that @hunger ? |
|
This is what claude thinks about the change: This obviously lead me down the rabbit hole of "Why on earth does this spawn 532 threads?!" |
8de6fb7 to
61f9f72
Compare
|
Ok, now with an extra change that limits the number of threads created by tokio in I admit I am still surprised by the big difference in futex calls... Maybe it's because the thereads run a bit longer on average? I need to sleep on this:-) |
Configure the tokio runtime with worker_threads set to half the CPU count and max_blocking_threads set to the CPU count. This reduces thread overhead significantly, resulting in 91% fewer threads and 99.9% fewer futex calls while improving wallclock time by ~14%.
This should stop docker layers from changing their hash when using pixi to populate them.
61f9f72 to
8f6cfa5
Compare
|
So where should we go from here? I see two ways forward:
I think 1. is the safer option and closer to user expectations. Of course it is also a bit slower. |
Description
This change makes sure to keep the original packages mtimes when linking into the environment. It will also set the atimes accordingly.
This should help with reproducibility of the installed environments
This adds a new dependency:
filetime. I do not think this is important though: We already transitively depend on that crate anyway.Did I miss tests for the linking somewhere? I'd love to update them to also check the timestamps:-)
I did NOT implement support for SOURCE_DATE_EPOCH: During extraction we make sure to keep the mtime as it was when the package was created. So the source file's mtime should be reproducible already. It seemed safe enough to me to just apply that to the link as well.