Skip to content

Conversation

@hunger
Copy link
Collaborator

@hunger hunger commented Jan 6, 2026

Description

This change makes sure to keep the original packages mtimes when linking into the environment. It will also set the atimes accordingly.

This should help with reproducibility of the installed environments

This adds a new dependency: filetime. I do not think this is important though: We already transitively depend on that crate anyway.

Did I miss tests for the linking somewhere? I'd love to update them to also check the timestamps:-)

I did NOT implement support for SOURCE_DATE_EPOCH: During extraction we make sure to keep the mtime as it was when the package was created. So the source file's mtime should be reproducible already. It seemed safe enough to me to just apply that to the link as well.

I need something to test with...
@hunger hunger changed the title Feature/pix 1348 reproducible builds with pixi install and source date epoch fix: more reproducible builds with pixi install and source date epoch Jan 6, 2026
@hunger hunger force-pushed the feature/pix-1348-reproducible-builds-with-pixi-install-and-source_date_epoch branch from e98d4c8 to 8de6fb7 Compare January 6, 2026 17:59
@baszalmstra
Copy link
Collaborator

This looks good to me but Im wondering if we can maybe only pay the price of these extra syscalls when explicitly requested. To fix the original issue perhaps it makes sense to only set the filetimes etc if the SOURCE_DATE_EPOCH is set in the environment? WDYT @hunger

@hunger
Copy link
Collaborator Author

hunger commented Jan 7, 2026

I have no strong opinion, but am leaning towards always doing it as long as that is not causing major regressions in performance.

If we say we do not care for mtimes in the environment, then we should also not care to keep mtimes when extracting the packages. That we do care to keep mtimes during extraction points me to assume that those are important to us -- if they are, then we should also copy them over into the environments we create. That is the one place where our users will actually care for those timestamps.

@wolfv
Copy link
Contributor

wolfv commented Jan 7, 2026

There were some similar issues in prefix-dev/rattler-build#1865

However, there seem to be differences even between Linux distributions (Alma9 vs Ubuntu) ... Might be good to start with a test suite.

@baszalmstra
Copy link
Collaborator

Yeah, maybe we should first benchmark if this has a significant overhead. Can you try that @hunger ?

@hunger
Copy link
Collaborator Author

hunger commented Jan 7, 2026

This is what claude thinks about the change:

● Benchmark Comparison: Current Rattler vs Base State                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                                           
  | Metric        | Current Rattler | Base State | Difference            |                                                                                                                                                                                                                                                                                                                 
  |---------------|-----------------|------------|-----------------------|                                                                                                                                                                                                                                                                                                                 
  | Timing (mean) | 31.5 ms         | 32.5 ms    | -1.0 ms (3.2% faster) |                                                                                                                                                                                                                                                                                                                 
  | Timing (min)  | 24.0 ms         | 24.3 ms    | -0.3 ms               |                                                                                                                                                                                                                                                                                                                 
  | Instructions  | 102.5M          | 109.6M     | -7.1M (6.5% fewer)    |                                                                                                                                                                                                                                                                                                                 
  | Cycles        | 102.7M          | 109.7M     | -6.8% fewer           |                                                                                                                                                                                                                                                                                                                 
  | Cache misses  | 11.93%          | 13.30%     | -1.37% better         |                                                                                                                                                                                                                                                                                                                 
  | Branch misses | 2.49%           | 2.75%      | -0.26% better         |                                                                                                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                                                                                                                                           
  Syscall Analysis                                                                                                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                                                                                                                           
  | Syscall         | Current   | Base State | Difference   |                                                                                                                                                                                                                                                                                                                              
  |-----------------|-----------|------------|--------------|                                                                                                                                                                                                                                                                                                                              
  | futex calls     | 1,252,478 | 1,126,527  | +125,951     |                                                                                                                                                                                                                                                                                                                              
  | statx calls     | 3,165     | 2,068      | +1,097       |                                                                                                                                                                                                                                                                                                                              
  | utimensat calls | 1,263     | 0          | +1,263 (new) |                                                                                                                                                                                                                                                                                                                              
  | Total syscalls  | 1,284,532 | 1,156,224  | +128,308     |                                                                                                                                                                                                                                                                                                                              
                                                                                                                                                                                                                                                                                                                                                                                           
  Summary                                                                                                                                                                                                                                                                                                                                                                                  
                                                                                                                                                                                                                                                                                                                                                                                           
  The current rattler is slightly faster (~3.2%) despite making more syscalls. Key differences:                                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                                                                                                                           
  1. New utimensat calls: Current rattler makes 1,263 utimensat calls (to preserve timestamps) that base state doesn't have                                                                                                                                                                                                                                                                
  2. More statx calls: Current rattler makes ~1,100 more statx calls                                                                                                                                                                                                                                                                                                                       
  3. More futex calls: More thread synchronization activity                                                                                                                                                                                                                                                                                                                                
  4. Better CPU efficiency: Fewer instructions and better cache/branch behavior                                                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                                                                                                                           
  The additional syscalls from timestamp preservation (utimensat) and stat checks (statx) are being offset by improved code efficiency, resulting in a net performance improvement.                                                                                                                                                                                                        

This obviously lead me down the rabbit hole of "Why on earth does this spawn 532 threads?!"

@hunger hunger force-pushed the feature/pix-1348-reproducible-builds-with-pixi-install-and-source_date_epoch branch from 8de6fb7 to 61f9f72 Compare January 7, 2026 20:25
@hunger
Copy link
Collaborator Author

hunger commented Jan 7, 2026

Ok, now with an extra change that limits the number of threads created by tokio in rattler-bin: This makes the results much more comparable:

● Benchmark Comparison: l (without mtime sync) vs ym (with mtime sync)

  This benchmark uses the ncurses package with 1905 files in it.                                                                                                                                                                                                                                                                                                                                                                   
                                                                                                                                                                                                                                                                                                                                                                                                                                         
  Both have the tokio thread limiting patch applied.                                                                                                                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                                                                                                                                                                         
  | Metric             | Without mtime (l) | With mtime (ym) | Change |                                                                                                                                                                                                                                                                                                                                                                  
  |--------------------|-------------------|-----------------|--------|                                                                                                                                                                                                                                                                                                                                                                  
  | Wallclock time     | 16.5 ms           | 18.2 ms         | +10%   |                                                                                                                                                                                                                                                                                                                                                                  
  | Instructions       | 85.6M             | 87.0M           | +2%    |                                                                                                                                                                                                                                                                                                                                                                  
  | Cycles             | 78.9M             | 80.5M           | +2%    |                                                                                                                                                                                                                                                                                                                                                                  
  | Total syscalls     | 28,026            | 24,379          | -13%   |                                                                                                                                                                                                                                                                                                                                                                  
  | Threads (attached) | 48                | 48              | same   |                                                                                                                                                                                                                                                                                                                                                                  
                                                                                                                                                                                                                                                                                                                                                                                                                                         
  Filesystem Syscalls:                                                                                                                                                                                                                                                                                                                                                                                                                   
                                                                                                                                                                                                                                                                                                                                                                                                                                         
  | Syscall   | Without mtime | With mtime | Change |                                                                                                                                                                                                                                                                                                                                                                                    
  |-----------|---------------|------------|--------|                                                                                                                                                                                                                                                                                                                                                                                    
  | openat    | 3,792         | 3,792      | same   |                                                                                                                                                                                                                                                                                                                                                                                    
  | close     | 3,796         | 3,796      | same   |                                                                                                                                                                                                                                                                                                                                                                                    
  | statx     | 2,068         | 3,165      | +53%   |                                                                                                                                                                                                                                                                                                                                                                                    
  | readlink  | 2,194         | 2,194      | same   |                                                                                                                                                                                                                                                                                                                                                                                    
  | linkat    | 1,723         | 1,723      | same   |                                                                                                                                                                                                                                                                                                                                                                                    
  | unlink    | 1,724         | 1,724      | same   |                                                                                                                                                                                                                                                                                                                                                                                    
  | symlink   | 1,098         | 1,098      | same   |                                                                                                                                                                                                                                                                                                                                                                                    
  | chmod     | 166           | 166        | same   |                                                                                                                                                                                                                                                                                                                                                                                    
  | utimensat | 0             | 1,263      | +1,263 |                                                                                                                                                                                                                                                                                                                                                                                    
  | mkdir     | 54            | 54         | same   |                                                                                                                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                                                                                                                                                                                                         
  Sync Overhead:                                                                                                                                                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                                                                                                                                                                         
  | Metric | Without mtime | With mtime | Change |                                                                                                                                                                                                                                                                                                                                                                                       
  |--------|---------------|------------|--------|                                                                                                                                                                                                                                                                                                                                                                                       
  | futex  | 7,822         | 1,814      | -77%   |                                                                                                                                                                                                                                                                                                                                                                                       
                                                                                                                                                                                                                                                                                                                                                                                                                                         
  Summary:                                                                                                                                                                                                                                                                                                                                                                                                                               
  - The mtime sync adds ~1.7ms wallclock time (+10%)                                                                                                                                                                                                                                                                                                                                                                                     
  - Adds 1,263 utimensat calls and ~1,097 extra statx calls                                                                                                                                                                                                                                                                                                                                                                              
  - Interestingly, futex calls dropped 77% with the mtime patch (likely different code paths)                                                                                                                                                                                                                                                                                                                                            
  - Thread count identical (48 threads)                                                                                                                                                                                                                                                                                                                                                                                                  

I admit I am still surprised by the big difference in futex calls... Maybe it's because the thereads run a bit longer on average? I need to sleep on this:-)

hunger and others added 2 commits January 7, 2026 20:34
Configure the tokio runtime with worker_threads set to half
the CPU count and max_blocking_threads set to the CPU count.
This reduces thread overhead significantly, resulting in 91%
fewer threads and 99.9% fewer futex calls while improving
wallclock time by ~14%.
This should stop docker layers from changing their hash
when using pixi to populate them.
@hunger hunger force-pushed the feature/pix-1348-reproducible-builds-with-pixi-install-and-source_date_epoch branch from 61f9f72 to 8f6cfa5 Compare January 7, 2026 20:34
@hunger
Copy link
Collaborator Author

hunger commented Jan 8, 2026

So where should we go from here?

I see two ways forward:

  1. Use this PR and make sure mtimes are correct (wrt. the package) everywhere
  2. Make this and the extraction process ignore mtimes. We could then set the SOURCE_DATE_EPOCH for the mtime when linking then. We do not really need mtimes for the extracted package... except for the files we hardlink, but those will fix themselves once a package gets linked -- unless the cache gets read-only somehow, we get errors in that case when we try to link.

I think 1. is the safer option and closer to user expectations. Of course it is also a bit slower.

@baszalmstra baszalmstra merged commit 4831fc4 into conda:main Jan 8, 2026
31 of 33 checks passed
@hunger hunger deleted the feature/pix-1348-reproducible-builds-with-pixi-install-and-source_date_epoch branch January 8, 2026 11:14
@github-actions github-actions bot mentioned this pull request Jan 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants