Skip to content

Conversation

@nkemnitz
Copy link
Collaborator

@nkemnitz nkemnitz commented Jan 19, 2026

Set up forkserver with all the fat modules imported and the slow registration done.
--> Stability from "spawn" (forkserver created before gRPC clients and other C extension stuff got initialized)
--> Speed from "fork"

@nkemnitz nkemnitz force-pushed the nkem/fix-multiproc branch 2 times, most recently from ff96372 to 5cce2d3 Compare January 19, 2026 21:38
@codecov
Copy link

codecov bot commented Jan 19, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.97%. Comparing base (c9d60aa) to head (cb87b6a).
⚠️ Report is 49 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff            @@
##             main    #1140    +/-   ##
========================================
  Coverage   99.97%   99.97%            
========================================
  Files         190      195     +5     
  Lines        9675     9960   +285     
========================================
+ Hits         9673     9958   +285     
  Misses          2        2            

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@trivoldus28
Copy link
Contributor

@nkemnitz my concern with this approach is that the the C extensions that you're trying to skip are the slow ones at importing, necessitating using fork.

@trivoldus28
Copy link
Contributor

What is the problem with using fork in zetta_utils/mazepa_layer_processing/common/volumetric_apply_flow.py again?

@nkemnitz
Copy link
Collaborator Author

What is the problem with using fork in zetta_utils/mazepa_layer_processing/common/volumetric_apply_flow.py again?

At that point, one of the Google Python clients we use already called grpc_init, so we can't safely fork anymore. That's these I0000 00:00:1768914288.553430 907164 fork_posix.cc:71] Other threads are currently calling into gRPC, skipping fork() handlers messages during scheduling. In my case, it must have caused the Segmentation Fault. Although it might have been any other library that has similar restrictions with fork - CUDA, for example. Dodam also documented problems with fork and sharded reads in CloudVolume.

@trivoldus28
Copy link
Contributor

trivoldus28 commented Jan 20, 2026

Do we know what is using gRPC? Maybe we can defer loading the module until it is truly necessary.

Alternatively, all that is forked in volumetric_apply_flow.py is just chunking tasks. Seems possible to make a process specifically for chunking before loading gRPC and let it fork itself.

Another alternative is rewriting the chunking process as a generator but that seems complex.

@dodamih

@trivoldus28
Copy link
Contributor

Another thought is to write chunk generation in Cython. It then can use multiple cores within the same process without forking.

@nkemnitz
Copy link
Collaborator Author

Do we know what is using gRPC? Maybe we can defer loading the module until it is truly necessary.
Most of the Google stuff (Kubernetes, Billing, Firestore DB), I believe.

But I put gRPC only as example because it's the noisiest with the log messages. It's not guaranteed that gRPC is the culprit for the segmentation fault I encountered.

Regarding performance: For the problematic inference spec that invokes these multiprocessing branches (bbox_strider and volumetric_apply_flow), I did measure no difference in speed (actually the forkserver forks were ~2ms faster compared to regular fork, for some reason).
I also just improved the startup of the forkserver, so now main process and forkserver process are loading the modules in parallel.

If you have another spec to test, I am happy to check.

@nkemnitz nkemnitz marked this pull request as ready for review January 21, 2026 11:33
@nkemnitz
Copy link
Collaborator Author

Ah, still need to check what necessitates the test change...

@trivoldus28
Copy link
Contributor

trivoldus28 commented Jan 21, 2026 via email

@trivoldus28
Copy link
Contributor

@nkemnitz
Copy link
Collaborator Author

Thanks, I took your script and increased #TEST_SIZE: [5000, 5000, 1600], otherwise the number of tasks was too low to hit any of the multiprocessing branches. No major difference between timings. forkserver maybe even slightly faster in setting up the forks, although I don't know why that would be the case.

This PR:

2026-01-22 10:59:13.473 INFO     [PID 13691] zetta_utils /home/nkemnitz/zetta/zetta_utils/zetta_utils/mazepa/dryrun.py:  24                                                                                                                                                                                            
                                 Starting dryrun....                                                                                                                                                                                                                                                                   
2026-01-22 10:59:14.120 INFO     [PID 13691] zetta_utils /home/nkemnitz/zetta/zetta_utils/zetta_utils/geometry/bbox_strider.py: 253                                                                                                                                                                                    
                                 [get_all_chunk_bboxes]: mp setup for 23409 tasks took 0.15s, total processing time 0.64s                                                                                                                                                                                              
2026-01-22 10:59:14.579 INFO     [PID 13691] zetta_utils /home/nkemnitz/zetta/zetta_utils/zetta_utils/geometry/bbox_strider.py: 253                                                                                                                                                                                    
                                 [get_all_chunk_bboxes]: mp setup for 4698 tasks took 0.15s, total processing time 0.22s                                                                                                                                                                                               
2026-01-22 10:59:18.281 INFO     [PID 13691] zetta_utils /home/nkemnitz/zetta/zetta_utils/zetta_utils/mazepa_layer_processing/common/volumetric_apply_flow.py: 636                                                                                                                                                     
                                 [make_tasks_with_checkerboarding]: no mp branch for 9 tasks, total processing time 3.70s                                                                                                                                                                                              
2026-01-22 10:59:18.873 INFO     [PID 13691] zetta_utils /home/nkemnitz/zetta/zetta_utils/zetta_utils/geometry/bbox_strider.py: 253                                                                                                                                                                                    
                                 [get_all_chunk_bboxes]: mp setup for 4617 tasks took 0.14s, total processing time 0.22s                                                                                                                                                                                               
2026-01-22 10:59:22.420 INFO     [PID 13691] zetta_utils /home/nkemnitz/zetta/zetta_utils/zetta_utils/mazepa_layer_processing/common/volumetric_apply_flow.py: 636                                                                                                                                                     
                                 [make_tasks_with_checkerboarding]: no mp branch for 9 tasks, total processing time 3.54s                                                                                                                                                                                              
2026-01-22 10:59:22.987 INFO     [PID 13691] zetta_utils /home/nkemnitz/zetta/zetta_utils/zetta_utils/geometry/bbox_strider.py: 253                                                                                                                                                                                    
                                 [get_all_chunk_bboxes]: mp setup for 4698 tasks took 0.14s, total processing time 0.22s                                                                                                                                                                                               
2026-01-22 10:59:26.600 INFO     [PID 13691] zetta_utils /home/nkemnitz/zetta/zetta_utils/zetta_utils/mazepa_layer_processing/common/volumetric_apply_flow.py: 636                                                                                                                                                     
                                 [make_tasks_with_checkerboarding]: no mp branch for 9 tasks, total processing time 3.61s                                                                                                                                                                                              
2026-01-22 10:59:27.180 INFO     [PID 13691] zetta_utils /home/nkemnitz/zetta/zetta_utils/zetta_utils/geometry/bbox_strider.py: 253                                                                                                                                                                                    
                                 [get_all_chunk_bboxes]: mp setup for 4617 tasks took 0.14s, total processing time 0.22s                                                                                                                                                                                               
2026-01-22 10:59:30.736 INFO     [PID 13691] zetta_utils /home/nkemnitz/zetta/zetta_utils/zetta_utils/mazepa_layer_processing/common/volumetric_apply_flow.py: 636                                                                                                                                                     
                                 [make_tasks_with_checkerboarding]: no mp branch for 9 tasks, total processing time 3.55s                                                                                                                                                                                              
2026-01-22 10:59:31.318 INFO     [PID 13691] zetta_utils /home/nkemnitz/zetta/zetta_utils/zetta_utils/geometry/bbox_strider.py: 253                                                                                                                                                                                    
                                 [get_all_chunk_bboxes]: mp setup for 4698 tasks took 0.15s, total processing time 0.22s                                                                                                                                                                                               
2026-01-22 10:59:34.970 INFO     [PID 13691] zetta_utils /home/nkemnitz/zetta/zetta_utils/zetta_utils/mazepa_layer_processing/common/volumetric_apply_flow.py: 636                                                                                                                                                     
                                 [make_tasks_with_checkerboarding]: no mp branch for 9 tasks, total processing time 3.65s                                                                                                                                                                                              
2026-01-22 10:59:35.549 INFO     [PID 13691] zetta_utils /home/nkemnitz/zetta/zetta_utils/zetta_utils/geometry/bbox_strider.py: 253                                                                                                                                                                                    
                                 [get_all_chunk_bboxes]: mp setup for 4617 tasks took 0.15s, total processing time 0.23s                                                                                                                                                                                               
2026-01-22 10:59:39.121 INFO     [PID 13691] zetta_utils /home/nkemnitz/zetta/zetta_utils/zetta_utils/mazepa_layer_processing/common/volumetric_apply_flow.py: 636                                                                                                                                                     
                                 [make_tasks_with_checkerboarding]: no mp branch for 9 tasks, total processing time 3.57s                                                                                                                                                                                              
2026-01-22 10:59:39.699 INFO     [PID 13691] zetta_utils /home/nkemnitz/zetta/zetta_utils/zetta_utils/geometry/bbox_strider.py: 253                                                                                                                                                                                    
                                 [get_all_chunk_bboxes]: mp setup for 4698 tasks took 0.15s, total processing time 0.22s                                                                                                                                                                                               
2026-01-22 10:59:43.325 INFO     [PID 13691] zetta_utils /home/nkemnitz/zetta/zetta_utils/zetta_utils/mazepa_layer_processing/common/volumetric_apply_flow.py: 636                                                                                                                                                     
                                 [make_tasks_with_checkerboarding]: no mp branch for 9 tasks, total processing time 3.62s                                                                                                                                                                                              
2026-01-22 10:59:43.903 INFO     [PID 13691] zetta_utils /home/nkemnitz/zetta/zetta_utils/zetta_utils/geometry/bbox_strider.py: 253                                                                                                                                                                                    
                                 [get_all_chunk_bboxes]: mp setup for 4617 tasks took 0.14s, total processing time 0.22s                                                                                                                                                                                               
2026-01-22 10:59:47.464 INFO     [PID 13691] zetta_utils /home/nkemnitz/zetta/zetta_utils/zetta_utils/mazepa_layer_processing/common/volumetric_apply_flow.py: 636                                                                                                                                                     
                                 [make_tasks_with_checkerboarding]: no mp branch for 9 tasks, total processing time 3.56s                                                                                                                                                                                              
2026-01-22 10:59:47.808 INFO     [PID 13691] zetta_utils /home/nkemnitz/zetta/zetta_utils/zetta_utils/mazepa_layer_processing/common/volumetric_apply_flow.py: 832 

main:

2026-01-22 11:18:18.950 INFO     [PID 18361] zetta_utils /home/nkemnitz/zetta/zetta_utils/zetta_utils/mazepa/dryrun.py:  24                                                                                                                                                                                            
                                 Starting dryrun....                                                                                                                                                                                                                                                                   
2026-01-22 11:18:20.077 INFO     [PID 18361] zetta_utils /home/nkemnitz/zetta/zetta_utils/zetta_utils/geometry/bbox_strider.py: 253                                                                                                                                                                                    
                                 [get_all_chunk_bboxes]: mp setup for 23409 tasks took 0.33s, total processing time 1.12s                                                                                                                                                                                              
2026-01-22 11:18:20.700 INFO     [PID 18361] zetta_utils /home/nkemnitz/zetta/zetta_utils/zetta_utils/geometry/bbox_strider.py: 253                                                                                                                                                                                    
                                 [get_all_chunk_bboxes]: mp setup for 4698 tasks took 0.28s, total processing time 0.34s                                                                                                                                                                                               
2026-01-22 11:18:25.404 INFO     [PID 18361] zetta_utils /home/nkemnitz/zetta/zetta_utils/zetta_utils/mazepa_layer_processing/common/volumetric_apply_flow.py: 636                                                                                                                                                     
                                 [make_tasks_with_checkerboarding]: no mp branch for 9 tasks, total processing time 4.70s                                                                                                                                                                                              
2026-01-22 11:18:26.183 INFO     [PID 18361] zetta_utils /home/nkemnitz/zetta/zetta_utils/zetta_utils/geometry/bbox_strider.py: 253                                                                                                                                                                                    
                                 [get_all_chunk_bboxes]: mp setup for 4617 tasks took 0.30s, total processing time 0.37s                                                                                                                                                                                               
2026-01-22 11:18:30.311 INFO     [PID 18361] zetta_utils /home/nkemnitz/zetta/zetta_utils/zetta_utils/mazepa_layer_processing/common/volumetric_apply_flow.py: 636                                                                                                                                                     
                                 [make_tasks_with_checkerboarding]: no mp branch for 9 tasks, total processing time 4.12s                                                                                                                                                                                              
2026-01-22 11:18:30.930 INFO     [PID 18361] zetta_utils /home/nkemnitz/zetta/zetta_utils/zetta_utils/geometry/bbox_strider.py: 253                                                                                                                                                                                    
                                 [get_all_chunk_bboxes]: mp setup for 4698 tasks took 0.20s, total processing time 0.25s                                                                                                                                                                                               
2026-01-22 11:18:34.529 INFO     [PID 18361] zetta_utils /home/nkemnitz/zetta/zetta_utils/zetta_utils/mazepa_layer_processing/common/volumetric_apply_flow.py: 636                                                                                                                                                     
                                 [make_tasks_with_checkerboarding]: no mp branch for 9 tasks, total processing time 3.59s                                                                                                                                                                                              
2026-01-22 11:18:35.152 INFO     [PID 18361] zetta_utils /home/nkemnitz/zetta/zetta_utils/zetta_utils/geometry/bbox_strider.py: 253                                                                                                                                                                                    
                                 [get_all_chunk_bboxes]: mp setup for 4617 tasks took 0.22s, total processing time 0.26s                                                                                                                                                                                               
2026-01-22 11:18:38.708 INFO     [PID 18361] zetta_utils /home/nkemnitz/zetta/zetta_utils/zetta_utils/mazepa_layer_processing/common/volumetric_apply_flow.py: 636                                                                                                                                                     
                                 [make_tasks_with_checkerboarding]: no mp branch for 9 tasks, total processing time 3.55s                                                                                                                                                                                              
2026-01-22 11:18:39.316 INFO     [PID 18361] zetta_utils /home/nkemnitz/zetta/zetta_utils/zetta_utils/geometry/bbox_strider.py: 253                                                                                                                                                                                    
                                 [get_all_chunk_bboxes]: mp setup for 4698 tasks took 0.20s, total processing time 0.25s                                                                                                                                                                                               
2026-01-22 11:18:42.954 INFO     [PID 18361] zetta_utils /home/nkemnitz/zetta/zetta_utils/zetta_utils/mazepa_layer_processing/common/volumetric_apply_flow.py: 636                                                                                                                                                     
                                 [make_tasks_with_checkerboarding]: no mp branch for 9 tasks, total processing time 3.63s                                                                                                                                                                                              
2026-01-22 11:18:43.574 INFO     [PID 18361] zetta_utils /home/nkemnitz/zetta/zetta_utils/zetta_utils/geometry/bbox_strider.py: 253                                                                                                                                                                                    
                                 [get_all_chunk_bboxes]: mp setup for 4617 tasks took 0.21s, total processing time 0.27s                                                                                                                                                                                               
2026-01-22 11:18:47.100 INFO     [PID 18361] zetta_utils /home/nkemnitz/zetta/zetta_utils/zetta_utils/mazepa_layer_processing/common/volumetric_apply_flow.py: 636                                                                                                                                                     
                                 [make_tasks_with_checkerboarding]: no mp branch for 9 tasks, total processing time 3.52s                                                                                                                                                                                              
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1769077127.465597   18361 fork_posix.cc:71] Other threads are currently calling into gRPC, skipping fork() handlers
2026-01-22 11:18:47.713 INFO     [PID 18361] zetta_utils /home/nkemnitz/zetta/zetta_utils/zetta_utils/geometry/bbox_strider.py: 253                                                                                                                                                                                    
                                 [get_all_chunk_bboxes]: mp setup for 4698 tasks took 0.20s, total processing time 0.25s                                                                                                                                                                                               
2026-01-22 11:18:51.353 INFO     [PID 18361] zetta_utils /home/nkemnitz/zetta/zetta_utils/zetta_utils/mazepa_layer_processing/common/volumetric_apply_flow.py: 636                                                                                                                                                     
                                 [make_tasks_with_checkerboarding]: no mp branch for 9 tasks, total processing time 3.64s                                                                                                                                                                                              
2026-01-22 11:18:51.990 INFO     [PID 18361] zetta_utils /home/nkemnitz/zetta/zetta_utils/zetta_utils/geometry/bbox_strider.py: 253                                                                                                                                                                                    
                                 [get_all_chunk_bboxes]: mp setup for 4617 tasks took 0.22s, total processing time 0.27s                                                                                                                                                                                               
2026-01-22 11:18:55.517 INFO     [PID 18361] zetta_utils /home/nkemnitz/zetta/zetta_utils/zetta_utils/mazepa_layer_processing/common/volumetric_apply_flow.py: 636                                                                                                                                                     
                                 [make_tasks_with_checkerboarding]: no mp branch for 9 tasks, total processing time 3.52s                                                                                                                                                                                              
2026-01-22 11:18:55.861 INFO     [PID 18361] zetta_utils /home/nkemnitz/zetta/zetta_utils/zetta_utils/mazepa_layer_processing/common/volumetric_apply_flow.py: 832                                                                                                                                                     

@trivoldus28
Copy link
Contributor

Are you actually running inference? If yes, can you try replicate the performance bug by removing the threshold for single threaded chunking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants