Bugfix/fix race condition by bguise987 · Pull Request #41 · bguise987/pigz-python

bguise987 · 2026-01-22T22:43:52Z

Fix for issue #34

Peek ahead to determine if the next set of bytes is the end of the file, and pass this along sooner in the process of reading in the source file.

The _process_chunk method checked _last_chunk to determine whether to use Z_FINISH, but _last_chunk wasn't set until after the read thread submitted the final chunk. This caused the last chunk to sometimes be compressed with Z_SYNC_FLUSH instead of Z_FINISH, producing invalid gzip files with unterminated deflate streams (00 00 FF FF marker). Fix by peeking ahead in _read_file to determine is_last before submitting to the pool, and passing the flag directly to _process_chunk. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

bguise987 · 2026-02-23T00:30:04Z

Race Condition Bugfix Validation Results

Background

This branch fixes a race condition in pigz_python where the _process_chunk method checked _last_chunk to determine whether to use Z_FINISH, but _last_chunk wasn't set until after the read thread submitted the final chunk. This caused the last chunk to sometimes be compressed with Z_SYNC_FLUSH instead of Z_FINISH, producing invalid gzip files with unterminated deflate streams.

The fix passes an is_last flag directly to _process_chunk by peeking ahead in _read_file, eliminating the race.

Test Methodology

A validation test script (compression_validation_test.py) was used to repeatedly:

Calculate the MD5 hash of an original file
Compress it using pigz_python
Decompress the resulting .gz file using Python's gzip module
Calculate the MD5 hash of the decompressed file
Compare the two hashes

Three test files of varying type and size were used per iteration:

text file
PDF
binary

The script runs in a continuous loop until a failure is detected or the process is manually stopped.

For each trial, the test was run simultaneously against both the master branch and the bugfix branch using the same worker count. The trial was stopped for the bugfix branch once the master branch was observed to fail. Two configurations were tested: 20 worker threads and the default worker count (system CPU count).

How Master Branch Fails

When the race condition is triggered on the master branch, the compressed output contains an unterminated deflate stream. During decompression, Python's gzip.open() hits an error and the script crashes.

Results: 20 Worker Threads

Trial	Bugfix Iterations	Master Iterations Before Hang	Master Failed
1	463	401	Yes (hung)
2	366	41	Yes (hung)
3	67	17	Yes (hung)
4	96	57	Yes (hung)
5	1,181	146	Yes (hung)

Bugfix total (20 workers): 2,173 iterations, 6,514 file validations, 0 failures

Results: Default Worker Count

Trial	Bugfix Iterations	Master Iterations Before Hang	Master Failed
1	37	37	Yes (hung)
2	2,209	1,734	Yes (hung)
3	805	241	Yes (hung)
4	4,602	3,910	Yes (hung)

Bugfix total (default workers): 7,653 iterations, 22,955 file validations, 0 failures

Summary

Metric	Bugfix Branch	Master Branch
Total iterations run	9,826	6,584
Total file validations	29,469	19,730
Failures	0	9 (all trials)
Failure mode	N/A	Hang during decompression of corrupt gzip

Across 9 trial runs, the master branch hung due to gzip corruption in every single trial. The bugfix branch completed all 9,826 iterations (29,469 individual file compress/decompress/validate cycles) with zero failures. The race condition is reliably triggered under load, and the fix eliminates it.

Pathfinder216

This looks good. Thanks for figuring this out and fixing it!

coreyhartley

lgtm

* Update github username (#39) * Update Python versions (#40) * Update tox.ini for newer file format and new Python versions * Update GitHub Actions workflow to use newer Python versions * Update GitHub Actions versions * Bugfix/fix race condition (#41) * Fix race condition causing intermittent gzip corruption The _process_chunk method checked _last_chunk to determine whether to use Z_FINISH, but _last_chunk wasn't set until after the read thread submitted the final chunk. This caused the last chunk to sometimes be compressed with Z_SYNC_FLUSH instead of Z_FINISH, producing invalid gzip files with unterminated deflate streams (00 00 FF FF marker). Fix by peeking ahead in _read_file to determine is_last before submitting to the pool, and passing the flag directly to _process_chunk. * Update tests to reflect passing in is_last to _process_chunk --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

* Update github username (#39) * Update Python versions (#40) * Update tox.ini for newer file format and new Python versions * Update GitHub Actions workflow to use newer Python versions * Update GitHub Actions versions * Specify GitHub actions Python versions as strings * Bugfix/fix race condition (#41) * Fix race condition causing intermittent gzip corruption The _process_chunk method checked _last_chunk to determine whether to use Z_FINISH, but _last_chunk wasn't set until after the read thread submitted the final chunk. This caused the last chunk to sometimes be compressed with Z_SYNC_FLUSH instead of Z_FINISH, producing invalid gzip files with unterminated deflate streams (00 00 FF FF marker). Fix by peeking ahead in _read_file to determine is_last before submitting to the pool, and passing the flag directly to _process_chunk. * Update tests to reflect passing in is_last to _process_chunk --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> * Update how package gets version metadata (#43) * Update how package gets version metadata * Add pyproject.toml * Delete setup.py --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

* Update github username (#39) * Update Python versions (#40) * Update tox.ini for newer file format and new Python versions * Update GitHub Actions workflow to use newer Python versions * Update GitHub Actions versions * Specify GitHub actions Python versions as strings * Bugfix/fix race condition (#41) * Fix race condition causing intermittent gzip corruption The _process_chunk method checked _last_chunk to determine whether to use Z_FINISH, but _last_chunk wasn't set until after the read thread submitted the final chunk. This caused the last chunk to sometimes be compressed with Z_SYNC_FLUSH instead of Z_FINISH, producing invalid gzip files with unterminated deflate streams (00 00 FF FF marker). Fix by peeking ahead in _read_file to determine is_last before submitting to the pool, and passing the flag directly to _process_chunk. * Update tests to reflect passing in is_last to _process_chunk --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> * Update how package gets version metadata (#43) * Update how package gets version metadata * Add pyproject.toml * Delete setup.py * Version bump to 2.0.0 --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

bguise987 and others added 3 commits January 22, 2026 17:34

Update tests to reflect passing in is_last to _process_chunk

e66093b

Black formatting

9e384b4

Pathfinder216 approved these changes Feb 23, 2026

View reviewed changes

coreyhartley approved these changes Feb 23, 2026

View reviewed changes

bguise987 merged commit 9a52847 into develop Mar 11, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bugfix/fix race condition#41

Bugfix/fix race condition#41
bguise987 merged 3 commits intodevelopfrom
bugfix/fix-race-condition

bguise987 commented Jan 22, 2026

Uh oh!

bguise987 commented Feb 23, 2026

Uh oh!

Pathfinder216 left a comment

Uh oh!

coreyhartley left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

bguise987 commented Jan 22, 2026

Uh oh!

bguise987 commented Feb 23, 2026

Race Condition Bugfix Validation Results

Background

Test Methodology

How Master Branch Fails

Results: 20 Worker Threads

Results: Default Worker Count

Summary

Uh oh!

Pathfinder216 left a comment

Choose a reason for hiding this comment

Uh oh!

coreyhartley left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants