Conversation
The _process_chunk method checked _last_chunk to determine whether to use Z_FINISH, but _last_chunk wasn't set until after the read thread submitted the final chunk. This caused the last chunk to sometimes be compressed with Z_SYNC_FLUSH instead of Z_FINISH, producing invalid gzip files with unterminated deflate streams (00 00 FF FF marker). Fix by peeking ahead in _read_file to determine is_last before submitting to the pool, and passing the flag directly to _process_chunk. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Race Condition Bugfix Validation ResultsBackgroundThis branch fixes a race condition in The fix passes an Test MethodologyA validation test script (
Three test files of varying type and size were used per iteration:
The script runs in a continuous loop until a failure is detected or the process is manually stopped. For each trial, the test was run simultaneously against both the master branch and the bugfix branch using the same worker count. The trial was stopped for the bugfix branch once the master branch was observed to fail. Two configurations were tested: 20 worker threads and the default worker count (system CPU count). How Master Branch FailsWhen the race condition is triggered on the master branch, the compressed output contains an unterminated deflate stream. During decompression, Python's Results: 20 Worker Threads
Bugfix total (20 workers): 2,173 iterations, 6,514 file validations, 0 failures Results: Default Worker Count
Bugfix total (default workers): 7,653 iterations, 22,955 file validations, 0 failures Summary
Across 9 trial runs, the master branch hung due to gzip corruption in every single trial. The bugfix branch completed all 9,826 iterations (29,469 individual file compress/decompress/validate cycles) with zero failures. The race condition is reliably triggered under load, and the fix eliminates it. |
Pathfinder216
left a comment
There was a problem hiding this comment.
This looks good. Thanks for figuring this out and fixing it!
* Update github username (#39) * Update Python versions (#40) * Update tox.ini for newer file format and new Python versions * Update GitHub Actions workflow to use newer Python versions * Update GitHub Actions versions * Bugfix/fix race condition (#41) * Fix race condition causing intermittent gzip corruption The _process_chunk method checked _last_chunk to determine whether to use Z_FINISH, but _last_chunk wasn't set until after the read thread submitted the final chunk. This caused the last chunk to sometimes be compressed with Z_SYNC_FLUSH instead of Z_FINISH, producing invalid gzip files with unterminated deflate streams (00 00 FF FF marker). Fix by peeking ahead in _read_file to determine is_last before submitting to the pool, and passing the flag directly to _process_chunk. * Update tests to reflect passing in is_last to _process_chunk --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
* Update github username (#39) * Update Python versions (#40) * Update tox.ini for newer file format and new Python versions * Update GitHub Actions workflow to use newer Python versions * Update GitHub Actions versions * Specify GitHub actions Python versions as strings * Bugfix/fix race condition (#41) * Fix race condition causing intermittent gzip corruption The _process_chunk method checked _last_chunk to determine whether to use Z_FINISH, but _last_chunk wasn't set until after the read thread submitted the final chunk. This caused the last chunk to sometimes be compressed with Z_SYNC_FLUSH instead of Z_FINISH, producing invalid gzip files with unterminated deflate streams (00 00 FF FF marker). Fix by peeking ahead in _read_file to determine is_last before submitting to the pool, and passing the flag directly to _process_chunk. * Update tests to reflect passing in is_last to _process_chunk --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> * Update how package gets version metadata (#43) * Update how package gets version metadata * Add pyproject.toml * Delete setup.py --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
* Update github username (#39) * Update Python versions (#40) * Update tox.ini for newer file format and new Python versions * Update GitHub Actions workflow to use newer Python versions * Update GitHub Actions versions * Specify GitHub actions Python versions as strings * Bugfix/fix race condition (#41) * Fix race condition causing intermittent gzip corruption The _process_chunk method checked _last_chunk to determine whether to use Z_FINISH, but _last_chunk wasn't set until after the read thread submitted the final chunk. This caused the last chunk to sometimes be compressed with Z_SYNC_FLUSH instead of Z_FINISH, producing invalid gzip files with unterminated deflate streams (00 00 FF FF marker). Fix by peeking ahead in _read_file to determine is_last before submitting to the pool, and passing the flag directly to _process_chunk. * Update tests to reflect passing in is_last to _process_chunk --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> * Update how package gets version metadata (#43) * Update how package gets version metadata * Add pyproject.toml * Delete setup.py * Version bump to 2.0.0 --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Fix for issue #34
Peek ahead to determine if the next set of bytes is the end of the file, and pass this along sooner in the process of reading in the source file.