Add script for compression + pickle deletion#1095
Merged
Conversation
…or max speed because this is slow as hell
nemacysts
approved these changes
Feb 24, 2026
Member
nemacysts
left a comment
There was a problem hiding this comment.
(i'm assuming this is mostly temporary, and the general skeleton looks fine to me!)
| # Max DynamoDB object size is 400KB. Since we save two copies of the object (pickled and JSON), | ||
| # we need to consider this max size applies to the entire item, so we use a max size of 200KB | ||
| # for each version. | ||
| OBJECT_SIZE = 150_000 |
Member
There was a problem hiding this comment.
i was gonna ask if we should grab this from tron/serialize/runstate/dynamodb_state_store.py - but i assume we'll delete this once we're done so it doesn't really matter?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Script for migrating Tron's DynamoDB state store to gzip-compressed binary. I've added 3 commands:
status: Scans the table and reports how many keys need migration.compress: Gzip-compresses items/partitions using TransactWriteItems. Initially this was slow enough that I wanted to do it while Tron was running so I added some ConditionExpression guards to help prevent conflicting writes. I've since sped this up quite a bit so if we really want we can take Tron down while this runs and it shouldn't be too terrible, though the most heavily written jobs are already migrated so this is mostly just historic runs/less frequent jobs.delete-pickles: Removes pickle data (val, num_partitions). This will run once we've stopped writing pickles in #TODOTesting
--keysahead of time a few at a time, or do so after. I don't think it makes a difference.General plan: