This repository was archived by the owner on Aug 21, 2024. It is now read-only.
Draft: Proposal to enable ZST encoded archives loaded by the transformer#574
Draft
Draft: Proposal to enable ZST encoded archives loaded by the transformer#574
Conversation
78910b9 to
be59a00
Compare
miyunari
reviewed
Jun 2, 2023
| - name: Upload to S3 | ||
| run: | | ||
| s3cmd sync --acl-public --delete-removed --mime-type application/json \ | ||
| s3cmd sync --acl-public --mime-type application/json \ |
Member
There was a problem hiding this comment.
I see why you think this could be a good idea. But we actually need to remove the files that are not uploaded, otherwise we can't filter things in the transformer, since they will still be on S3 :) WDYT?
Member
Author
There was a problem hiding this comment.
Ah I see, I had no idea. Yes that makes sense to me.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The retriever has been switched to provide zstandard encoded raw data instead of pure JSON.
As this is indeed an archived format it cannot be read like pure JSON data.
To enable the transformer to work with the new data we need to add the proper file conversion.
These changes are up for debate. However we need to fix this within the day as the scheduled transformer run fortomorrow morning will break the data again otherwise.
I've also removed the delete statement in our S3 upload as this removes files that aren't within the upload folder but are present in the S3 bucket.