-
Notifications
You must be signed in to change notification settings - Fork 1k
Refactor: Move parquet metadata parsing code into its own module #8436
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
// specific language governing permissions and limitations | ||
// under the License. | ||
|
||
//! Internal metadata parsing routines |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this code is just moved, with minimal changes, from the reader.rs module
|
||
self.parse_column_index(&bytes, offset)?; | ||
self.parse_offset_index(&bytes, offset)?; | ||
parse_column_index(metadata, self.column_index, &bytes, offset)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these functions need access to some state on self, so I passed them as explicit arguments instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also looks good. I'll note that most of this code is going to move again as part of the thrift remodel as it needs to be closer to the private thrift structs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @alamb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good @alamb, just a minor nit.
Co-authored-by: Matthijs Brobbel <m1brobbel@gmail.com>
Ouch time 😅 |
As in this caused merge conflicts with the thrift-remodel branch? |
Precisely 😁 Took me two tries, but I got it merged. But that's what I signed up for. Actually I've been amazed so far by the lack of conflicts. |
Ideally the code to parse thrift should be pretty isolated (though I realize that is not the current state of main 😆 ) I feel like we are finally getting to the point where parsing / decoding and IO are all separated nicely, which I think sets us up very nicely for additional performance improvements |
Which issue does this PR close?
Note while this is a large (in line count) code change, it should be relatively easy to review as it is just moving code around
Rationale for this change
In #8340 I am trying to split the "IO" from the "where is the metadata in the file" from the "decode thrift into Rust structures" logic. The first part of this is simply to move the code that handles the "decode thrift into Rust structures" into its own module.
What changes are included in this PR?
parquet/src/file/metadata/mod.rs
toparquet/src/file/metadata/parser.rs
Are these changes tested?
yes, by CI
Are there any user-facing changes?
No, this is entirely internal reorganization