Conversation
* Copy page stats from page headers to chunk metadata so it ends up in the file trailer. * Test case to verify presence of statistics in a generated file
|
Hello @zsolt-haraszti, thanks for submitting a pull request. The statistics are not generated unless the application opts-in to do so Line 1098 in 3f82efc We made it optional because the latest parquet format recommends not to write page statistics as part of the page header since the page index has the same information in a more usable form. This means that in the current form, the option would also control the creation of column chunk statistics. On a different note, I believe the change you submitted may have an issue with the correctness of statistics set on the column chunk metadata. There may be multiple pages per column chunk, so the column chunk statistics should reflect the aggregate over all pages rather than hold the statistics of the last page written to the chunk. Let me know if you have any comments on the feedback. |
Copy page stats from page headers to chunk metadata so it ends up in
the file trailer.
Test case to verify presence of statistics in a generated file