Skip to content

Update ParquetSharp to Reflect changes in Arrow APIs and Type Defintions #524

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

JoshuaOloton
Copy link
Contributor

Description

This PR updates ParquetSharp to maintain compatibility with recent changes introduced in Arrow. The deprecated overloads of GetRecordBatchReader and the PARQUET_2_0 enum value have been removed in Arrow 19+, along with some other modifications (apache/arrow#44990, apache/arrow#45849, apache/arrow#45375 and apache/arrow#46132).

Checklist

  • Migrated to arrow::Result-based APIs (PARQUET_ASSIGN_OR_THROW) as the old Status-based methods were removed.
  • Refactored usage of GetRecordBatchReader to align with Arrow's updated method definitions.
  • Updated enums and added Variant LogicalType.

Related Issues

@adamreeve
Copy link
Contributor

Hi @JoshuaOloton. Thanks for this PR. We won't want to merge this yet though, as it only makes sense to make these changes when we upgrade the version of Arrow C++ we are using. That will need to wait until Arrow 20.0.0 is in vcpkg (microsoft/vcpkg#45244).

The removal of DeepClone also looks concerning. I don't think it's valid to change our deep clone implementations to just create new shared pointer copies.

@JoshuaOloton
Copy link
Contributor Author

That makes sense @adamreeve ,thanks for the feedback! I understand your concerns on Deepclone, I'm open to further input on how to best implement its update.

@adamreeve
Copy link
Contributor

Looking at the PR that removed the DeepClone methods, it looks like this was previously required as encryption keys would be wiped out after use, but that is no longer the case. So it is now safe to reuse these classes without needing to clone them.

I think we should remove the C++ deep clone methods and change the C# DeepClone methods to return this, and add an Obsolete attribute on them with an explanation that they aren't needed any more.

@adamreeve
Copy link
Contributor

I've included most of your changes in #528, thanks @JoshuaOloton.

Note that this doesn't include adding the variant logical type as that wasn't included in Parquet 20.0.0.

@adamreeve adamreeve closed this May 23, 2025
@JoshuaOloton
Copy link
Contributor Author

That's great to hear @adamreeve, thanks for the update! Noted on the variant logical type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG]: Build failure due to updated Arrow APIs
2 participants