Refactor/make pyarrow pq optional#169
Merged
singjc merged 11 commits intoPyProphet:masterfrom Oct 23, 2025
Merged
Conversation
…ndling for parquet support
…utility functions
… macOS, and Windows
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request refactors how Parquet support is handled in the codebase, making
pyarrowan optional dependency and ensuring it is only imported when needed. This improves installation flexibility and reduces unnecessary dependencies for users who do not require Parquet functionality. The changes also simplify the dispatcher logic for Parquet readers and writers, and update build scripts to exclude optional modules from bundled distributions.Dependency and packaging changes:
pyarrowan optional dependency by removing it from the main install requirements inpyproject.toml, and adding a newparquetextra for optional installation. [1] [2]build_linux.sh,build_macos.sh,build_windows.bat) to explicitly excludepyarrowand other optional/dev modules from PyInstaller builds, reducing package size and avoiding unnecessary dependencies. [1] [2] [3]Codebase refactoring for Parquet support:
pyprophet/io/dispatcher.pyto use new utility functions that lazily import the correct classes only when needed, rather than importing all Parquet modules up front. [1] [2] [3] [4]_ensure_pyarrowand related utility functions inpyprophet/io/util.pyto handle lazy importing ofpyarrow, provide user-friendly error messages if it is missing, and centralize logic for resolving the correct Parquet reader/writer class for a given config. [1] [2]Module-level import improvements:
_ensure_pyarrow()for lazy importing, preventingImportErrorat module load time ifpyarrowis not installed. This change is reflected in allparquet.pyandsplit_parquet.pyfiles across the different IO areas. [1] [2] [3] [4] [5] [6]Documentation updates:
pyprophet/io/__init__.pyto clarify thatpyarrowis now optional and only required for Parquet support.Parquet utility enhancements:
Let me know if you have questions about how lazy importing works or how these changes affect the installation and usage of Parquet features!