Skip to content

Refactor/make pyarrow pq optional#169

Merged
singjc merged 11 commits intoPyProphet:masterfrom
singjc:refactor/make_pyarrow_pq_optional
Oct 23, 2025
Merged

Refactor/make pyarrow pq optional#169
singjc merged 11 commits intoPyProphet:masterfrom
singjc:refactor/make_pyarrow_pq_optional

Conversation

@singjc
Copy link
Contributor

@singjc singjc commented Oct 23, 2025

This pull request refactors how Parquet support is handled in the codebase, making pyarrow an optional dependency and ensuring it is only imported when needed. This improves installation flexibility and reduces unnecessary dependencies for users who do not require Parquet functionality. The changes also simplify the dispatcher logic for Parquet readers and writers, and update build scripts to exclude optional modules from bundled distributions.

Dependency and packaging changes:

  • Made pyarrow an optional dependency by removing it from the main install requirements in pyproject.toml, and adding a new parquet extra for optional installation. [1] [2]
  • Updated build scripts (build_linux.sh, build_macos.sh, build_windows.bat) to explicitly exclude pyarrow and other optional/dev modules from PyInstaller builds, reducing package size and avoiding unnecessary dependencies. [1] [2] [3]

Codebase refactoring for Parquet support:

  • Refactored Parquet reader/writer imports in pyprophet/io/dispatcher.py to use new utility functions that lazily import the correct classes only when needed, rather than importing all Parquet modules up front. [1] [2] [3] [4]
  • Added _ensure_pyarrow and related utility functions in pyprophet/io/util.py to handle lazy importing of pyarrow, provide user-friendly error messages if it is missing, and centralize logic for resolving the correct Parquet reader/writer class for a given config. [1] [2]

Module-level import improvements:

  • Updated all Parquet-related modules to use _ensure_pyarrow() for lazy importing, preventing ImportError at module load time if pyarrow is not installed. This change is reflected in all parquet.py and split_parquet.py files across the different IO areas. [1] [2] [3] [4] [5] [6]

Documentation updates:

  • Updated documentation in pyprophet/io/__init__.py to clarify that pyarrow is now optional and only required for Parquet support.

Parquet utility enhancements:

  • Modified Parquet utility functions to use lazy imports and improved error handling, ensuring that user-facing errors are clear if Parquet support is requested without the necessary dependencies. [1] [2]

Let me know if you have questions about how lazy importing works or how these changes affect the installation and usage of Parquet features!

@singjc singjc merged commit f8fcfbd into PyProphet:master Oct 23, 2025
4 checks passed
@singjc singjc deleted the refactor/make_pyarrow_pq_optional branch October 23, 2025 13:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant