From 1602e8e98b3c2adb9e0f4898633e8a7cf5a850d7 Mon Sep 17 00:00:00 2001 From: Andreas Motl Date: Sun, 14 Sep 2025 23:03:36 +0200 Subject: [PATCH 1/3] Polars: Index page --- docs/connect/df/index.md | 63 ++-------------------------------- docs/integrate/index.md | 1 + docs/integrate/polars/index.md | 63 ++++++++++++++++++++++++++++++++++ 3 files changed, 66 insertions(+), 61 deletions(-) create mode 100644 docs/integrate/polars/index.md diff --git a/docs/connect/df/index.md b/docs/connect/df/index.md index 7313a199..50cc9e01 100644 --- a/docs/connect/df/index.md +++ b/docs/connect/df/index.md @@ -86,67 +86,10 @@ and operations for manipulating numerical tables and time series. - [From data storage to data analysis: Tutorial on CrateDB and pandas] - -(polars)= ## Polars - -:::{rubric} About -::: - -```{div} -:style: "float: right; margin-left: 0.5em" -[![](https://github.com/pola-rs/polars-static/raw/master/logos/polars-logo-dark.svg){w=180px}](https://pola.rs/) -``` - -[Polars] is a blazingly fast DataFrames library with language bindings for -Rust, Python, Node.js, R, and SQL. Polars is powered by a multithreaded, -vectorized query engine, it is open source, and written in Rust. - -- **Fast:** Written from scratch in Rust and with performance in mind, - designed close to the machine, and without external dependencies. - -- **I/O:** First class support for all common data storage layers: local, - cloud storage & databases. - -- **Intuitive API:** Write your queries the way they were intended. Polars, - internally, will determine the most efficient way to execute using its query - optimizer. Polars' expressions are intuitive and empower you to write - readable and performant code at the same time. - -- **Out of Core:** The streaming API allows you to process your results without - requiring all your data to be in memory at the same time. - -- **Parallel:** Polars' multi-threaded query engine utilises the power of your - machine by dividing the workload among the available CPU cores without any - additional configuration. - -- **Vectorized Query Engine:** Uses [Apache Arrow], a columnar data format, to - process your queries in a vectorized manner and SIMD to optimize CPU usage. - This enables cache-coherent algorithms and high performance on modern processors. - -- **Open Source:** Polars is and always will be open source. Driven by an active - community of developers. Everyone is encouraged to add new features and contribute. - It is free to use under the MIT license. - -:::{rubric} Data formats -::: - -Polars supports reading and writing to many common data formats. -This allows you to easily integrate Polars into your existing data stack. - -- Text: CSV & JSON -- Binary: Parquet, Delta Lake, AVRO & Excel -- IPC: Feather, Arrow -- Databases: MySQL, Postgres, SQL Server, Sqlite, Redshift & Oracle -- Cloud Storage: S3, Azure Blob & Azure File - -```{div} -:style: "clear: both" -``` - -:::{rubric} Learn +:::{seealso} +Please navigate to the dedicated page about {ref}`polars`. ::: -- [Polars code examples] [Apache Arrow]: https://arrow.apache.org/ @@ -154,7 +97,6 @@ This allows you to easily integrate Polars into your existing data stack. [Dask DataFrames]: https://docs.dask.org/en/latest/dataframe.html [Dask Futures]: https://docs.dask.org/en/latest/futures.html [pandas]: https://pandas.pydata.org/ -[Polars]: https://pola.rs/ [Dask code examples]: https://github.com/crate/cratedb-examples/tree/main/by-dataframe/dask [Efficient batch/bulk INSERT operations with pandas, Dask, and SQLAlchemy]: https://cratedb.com/docs/python/en/latest/by-example/sqlalchemy/dataframe.html @@ -164,4 +106,3 @@ This allows you to easily integrate Polars into your existing data stack. [Import weather data using Dask]: https://github.com/crate/cratedb-examples/blob/main/topic/timeseries/dask-weather-data-import.ipynb [Importing Parquet files into CrateDB using Apache Arrow and SQLAlchemy]: https://community.cratedb.com/t/importing-parquet-files-into-cratedb-using-apache-arrow-and-sqlalchemy/1161 [pandas code examples]: https://github.com/crate/cratedb-examples/tree/main/by-dataframe/pandas -[Polars code examples]: https://github.com/crate/cratedb-examples/tree/main/by-dataframe/polars diff --git a/docs/integrate/index.md b/docs/integrate/index.md index 96152972..9da8a985 100644 --- a/docs/integrate/index.md +++ b/docs/integrate/index.md @@ -55,6 +55,7 @@ nifi/index node-red/index oracle/index plotly/index +polars/index postgresql/index Power BI prometheus/index diff --git a/docs/integrate/polars/index.md b/docs/integrate/polars/index.md new file mode 100644 index 00000000..31fef04f --- /dev/null +++ b/docs/integrate/polars/index.md @@ -0,0 +1,63 @@ +(polars)= +# Polars + +```{div} +:style: "float: right; margin-left: 0.5em" +[![Polars logo](https://github.com/pola-rs/polars-static/raw/master/logos/polars-logo-dark.svg){w=180px}][Polars] +``` +```{div} .clearfix +``` + +:::{rubric} About +::: + +[Polars] is a blazingly fast DataFrames library with language bindings for +Rust, Python, Node.js, R, and SQL. Polars is powered by a multithreaded, +vectorized query engine, it is open source, and written in Rust. + +- **Fast:** Written from scratch in Rust and with performance in mind, + designed close to the machine, and without external dependencies. + +- **I/O:** First class support for all common data storage layers: local, + cloud storage & databases. + +- **Intuitive API:** Write your queries the way they were intended. Polars, + internally, will determine the most efficient way to execute using its query + optimizer. Polars' expressions are intuitive and empower you to write + readable and performant code at the same time. + +- **Out of Core:** The streaming API allows you to process your results without + requiring all your data to be in memory at the same time. + +- **Parallel:** Polars' multi-threaded query engine utilises the power of your + machine by dividing the workload among the available CPU cores without any + additional configuration. + +- **Vectorized Query Engine:** Uses [Apache Arrow], a columnar data format, to + process your queries in a vectorized manner and SIMD to optimize CPU usage. + This enables cache-coherent algorithms and high performance on modern processors. + +- **Open Source:** Polars is and always will be open source. Driven by an active + community of developers. Everyone is encouraged to add new features and contribute. + It is free to use under the MIT license. + +:::{rubric} Data formats +::: + +Polars supports reading and writing to many common data formats. +This allows you to easily integrate Polars into your existing data stack. + +- Text: CSV & JSON +- Binary: Parquet, Delta Lake, AVRO & Excel +- IPC: Feather, Arrow +- Databases: MySQL, Postgres, SQL Server, Sqlite, Redshift & Oracle +- Cloud Storage: S3, Azure Blob & Azure File + + +:::{rubric} Learn +::: +- [Polars code examples] + + +[Polars]: https://pola.rs/ +[Polars code examples]: https://github.com/crate/cratedb-examples/tree/main/by-dataframe/polars From ba3a244b3933cd180b12d563ee7e8d123c68b0eb Mon Sep 17 00:00:00 2001 From: Andreas Motl Date: Tue, 16 Sep 2025 00:51:11 +0200 Subject: [PATCH 2/3] Polars: Implement suggestions by CodeRabbit --- docs/integrate/polars/index.md | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/docs/integrate/polars/index.md b/docs/integrate/polars/index.md index 31fef04f..b7e591cf 100644 --- a/docs/integrate/polars/index.md +++ b/docs/integrate/polars/index.md @@ -11,9 +11,9 @@ :::{rubric} About ::: -[Polars] is a blazingly fast DataFrames library with language bindings for -Rust, Python, Node.js, R, and SQL. Polars is powered by a multithreaded, -vectorized query engine, it is open source, and written in Rust. +[Polars] is a high‑performance DataFrames library with interfaces for +Rust, Python, Node.js, and R, plus a SQL context. It is powered by a +multithreaded, vectorized query engine and written in Rust. - **Fast:** Written from scratch in Rust and with performance in mind, designed close to the machine, and without external dependencies. @@ -29,7 +29,7 @@ vectorized query engine, it is open source, and written in Rust. - **Out of Core:** The streaming API allows you to process your results without requiring all your data to be in memory at the same time. -- **Parallel:** Polars' multi-threaded query engine utilises the power of your +- **Parallel:** Polars' multi-threaded query engine utilizes the power of your machine by dividing the workload among the available CPU cores without any additional configuration. @@ -46,18 +46,18 @@ vectorized query engine, it is open source, and written in Rust. Polars supports reading and writing to many common data formats. This allows you to easily integrate Polars into your existing data stack. - -- Text: CSV & JSON -- Binary: Parquet, Delta Lake, AVRO & Excel -- IPC: Feather, Arrow -- Databases: MySQL, Postgres, SQL Server, Sqlite, Redshift & Oracle -- Cloud Storage: S3, Azure Blob & Azure File +- Text: CSV, JSON +- Binary: Parquet, Delta Lake, Avro, Excel +- IPC: Feather, Arrow IPC +- Databases: MySQL, PostgreSQL, SQLite, Redshift, SQL Server, (others via ConnectorX) +- Cloud storage: Amazon S3, Azure Blob/ADLS (via fsspec‑compatible backends) :::{rubric} Learn ::: - [Polars code examples] +[Apache Arrow]: https://arrow.apache.org/ [Polars]: https://pola.rs/ [Polars code examples]: https://github.com/crate/cratedb-examples/tree/main/by-dataframe/polars From 3f5467cc8fe02f461d947722c1a5a3c5853f17fa Mon Sep 17 00:00:00 2001 From: Andreas Motl Date: Tue, 16 Sep 2025 10:43:40 +0200 Subject: [PATCH 3/3] Polars: Fix statement about ConnectorX --- docs/integrate/polars/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/integrate/polars/index.md b/docs/integrate/polars/index.md index b7e591cf..a992cd35 100644 --- a/docs/integrate/polars/index.md +++ b/docs/integrate/polars/index.md @@ -50,7 +50,7 @@ This allows you to easily integrate Polars into your existing data stack. - Text: CSV, JSON - Binary: Parquet, Delta Lake, Avro, Excel - IPC: Feather, Arrow IPC -- Databases: MySQL, PostgreSQL, SQLite, Redshift, SQL Server, (others via ConnectorX) +- Databases: MySQL, PostgreSQL, SQLite, Redshift, SQL Server, etc. (via ConnectorX) - Cloud storage: Amazon S3, Azure Blob/ADLS (via fsspec‑compatible backends) :::{rubric} Learn