Add Apache Parquet Adapter

# Problem
Currently, `flow.record` data is stored in a custom record-based format. While efficient for streaming and sequential access, it presents challenges for:

1. **Data Science & Analytics**: Integrating with modern data stacks (Pandas, Polars, DuckDB, Spark) requires converting data first, creating friction for analysts.
2. **Columnar Analysis**: Some use cases only need a subset of fields (e.g., just the `src_ip` and `dst_ip` columns). Our current row-based formats require reading the entire record, which is inefficient for these access patterns.
2. **Storage Efficiency**: While we support compression, columnar formats like Parquet often offer better compression ratios for repetitive data types.

# Proposed Solution
Implement an adapter for **Apache Parquet**, a standard open-source columnar storage format. This would allow `flow.record` to natively read and write Parquet files.

# Benefits

* **First-class Interoperability**: `flow.record` datasets could be directly queried by DuckDB or loaded into Pandas/Polars DataFrames without intermediate conversion steps.
* **Improved Performance for Analytics**: Users can read only the specific columns they need, significantly reducing I/O for wide records.
* **Ecosystem Integration**: Opens the door to using the vast ecosystem of tools that support Parquet (AWS Athena, BigQuery, Spark, etc.).
* **Efficient Storage**: Parquet's columnar compression and encoding schemes (RLE, Dictionary, etc.) are highly effective for strictly typed telemetry data.
* **Metadata**: The number of rows and other metadata is stored in the Parquet metadata. This information could be leveraged by `rdump`. (e.g.: improved progress bar).

# Implementation Details

* Add a ParquetWriter adapter utilizing pyarrow
* Add a ParquetReader adapter with column projection support, and also add support for this to `rdump`.
* Map `flow.record` types to Arrow types (handling complex types like `digest` and `path`).
* Parquet support storing custom metadata, it can be used to store and read the `RecordDescriptor`.

# Some things to take into account
* A Parquet file supports only one schema. A workable solution needs to be made to support a source of mixed RecordDescriptors. (same problem as: #190)
* pyarrow is a pretty big module (~42 mb), so Parquet support should be completely optional.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Apache Parquet Adapter #207

Problem

Proposed Solution

Benefits

Implementation Details

Some things to take into account

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add Apache Parquet Adapter #207

Description

Problem

Proposed Solution

Benefits

Implementation Details

Some things to take into account

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions