From 8901ff7325e7b84ab1dcc7fe1f5f858fc2633c0f Mon Sep 17 00:00:00 2001 From: Hiroshi Hatake Date: Fri, 24 Oct 2025 14:34:44 +0900 Subject: [PATCH 1/5] out_s3: Add an instruction for enabling parquet compression Signed-off-by: Hiroshi Hatake --- pipeline/outputs/s3.md | 54 +++++++++++++++++++++++++++++++++++++++++- 1 file changed, 53 insertions(+), 1 deletion(-) diff --git a/pipeline/outputs/s3.md b/pipeline/outputs/s3.md index 493c0b161..e951b5776 100644 --- a/pipeline/outputs/s3.md +++ b/pipeline/outputs/s3.md @@ -45,7 +45,8 @@ The [Prometheus success/retry/error metrics values](../../administration/monitor | `sts_endpoint` | Custom endpoint for the STS API. | _none_ | | `profile` | Option to specify an AWS Profile for credentials. | `default` | | `canned_acl` | [Predefined Canned ACL policy](https://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html#canned-acl) for S3 objects. | _none_ | -| `compression` | Compression type for S3 objects. `gzip` is currently the only supported value by default. If Apache Arrow support was enabled at compile time, you can use `arrow`. For gzip compression, the Content-Encoding HTTP Header will be set to `gzip`. Gzip compression can be enabled when `use_put_object` is `on` or `off` (`PutObject` and Multipart). Arrow compression can only be enabled with `use_put_object On`. | _none_ | +| `compression` | Compression/format for S3 objects. Supported: `gzip` (always available) and `parquet` (requires Arrow build). For `gzip`, the `Content-Encoding` header is set to `gzip`. `parquet` is available **only when Fluent Bit is built with `-DFLB_ARROW=On`** and Arrow GLib/Parquet GLib are installed. Parquet is typically used with `use_put_object On`. | *none* | + | `content_type` | A standard MIME type for the S3 object, set as the Content-Type HTTP header. | _none_ | | `send_content_md5` | Send the Content-MD5 header with `PutObject` and UploadPart requests, as is required when Object Lock is enabled. | `false` | | `auto_retry_requests` | Immediately retry failed requests to AWS services once. This option doesn't affect the normal Fluent Bit retry mechanism with backoff. Instead, it enables an immediate retry with no delay for networking errors, which can help improve throughput during transient network issues. | `true` | @@ -649,3 +650,54 @@ The following example uses `pyarrow` to analyze the uploaded data: 3 2021-04-27T09:33:56.539430Z 0.0 0.0 0.0 0.0 0.0 0.0 4 2021-04-27T09:33:57.539803Z 0.0 0.0 0.0 0.0 0.0 0.0 ``` + +## Enable Parquet support + +### Build requirements for Parquet + +To enable Parquet, build Fluent Bit with Apache Arrow support and install Arrow GLib/Parquet GLib: + +```bash +# Ubuntu/Debian example +sudo apt-get update +sudo apt-get install -y -V ca-certificates lsb-release wget +wget https://packages.apache.org/artifactory/arrow/$(lsb_release --id --short | tr 'A-Z' 'a-z')/apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb +sudo apt-get install -y -V ./apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb +sudo apt-get update +sudo apt-get install -y -V libarrow-glib-dev libparquet-glib-dev + +# Build Fluent Bit with Arrow: +cd build/ +cmake -DFLB_ARROW=On .. +cmake --build . +``` + +### Testing Parquet compression + +```md +## Testing (Parquet) + +Example configuration: + +```yaml +service: + flush: 5 + daemon: Off + log_level: debug + http_server: Off + +pipeline: + inputs: + - name: dummy + tag: dummy.local + dummy {"boolean": false, "int": 1, "long": 1, "float": 1.1, "double": 1.1, "bytes": "foo", "string": "foo"} + + outputs: + - name: s3 + match: dummy* + region: us-east-2 + bucket: + use_put_object: On + compression: parquet + # other parameters +``` From a99a49c7a83af037c3c233cd5e5efaab1784700c Mon Sep 17 00:00:00 2001 From: Hiroshi Hatake Date: Fri, 24 Oct 2025 14:35:13 +0900 Subject: [PATCH 2/5] out_s3: Fix a welcome link of S3 Signed-off-by: Hiroshi Hatake --- pipeline/outputs/s3.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pipeline/outputs/s3.md b/pipeline/outputs/s3.md index e951b5776..a0e2ff849 100644 --- a/pipeline/outputs/s3.md +++ b/pipeline/outputs/s3.md @@ -6,7 +6,7 @@ description: Send logs, data, and metrics to Amazon S3 ![AWS logo](<../../.gitbook/assets/image (9).png>) -The _Amazon S3_ output plugin lets you ingest records into the [S3](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/WhatIsCloudWatchLogs.html) cloud object store. +The _Amazon S3_ output plugin lets you ingest records into the [S3](https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html) cloud object store. The plugin can upload data to S3 using the [multipart upload API](https://docs.aws.amazon.com/AmazonS3/latest/dev/uploadobjusingmpu.html) or [`PutObject`](https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutObject.html). Multipart is the default and is recommended. Fluent Bit will stream data in a series of _parts_. This limits the amount of data buffered on disk at any point in time. By default, every time 5 MiB of data have been received, a new part will be uploaded. The plugin can create files up to gigabytes in size from many small chunks or parts using the multipart API. All aspects of the upload process are configurable. From 394970a1bcb07d2b0e47ca97ce4ec28711170c68 Mon Sep 17 00:00:00 2001 From: Hiroshi Hatake Date: Fri, 24 Oct 2025 14:38:50 +0900 Subject: [PATCH 3/5] out_s3: Suppress warnings from markdownlint Signed-off-by: Hiroshi Hatake --- pipeline/outputs/s3.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pipeline/outputs/s3.md b/pipeline/outputs/s3.md index a0e2ff849..65674eb86 100644 --- a/pipeline/outputs/s3.md +++ b/pipeline/outputs/s3.md @@ -45,7 +45,7 @@ The [Prometheus success/retry/error metrics values](../../administration/monitor | `sts_endpoint` | Custom endpoint for the STS API. | _none_ | | `profile` | Option to specify an AWS Profile for credentials. | `default` | | `canned_acl` | [Predefined Canned ACL policy](https://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html#canned-acl) for S3 objects. | _none_ | -| `compression` | Compression/format for S3 objects. Supported: `gzip` (always available) and `parquet` (requires Arrow build). For `gzip`, the `Content-Encoding` header is set to `gzip`. `parquet` is available **only when Fluent Bit is built with `-DFLB_ARROW=On`** and Arrow GLib/Parquet GLib are installed. Parquet is typically used with `use_put_object On`. | *none* | +| `compression` | Compression/format for S3 objects. Supported: `gzip` (always available) and `parquet` (requires Arrow build). For `gzip`, the `Content-Encoding` header is set to `gzip`. `parquet` is available _only when Fluent Bit is built with `-DFLB_ARROW=On`_ and Arrow GLib/Parquet GLib are installed. Parquet is typically used with `use_put_object On`. | _none_ | | `content_type` | A standard MIME type for the S3 object, set as the Content-Type HTTP header. | _none_ | | `send_content_md5` | Send the Content-MD5 header with `PutObject` and UploadPart requests, as is required when Object Lock is enabled. | `false` | From b6590552e5df217fb7f8fbb2b97bc142d5e698d9 Mon Sep 17 00:00:00 2001 From: Hiroshi Hatake Date: Fri, 24 Oct 2025 14:44:28 +0900 Subject: [PATCH 4/5] out_s3: Add a note for other distributions like CentOS or AmazonLinux 2023. Signed-off-by: Hiroshi Hatake --- pipeline/outputs/s3.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/pipeline/outputs/s3.md b/pipeline/outputs/s3.md index 65674eb86..4352d3487 100644 --- a/pipeline/outputs/s3.md +++ b/pipeline/outputs/s3.md @@ -672,6 +672,9 @@ cmake -DFLB_ARROW=On .. cmake --build . ``` +For other Linux distributions, refer [the document for installation instructions of Apache Parquet](https://arrow.apache.org/install/). +Apache Parquet GLib is a part of Apache Arrow project. + ### Testing Parquet compression ```md From 3b4b0d403564772681bacbb007a6fe742ff26b35 Mon Sep 17 00:00:00 2001 From: Hiroshi Hatake Date: Tue, 28 Oct 2025 12:58:57 +0900 Subject: [PATCH 5/5] Apply suggestion from @esmerel Co-authored-by: Lynette Miles <6818907+esmerel@users.noreply.github.com> Signed-off-by: Hiroshi Hatake --- pipeline/outputs/s3.md | 1 - 1 file changed, 1 deletion(-) diff --git a/pipeline/outputs/s3.md b/pipeline/outputs/s3.md index 4352d3487..d73b45330 100644 --- a/pipeline/outputs/s3.md +++ b/pipeline/outputs/s3.md @@ -677,7 +677,6 @@ Apache Parquet GLib is a part of Apache Arrow project. ### Testing Parquet compression -```md ## Testing (Parquet) Example configuration: