pipeline: filters: lookup added new filter

olegmukhin · olegmukhin · commit e0d77618dd50 · 2025-11-24T17:25:10.000Z
Updated inputs based on code changes.
Added metrics section.
Made key considerations clearer with a separate section.

Signed-off-by: Oleg Mukhin &lt;oleg.v.mukhin@gmail.com&gt;
diff --git a/pipeline/filters/lookup.md b/pipeline/filters/lookup.md
@@ -4,14 +4,15 @@ The Lookup plugin looks up a key value from a record in a specified CSV file and
 
 ## Configuration parameters
 
-The plugin supports the following configuration parameters
+The plugin supports the following configuration parameters:
 
 | Key | Description | Default |
 | :-- | :---------- | :------ |
-| `file` | The CSV file that Fluent Bit will use as a lookup table. The file should contain two columns (key and value), with the first row as an optional header that is skipped. Supports quoted fields and escaped quotes. | _none_ |
-| `lookup_key` | The specific key in the input record to look up in the CSV file's first column. Supports [record accessor](../../administration/configuring-fluent-bit/record-accessor). | _none_ |
-| `result_key` | The name of the key to add to the output record with the matched value from the CSV file's second column if a match is found. | _none_ |
-| `ignore_case` | Ignore case when matching the lookup key against the CSV keys. | `false` |
+| `data_source` | Path to the CSV file that the Lookup filter will use as a lookup table. This file must contain one column of keys and one column of values. See [Key Considerations](#key-considerations) for details. | _none_ (required) |
+| `lookup_key` | Specifies the record key whose value to search for in the CSV file's first column. Supports [record accessor](../administration/configuring-fluent-bit/classic-mode/record-accessor) syntax for nested fields and array indexing (e.g., `$user['profile']['id']`, `$users[0]['id']`). | _none_ (required) |
+| `result_key` | If a CSV entry whose value matches the value of `lookup_key` is found, specifies the name of the new key to add to the output record. This new key uses the corresponding value from the second column of the CSV file in the same row where `lookup_key` was found. If this key already exists in the record, it will be overwritten. | _none_ (required) |
+| `ignore_case` | Specifies whether to ignore case when searching for `lookup_key`. If `true`, searches are case-insensitive. If `false`, searches are case-sensitive. Case normalization applies to both the lookup key from the record and the keys in the CSV file. | `false` |
+| `skip_header_row` | If `true`, the filter skips the first row of the CSV file, treating it as a header. If `false`, the first row is processed as data. | `false` |
 
 ## Example configuration
 
@@ -34,10 +35,11 @@ pipeline:
   filters:
     - name: lookup
       match: test
-      file: device-bu.csv
+      data_source: device-bu.csv
       lookup_key: $hostname
       result_key: business_line
       ignore_case: true
+      skip_header_row: true
 
   outputs:
     - name: stdout
@@ -60,12 +62,13 @@ pipeline:
     Parser            json
 
 [FILTER]
-    Name           lookup
-    Match          test
-    File           device-bu.csv
-    Lookup_key     $hostname
-    Result_key     business_line
-    Ignore_case    On
+    Name              lookup
+    Match             test
+    data_source       device-bu.csv
+    Lookup_key        $hostname
+    Result_key        business_line
+    Ignore_case       On
+    Skip_header_row   On
 
 [OUTPUT]
     Name   stdout
@@ -75,7 +78,7 @@ pipeline:
 {% endtab %}
 {% endtabs %}
 
-The following configuration reads log records from `devices.log` that includes the following values for device hostnames:
+The previous configuration reads log records from `devices.log` that includes the following values in the `hostname` field:
 
 ```text
 {"hostname": "server-prod-001"}
@@ -92,7 +95,7 @@ The following configuration reads log records from `devices.log` that includes t
 {"hostname": " "}
 ```
 
-It uses the value of the `hostname` field (which has been set as the `lookup_key`) to find matching values in column 1 of the  (`device-bu.csv`) CSV file.
+Because `hostname` was set as the `lookup_key`, the Lookup filter uses the value of each `hostname` key within the record to search for matching values in the first column of the CSV file.
 
 ```text
 hostname,business_line
@@ -107,9 +110,9 @@ app-backend-123,Operations
 no-match-host,Should Not Appear
 ```
 
-Where a match is found the filter adds new key (name of which is set by the `result_key` input) with the value from the second column of the CSV file of the matched row.
+When the filter finds a match, it adds a new key with the name specified by `result_key` and a value from the second column of the CSV file of the row where `lookup_key` was found.
 
-For above configuration the following output can be expected (when matching case is ignored as `ignore_case` is set to true):
+For the above configuration the following output can be expected (when matching case is ignored as `ignore_case` is set to true):
 
 ```text
 {"hostname"=>"server-prod-001", "business_line"=>"Finance"}
@@ -125,10 +128,25 @@ For above configuration the following output can be expected (when matching case
 {"hostname"=>{"sub"=>"val"}}
 ```
 
-## CSV import
+## Metrics
 
-The CSV is used to create an in-memory key value lookup table. Column 1 of the CSV is always used as key, while column 2 is assumed to be the value. All other columns in the CSV are ignored.
+When metrics support is enabled, the Lookup filter exposes the following counters to help monitor filter performance and effectiveness:
 
-This filter is intended for static datasets. CSV is loaded once when Fluent Bit starts and is not reloaded.
+| Metric Name | Description |
+| :---------- | :---------- |
+| `fluentbit_filter_lookup_processed_records_total` | Total number of records processed by the filter |
+| `fluentbit_filter_lookup_matched_records_total` | Total number of records where a lookup match was found and the result key was added |
+| `fluentbit_filter_lookup_skipped_records_total` | Total number of records skipped due to encoding errors or other processing failures |
 
-Multiline values in CSV file are not currently supported.
+Each metric includes a `name` label to identify the filter instance.
+
+## Key considerations
+
+- The CSV is used to create an in-memory key value lookup table. Column 1 of the CSV is always used as key, while column 2 is assumed to be the value. All other columns in the CSV are ignored.
+- CSV fields can be enclosed in double quotes (`"`). Lines with unmatched quotes are logged as warnings and skipped.
+- Multiline values in CSV file are not currently supported.
+- Duplicate keys (values in first column) in the CSV will use the last occurrence (hash table behavior)
+- Leading and trailing whitespace is automatically trimmed from both keys and values.
+- The `lookup_key` can be of various types: strings are used directly, integers and floats are converted to their string representation, booleans become "true" or "false", and null becomes "null". Records with array or object values for the lookup key are passed through unchanged.
+- Records without the `lookup_key` field or with no matching CSV entry are passed through unchanged.
+- This filter is currently intended for static datasets. CSV is loaded once when Fluent Bit starts and is not reloaded.