Skip to content

Commit e0d7761

Browse files
committed
pipeline: filters: lookup added new filter
Updated inputs based on code changes. Added metrics section. Made key considerations clearer with a separate section. Signed-off-by: Oleg Mukhin <oleg.v.mukhin@gmail.com>
1 parent 21178de commit e0d7761

File tree

1 file changed

+38
-20
lines changed

1 file changed

+38
-20
lines changed

pipeline/filters/lookup.md

Lines changed: 38 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -4,14 +4,15 @@ The Lookup plugin looks up a key value from a record in a specified CSV file and
44

55
## Configuration parameters
66

7-
The plugin supports the following configuration parameters
7+
The plugin supports the following configuration parameters:
88

99
| Key | Description | Default |
1010
| :-- | :---------- | :------ |
11-
| `file` | The CSV file that Fluent Bit will use as a lookup table. The file should contain two columns (key and value), with the first row as an optional header that is skipped. Supports quoted fields and escaped quotes. | _none_ |
12-
| `lookup_key` | The specific key in the input record to look up in the CSV file's first column. Supports [record accessor](../../administration/configuring-fluent-bit/record-accessor). | _none_ |
13-
| `result_key` | The name of the key to add to the output record with the matched value from the CSV file's second column if a match is found. | _none_ |
14-
| `ignore_case` | Ignore case when matching the lookup key against the CSV keys. | `false` |
11+
| `data_source` | Path to the CSV file that the Lookup filter will use as a lookup table. This file must contain one column of keys and one column of values. See [Key Considerations](#key-considerations) for details. | _none_ (required) |
12+
| `lookup_key` | Specifies the record key whose value to search for in the CSV file's first column. Supports [record accessor](../administration/configuring-fluent-bit/classic-mode/record-accessor) syntax for nested fields and array indexing (e.g., `$user['profile']['id']`, `$users[0]['id']`). | _none_ (required) |
13+
| `result_key` | If a CSV entry whose value matches the value of `lookup_key` is found, specifies the name of the new key to add to the output record. This new key uses the corresponding value from the second column of the CSV file in the same row where `lookup_key` was found. If this key already exists in the record, it will be overwritten. | _none_ (required) |
14+
| `ignore_case` | Specifies whether to ignore case when searching for `lookup_key`. If `true`, searches are case-insensitive. If `false`, searches are case-sensitive. Case normalization applies to both the lookup key from the record and the keys in the CSV file. | `false` |
15+
| `skip_header_row` | If `true`, the filter skips the first row of the CSV file, treating it as a header. If `false`, the first row is processed as data. | `false` |
1516

1617
## Example configuration
1718

@@ -34,10 +35,11 @@ pipeline:
3435
filters:
3536
- name: lookup
3637
match: test
37-
file: device-bu.csv
38+
data_source: device-bu.csv
3839
lookup_key: $hostname
3940
result_key: business_line
4041
ignore_case: true
42+
skip_header_row: true
4143

4244
outputs:
4345
- name: stdout
@@ -60,12 +62,13 @@ pipeline:
6062
Parser json
6163

6264
[FILTER]
63-
Name lookup
64-
Match test
65-
File device-bu.csv
66-
Lookup_key $hostname
67-
Result_key business_line
68-
Ignore_case On
65+
Name lookup
66+
Match test
67+
data_source device-bu.csv
68+
Lookup_key $hostname
69+
Result_key business_line
70+
Ignore_case On
71+
Skip_header_row On
6972

7073
[OUTPUT]
7174
Name stdout
@@ -75,7 +78,7 @@ pipeline:
7578
{% endtab %}
7679
{% endtabs %}
7780

78-
The following configuration reads log records from `devices.log` that includes the following values for device hostnames:
81+
The previous configuration reads log records from `devices.log` that includes the following values in the `hostname` field:
7982

8083
```text
8184
{"hostname": "server-prod-001"}
@@ -92,7 +95,7 @@ The following configuration reads log records from `devices.log` that includes t
9295
{"hostname": " "}
9396
```
9497

95-
It uses the value of the `hostname` field (which has been set as the `lookup_key`) to find matching values in column 1 of the (`device-bu.csv`) CSV file.
98+
Because `hostname` was set as the `lookup_key`, the Lookup filter uses the value of each `hostname` key within the record to search for matching values in the first column of the CSV file.
9699

97100
```text
98101
hostname,business_line
@@ -107,9 +110,9 @@ app-backend-123,Operations
107110
no-match-host,Should Not Appear
108111
```
109112

110-
Where a match is found the filter adds new key (name of which is set by the `result_key` input) with the value from the second column of the CSV file of the matched row.
113+
When the filter finds a match, it adds a new key with the name specified by `result_key` and a value from the second column of the CSV file of the row where `lookup_key` was found.
111114

112-
For above configuration the following output can be expected (when matching case is ignored as `ignore_case` is set to true):
115+
For the above configuration the following output can be expected (when matching case is ignored as `ignore_case` is set to true):
113116

114117
```text
115118
{"hostname"=>"server-prod-001", "business_line"=>"Finance"}
@@ -125,10 +128,25 @@ For above configuration the following output can be expected (when matching case
125128
{"hostname"=>{"sub"=>"val"}}
126129
```
127130

128-
## CSV import
131+
## Metrics
129132

130-
The CSV is used to create an in-memory key value lookup table. Column 1 of the CSV is always used as key, while column 2 is assumed to be the value. All other columns in the CSV are ignored.
133+
When metrics support is enabled, the Lookup filter exposes the following counters to help monitor filter performance and effectiveness:
131134

132-
This filter is intended for static datasets. CSV is loaded once when Fluent Bit starts and is not reloaded.
135+
| Metric Name | Description |
136+
| :---------- | :---------- |
137+
| `fluentbit_filter_lookup_processed_records_total` | Total number of records processed by the filter |
138+
| `fluentbit_filter_lookup_matched_records_total` | Total number of records where a lookup match was found and the result key was added |
139+
| `fluentbit_filter_lookup_skipped_records_total` | Total number of records skipped due to encoding errors or other processing failures |
133140

134-
Multiline values in CSV file are not currently supported.
141+
Each metric includes a `name` label to identify the filter instance.
142+
143+
## Key considerations
144+
145+
- The CSV is used to create an in-memory key value lookup table. Column 1 of the CSV is always used as key, while column 2 is assumed to be the value. All other columns in the CSV are ignored.
146+
- CSV fields can be enclosed in double quotes (`"`). Lines with unmatched quotes are logged as warnings and skipped.
147+
- Multiline values in CSV file are not currently supported.
148+
- Duplicate keys (values in first column) in the CSV will use the last occurrence (hash table behavior)
149+
- Leading and trailing whitespace is automatically trimmed from both keys and values.
150+
- The `lookup_key` can be of various types: strings are used directly, integers and floats are converted to their string representation, booleans become "true" or "false", and null becomes "null". Records with array or object values for the lookup key are passed through unchanged.
151+
- Records without the `lookup_key` field or with no matching CSV entry are passed through unchanged.
152+
- This filter is currently intended for static datasets. CSV is loaded once when Fluent Bit starts and is not reloaded.

0 commit comments

Comments
 (0)