diff --git a/content/influxdb3/cloud-dedicated/reference/syntax/line-protocol.md b/content/influxdb3/cloud-dedicated/reference/syntax/line-protocol.md index 6a51fce9d9..def3d326c1 100644 --- a/content/influxdb3/cloud-dedicated/reference/syntax/line-protocol.md +++ b/content/influxdb3/cloud-dedicated/reference/syntax/line-protocol.md @@ -14,5 +14,4 @@ related: source: /shared/v3-line-protocol.md --- - + diff --git a/content/influxdb3/cloud-dedicated/write-data/best-practices/optimize-writes.md b/content/influxdb3/cloud-dedicated/write-data/best-practices/optimize-writes.md index 8f0de2ca7a..c9e0bce648 100644 --- a/content/influxdb3/cloud-dedicated/write-data/best-practices/optimize-writes.md +++ b/content/influxdb3/cloud-dedicated/write-data/best-practices/optimize-writes.md @@ -429,6 +429,10 @@ Deduplicating your data can reduce your write payload size and resource usage. > sometimes sooner—this ordering is not guaranteed if duplicate points are flushed > at the same time. As a result, the last written duplicate point may not always > be retained in storage. +> +> For recommended patterns and anti-patterns to avoid, see +> [Duplicate points](/influxdb3/cloud-dedicated/reference/syntax/line-protocol/#duplicate-points) +> in the line protocol reference. Use Telegraf and the [Dedup processor plugin](/telegraf/v1/plugins/#processor-dedup) to filter data whose field values are exact repetitions of previous values. diff --git a/content/influxdb3/cloud-dedicated/write-data/best-practices/schema-design.md b/content/influxdb3/cloud-dedicated/write-data/best-practices/schema-design.md index dbc8716159..4be75331cb 100644 --- a/content/influxdb3/cloud-dedicated/write-data/best-practices/schema-design.md +++ b/content/influxdb3/cloud-dedicated/write-data/best-practices/schema-design.md @@ -83,6 +83,10 @@ In time series data, the primary key for a row of data is typically a combinatio In InfluxDB, the primary key for a row is the combination of the point's timestamp and _tag set_ - the collection of [tag keys](/influxdb3/cloud-dedicated/reference/glossary/#tag-key) and [tag values](/influxdb3/cloud-dedicated/reference/glossary/#tag-value) on the point. A row's primary key tag set does not include tags with null values. +> [!Important] +> Overwriting points with the same primary key (timestamp and tag set) is not reliable for maintaining a last-value view. +> For recommended patterns, see [Duplicate points](/influxdb3/cloud-dedicated/reference/syntax/line-protocol/#duplicate-points) in the line protocol reference. + ### Tags versus fields When designing your schema for InfluxDB, a common question is, "what should be a diff --git a/content/influxdb3/cloud-serverless/reference/syntax/line-protocol.md b/content/influxdb3/cloud-serverless/reference/syntax/line-protocol.md index 8985dbd7b8..016da6f997 100644 --- a/content/influxdb3/cloud-serverless/reference/syntax/line-protocol.md +++ b/content/influxdb3/cloud-serverless/reference/syntax/line-protocol.md @@ -13,5 +13,4 @@ related: source: /shared/influxdb-v2/reference/syntax/line-protocol.md --- - + diff --git a/content/influxdb3/cloud-serverless/write-data/best-practices/schema-design.md b/content/influxdb3/cloud-serverless/write-data/best-practices/schema-design.md index eb4bfbea9b..7df5d024ef 100644 --- a/content/influxdb3/cloud-serverless/write-data/best-practices/schema-design.md +++ b/content/influxdb3/cloud-serverless/write-data/best-practices/schema-design.md @@ -64,6 +64,10 @@ In time series data, the primary key for a row of data is typically a combinatio In InfluxDB, the primary key for a row is the combination of the point's timestamp and _tag set_ - the collection of [tag keys](/influxdb3/cloud-serverless/reference/glossary/#tag-key) and [tag values](/influxdb3/cloud-serverless/reference/glossary/#tag-value) on the point. A row's primary key tag set does not include tags with null values. +> [!Important] +> Overwriting points with the same primary key (timestamp and tag set) is not reliable for maintaining a last-value view. +> For recommended patterns, see [Duplicate points](/influxdb3/cloud-serverless/reference/syntax/line-protocol/#duplicate-points) in the line protocol reference. + ### Tags versus fields When designing your schema for InfluxDB, a common question is, "what should be a diff --git a/content/influxdb3/clustered/reference/syntax/line-protocol.md b/content/influxdb3/clustered/reference/syntax/line-protocol.md index 87ff877078..936b399c05 100644 --- a/content/influxdb3/clustered/reference/syntax/line-protocol.md +++ b/content/influxdb3/clustered/reference/syntax/line-protocol.md @@ -14,5 +14,4 @@ related: source: /shared/v3-line-protocol.md --- - + diff --git a/content/influxdb3/clustered/write-data/best-practices/optimize-writes.md b/content/influxdb3/clustered/write-data/best-practices/optimize-writes.md index b195022386..6b1624a185 100644 --- a/content/influxdb3/clustered/write-data/best-practices/optimize-writes.md +++ b/content/influxdb3/clustered/write-data/best-practices/optimize-writes.md @@ -436,6 +436,10 @@ Deduplicating your data can reduce your write payload size and resource usage. > sometimes sooner—this ordering is not guaranteed if duplicate points are flushed > at the same time. As a result, the last written duplicate point may not always > be retained in storage. +> +> For recommended patterns and anti-patterns to avoid, see +> [Duplicate points](/influxdb3/clustered/reference/syntax/line-protocol/#duplicate-points) +> in the line protocol reference. Use Telegraf and the [Dedup processor plugin](/telegraf/v1/plugins/#processor-dedup) to filter data whose field values are exact repetitions of previous values. diff --git a/content/influxdb3/clustered/write-data/best-practices/schema-design.md b/content/influxdb3/clustered/write-data/best-practices/schema-design.md index b2ac126723..0b52cf4cfb 100644 --- a/content/influxdb3/clustered/write-data/best-practices/schema-design.md +++ b/content/influxdb3/clustered/write-data/best-practices/schema-design.md @@ -83,6 +83,10 @@ In time series data, the primary key for a row of data is typically a combinatio In InfluxDB, the primary key for a row is the combination of the point's timestamp and _tag set_ - the collection of [tag keys](/influxdb3/clustered/reference/glossary/#tag-key) and [tag values](/influxdb3/clustered/reference/glossary/#tag-value) on the point. A row's primary key tag set does not include tags with null values. +> [!Important] +> Overwriting points with the same primary key (timestamp and tag set) is not reliable for maintaining a last-value view. +> For recommended patterns, see [Duplicate points](/influxdb3/clustered/reference/syntax/line-protocol/#duplicate-points) in the line protocol reference. + ### Tags versus fields When designing your schema for InfluxDB, a common question is, "what should be a diff --git a/content/influxdb3/core/reference/line-protocol.md b/content/influxdb3/core/reference/line-protocol.md index 3c94bcc657..0ed766a802 100644 --- a/content/influxdb3/core/reference/line-protocol.md +++ b/content/influxdb3/core/reference/line-protocol.md @@ -12,10 +12,8 @@ influxdb3/core/tags: [write, line protocol, syntax] related: - /influxdb3/core/write-data/ aliases: - - /influxdb3/core/reference/syntax/line-protocol + - /influxdb3/core/reference/syntax/line-protocol/ source: /shared/v3-line-protocol.md --- - + diff --git a/content/influxdb3/enterprise/reference/line-protocol.md b/content/influxdb3/enterprise/reference/line-protocol.md index 85b5b0295f..bd7b1c2646 100644 --- a/content/influxdb3/enterprise/reference/line-protocol.md +++ b/content/influxdb3/enterprise/reference/line-protocol.md @@ -12,10 +12,8 @@ influxdb3/enterprise/tags: [write, line protocol, syntax] related: - /influxdb3/enterprise/write-data/ aliases: - - /influxdb3/enterprise/reference/syntax/line-protocol + - /influxdb3/enterprise/reference/syntax/line-protocol/ source: /shared/v3-line-protocol.md --- - + diff --git a/content/shared/v3-line-protocol.md b/content/shared/v3-line-protocol.md index 323fb8d267..54d3c335c4 100644 --- a/content/shared/v3-line-protocol.md +++ b/content/shared/v3-line-protocol.md @@ -283,14 +283,193 @@ If you submit line protocol with the same table, tag set, and timestamp, but with a different field set, the field set becomes the union of the old field set and the new field set, where any conflicts favor the new field set. -{{% show-in "cloud-dedicated,clustered" %}} -> [!Important] -> #### Write ordering for duplicate points +{{% show-in "cloud-dedicated,clustered,cloud-serverless" %}} +> [!Warning] +> #### Duplicate point overwrites are non-deterministic > -> {{% product-name %}} attempts to honor write ordering for duplicate points, -> with the most recently written point taking precedence. However, when data is -> flushed from the in-memory buffer to Parquet files—typically every 15 minutes, -> but sometimes sooner—this ordering is not guaranteed if duplicate points are -> flushed at the same time. As a result, the last written duplicate point may -> not always be retained in storage. +> Overwriting duplicate points (same table, tag set, and timestamp) is _not a reliable way to maintain a last-value view_. +> When duplicate points are flushed together, write ordering is not guaranteed—a prior write may "win." +> See [Anti-patterns to avoid](#anti-patterns-to-avoid) and [Recommended patterns](#recommended-patterns-for-last-value-tracking) below. + +### Recommended patterns for last-value tracking + +To reliably maintain a last-value view of your data, use one of these append-only patterns: + +#### Append-only with unique timestamps (recommended) + +Write each change as a new point with a unique timestamp using the actual event time. +Query for the most recent point to get the current value. + +**Line protocol example**: + +```text +device_status,device_id=sensor01 status="active",temperature=72.5 1700000000000000000 +device_status,device_id=sensor01 status="active",temperature=73.1 1700000300000000000 +device_status,device_id=sensor01 status="inactive",temperature=73.1 1700000600000000000 +``` + +**SQL query to get latest state**: + +```sql +SELECT + device_id, + status, + temperature, + time +FROM device_status +WHERE time >= now() - INTERVAL '7 days' + AND device_id = 'sensor01' +ORDER BY time DESC +LIMIT 1 +``` + +**InfluxQL query to get latest state**: + +```influxql +SELECT LAST(status), LAST(temperature) +FROM device_status +WHERE device_id = 'sensor01' + AND time >= now() - 7d +GROUP BY device_id +``` + +#### Append-only with change tracking field + +If you need to filter by "changes since a specific time," add a dedicated `last_change_timestamp` field. + +**Line protocol example**: + +```text +device_status,device_id=sensor01 status="active",temperature=72.5,last_change_timestamp=1700000000000000000i 1700000000000000000 +device_status,device_id=sensor01 status="active",temperature=73.1,last_change_timestamp=1700000300000000000i 1700000300000000000 +device_status,device_id=sensor01 status="inactive",temperature=73.1,last_change_timestamp=1700000600000000000i 1700000600000000000 +``` + +**SQL query to get changes since a specific time**: + +```sql +SELECT + device_id, + status, + temperature, + time +FROM device_status +WHERE last_change_timestamp >= 1700000000000000000 +ORDER BY time DESC +``` + +### Anti-patterns to avoid + +The following patterns will produce non-deterministic results when duplicate points are flushed together: + +#### Don't overwrite the same (time, tags) point + +If points with the same time and tag set are flushed to storage together, any of the values might be retained. +For example, **don't do this**: + +```text +-- All writes use the same timestamp +device_status,device_id=sensor01 status="active",temperature=72.5 1700000000000000000 +device_status,device_id=sensor01 status="active",temperature=73.1 1700000000000000000 +device_status,device_id=sensor01 status="inactive",temperature=73.1 1700000000000000000 + +#### Don't add a field while overwriting data (time, tags) + +Adding a field doesn't make points unique. +Points with the same time and tag set are still considered duplicates--for example, +**don't do this**: + +```text +-- All writes use the same timestamp, but add a version field +device_status,device_id=sensor01 status="active",temperature=72.5,version=1i 1700000000000000000 +device_status,device_id=sensor01 status="active",temperature=73.1,version=2i 1700000000000000000 +device_status,device_id=sensor01 status="inactive",temperature=73.1,version=3i 1700000000000000000 + +#### Don't rely on write delays to force ordering + +Delays don't guarantee that duplicate points won't be flushed together. +The flush interval depends on buffer size, ingestion rate, and system load. + +For example, **don't do this**: + +```text +-- Writing with delays between each write +device_status,device_id=sensor01 status="active" 1700000000000000000 +# Wait 10 seconds... +device_status,device_id=sensor01 status="inactive" 1700000000000000000 +{{% /show-in %}} + +{{% show-in "cloud-dedicated" %}} +### Retention guidance for last-value tables + +{{% product-name %}} applies retention at the database level. +If your last-value view only needs to retain data for days or weeks, but your main database retains data for months or years (for example, ~400 days), consider creating a separate database with shorter retention specifically for last-value tracking. + +**Benefits**: +- Reduces storage costs for last-value data +- Improves query performance by limiting data volume +- Allows independent retention policies for different use cases + +**Example**: + +```bash +# Create a database for last-value tracking with 7-day retention +influxctl database create device_status_current --retention-period 7d + +# Create your main database with longer retention +influxctl database create device_status_history --retention-period 400d +``` + +Then write current status to `device_status_current` and historical data to `device_status_history`. +{{% /show-in %}} + +{{% show-in "cloud-dedicated,clustered" %}} +### Performance considerations + +#### Row count and query performance + +Append-only patterns increase row counts compared to overwriting duplicate points. +To maintain query performance: + +1. **Limit query time ranges**: Query only the time range you need (for example, last 7 days for current state) +2. **Use time-based filters**: Always include a `WHERE time >=` clause to narrow the query scope +3. **Consider shorter retention**: For last-value views, use a dedicated database with shorter retention + +**Example - Good query with time filter**: + +```sql +SELECT device_id, status, temperature, time +FROM device_status +WHERE time >= now() - INTERVAL '7 days' +ORDER BY time DESC +``` + +**Example - Avoid querying entire table**: + +```sql +-- Don't do this - queries all historical data +SELECT device_id, status, temperature, time +FROM device_status +ORDER BY time DESC +``` + +#### Storage and cache bandwidth + +Append-only patterns create more data points, which results in larger Parquet files. +This can increase cache bandwidth usage when querying large time ranges. + +**Mitigation strategies**: +1. **Narrow time filters**: Query only same-day partitions when possible +2. **Use partition-aligned time ranges**: Queries that align with partition boundaries are more efficient +3. **Consider aggregation**: For historical analysis, use downsampled or aggregated data instead of raw points + +**Example - Partition-aligned query**: + +```sql +SELECT device_id, status, temperature, time +FROM device_status +WHERE time >= '2025-11-20T00:00:00Z' + AND time < '2025-11-21T00:00:00Z' +ORDER BY time DESC +``` {{% /show-in %}}