From be65e1faba55705bb47c402b92130d583a2be7b7 Mon Sep 17 00:00:00 2001 From: bwplotka Date: Thu, 2 May 2024 12:56:56 +0100 Subject: [PATCH 01/31] Added Remote-Write Specification 2.0. See proposal https://github.com/prometheus/proposals/pull/35 which explain rationales. Signed-off-by: bwplotka --- .../docs/concepts/remote_write_spec_2_0.md | 394 ++++++++++++++++++ 1 file changed, 394 insertions(+) create mode 100644 content/docs/concepts/remote_write_spec_2_0.md diff --git a/content/docs/concepts/remote_write_spec_2_0.md b/content/docs/concepts/remote_write_spec_2_0.md new file mode 100644 index 000000000..24539b28d --- /dev/null +++ b/content/docs/concepts/remote_write_spec_2_0.md @@ -0,0 +1,394 @@ +--- +title: Prometheus Remote-Write Specification 2.0 +sort_rank: 4 +--- + +# Prometheus Remote-Write Specification + +* Version: 2.0 +* Status: Proposed +* Date: May 2024 + +The remote write specification, in general, is intended to document the standard for how Prometheus and Prometheus remote-write-compatible agents send data to a Prometheus or Prometheus remote-write compatible receivers. + +This document is intended to define a second version of the [Prometheus Remote Write](./remote_write_spec.md) API with minor changes to protocol and semantics. This second version also adds a new wire format with new features enabling more use cases and wider adoption on top of performance and cost savings. Finally, this spec outlines how to implement backward compatible senders and receivers (even under a single endpoint) using existing basic content negotiation request headers. More advanced, automatic content negotiation mechanisms might come in future versions, if needed. For the rationales behind the 2.0 specification, see [the formal proposal](https://github.com/prometheus/proposals/pull/35). + +The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC 2119](https://datatracker.ietf.org/doc/html/rfc2119). + +## Introduction + +### Background + +The remote write protocol is designed to make it possible to reliably propagate samples in real-time from a sender to a receiver, without loss. + +The remote write protocol is designed to make stateless implementations of the server possible; as such there are little-to-no inter-message references. As such the protocol is not considered "streaming." To achieve a streaming effect multiple messages should be sent over the same connection using e.g. HTTP/1.1 or HTTP/2. "Fancy" technologies such as gRPC were considered, but at the time were not widely adopted, and it was challenging to expose gRPC services to the internet behind load balancers such as an AWS EC2 ELB. + +The remote write protocol contains opportunities for batching, e.g. sending multiple samples for different series in a single request. It is not expected that multiple samples for the same series will be commonly sent in the same request, although there is support for this in the protocol. + + +The remote write protocol is not intended for use by applications to push metrics to Prometheus remote-write-compatible Receiver. It is intended that a Prometheus remote-write-compatible sender scrapes instrumented applications or exporters and sends remote write messages to a server. + + +A test suite can be found at https://github.com/prometheus/compliance/tree/main/remote_write_sender. + +### Glossary + +For the purposes of this document the following definitions MUST be followed: + +* a "Sender" is something that sends Prometheus Remote Write data. +* a "Receiver" is something that receives Prometheus Remote Write data. +* a "Sample" is a pair of (timestamp, value). +* a "Histogram" is a pair of (timestamp, histogram value). +* a "Label" is a pair of (key, value). +* a "Series" is a list of samples, identified by a unique set of labels. + +## Definitions + +### Protocol + +The Remote Write Protocol MUST consist of RPCs with the request body encoded using a Google Protobuf 3 message. The protobuf encoding MUST use either of the following schemas: + + +* [`prometheus.WriteRequest`](./remote_write_spec.md#protocol) introduced in the Remote Write 1.0 specification. As of 2.0 the `prometheus.WriteRequest` message is deprecated. +* `io.prometheus.write.v2.Request` introduced in this specification and defined [below](#ioprometheuswritev2request-proto-schema). Senders SHOULD use `io.prometheus.write.v2.Request` when possible. + +Sender MUST send encoded and compresses proto message in the body of an HTTP POST request and send it to the Receiver via HTTP at a provided URL path. The Receiver MAY specify any HTTP URL path to receive metrics. + +Sender MUST send the following "reserved" headers with the HTTP request: + +* `Content-Encoding: ` + + Content encoding request header MUST follow [the RFC 9110](https://www.rfc-editor.org/rfc/rfc9110.html#name-content-encoding). Sender MUST use the `snappy` value. More compression algorithms might come in 2.x or beyond. + +* `Content-Type: application/x-protobuf` or `Content-Type: application/x-protobuf;proto=` + + + Content type request header MUST follow [the RFC 9110](https://www.rfc-editor.org/rfc/rfc9110.html#name-content-type). Sender MUST use `application/x-protobuf` as the only media type. Sender MAY add `;proto=` parameter to the header's value to indicate the fully qualified name of the protobuf message (schema) that was used, from the two mentioned above. As a result, Sender MUST send any of the three supported header values: + + For the message introduced in PRW 1.0, identified by `prometheus.WriteRequest`: + * `Content-Type: application/x-protobuf` + * `Content-Type: application/x-protobuf;proto=prometheus.WriteRequest` + For the message introduced in PRW 2.0, identified by `io.prometheus.write.v2.Request`: + * `Content-Type: application/x-protobuf;proto=io.prometheus.write.v2.Request` + + Sender SHOULD use `Content-Type: application/x-protobuf`, for backward compatibility, when talking to 1.x Receiver. Sender SHOULD use `Content-Type: application/x-protobuf;proto=io.prometheus.write.v2.Request` when talking to Receiver supporting 2.x. More proto messages might come in 2.x or beyond. + +* `User-Agent: ` +* `X-Prometheus-Remote-Write-Version: ` + + Sender SHOULD use `X-Prometheus-Remote-Write-Version: 0.1.0` for backward compatibility, when using 1.0 proto message. + +Sender MAY allow users to send custom HTTP headers; they MUST NOT allow users to configure them in such a way as to send reserved headers. + + +The remote write request in the body of the HTTP POST MUST be compressed with [Google’s Snappy](https://github.com/google/snappy). The block format MUST be used -- the framed format MUST NOT be used. The remote write request MUST be encoded using Google Protobuf 3, and MUST use either of the schemas defined above. + +#### `io.prometheus.write.v2.Request` Proto Schema + + +The source of truth is [here](https://github.com/prometheus/prometheus/blob/remote-write-2.0/prompb/io/prometheus/write/v2/types.proto#L32). The `gogo` dependency and options CAN be ignored. They are not part of the specification as they don't impact the serialized format. + +The simplified version of the new `io.prometheus.write.v2.Request` is presented below. + +``` +// Request represents a request to write the given timeseries to a remote destination. +message Request { + // symbols contains a de-duplicated array of string elements used for various + // items in a Request message, like labels and metadata items. To decode + // each of those items, referenced, by "ref(s)" suffix, you need to lookup the + // actual string by index from symbols array. The order of strings is up to + // the client, server should not assume any particular encoding. + repeated string symbols = 1; + // timeseries represents an array of distinct series with 0 or more samples. + repeated TimeSeries timeseries = 2; +} + +// TimeSeries represents a single series. +message TimeSeries { + // labels_refs is a list of label name-value pair references, encoded + // as indices to the Request.symbols array. This list's length is always + // a multiple of two, and the underlying labels should be sorted. + // + // Note that there might be multiple TimeSeries objects in the same + // Requests with the same labels e.g. for different exemplars, metadata + // or created timestamp. + repeated uint32 labels_refs = 1; + + // Timeseries messages can either specify samples or (native) histogram samples + // (histogram field), but not both. For typical clients (real-time metric + // streaming), in healthy cases, there will be only one sample or histogram. + // + // Samples and histograms are sorted by timestamp (older first). + repeated Sample samples = 2; + repeated Histogram histograms = 3; + + // exemplars represents an optional set of exemplars attached to this series' samples. + repeated Exemplar exemplars = 4; + + // metadata represents the metadata associated with the given series' samples. + Metadata metadata = 5; + + // created_timestamp represents an optional created timestamp associated with + // this series' samples in ms format, typically for counter or histogram type + // metrics. Note that some servers might require this and in return fail to + // ingest such samples within the Request. + // + // For Go, see github.com/prometheus/prometheus/model/timestamp/timestamp.go + // for conversion from/to time.Time to Prometheus timestamp. + // + // Note that the "optional" keyword is omitted due to + // https://cloud.google.com/apis/design/design_patterns.md#optional_primitive_fields + // Zero value means value not set. If you need to use exactly zero value for + // the timestamp, use 1 millisecond before or after. + int64 created_timestamp = 6; +} + +// Exemplar is an additional information attached to some series' samples. +message Exemplar { + // labels_refs is a list of label name-value pair references, encoded + // as indices to the Request.symbols array. This list's len is always + // a multiple of 2, and the underlying labels should be sorted. + repeated uint32 labels_refs = 1; + double value = 2; + int64 timestamp = 3; +} + +// Sample represents series sample. +message Sample { + // value of the sample. + double value = 1; + // timestamp represents timestamp of the sample in ms. + // For Go, see github.com/prometheus/prometheus/model/timestamp/timestamp.go + // for conversion from/to time.Time to Prometheus timestamp. + int64 timestamp = 2; +} + +// Metadata represents the metadata associated with the given series' samples. +message Metadata { + enum MetricType { + METRIC_TYPE_UNSPECIFIED = 0; + METRIC_TYPE_COUNTER = 1; + METRIC_TYPE_GAUGE = 2; + METRIC_TYPE_HISTOGRAM = 3; + METRIC_TYPE_GAUGEHISTOGRAM = 4; + METRIC_TYPE_SUMMARY = 5; + METRIC_TYPE_INFO = 6; + METRIC_TYPE_STATESET = 7; + } + MetricType type = 1; + // help_ref is a reference to the Request.symbols array representing help + // text for the metric. + uint32 help_ref = 3; + // unit_ref is a reference to the Request.symbols array representing unit + // for the metric. + uint32 unit_ref = 4; +} + +// A native histogram, also known as a sparse histogram. +message Histogram { ... } + +// A BucketSpan defines a number of consecutive buckets with their +// offset. +message BucketSpan { ... } +``` + +All timestamps MUST be int64 counted as milliseconds since the Unix epoch. Sample's values MUST be float64. + +For every `TimeSeries` message: + +* Label references MUST be provided. +* At least one element in Samples or in Histograms MUST be provided. For series which (rarely) would mix float and histogram samples, a separate `TimeSeries` message MUST be used. + +* Metadata MUST be provided. +* Exemplars SHOULD be provided, if they exist. +* Created timestamp SHOULD be provided for metrics that follow counter semantics (e.g. counters and histograms). + +The following subsections define some schema elements in details. + +#### Symbols + +The `io.prometheus.write.v2.Request` proto schema is designed to [intern all strings](https://en.wikipedia.org/wiki/String_interning) for the proven additional compression and memory efficiency gains on top of the standard compressions. + +Symbols table containing deduplicated strings used in series and exemplar labels, metadata strings MUST be provided. References MUST point to the existing index in the Symbols string array. + +#### Series Labels + +The complete set of labels MUST be sent with each Sample or Histogram sample. Additionally, the label set associated with samples: + +- SHOULD contain a `__name__` label. +- MUST NOT contain repeated label names. +- MUST have label names sorted in lexicographical order. +- MUST NOT contain any empty label names or values. + +Sender MUST only send valid metric names, label names, and label values: + + +- Metric names MUST adhere to the regex `[a-zA-Z_:]([a-zA-Z0-9_:])*`. +- Label names MUST adhere to the regex `[a-zA-Z_]([a-zA-Z0-9_])*`. +- Label values MAY be any sequence of UTF-8 characters . + +Receiver MAY impose limits on the number and length of labels, but this will be receiver-specific and is out of scope for this document. + +Label names beginning with "__" are RESERVED for system usage and SHOULD NOT be used, see [Prometheus Data Model](https://prometheus.io/docs/concepts/data_model/). + +#### Samples + + +Sender MUST send samples for any given TimeSeries in timestamp order. Sender MAY send multiple requests for different series in parallel. + +Sender MUST send stale markers when a time series will no longer be appended to. + +Stale markers MUST be signalled by the special NaN value `0x7ff0000000000002`. This value MUST NOT be used otherwise. + +Typically, Sender can detect when a time series will no longer be appended to using the following techniques: + +1. Detecting, using service discovery, that the target exposing the series has gone away +1. Noticing the target is no longer exposing the time series between successive scrapes +1. Failing to scrape the target that originally exposed a time series +1. Tracking configuration and evaluation for recording and alerting rules + +#### Metadata + +Metadata SHOULD follow the official guidelines for [TYPE](https://prometheus.io/docs/instrumenting/writing_exporters/#types) and [HELP](https://prometheus.io/docs/instrumenting/writing_exporters/#help-strings). + +#### Exemplars + + +TBD + +#### Created Timestamp + + +TBD + +### Responses + +Receiver ingesting all samples successfully MUST return HTTP 200 status code. In such a successful case, the response body from the Receiver SHOULD be empty; Sender MUST ignore the response body. The response body is RESERVED for future use. + +The following subsections specify Sender and Receiver semantics around write errors. + +#### Partial Write + +Sender SHOULD use Prometheus Remote Write to request write of multiple samples, across multiple series. As a result, Receiver MAY ingest valid samples within a write request that contains invalid or otherwise unwritten samples, which represents a partial write case. + +In a partial write case, Receiver MUST NOT return HTTP 200 status code. Receiver MUST provide a human-readable error message in the response body. Sender MUST NOT try and interpret the error message, and SHOULD log it as is. + +#### Unsupported Request Content + +Receiver MAY NOT support certain content types or encodings defined in [the Protocol section](#protocol). Receiver MUST return [415 HTTP Unsupported Media Type](https://www.rfc-editor.org/rfc/rfc9110.html#name-415-unsupported-media-type) status code if they don't support a given content type or encoding provided by the Sender. + +Sender SHOULD expect [400 HTTP Bad Request](https://www.rfc-editor.org/rfc/rfc9110.html#name-400-bad-request) for the above reasons from the 1.x Receiver, for backward compatibility. + + +Sender MAY retry write requests on 415 HTTP status code, with different content type and compression settings. + +#### Invalid Samples + + +Receiver MAY NOT support certain metric types or samples (e.g. Receiver might reject sample without metadata or without created timestamp, while another Receiver might accept such sample.). It’s up to the Receiver what sample is invalid. Receiver MUST return a [400 HTTP Bad Request](https://www.rfc-editor.org/rfc/rfc9110.html#name-400-bad-request) status code for write requests that contain any invalid samples, unless the [partial retryable write](#retries-on-partial-writes) occurs. + +Sender MUST NOT retry on 4xx HTTP (other than 429 and 415) status codes, which MUST be used by Receiver to indicate that the write will never be able to succeed and should not be retried. + +### Retries & Backoff + +Receiver MAY return a [429 HTTP Too Many Requests](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/429) status code to indicate the overloaded server situation. Receiver MAY return [the Retry-After](https://www.rfc-editor.org/rfc/rfc9110.html#name-retry-after) header to indicate the time for the next write attempt. Receiver MAY return a 5xx HTTP status code to represent internal server errors. + +Sender MAY retry on 429 HTTP status code. Sender MUST retry write requests on 5xx HTTP. Sender MUST use a backoff algorithm to prevent overwhelming the server. Sender MAY handle [the Retry-After response header](https://www.rfc-editor.org/rfc/rfc9110.html#name-retry-after) to estimate the next retry time. + +The difference between 429 vs 5xx handling is due to potential for Sender “falling behind” if the Receiver cannot keep up. As a result, the ability to NOT retry on 429 allows progress is made when there are Sender side errors (e.g. too much traffic), while the data is not lost when there are Receiver side errors. + +### Retries on Partial Writes + +No partial retry-ability is specified (ability for receiver to ask for retry on certain samples only), but Receiver MAY return a HTTP 5xx or 429 status code a case of partial write cases (e.g. when some samples require retry, while the rest of the samples were successfully written). In that case Receiver MUST support idempotency as sender MAY retry with the same request. It’s up to Receiver implementation to decide what’s best with [the specified sender retry semantics](#retries--backoff). + +Similarly, Receiver MAY return a HTTP 5xx or 429 status code on partial write or [partial invalid sample cases](#partial-write), when it expects Sender to retry the whole request. + +### Backward and forward compatibility + +TBD + + + +Receiver MAY ingest valid samples within a write request that otherwise contains invalid samples. Receiver MUST return a HTTP 400 status code ("Bad Request") for write requests that contain any invalid samples. Receiver SHOULD provide a human readable error message in the response body. Sender MUST NOT try and interpret the error message, and SHOULD log it as is. + +## Out of Scope + + + +The same as in [1.0](./remote_write_spec.md#out-of-scope). + +## Future Plans + +This section contains speculative plans that are not considered part of protocol specification, but are mentioned here for completeness. Note that 2.0 specification completed [2 of 3 future plans in the 1.0](./remote_write_spec.md#future-plans). + +* **Transactionality** There is still no transactionality defined for 2.0 specification, mostly because it makes scalable Prometheus Sender implementation difficult. Prometheus aims at being "transactional" - i.e. to never expose a partially scraped target to a query. We intend to do the same with remote write -- for instance, in the future we would like to "align" remote write with scrapes, perhaps such that all the samples, metadata and exemplars for a single scrape are sent in a single remote write request. + + However, Remote Write 2.0 specification solves a key transactionality problem for [the classic histogram buckets](https://docs.google.com/document/d/1mpcSWH1B82q-BtJza-eJ8xMLlKt6EJ9oFGH325vtY1Q/edit#heading=h.ueg7q07wymku). This is done thanks to native histograms supporting custom bucket-ing which is supported by `io.prometheus.write.v2.Request`. Sender might translate all classic histograms to native histograms this way, but it's out of this specification to mandate this. However, for this reason Receiver MAY ignore certain metric types (e.g. classic histograms). + +* **Alternative wire formats**. The OpenTelemetry community has shown the validity of Apache Arrow (and potentially other columnar formats) for over the wire data transfer with their OTLP protocol. We would like to do experiments to confirm the compatibility of a similar format with Prometheus’ data model, and include benchmarks of any resource usage changes. We would potentially maintain both a protobuf and columnar format long term for compatibility reasons and use our content negotiation to add different proto message for this purpose. + +* Pre-defined string dictionary for interning The protocol could pre-define a static dictionary of ref->symbol that includes strings that are considered common, e.g. “namespace”, “le”, “job”, “seconds”, “bytes”, etc. Sender and refer to these without the need to include them in the request’s symbols table. This dictionary could incrementally grow with minor versions releases of the protocol. + +## Related + +### FAQ + +See 1.0 FAQ + +**Why did you not use gRPC?** +Because 1.0 protocol is not using gRPC, breaking it would increase friction in the adoption. See 1.0 [reason](./remote_write_spec.md#faq). + +**Why not streaming protobuf messages?** +The same rationale as in 1.0 [reasoning](./remote_write_spec.md#faq). + +**Why do we send samples in order?** +The same rationale as in 1.0 [reasoning](./remote_write_spec.md#faq). + +**How can we parallelise requests with the in-order constraint?** +The same answer as in 1.0 [reasoning](./remote_write_spec.md#faq). + + From 32a2ab6e5569e96dd816ade58c3a44c002035607 Mon Sep 17 00:00:00 2001 From: bwplotka Date: Wed, 22 May 2024 16:10:24 +0100 Subject: [PATCH 02/31] Addressed some feedback. Signed-off-by: bwplotka --- .../docs/concepts/remote_write_spec_2_0.md | 151 +++++++----------- 1 file changed, 60 insertions(+), 91 deletions(-) diff --git a/content/docs/concepts/remote_write_spec_2_0.md b/content/docs/concepts/remote_write_spec_2_0.md index 24539b28d..6ac1a0b70 100644 --- a/content/docs/concepts/remote_write_spec_2_0.md +++ b/content/docs/concepts/remote_write_spec_2_0.md @@ -25,15 +25,7 @@ The remote write protocol is designed to make stateless implementations of the s The remote write protocol contains opportunities for batching, e.g. sending multiple samples for different series in a single request. It is not expected that multiple samples for the same series will be commonly sent in the same request, although there is support for this in the protocol. - -The remote write protocol is not intended for use by applications to push metrics to Prometheus remote-write-compatible Receiver. It is intended that a Prometheus remote-write-compatible sender scrapes instrumented applications or exporters and sends remote write messages to a server. - - -A test suite can be found at https://github.com/prometheus/compliance/tree/main/remote_write_sender. +A test suite can be found at https://github.com/prometheus/compliance/tree/main/remote_write_sender. The test's 2.0 compatibility [is in progress](https://github.com/prometheus/compliance/issues/101). ### Glossary @@ -50,30 +42,28 @@ For the purposes of this document the following definitions MUST be followed: ### Protocol -The Remote Write Protocol MUST consist of RPCs with the request body encoded using a Google Protobuf 3 message. The protobuf encoding MUST use either of the following schemas: +The Remote Write Protocol MUST consist of RPCs with the request body encoded using a Google Protobuf 3 message and then compressed. - -* [`prometheus.WriteRequest`](./remote_write_spec.md#protocol) introduced in the Remote Write 1.0 specification. As of 2.0 the `prometheus.WriteRequest` message is deprecated. -* `io.prometheus.write.v2.Request` introduced in this specification and defined [below](#ioprometheuswritev2request-proto-schema). Senders SHOULD use `io.prometheus.write.v2.Request` when possible. +The protobuf encoding MUST use either of the following schemas: + +* [`prometheus.WriteRequest`](./remote_write_spec.md#protocol) introduced in the Remote Write 1.0 specification. As of 2.0 the `prometheus.WriteRequest` message is deprecated. It SHOULD be used only for compatibility reasons. Receiver MAY NOT support `prometheus.WriteRequest`. +* `io.prometheus.write.v2.Request` introduced in this specification and defined [below](#ioprometheuswritev2request-proto-schema). Senders and Receivers SHOULD use `io.prometheus.write.v2.Request` when possible. Receiver MUST support `io.prometheus.write.v2.Request`. + +The encoded message MUST be compressed with [Google’s Snappy](https://github.com/google/snappy). The block format MUST be used -- the framed format MUST NOT be used. Sender MUST send encoded and compresses proto message in the body of an HTTP POST request and send it to the Receiver via HTTP at a provided URL path. The Receiver MAY specify any HTTP URL path to receive metrics. -Sender MUST send the following "reserved" headers with the HTTP request: +Sender MUST send the following reserved headers with the HTTP request: * `Content-Encoding: ` Content encoding request header MUST follow [the RFC 9110](https://www.rfc-editor.org/rfc/rfc9110.html#name-content-encoding). Sender MUST use the `snappy` value. More compression algorithms might come in 2.x or beyond. * `Content-Type: application/x-protobuf` or `Content-Type: application/x-protobuf;proto=` - - + Content type request header MUST follow [the RFC 9110](https://www.rfc-editor.org/rfc/rfc9110.html#name-content-type). Sender MUST use `application/x-protobuf` as the only media type. Sender MAY add `;proto=` parameter to the header's value to indicate the fully qualified name of the protobuf message (schema) that was used, from the two mentioned above. As a result, Sender MUST send any of the three supported header values: - For the message introduced in PRW 1.0, identified by `prometheus.WriteRequest`: + For the deprecated message introduced in PRW 1.0, identified by `prometheus.WriteRequest`: * `Content-Type: application/x-protobuf` * `Content-Type: application/x-protobuf;proto=prometheus.WriteRequest` For the message introduced in PRW 2.0, identified by `io.prometheus.write.v2.Request`: @@ -82,16 +72,11 @@ Sender MUST send the following "reserved" headers with the HTTP request: Sender SHOULD use `Content-Type: application/x-protobuf`, for backward compatibility, when talking to 1.x Receiver. Sender SHOULD use `Content-Type: application/x-protobuf;proto=io.prometheus.write.v2.Request` when talking to Receiver supporting 2.x. More proto messages might come in 2.x or beyond. * `User-Agent: ` -* `X-Prometheus-Remote-Write-Version: ` - - Sender SHOULD use `X-Prometheus-Remote-Write-Version: 0.1.0` for backward compatibility, when using 1.0 proto message. +* `X-Prometheus-Remote-Write-Version: ` -Sender MAY allow users to send custom HTTP headers; they MUST NOT allow users to configure them in such a way as to send reserved headers. + Sender SHOULD use `X-Prometheus-Remote-Write-Version: 0.1.0` for backward compatibility, when talking to 1.x Receiver 1.0. Otherwise, Sender SHOULD use the newest remote write version it is compatible with e.g. `X-Prometheus-Remote-Write-Version: 2.0` - -The remote write request in the body of the HTTP POST MUST be compressed with [Google’s Snappy](https://github.com/google/snappy). The block format MUST be used -- the framed format MUST NOT be used. The remote write request MUST be encoded using Google Protobuf 3, and MUST use either of the schemas defined above. +Sender MAY allow users to add custom HTTP headers; they MUST NOT allow users to configure them in such a way as to send reserved headers. #### `io.prometheus.write.v2.Request` Proto Schema @@ -106,10 +91,14 @@ The simplified version of the new `io.prometheus.write.v2.Request` is presented // Request represents a request to write the given timeseries to a remote destination. message Request { // symbols contains a de-duplicated array of string elements used for various - // items in a Request message, like labels and metadata items. To decode - // each of those items, referenced, by "ref(s)" suffix, you need to lookup the - // actual string by index from symbols array. The order of strings is up to - // the client, server should not assume any particular encoding. + // items in a Request message, like labels and metadata items. For the sender convenience + // around empty values for optional fields like unit_ref, symbols array MUST start with + // empty string. + // + // To decode each of the symbolized strings, referenced, by "ref(s)" suffix, you + // need to lookup the actual string by index from symbols array. The order of + // strings is up to the sender. The receiver should not assume any particular encoding. + repeated string symbols = 1; repeated string symbols = 1; // timeseries represents an array of distinct series with 0 or more samples. repeated TimeSeries timeseries = 2; @@ -119,7 +108,7 @@ message Request { message TimeSeries { // labels_refs is a list of label name-value pair references, encoded // as indices to the Request.symbols array. This list's length is always - // a multiple of two, and the underlying labels should be sorted. + // a multiple of two, and the underlying labels should be sorted lexicographically. // // Note that there might be multiple TimeSeries objects in the same // Requests with the same labels e.g. for different exemplars, metadata @@ -127,7 +116,7 @@ message TimeSeries { repeated uint32 labels_refs = 1; // Timeseries messages can either specify samples or (native) histogram samples - // (histogram field), but not both. For typical clients (real-time metric + // (histogram field), but not both. For typical sender (real-time metric // streaming), in healthy cases, there will be only one sample or histogram. // // Samples and histograms are sorted by timestamp (older first). @@ -142,7 +131,7 @@ message TimeSeries { // created_timestamp represents an optional created timestamp associated with // this series' samples in ms format, typically for counter or histogram type - // metrics. Note that some servers might require this and in return fail to + // metrics. Note that some receivers might require this and in return fail to // ingest such samples within the Request. // // For Go, see github.com/prometheus/prometheus/model/timestamp/timestamp.go @@ -159,7 +148,7 @@ message TimeSeries { message Exemplar { // labels_refs is a list of label name-value pair references, encoded // as indices to the Request.symbols array. This list's len is always - // a multiple of 2, and the underlying labels should be sorted. + // a multiple of 2, and the underlying labels should be sorted lexicographically. repeated uint32 labels_refs = 1; double value = 2; int64 timestamp = 3; @@ -178,7 +167,7 @@ message Sample { // Metadata represents the metadata associated with the given series' samples. message Metadata { enum MetricType { - METRIC_TYPE_UNSPECIFIED = 0; + METRIC_TYPE_UNSPECIFIED = 0; METRIC_TYPE_COUNTER = 1; METRIC_TYPE_GAUGE = 2; METRIC_TYPE_HISTOGRAM = 3; @@ -189,19 +178,20 @@ message Metadata { } MetricType type = 1; // help_ref is a reference to the Request.symbols array representing help - // text for the metric. + // text for the metric. Help is optional, reference should point to empty string in + // such a case. uint32 help_ref = 3; // unit_ref is a reference to the Request.symbols array representing unit - // for the metric. + // for the metric. Unit is optional, reference should point to empty string in + // such a case. uint32 unit_ref = 4; } // A native histogram, also known as a sparse histogram. +// See https://github.com/prometheus/prometheus/blob/remote-write-2.0/prompb/io/prometheus/write/v2/types.proto#L142 +// for a full message that follows the native histogram spec for both sparse +// and exponential, as well as, custom bucketing. message Histogram { ... } - -// A BucketSpan defines a number of consecutive buckets with their -// offset. -message BucketSpan { ... } ``` All timestamps MUST be int64 counted as milliseconds since the Unix epoch. Sample's values MUST be float64. @@ -210,11 +200,8 @@ For every `TimeSeries` message: * Label references MUST be provided. * At least one element in Samples or in Histograms MUST be provided. For series which (rarely) would mix float and histogram samples, a separate `TimeSeries` message MUST be used. - -* Metadata MUST be provided. -* Exemplars SHOULD be provided, if they exist. +* Metadata fields SHOULD be provided. +* Exemplars SHOULD be provided, if they exist for a series. * Created timestamp SHOULD be provided for metrics that follow counter semantics (e.g. counters and histograms). The following subsections define some schema elements in details. @@ -223,27 +210,20 @@ The following subsections define some schema elements in details. The `io.prometheus.write.v2.Request` proto schema is designed to [intern all strings](https://en.wikipedia.org/wiki/String_interning) for the proven additional compression and memory efficiency gains on top of the standard compressions. -Symbols table containing deduplicated strings used in series and exemplar labels, metadata strings MUST be provided. References MUST point to the existing index in the Symbols string array. +Symbols table containing deduplicated strings used in series and exemplar labels, metadata strings MUST be provided. The first element of symbols table MUST be an empty string. References MUST point to the existing index in the Symbols string array. #### Series Labels The complete set of labels MUST be sent with each Sample or Histogram sample. Additionally, the label set associated with samples: -- SHOULD contain a `__name__` label. -- MUST NOT contain repeated label names. -- MUST have label names sorted in lexicographical order. -- MUST NOT contain any empty label names or values. +* SHOULD contain a `__name__` label. +* MUST NOT contain repeated label names. +* MUST have label names sorted in lexicographical order. +* MUST NOT contain any empty label names or values. -Sender MUST only send valid metric names, label names, and label values: +Metric names, label names, and label values MAY be any sequence of UTF-8 characters. Receiver MAY reject some series with metric names and label names characters that does not follow [previous patterns](https://prometheus.io/docs/concepts/remote_write_spec/#:~:text=Metric%20names%20MUST,UTF%2D8%20characters%20) given [the UTF-8 support is still in progress](https://github.com/prometheus/proposals/blob/main/proposals/2023-08-21-utf8.md). - -- Metric names MUST adhere to the regex `[a-zA-Z_:]([a-zA-Z0-9_:])*`. -- Label names MUST adhere to the regex `[a-zA-Z_]([a-zA-Z0-9_])*`. -- Label values MAY be any sequence of UTF-8 characters . - -Receiver MAY impose limits on the number and length of labels, but this will be receiver-specific and is out of scope for this document. +Receiver also MAY impose limits on the number and length of labels, but this is receiver-specific and is out of scope for this document. Label names beginning with "__" are RESERVED for system usage and SHOULD NOT be used, see [Prometheus Data Model](https://prometheus.io/docs/concepts/data_model/). @@ -271,10 +251,11 @@ Metadata SHOULD follow the official guidelines for [TYPE](https://prometheus.io/ #### Exemplars - -TBD +Each exemplar, if attached to a `TimeSeries`: + +* MUST contain at least one label set, so two references to symbols table. +* MUST contain value. +* MAY contain timestamp. #### Created Timestamp @@ -291,7 +272,7 @@ The following subsections specify Sender and Receiver semantics around write err #### Partial Write -Sender SHOULD use Prometheus Remote Write to request write of multiple samples, across multiple series. As a result, Receiver MAY ingest valid samples within a write request that contains invalid or otherwise unwritten samples, which represents a partial write case. +Sender SHOULD use Prometheus Remote Write to send samples for multiple series in a single request. As a result, Receiver MAY ingest valid samples within a write request that contains invalid or otherwise unwritten samples, which represents a partial write case. In a partial write case, Receiver MUST NOT return HTTP 200 status code. Receiver MUST provide a human-readable error message in the response body. Sender MUST NOT try and interpret the error message, and SHOULD log it as is. @@ -301,17 +282,9 @@ Receiver MAY NOT support certain content types or encodings defined in [the Prot Sender SHOULD expect [400 HTTP Bad Request](https://www.rfc-editor.org/rfc/rfc9110.html#name-400-bad-request) for the above reasons from the 1.x Receiver, for backward compatibility. - -Sender MAY retry write requests on 415 HTTP status code, with different content type and compression settings. - #### Invalid Samples - -Receiver MAY NOT support certain metric types or samples (e.g. Receiver might reject sample without metadata or without created timestamp, while another Receiver might accept such sample.). It’s up to the Receiver what sample is invalid. Receiver MUST return a [400 HTTP Bad Request](https://www.rfc-editor.org/rfc/rfc9110.html#name-400-bad-request) status code for write requests that contain any invalid samples, unless the [partial retryable write](#retries-on-partial-writes) occurs. +Receiver MAY NOT support certain metric types or samples (e.g. Receiver might reject sample without metadata type specified or without created timestamp, while another Receiver might accept such sample.). It’s up to the Receiver what sample is invalid. Receiver MUST return a [400 HTTP Bad Request](https://www.rfc-editor.org/rfc/rfc9110.html#name-400-bad-request) status code for write requests that contain any invalid samples, unless the [partial retryable write](#retries-on-partial-writes) occurs. Sender MUST NOT retry on 4xx HTTP (other than 429 and 415) status codes, which MUST be used by Receiver to indicate that the write will never be able to succeed and should not be retried. @@ -329,27 +302,23 @@ No partial retry-ability is specified (ability for receiver to ask for retry on Similarly, Receiver MAY return a HTTP 5xx or 429 status code on partial write or [partial invalid sample cases](#partial-write), when it expects Sender to retry the whole request. -### Backward and forward compatibility +### Backward and Forward Compatibility -TBD +The protocol follows [semantic versioning 2.0](https://semver.org/): any 2.x compatible Receiver MUST be able to read any 2.x compatible sender and so on. Breaking/backwards incompatible changes will result in a 3.x version of the spec. - +The 2.x protocol is breaking compatibility with 1.x by introducing a new `io.prometheus.write.v2.Request` content type (wire format) and deprecating the `prometheus.WriteRequest`. + +2.x senders MAY support 1.x... TBD explain. -Receiver MAY ingest valid samples within a write request that otherwise contains invalid samples. Receiver MUST return a HTTP 400 status code ("Bad Request") for write requests that contain any invalid samples. Receiver SHOULD provide a human readable error message in the response body. Sender MUST NOT try and interpret the error message, and SHOULD log it as is. ## Out of Scope From 775abc3c8f5ed4b40923e5d8ca9d75621c4d7d14 Mon Sep 17 00:00:00 2001 From: bwplotka Date: Fri, 24 May 2024 18:35:06 +0100 Subject: [PATCH 03/31] Work together with Callum. Signed-off-by: bwplotka --- .../docs/concepts/remote_write_spec_2_0.md | 198 +++++++++--------- content/docs/specs/remote_write_spec.md | 2 +- 2 files changed, 102 insertions(+), 98 deletions(-) diff --git a/content/docs/concepts/remote_write_spec_2_0.md b/content/docs/concepts/remote_write_spec_2_0.md index 6ac1a0b70..19d5917f6 100644 --- a/content/docs/concepts/remote_write_spec_2_0.md +++ b/content/docs/concepts/remote_write_spec_2_0.md @@ -34,7 +34,7 @@ For the purposes of this document the following definitions MUST be followed: * a "Sender" is something that sends Prometheus Remote Write data. * a "Receiver" is something that receives Prometheus Remote Write data. * a "Sample" is a pair of (timestamp, value). -* a "Histogram" is a pair of (timestamp, histogram value). +* a "Histogram" is a pair of (timestamp, [histogram value](https://github.com/prometheus/docs/blob/b9657b5f5b264b81add39f6db2f1df36faf03efe/content/docs/concepts/native_histograms.md)). * a "Label" is a pair of (key, value). * a "Series" is a list of samples, identified by a unique set of labels. @@ -46,18 +46,18 @@ The Remote Write Protocol MUST consist of RPCs with the request body encoded usi The protobuf encoding MUST use either of the following schemas: -* [`prometheus.WriteRequest`](./remote_write_spec.md#protocol) introduced in the Remote Write 1.0 specification. As of 2.0 the `prometheus.WriteRequest` message is deprecated. It SHOULD be used only for compatibility reasons. Receiver MAY NOT support `prometheus.WriteRequest`. -* `io.prometheus.write.v2.Request` introduced in this specification and defined [below](#ioprometheuswritev2request-proto-schema). Senders and Receivers SHOULD use `io.prometheus.write.v2.Request` when possible. Receiver MUST support `io.prometheus.write.v2.Request`. +* [`prometheus.WriteRequest`](./remote_write_spec.md#protocol) introduced in the Remote Write 1.0 specification. As of 2.0 the `prometheus.WriteRequest` message is deprecated. It SHOULD be used only for compatibility reasons. Sender and Receiver MAY NOT support `prometheus.WriteRequest`. +* `io.prometheus.write.v2.Request` introduced in this specification and defined [below](#ioprometheuswritev2request-proto-schema). Senders and Receivers SHOULD use `io.prometheus.write.v2.Request` when possible. Sender and Receiver MUST support `io.prometheus.write.v2.Request`. The encoded message MUST be compressed with [Google’s Snappy](https://github.com/google/snappy). The block format MUST be used -- the framed format MUST NOT be used. -Sender MUST send encoded and compresses proto message in the body of an HTTP POST request and send it to the Receiver via HTTP at a provided URL path. The Receiver MAY specify any HTTP URL path to receive metrics. +Sender MUST send encoded and compressed proto message in the body of an HTTP POST request and send it to the Receiver via HTTP at a provided URL path. The Receiver MAY specify any HTTP URL path to receive metrics. Sender MUST send the following reserved headers with the HTTP request: * `Content-Encoding: ` - Content encoding request header MUST follow [the RFC 9110](https://www.rfc-editor.org/rfc/rfc9110.html#name-content-encoding). Sender MUST use the `snappy` value. More compression algorithms might come in 2.x or beyond. + Content encoding request header MUST follow [the RFC 9110](https://www.rfc-editor.org/rfc/rfc9110.html#name-content-encoding). Sender MUST use the `snappy` value. Receiver MUST support `snappy` compression. New, optional compression algorithms might come in 2.x or beyond. * `Content-Type: application/x-protobuf` or `Content-Type: application/x-protobuf;proto=` @@ -69,15 +69,70 @@ Sender MUST send the following reserved headers with the HTTP request: For the message introduced in PRW 2.0, identified by `io.prometheus.write.v2.Request`: * `Content-Type: application/x-protobuf;proto=io.prometheus.write.v2.Request` - Sender SHOULD use `Content-Type: application/x-protobuf`, for backward compatibility, when talking to 1.x Receiver. Sender SHOULD use `Content-Type: application/x-protobuf;proto=io.prometheus.write.v2.Request` when talking to Receiver supporting 2.x. More proto messages might come in 2.x or beyond. + When talking to 1.x Receiver, the Sender SHOULD use `Content-Type: application/x-protobuf` for backward compatibility. Otherwise, Sender SHOULD use `Content-Type: application/x-protobuf;proto=io.prometheus.write.v2.Request`. More proto messages might come in 2.x or beyond. * `User-Agent: ` * `X-Prometheus-Remote-Write-Version: ` - Sender SHOULD use `X-Prometheus-Remote-Write-Version: 0.1.0` for backward compatibility, when talking to 1.x Receiver 1.0. Otherwise, Sender SHOULD use the newest remote write version it is compatible with e.g. `X-Prometheus-Remote-Write-Version: 2.0` + When talking to 1.x Receiver, the Sender MUST use `X-Prometheus-Remote-Write-Version: 0.1.0` for backward compatibility. Otherwise, Sender SHOULD use the newest remote write version it is compatible with e.g. `X-Prometheus-Remote-Write-Version: 2.0.0`. Sender MAY allow users to add custom HTTP headers; they MUST NOT allow users to configure them in such a way as to send reserved headers. +### Response + +Receiver ingesting all samples successfully MUST return HTTP 200 status code. In such a successful case, the response body from the Receiver SHOULD be empty; Sender MUST ignore the response body. The response body is RESERVED for future use. + +The following subsections specify Sender and Receiver semantics around write errors. + +#### Partial Write + +Sender SHOULD use Prometheus Remote Write to send samples for multiple series in a single request. As a result, Receiver MAY ingest valid samples within a write request that contains invalid or otherwise unwritten samples, which represents a partial write case. + +In a partial write case, Receiver MUST NOT return HTTP 200 status code. Receiver MUST provide a human-readable error message in the response body. The Receiver's error SHOULD contain information about the amount of the samples being rejected and for what reasons. + +Sender MUST NOT try and interpret the error message, and SHOULD log it as is. + +#### Unsupported Request Content + +Receiver MUST return [415 HTTP Unsupported Media Type](https://www.rfc-editor.org/rfc/rfc9110.html#name-415-unsupported-media-type) status code if they don't support a given content type or encoding provided by the Sender. + +Sender SHOULD expect [400 HTTP Bad Request](https://www.rfc-editor.org/rfc/rfc9110.html#name-400-bad-request) for the above reasons from the 1.x Receiver, for backward compatibility. + +#### Invalid Samples + +Receiver MAY NOT support certain metric types or samples (e.g. Receiver might reject sample without metadata type specified or without created timestamp, while another Receiver might accept such sample.). It’s up to the Receiver what sample is invalid. Receiver MUST return a [400 HTTP Bad Request](https://www.rfc-editor.org/rfc/rfc9110.html#name-400-bad-request) status code for write requests that contain any invalid samples, unless the [partial retryable write](#retries-on-partial-writes) occurs. + +Sender MUST NOT retry on 4xx HTTP (other than [429](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/429)) status codes, which MUST be used by Receiver to indicate that the write will never be able to succeed and should not be retried. Sender MAY retry on 415 HTTP status code with a different content-type or encoding to see if Receiver supports it. + +### Retries & Backoff + +Receiver MAY return a [429 HTTP Too Many Requests](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/429) status code to indicate the overloaded server situation. Receiver MAY return [the Retry-After](https://www.rfc-editor.org/rfc/rfc9110.html#name-retry-after) header to indicate the time for the next write attempt. Receiver MAY return a 5xx HTTP status code to represent internal server errors, that should be retried. + +Sender MAY retry on 429 HTTP status code. Sender MUST retry write requests on 5xx HTTP. Sender MUST use a backoff algorithm to prevent overwhelming the server. Sender MAY handle [the Retry-After response header](https://www.rfc-editor.org/rfc/rfc9110.html#name-retry-after) to estimate the next retry time. + +The difference between 429 vs 5xx handling is due to a potential situation for the Sender “falling behind” if the Receiver cannot keep up. As a result, the ability to NOT retry on 429 allows progress is made when there are Sender side errors (e.g. too much traffic), while the data is not lost when there are Receiver side errors. + +### Retries on Partial Writes + +Receiver MAY return a 5xx HTTP or 429 HTTP status code on partial write or [partial invalid sample cases](#partial-write), when it expects Sender to retry the whole request. In that case Receiver MUST support idempotency as sender MAY retry with the same request. + +### Backward and Forward Compatibility + +The protocol follows [semantic versioning 2.0](https://semver.org/): any 2.x compatible Receiver MUST be able to read any 2.x compatible Sender and vice versa. Breaking or backwards incompatible changes will result in a 3.x version of the spec. + +The proto formats itself are forward / backward compatible, in some respects: + +* Removing fields from the proto requirements mean a major version bump. +* Adding (optional) fields will be a minor version bump. + +In other words, this means that future minor versions of 2.x MAY add new optional fields to `io.prometheus.write.v2.Request`, new compressions, content types (wire formats) and negotiation mechanisms, as long as they are backward compatible (e.g. optional to both Receivers and Senders). + +### 2.x vs 1.x Compatibility + +The 2.x protocol is breaking compatibility with 1.x by introducing a new `io.prometheus.write.v2.Request` content type (wire format) and deprecating the `prometheus.WriteRequest`. + +2.x Senders MAY support 1.x Receivers by allowing users to configure what content type sender should use. 2.x Senders also MAY automatically fall back to different content types, if the Receiver returns 415 HTTP status code. + #### `io.prometheus.write.v2.Request` Proto Schema -Sender MUST send samples for any given TimeSeries in timestamp order. Sender MAY send multiple requests for different series in parallel. +#### Samples and Histogram Samples + +Sender MUST send samples (or histogram samples) for any given TimeSeries in timestamp order. Sender MAY send multiple requests for different series in parallel. -Sender MUST send stale markers when a time series will no longer be appended to. +Sender MUST send stale markers when a time series will no longer be appended to, for time series that were "scraped". Stale markers MUST be signalled by the special NaN value `0x7ff0000000000002`. This value MUST NOT be used otherwise. @@ -247,117 +312,56 @@ Typically, Sender can detect when a time series will no longer be appended to us #### Metadata -Metadata SHOULD follow the official guidelines for [TYPE](https://prometheus.io/docs/instrumenting/writing_exporters/#types) and [HELP](https://prometheus.io/docs/instrumenting/writing_exporters/#help-strings). +Metadata SHOULD follow the official Prometheus guidelines for: + +* [Type](https://prometheus.io/docs/instrumenting/writing_exporters/#types) +* [Help](https://prometheus.io/docs/instrumenting/writing_exporters/#help-strings). + +Metadata MAY follow the official OpenMetrics guidelines for: + +* [Unit](https://github.com/OpenObservability/OpenMetrics/blob/main/specification/OpenMetrics.md#unit) #### Exemplars Each exemplar, if attached to a `TimeSeries`: -* MUST contain at least one label set, so two references to symbols table. +* MUST contain at least one label set, so two references to a symbols table. * MUST contain value. * MAY contain timestamp. -#### Created Timestamp - - -TBD - -### Responses - -Receiver ingesting all samples successfully MUST return HTTP 200 status code. In such a successful case, the response body from the Receiver SHOULD be empty; Sender MUST ignore the response body. The response body is RESERVED for future use. - -The following subsections specify Sender and Receiver semantics around write errors. - -#### Partial Write - -Sender SHOULD use Prometheus Remote Write to send samples for multiple series in a single request. As a result, Receiver MAY ingest valid samples within a write request that contains invalid or otherwise unwritten samples, which represents a partial write case. - -In a partial write case, Receiver MUST NOT return HTTP 200 status code. Receiver MUST provide a human-readable error message in the response body. Sender MUST NOT try and interpret the error message, and SHOULD log it as is. - -#### Unsupported Request Content - -Receiver MAY NOT support certain content types or encodings defined in [the Protocol section](#protocol). Receiver MUST return [415 HTTP Unsupported Media Type](https://www.rfc-editor.org/rfc/rfc9110.html#name-415-unsupported-media-type) status code if they don't support a given content type or encoding provided by the Sender. - -Sender SHOULD expect [400 HTTP Bad Request](https://www.rfc-editor.org/rfc/rfc9110.html#name-400-bad-request) for the above reasons from the 1.x Receiver, for backward compatibility. - -#### Invalid Samples - -Receiver MAY NOT support certain metric types or samples (e.g. Receiver might reject sample without metadata type specified or without created timestamp, while another Receiver might accept such sample.). It’s up to the Receiver what sample is invalid. Receiver MUST return a [400 HTTP Bad Request](https://www.rfc-editor.org/rfc/rfc9110.html#name-400-bad-request) status code for write requests that contain any invalid samples, unless the [partial retryable write](#retries-on-partial-writes) occurs. - -Sender MUST NOT retry on 4xx HTTP (other than 429 and 415) status codes, which MUST be used by Receiver to indicate that the write will never be able to succeed and should not be retried. - -### Retries & Backoff - -Receiver MAY return a [429 HTTP Too Many Requests](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/429) status code to indicate the overloaded server situation. Receiver MAY return [the Retry-After](https://www.rfc-editor.org/rfc/rfc9110.html#name-retry-after) header to indicate the time for the next write attempt. Receiver MAY return a 5xx HTTP status code to represent internal server errors. - -Sender MAY retry on 429 HTTP status code. Sender MUST retry write requests on 5xx HTTP. Sender MUST use a backoff algorithm to prevent overwhelming the server. Sender MAY handle [the Retry-After response header](https://www.rfc-editor.org/rfc/rfc9110.html#name-retry-after) to estimate the next retry time. - -The difference between 429 vs 5xx handling is due to potential for Sender “falling behind” if the Receiver cannot keep up. As a result, the ability to NOT retry on 429 allows progress is made when there are Sender side errors (e.g. too much traffic), while the data is not lost when there are Receiver side errors. - -### Retries on Partial Writes - -No partial retry-ability is specified (ability for receiver to ask for retry on certain samples only), but Receiver MAY return a HTTP 5xx or 429 status code a case of partial write cases (e.g. when some samples require retry, while the rest of the samples were successfully written). In that case Receiver MUST support idempotency as sender MAY retry with the same request. It’s up to Receiver implementation to decide what’s best with [the specified sender retry semantics](#retries--backoff). - -Similarly, Receiver MAY return a HTTP 5xx or 429 status code on partial write or [partial invalid sample cases](#partial-write), when it expects Sender to retry the whole request. - -### Backward and Forward Compatibility - -The protocol follows [semantic versioning 2.0](https://semver.org/): any 2.x compatible Receiver MUST be able to read any 2.x compatible sender and so on. Breaking/backwards incompatible changes will result in a 3.x version of the spec. - -The proto formats itself are forward / backward compatible, in some respects: - -* Removing fields from the proto requirements mean a major version bump. -* Adding (optional) fields will be a minor version bump. - -In other words, this means that future minor versions of 2.x MAY add new optional fields to `io.prometheus.write.v2.Request`, new compressions, content types (wire formats) and negotiation mechanisms, as long as they are backward compatible (e.g. optional to both Receivers and Senders). - -### 2.x vs 1.x Compatibility - -The 2.x protocol is breaking compatibility with 1.x by introducing a new `io.prometheus.write.v2.Request` content type (wire format) and deprecating the `prometheus.WriteRequest`. - -2.x senders MAY support 1.x... TBD explain. - - ## Out of Scope - - The same as in [1.0](./remote_write_spec.md#out-of-scope). ## Future Plans -This section contains speculative plans that are not considered part of protocol specification, but are mentioned here for completeness. Note that 2.0 specification completed [2 of 3 future plans in the 1.0](./remote_write_spec.md#future-plans). +This section contains speculative plans that are not considered part of protocol specification yet, but are mentioned here for completeness. Note that 2.0 specification completed [2 of 3 future plans in the 1.0](./remote_write_spec.md#future-plans). -* **Transactionality** There is still no transactionality defined for 2.0 specification, mostly because it makes scalable Prometheus Sender implementation difficult. Prometheus aims at being "transactional" - i.e. to never expose a partially scraped target to a query. We intend to do the same with remote write -- for instance, in the future we would like to "align" remote write with scrapes, perhaps such that all the samples, metadata and exemplars for a single scrape are sent in a single remote write request. +* **Transactionality** There is still no transactionality defined for 2.0 specification, mostly because it makes scalable Sender implementation difficult. Prometheus Sender aims at being "transactional" - i.e. to never expose a partially scraped target to a query. We intend to do the same with remote write -- for instance, in the future we would like to "align" remote write with scrapes, perhaps such that all the samples, metadata and exemplars for a single scrape are sent in a single remote write request. - However, Remote Write 2.0 specification solves a key transactionality problem for [the classic histogram buckets](https://docs.google.com/document/d/1mpcSWH1B82q-BtJza-eJ8xMLlKt6EJ9oFGH325vtY1Q/edit#heading=h.ueg7q07wymku). This is done thanks to native histograms supporting custom bucket-ing which is supported by `io.prometheus.write.v2.Request`. Sender might translate all classic histograms to native histograms this way, but it's out of this specification to mandate this. However, for this reason Receiver MAY ignore certain metric types (e.g. classic histograms). + However, Remote Write 2.0 specification solves an important transactionality problem for [the classic histogram buckets](https://docs.google.com/document/d/1mpcSWH1B82q-BtJza-eJ8xMLlKt6EJ9oFGH325vtY1Q/edit#heading=h.ueg7q07wymku). This is done thanks to the native histograms supporting custom bucket-ing possible with the `io.prometheus.write.v2.Request` wire format. Sender might translate all classic histograms to native histograms this way, but it's out of this specification to mandate this. However, for this reason Receiver MAY ignore certain metric types (e.g. classic histograms). * **Alternative wire formats**. The OpenTelemetry community has shown the validity of Apache Arrow (and potentially other columnar formats) for over the wire data transfer with their OTLP protocol. We would like to do experiments to confirm the compatibility of a similar format with Prometheus’ data model, and include benchmarks of any resource usage changes. We would potentially maintain both a protobuf and columnar format long term for compatibility reasons and use our content negotiation to add different proto message for this purpose. -* Pre-defined string dictionary for interning The protocol could pre-define a static dictionary of ref->symbol that includes strings that are considered common, e.g. “namespace”, “le”, “job”, “seconds”, “bytes”, etc. Sender and refer to these without the need to include them in the request’s symbols table. This dictionary could incrementally grow with minor versions releases of the protocol. +* **Global symbols**. Pre-defined string dictionary for interning The protocol could pre-define a static dictionary of ref->symbol that includes strings that are considered common, e.g. “namespace”, “le”, “job”, “seconds”, “bytes”, etc. Sender and refer to these without the need to include them in the request’s symbols table. This dictionary could incrementally grow with minor versions releases of the protocol. ## Related ### FAQ -See 1.0 FAQ - **Why did you not use gRPC?** Because 1.0 protocol is not using gRPC, breaking it would increase friction in the adoption. See 1.0 [reason](./remote_write_spec.md#faq). **Why not streaming protobuf messages?** -The same rationale as in 1.0 [reasoning](./remote_write_spec.md#faq). +If you use persistent HTTP/1.1 connections, they are pretty close to streaming. Of course headers have to be re-sent, but yes that is less expensive than a new TCP set up. **Why do we send samples in order?** -The same rationale as in 1.0 [reasoning](./remote_write_spec.md#faq). +The in-order constraint comes from the encoding we use for time series data in Prometheus, the implementation of which is append only. It is possible to remove this constraint, for instance by buffering samples and reordering them before encoding. **How can we parallelise requests with the in-order constraint?** -The same answer as in 1.0 [reasoning](./remote_write_spec.md#faq). +Samples must be in-order _for a given series_. Remote write requests can be sent in parallel as long as they are for different series. In Prometheus, we shard the samples by their labels into separate queues, and then writes happen sequentially in each queue. This guarantees samples for the same series are delivered in order, but samples for different series are sent in parallel - and potentially "out of order" between different series. - +**What are the differences between Remote Write 2.0 and OpenTelemetry's OTLP protocol?** +[OpenTelemetry OTLP](https://github.com/open-telemetry/opentelemetry-proto/blob/a05597bff803d3d9405fcdd1e1fb1f42bed4eb7a/docs/specification.md) is a protocol for transporting of telemetry data (such as metrics, logs, traces and profiles) between telemetry sources, intermediate nodes and telemetry backends. The recommended transport involves gRPC with protobuf, but HTTP with protobuf or JSON are also described. It was designed from scratch with the intent to support variety of different observability signals, data types and extra information. For [metrics](https://github.com/open-telemetry/opentelemetry-proto/blob/main/opentelemetry/proto/metrics/v1/metrics.proto) that means additional non-identifying labels, flags, temporal aggregations types, resource or scoped metrics, schema URLs and more. OTLP also requires [the semantic convention](https://opentelemetry.io/docs/concepts/semantic-conventions/) to be used. + +Remote Write was designed for simplicity, efficiency and organic growth. First version was officially released in 2023, when already [dozens of battle-tested adopters in the CNCF ecosystem](./remote_write_spec.md#compatible-senders-and-receivers) were using it for years. Remote Write 2.0 iterates on the previous protocol by adding a few new elements (metadata, exemplars, created timestamp and native histograms) and string interning. Remote Write 2.0 is always stateless, focuses only on metrics and is opinionated -- it is scoped down to elements that by Prometheus community, is all you need to have robust metric solution. We believe Remote Write 2.0 proposes an export transport, for metrics, that is a magnitude simpler to adopt and use, and often more efficient than competitors. diff --git a/content/docs/specs/remote_write_spec.md b/content/docs/specs/remote_write_spec.md index 796fea4d2..aa2bad26f 100644 --- a/content/docs/specs/remote_write_spec.md +++ b/content/docs/specs/remote_write_spec.md @@ -183,7 +183,7 @@ This section contains speculative plans that are not considered part of protocol ## Related ### Compatible Senders and Receivers -The spec is intended to describe how the following components interact: +The spec is intended to describe how the following components interact (as of April 2023): - [Prometheus](https://github.com/prometheus/prometheus/tree/master/storage/remote) (as both a "sender" and a "receiver") - [Avalanche](https://github.com/prometheus-community/avalanche) (as a "sender") - A Load Testing Tool Prometheus Metrics. From 67991974b99f94a70f323b7ddb4b770cc65ed907 Mon Sep 17 00:00:00 2001 From: bwplotka Date: Tue, 4 Jun 2024 09:06:49 +0100 Subject: [PATCH 04/31] Final changes. * Addressed Callum review comments. * Consistent naming for Wire Format, Protocol, Proto Message and Remote-Write. Specifically stop using Wire Format as it's confusing (https://stackoverflow.com/a/70862002). * Added deprecation notice on summary * Added links to rationales where important (in commentary). * Grammarly fixes. Signed-off-by: bwplotka --- .../docs/concepts/remote_write_spec_2_0.md | 95 +++++++++++++------ 1 file changed, 64 insertions(+), 31 deletions(-) diff --git a/content/docs/concepts/remote_write_spec_2_0.md b/content/docs/concepts/remote_write_spec_2_0.md index 19d5917f6..e8952eb5a 100644 --- a/content/docs/concepts/remote_write_spec_2_0.md +++ b/content/docs/concepts/remote_write_spec_2_0.md @@ -9,9 +9,9 @@ sort_rank: 4 * Status: Proposed * Date: May 2024 -The remote write specification, in general, is intended to document the standard for how Prometheus and Prometheus remote-write-compatible agents send data to a Prometheus or Prometheus remote-write compatible receivers. +The remote write specification, in general, is intended to document the standard for how Prometheus and Prometheus Remote-Write-compatible agents send data to a Prometheus or Prometheus Remote-Write compatible receivers. -This document is intended to define a second version of the [Prometheus Remote Write](./remote_write_spec.md) API with minor changes to protocol and semantics. This second version also adds a new wire format with new features enabling more use cases and wider adoption on top of performance and cost savings. Finally, this spec outlines how to implement backward compatible senders and receivers (even under a single endpoint) using existing basic content negotiation request headers. More advanced, automatic content negotiation mechanisms might come in future versions, if needed. For the rationales behind the 2.0 specification, see [the formal proposal](https://github.com/prometheus/proposals/pull/35). +This document is intended to define a second version of the [Prometheus Remote-Write](./remote_write_spec.md) API with minor changes to protocol and semantics. This second version adds a new Proto Message with new features enabling more use cases and wider adoption on top of performance and cost savings. Second version also deprecates the previous Proto Message from a [1.0 Remote-Write specification](./remote_write_spec.md#protocol). Finally, this spec outlines how to implement backward compatible senders and receivers (even under a single endpoint) using existing basic content negotiation request headers. More advanced, automatic content negotiation mechanisms might come in a future minor version, if needed. For the rationales behind the 2.0 specification, see [the formal proposal](https://github.com/prometheus/proposals/pull/35). The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC 2119](https://datatracker.ietf.org/doc/html/rfc2119). @@ -19,11 +19,14 @@ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "S ### Background -The remote write protocol is designed to make it possible to reliably propagate samples in real-time from a sender to a receiver, without loss. +The Remote-Write protocol is designed to make it possible to reliably propagate samples in real-time from a sender to a receiver, without loss. -The remote write protocol is designed to make stateless implementations of the server possible; as such there are little-to-no inter-message references. As such the protocol is not considered "streaming." To achieve a streaming effect multiple messages should be sent over the same connection using e.g. HTTP/1.1 or HTTP/2. "Fancy" technologies such as gRPC were considered, but at the time were not widely adopted, and it was challenging to expose gRPC services to the internet behind load balancers such as an AWS EC2 ELB. + +The Remote-Write protocol is designed to make stateless implementations of the server possible; as such there are little-to-no inter-message references. As such the protocol is not considered "streaming." To achieve a streaming effect multiple messages should be sent over the same connection using e.g. HTTP/1.1 or HTTP/2. "Fancy" technologies such as gRPC were considered, but at the time were not widely adopted, and it was challenging to expose gRPC services to the internet behind load balancers such as an AWS EC2 ELB. -The remote write protocol contains opportunities for batching, e.g. sending multiple samples for different series in a single request. It is not expected that multiple samples for the same series will be commonly sent in the same request, although there is support for this in the protocol. +The Remote-Write protocol contains opportunities for batching, e.g. sending multiple samples for different series in a single request. It is not expected that multiple samples for the same series will be commonly sent in the same request, although there is support for this in the Proto Message. A test suite can be found at https://github.com/prometheus/compliance/tree/main/remote_write_sender. The test's 2.0 compatibility [is in progress](https://github.com/prometheus/compliance/issues/101). @@ -31,8 +34,12 @@ A test suite can be found at https://github.com/prometheus/compliance/tree/main/ For the purposes of this document the following definitions MUST be followed: -* a "Sender" is something that sends Prometheus Remote Write data. -* a "Receiver" is something that receives Prometheus Remote Write data. +* a "Remote-Write" is the name of this Prometheus protocol. +* a "Protocol" is a communication specification that enables client and server to transfer metrics. This includes content type definitions, but also compressions, negotiation, retry mechanisms and so on. +* a "Proto Message" refers to the [content type](https://www.rfc-editor.org/rfc/rfc9110.html#name-content-type) definition of the data structure Remote-Write is specifying for this Protocol. Since this specification uses [Google Protocol Buffers ("protobuf")](https://protobuf.dev/) exclusively, the schema is defined in a ["proto" file](https://protobuf.dev/programming-guides/proto3/) and represented by a single Protobuf ["message"](https://protobuf.dev/programming-guides/proto3/#simple). +* a "Wire Format" is the format of the data as it travels on the wire (i.e. in a network). In case of Remote-Write this is always the compressed binary protobuf format. +* a "Sender" is something that sends Remote-Write data. +* a "Receiver" is something that receives Remote-Write data. * a "Sample" is a pair of (timestamp, value). * a "Histogram" is a pair of (timestamp, [histogram value](https://github.com/prometheus/docs/blob/b9657b5f5b264b81add39f6db2f1df36faf03efe/content/docs/concepts/native_histograms.md)). * a "Label" is a pair of (key, value). @@ -42,26 +49,34 @@ For the purposes of this document the following definitions MUST be followed: ### Protocol -The Remote Write Protocol MUST consist of RPCs with the request body encoded using a Google Protobuf 3 message and then compressed. - -The protobuf encoding MUST use either of the following schemas: +The Remote-Write Protocol MUST consist of RPCs with the request body serialized using a Google Protocol Buffers and then compressed. -* [`prometheus.WriteRequest`](./remote_write_spec.md#protocol) introduced in the Remote Write 1.0 specification. As of 2.0 the `prometheus.WriteRequest` message is deprecated. It SHOULD be used only for compatibility reasons. Sender and Receiver MAY NOT support `prometheus.WriteRequest`. + +The protobuf serialization MUST use either of the following Proto Messages: +* [`prometheus.WriteRequest`](./remote_write_spec.md#protocol) introduced in the Remote-Write 1.0 specification. As of 2.0 the `prometheus.WriteRequest` message is deprecated. It SHOULD be used only for compatibility reasons. Sender and Receiver MAY NOT support `prometheus.WriteRequest`. * `io.prometheus.write.v2.Request` introduced in this specification and defined [below](#ioprometheuswritev2request-proto-schema). Senders and Receivers SHOULD use `io.prometheus.write.v2.Request` when possible. Sender and Receiver MUST support `io.prometheus.write.v2.Request`. -The encoded message MUST be compressed with [Google’s Snappy](https://github.com/google/snappy). The block format MUST be used -- the framed format MUST NOT be used. +The Proto Message MUST use binary Wire Format. Then, MUST be compressed with [Google’s Snappy](https://github.com/google/snappy). The block format MUST be used -- the framed format MUST NOT be used. -Sender MUST send encoded and compressed proto message in the body of an HTTP POST request and send it to the Receiver via HTTP at a provided URL path. The Receiver MAY specify any HTTP URL path to receive metrics. +Sender MUST send serialized and compressed Proto Message in the body of an HTTP POST request and send it to the Receiver via HTTP at a provided URL path. The Receiver MAY specify any HTTP URL path to receive metrics. + Sender MUST send the following reserved headers with the HTTP request: * `Content-Encoding: ` + Content encoding request header MUST follow [the RFC 9110](https://www.rfc-editor.org/rfc/rfc9110.html#name-content-encoding). Sender MUST use the `snappy` value. Receiver MUST support `snappy` compression. New, optional compression algorithms might come in 2.x or beyond. * `Content-Type: application/x-protobuf` or `Content-Type: application/x-protobuf;proto=` - Content type request header MUST follow [the RFC 9110](https://www.rfc-editor.org/rfc/rfc9110.html#name-content-type). Sender MUST use `application/x-protobuf` as the only media type. Sender MAY add `;proto=` parameter to the header's value to indicate the fully qualified name of the protobuf message (schema) that was used, from the two mentioned above. As a result, Sender MUST send any of the three supported header values: + Content type request header MUST follow [the RFC 9110](https://www.rfc-editor.org/rfc/rfc9110.html#name-content-type). Sender MUST use `application/x-protobuf` as the only media type. Sender MAY add `;proto=` parameter to the header's value to indicate the fully qualified name of the Proto Message that was used, from the two mentioned above. As a result, Sender MUST send any of the three supported header values: For the deprecated message introduced in PRW 1.0, identified by `prometheus.WriteRequest`: * `Content-Type: application/x-protobuf` @@ -69,12 +84,12 @@ Sender MUST send the following reserved headers with the HTTP request: For the message introduced in PRW 2.0, identified by `io.prometheus.write.v2.Request`: * `Content-Type: application/x-protobuf;proto=io.prometheus.write.v2.Request` - When talking to 1.x Receiver, the Sender SHOULD use `Content-Type: application/x-protobuf` for backward compatibility. Otherwise, Sender SHOULD use `Content-Type: application/x-protobuf;proto=io.prometheus.write.v2.Request`. More proto messages might come in 2.x or beyond. + When talking to 1.x Receiver, the Sender SHOULD use `Content-Type: application/x-protobuf` for backward compatibility. Otherwise, Sender SHOULD use `Content-Type: application/x-protobuf;proto=io.prometheus.write.v2.Request`. More Proto Messages might come in 2.x or beyond. * `User-Agent: ` -* `X-Prometheus-Remote-Write-Version: ` +* `X-Prometheus-Remote-Write-Version: ` - When talking to 1.x Receiver, the Sender MUST use `X-Prometheus-Remote-Write-Version: 0.1.0` for backward compatibility. Otherwise, Sender SHOULD use the newest remote write version it is compatible with e.g. `X-Prometheus-Remote-Write-Version: 2.0.0`. + When talking to 1.x Receiver, the Sender MUST use `X-Prometheus-Remote-Write-Version: 0.1.0` for backward compatibility. Otherwise, Sender SHOULD use the newest Remote-Write version it is compatible with e.g. `X-Prometheus-Remote-Write-Version: 2.0.0`. Sender MAY allow users to add custom HTTP headers; they MUST NOT allow users to configure them in such a way as to send reserved headers. @@ -86,7 +101,10 @@ The following subsections specify Sender and Receiver semantics around write err #### Partial Write -Sender SHOULD use Prometheus Remote Write to send samples for multiple series in a single request. As a result, Receiver MAY ingest valid samples within a write request that contains invalid or otherwise unwritten samples, which represents a partial write case. + +Sender SHOULD use Remote-Write to send samples for multiple series in a single request. As a result, Receiver MAY ingest valid samples within a write request that contains invalid or otherwise unwritten samples, which represents a partial write case. In a partial write case, Receiver MUST NOT return HTTP 200 status code. Receiver MUST provide a human-readable error message in the response body. The Receiver's error SHOULD contain information about the amount of the samples being rejected and for what reasons. @@ -102,7 +120,7 @@ Sender SHOULD expect [400 HTTP Bad Request](https://www.rfc-editor.org/rfc/rfc91 Receiver MAY NOT support certain metric types or samples (e.g. Receiver might reject sample without metadata type specified or without created timestamp, while another Receiver might accept such sample.). It’s up to the Receiver what sample is invalid. Receiver MUST return a [400 HTTP Bad Request](https://www.rfc-editor.org/rfc/rfc9110.html#name-400-bad-request) status code for write requests that contain any invalid samples, unless the [partial retryable write](#retries-on-partial-writes) occurs. -Sender MUST NOT retry on 4xx HTTP (other than [429](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/429)) status codes, which MUST be used by Receiver to indicate that the write will never be able to succeed and should not be retried. Sender MAY retry on 415 HTTP status code with a different content-type or encoding to see if Receiver supports it. +Sender MUST NOT retry on 4xx HTTP (other than [429](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/429)) status codes, which MUST be used by Receiver to indicate that the write operation will never be able to succeed and should not be retried. Sender MAY retry on 415 HTTP status code with a different content-type or encoding to see if Receiver supports it. ### Retries & Backoff @@ -120,25 +138,25 @@ Receiver MAY return a 5xx HTTP or 429 HTTP status code on partial write or [part The protocol follows [semantic versioning 2.0](https://semver.org/): any 2.x compatible Receiver MUST be able to read any 2.x compatible Sender and vice versa. Breaking or backwards incompatible changes will result in a 3.x version of the spec. -The proto formats itself are forward / backward compatible, in some respects: +The Proto Messages (in Wire Format) themselves are forward / backward compatible, in some respects: * Removing fields from the proto requirements mean a major version bump. * Adding (optional) fields will be a minor version bump. -In other words, this means that future minor versions of 2.x MAY add new optional fields to `io.prometheus.write.v2.Request`, new compressions, content types (wire formats) and negotiation mechanisms, as long as they are backward compatible (e.g. optional to both Receivers and Senders). +In other words, this means that future minor versions of 2.x MAY add new optional fields to `io.prometheus.write.v2.Request`, new compressions, Proto Messages and negotiation mechanisms, as long as they are backward compatible (e.g. optional to both Receiver and Sender). -### 2.x vs 1.x Compatibility +#### 2.x vs 1.x Compatibility -The 2.x protocol is breaking compatibility with 1.x by introducing a new `io.prometheus.write.v2.Request` content type (wire format) and deprecating the `prometheus.WriteRequest`. +The 2.x protocol is breaking compatibility with 1.x by introducing a new, mandatory `io.prometheus.write.v2.Request` Proto Message and deprecating the `prometheus.WriteRequest`. -2.x Senders MAY support 1.x Receivers by allowing users to configure what content type sender should use. 2.x Senders also MAY automatically fall back to different content types, if the Receiver returns 415 HTTP status code. +2.x Sender MAY support 1.x Receiver by allowing users to configure what content type Sender should use. 2.x Sender also MAY automatically fall back to different content types, if the Receiver returns 415 HTTP status code. #### `io.prometheus.write.v2.Request` Proto Schema -The source of truth is [here](https://github.com/prometheus/prometheus/blob/remote-write-2.0/prompb/io/prometheus/write/v2/types.proto#L32). The `gogo` dependency and options CAN be ignored. They are not part of the specification as they don't impact the serialized format. +The source of truth is [here](https://github.com/prometheus/prometheus/blob/remote-write-2.0/prompb/io/prometheus/write/v2/types.proto#L32). The `gogo` dependency and options CAN be ignored ([will be removed eventually](https://github.com/prometheus/prometheus/issues/11908)). They are not part of the specification as they don't impact the serialized format. The simplified version of the new `io.prometheus.write.v2.Request` is presented below. @@ -262,7 +280,13 @@ All timestamps MUST be int64 counted as milliseconds since the Unix epoch. Sampl For every `TimeSeries` message: * Label references MUST be provided. + * At least one element in Samples or in Histograms MUST be provided. For series which (rarely) would mix float and histogram samples, a separate `TimeSeries` message MUST be used. + * Metadata fields SHOULD be provided. * Exemplars SHOULD be provided, if they exist for a series. * Created timestamp SHOULD be provided for metrics that follow counter semantics (e.g. counters and histograms). @@ -271,12 +295,18 @@ The following subsections define some schema elements in details. #### Symbols -The `io.prometheus.write.v2.Request` proto schema is designed to [intern all strings](https://en.wikipedia.org/wiki/String_interning) for the proven additional compression and memory efficiency gains on top of the standard compressions. + +The `io.prometheus.write.v2.Request` Proto Message is designed to [intern all strings](https://en.wikipedia.org/wiki/String_interning) for the proven additional compression and memory efficiency gains on top of the standard compressions. Symbols table MUST be provided and it MUST contain deduplicated strings used in series, exemplar labels and metadata strings. The first element of symbols table MUST be an empty string. References MUST point to the existing index in the Symbols string array. #### Series Labels + The complete set of labels MUST be sent with each Sample or Histogram sample. Additionally, the label set associated with samples: * SHOULD contain a `__name__` label. @@ -297,6 +327,9 @@ Receiver also MAY impose limits on the number and length of labels, but this is #### Samples and Histogram Samples + Sender MUST send samples (or histogram samples) for any given TimeSeries in timestamp order. Sender MAY send multiple requests for different series in parallel. Sender MUST send stale markers when a time series will no longer be appended to, for time series that were "scraped". @@ -337,9 +370,9 @@ The same as in [1.0](./remote_write_spec.md#out-of-scope). This section contains speculative plans that are not considered part of protocol specification yet, but are mentioned here for completeness. Note that 2.0 specification completed [2 of 3 future plans in the 1.0](./remote_write_spec.md#future-plans). -* **Transactionality** There is still no transactionality defined for 2.0 specification, mostly because it makes scalable Sender implementation difficult. Prometheus Sender aims at being "transactional" - i.e. to never expose a partially scraped target to a query. We intend to do the same with remote write -- for instance, in the future we would like to "align" remote write with scrapes, perhaps such that all the samples, metadata and exemplars for a single scrape are sent in a single remote write request. +* **Transactionality** There is still no transactionality defined for 2.0 specification, mostly because it makes scalable Sender implementation difficult. Prometheus Sender aims at being "transactional" - i.e. to never expose a partially scraped target to a query. We intend to do the same with Remote-Write -- for instance, in the future we would like to "align" Remote-Write with scrapes, perhaps such that all the samples, metadata and exemplars for a single scrape are sent in a single Remote-Write request. - However, Remote Write 2.0 specification solves an important transactionality problem for [the classic histogram buckets](https://docs.google.com/document/d/1mpcSWH1B82q-BtJza-eJ8xMLlKt6EJ9oFGH325vtY1Q/edit#heading=h.ueg7q07wymku). This is done thanks to the native histograms supporting custom bucket-ing possible with the `io.prometheus.write.v2.Request` wire format. Sender might translate all classic histograms to native histograms this way, but it's out of this specification to mandate this. However, for this reason Receiver MAY ignore certain metric types (e.g. classic histograms). + However, Remote-Write 2.0 specification solves an important transactionality problem for [the classic histogram buckets](https://docs.google.com/document/d/1mpcSWH1B82q-BtJza-eJ8xMLlKt6EJ9oFGH325vtY1Q/edit#heading=h.ueg7q07wymku). This is done thanks to the native histograms supporting custom bucket-ing possible with the `io.prometheus.write.v2.Request` wire format. Sender might translate all classic histograms to native histograms this way, but it's out of this specification to mandate this. However, for this reason Receiver MAY ignore certain metric types (e.g. classic histograms). * **Alternative wire formats**. The OpenTelemetry community has shown the validity of Apache Arrow (and potentially other columnar formats) for over the wire data transfer with their OTLP protocol. We would like to do experiments to confirm the compatibility of a similar format with Prometheus’ data model, and include benchmarks of any resource usage changes. We would potentially maintain both a protobuf and columnar format long term for compatibility reasons and use our content negotiation to add different proto message for this purpose. @@ -359,9 +392,9 @@ If you use persistent HTTP/1.1 connections, they are pretty close to streaming. The in-order constraint comes from the encoding we use for time series data in Prometheus, the implementation of which is append only. It is possible to remove this constraint, for instance by buffering samples and reordering them before encoding. **How can we parallelise requests with the in-order constraint?** -Samples must be in-order _for a given series_. Remote write requests can be sent in parallel as long as they are for different series. In Prometheus, we shard the samples by their labels into separate queues, and then writes happen sequentially in each queue. This guarantees samples for the same series are delivered in order, but samples for different series are sent in parallel - and potentially "out of order" between different series. +Samples must be in-order _for a given series_. Remote-Write requests can be sent in parallel as long as they are for different series. In Prometheus, we shard the samples by their labels into separate queues, and then writes happen sequentially in each queue. This guarantees samples for the same series are delivered in order, but samples for different series are sent in parallel - and potentially "out of order" between different series. -**What are the differences between Remote Write 2.0 and OpenTelemetry's OTLP protocol?** +**What are the differences between Remote-Write 2.0 and OpenTelemetry's OTLP protocol?** [OpenTelemetry OTLP](https://github.com/open-telemetry/opentelemetry-proto/blob/a05597bff803d3d9405fcdd1e1fb1f42bed4eb7a/docs/specification.md) is a protocol for transporting of telemetry data (such as metrics, logs, traces and profiles) between telemetry sources, intermediate nodes and telemetry backends. The recommended transport involves gRPC with protobuf, but HTTP with protobuf or JSON are also described. It was designed from scratch with the intent to support variety of different observability signals, data types and extra information. For [metrics](https://github.com/open-telemetry/opentelemetry-proto/blob/main/opentelemetry/proto/metrics/v1/metrics.proto) that means additional non-identifying labels, flags, temporal aggregations types, resource or scoped metrics, schema URLs and more. OTLP also requires [the semantic convention](https://opentelemetry.io/docs/concepts/semantic-conventions/) to be used. -Remote Write was designed for simplicity, efficiency and organic growth. First version was officially released in 2023, when already [dozens of battle-tested adopters in the CNCF ecosystem](./remote_write_spec.md#compatible-senders-and-receivers) were using it for years. Remote Write 2.0 iterates on the previous protocol by adding a few new elements (metadata, exemplars, created timestamp and native histograms) and string interning. Remote Write 2.0 is always stateless, focuses only on metrics and is opinionated -- it is scoped down to elements that by Prometheus community, is all you need to have robust metric solution. We believe Remote Write 2.0 proposes an export transport, for metrics, that is a magnitude simpler to adopt and use, and often more efficient than competitors. +Remote-Write was designed for simplicity, efficiency and organic growth. First version was officially released in 2023, when already [dozens of battle-tested adopters in the CNCF ecosystem](./remote_write_spec.md#compatible-senders-and-receivers) were using it for years. Remote-Write 2.0 iterates on the previous protocol by adding a few new elements (metadata, exemplars, created timestamp and native histograms) and string interning. Remote-Write 2.0 is always stateless, focuses only on metrics and is opinionated -- it is scoped down to elements that by Prometheus community, is all you need to have robust metric solution. We believe Remote-Write 2.0 proposes an export transport, for metrics, that is a magnitude simpler to adopt and use, and often more efficient than competitors. From 69885ef24bed0cb49ec22c63825513ed29412f12 Mon Sep 17 00:00:00 2001 From: bwplotka Date: Wed, 5 Jun 2024 10:10:51 +0100 Subject: [PATCH 05/31] Changed stale marker section, added more info on CT. Signed-off-by: bwplotka --- .../docs/concepts/remote_write_spec_2_0.md | 32 +++++++++++++------ 1 file changed, 23 insertions(+), 9 deletions(-) diff --git a/content/docs/concepts/remote_write_spec_2_0.md b/content/docs/concepts/remote_write_spec_2_0.md index e8952eb5a..6ca5b8084 100644 --- a/content/docs/concepts/remote_write_spec_2_0.md +++ b/content/docs/concepts/remote_write_spec_2_0.md @@ -204,7 +204,11 @@ message TimeSeries { // created_timestamp represents an optional created timestamp associated with // this series' samples in ms format, typically for counter or histogram type - // metrics. Note that some receivers might require this and in return fail to + // metrics. Created timestamp represents the time when the counter started + // counting (sometimes referred to as start timestamp), which can increase + // the accuracy of query results. + // + // Note that some receivers might require this and in return fail to // ingest such samples within the Request. // // For Go, see github.com/prometheus/prometheus/model/timestamp/timestamp.go @@ -287,9 +291,9 @@ Rationales: https://github.com/prometheus/proposals/blob/alexg/remote-write-20-p -* Metadata fields SHOULD be provided. +* Metadata fields SHOULD be provided. Receiver MAY reject series with unspecified type. * Exemplars SHOULD be provided, if they exist for a series. -* Created timestamp SHOULD be provided for metrics that follow counter semantics (e.g. counters and histograms). +* Created timestamp SHOULD be provided for metrics that follow counter semantics (e.g. counters and histograms). Receiver MAY reject those series without created timestamp being set. The following subsections define some schema elements in details. @@ -300,7 +304,7 @@ Rationales: https://github.com/prometheus/proposals/blob/alexg/remote-write-20-p --> The `io.prometheus.write.v2.Request` Proto Message is designed to [intern all strings](https://en.wikipedia.org/wiki/String_interning) for the proven additional compression and memory efficiency gains on top of the standard compressions. -Symbols table MUST be provided and it MUST contain deduplicated strings used in series, exemplar labels and metadata strings. The first element of symbols table MUST be an empty string. References MUST point to the existing index in the Symbols string array. +Symbols table MUST be provided and it MUST contain deduplicated strings used in series, exemplar labels and metadata strings. The first element of the symbols table MUST be an empty string. References MUST point to the existing index in the Symbols string array. #### Series Labels @@ -332,16 +336,26 @@ Rationales: https://github.com/prometheus/proposals/blob/alexg/remote-write-20-p --> Sender MUST send samples (or histogram samples) for any given TimeSeries in timestamp order. Sender MAY send multiple requests for different series in parallel. -Sender MUST send stale markers when a time series will no longer be appended to, for time series that were "scraped". + +Sender SHOULD send stale markers when a time series will no longer be appended to. +Sender MUST send stale markers if the discontinuation of time series is possible to detect, for example: + +* For series that were pulled (scraped), unless explicit timestamp was used. +* For series that is resulted by a recording rule evaluation. + +Generally, not sending stale markers for series that are discontinued can lead to Receiver [non-trivial query time alignment issues](https://prometheus.io/docs/prometheus/latest/querying/basics/#staleness). Stale markers MUST be signalled by the special NaN value `0x7ff0000000000002`. This value MUST NOT be used otherwise. Typically, Sender can detect when a time series will no longer be appended to using the following techniques: -1. Detecting, using service discovery, that the target exposing the series has gone away -1. Noticing the target is no longer exposing the time series between successive scrapes -1. Failing to scrape the target that originally exposed a time series -1. Tracking configuration and evaluation for recording and alerting rules +1. Detecting, using service discovery, that the target exposing the series has gone away. +1. Noticing the target is no longer exposing the time series between successive scrapes. +1. Failing to scrape the target that originally exposed a time series. +1. Tracking configuration and evaluation for recording and alerting rules. +1. Tracking discontinuation of other source of metrics (e.g. in k6 when benchmark has finished for series per benchmark, it could emit stale marker). #### Metadata From d6fc3be42b011778a407f0301f15fe9bcbf5dee6 Mon Sep 17 00:00:00 2001 From: bwplotka Date: Wed, 5 Jun 2024 11:33:35 +0100 Subject: [PATCH 06/31] Exemplars MAY skip labels given https://github.com/prometheus/prometheus/issues/14208 Signed-off-by: bwplotka --- content/docs/concepts/remote_write_spec_2_0.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/content/docs/concepts/remote_write_spec_2_0.md b/content/docs/concepts/remote_write_spec_2_0.md index 6ca5b8084..716ba0706 100644 --- a/content/docs/concepts/remote_write_spec_2_0.md +++ b/content/docs/concepts/remote_write_spec_2_0.md @@ -372,8 +372,11 @@ Metadata MAY follow the official OpenMetrics guidelines for: Each exemplar, if attached to a `TimeSeries`: -* MUST contain at least one label set, so two references to a symbols table. * MUST contain value. + +* MAY contain labels e.g. referencing trace or request ID. If the exemplar references a trace it SHOULD use `trace_id` label name, as a best practice. * MAY contain timestamp. ## Out of Scope From 62e201973c2c5639dbfd701c32f0f1136c65d407 Mon Sep 17 00:00:00 2001 From: bwplotka Date: Wed, 5 Jun 2024 11:52:11 +0100 Subject: [PATCH 07/31] Grammar fixes thx to Grammarly. Signed-off-by: bwplotka --- .../docs/concepts/remote_write_spec_2_0.md | 88 +++++++++---------- 1 file changed, 44 insertions(+), 44 deletions(-) diff --git a/content/docs/concepts/remote_write_spec_2_0.md b/content/docs/concepts/remote_write_spec_2_0.md index 716ba0706..1aeb75a1f 100644 --- a/content/docs/concepts/remote_write_spec_2_0.md +++ b/content/docs/concepts/remote_write_spec_2_0.md @@ -9,9 +9,9 @@ sort_rank: 4 * Status: Proposed * Date: May 2024 -The remote write specification, in general, is intended to document the standard for how Prometheus and Prometheus Remote-Write-compatible agents send data to a Prometheus or Prometheus Remote-Write compatible receivers. +The Remote-Write specification, in general, is intended to document the standard for how Prometheus and Prometheus Remote-Write compatible senders send data to Prometheus or Prometheus Remote-Write compatible receivers. -This document is intended to define a second version of the [Prometheus Remote-Write](./remote_write_spec.md) API with minor changes to protocol and semantics. This second version adds a new Proto Message with new features enabling more use cases and wider adoption on top of performance and cost savings. Second version also deprecates the previous Proto Message from a [1.0 Remote-Write specification](./remote_write_spec.md#protocol). Finally, this spec outlines how to implement backward compatible senders and receivers (even under a single endpoint) using existing basic content negotiation request headers. More advanced, automatic content negotiation mechanisms might come in a future minor version, if needed. For the rationales behind the 2.0 specification, see [the formal proposal](https://github.com/prometheus/proposals/pull/35). +This document is intended to define a second version of the [Prometheus Remote-Write](./remote_write_spec.md) API with minor changes to protocol and semantics. This second version adds a new Proto Message with new features enabling more use cases and wider adoption on top of performance and cost savings. The second version also deprecates the previous Proto Message from a [1.0 Remote-Write specification](./remote_write_spec.md#protocol). Finally, this spec outlines how to implement backwards-compatible senders and receivers (even under a single endpoint) using existing basic content negotiation request headers. More advanced, automatic content negotiation mechanisms might come in a future minor version, if needed. For the rationales behind the 2.0 specification, see [the formal proposal](https://github.com/prometheus/proposals/pull/35). The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC 2119](https://datatracker.ietf.org/doc/html/rfc2119). @@ -32,12 +32,12 @@ A test suite can be found at https://github.com/prometheus/compliance/tree/main/ ### Glossary -For the purposes of this document the following definitions MUST be followed: +In this document, the following definitions are followed: * a "Remote-Write" is the name of this Prometheus protocol. -* a "Protocol" is a communication specification that enables client and server to transfer metrics. This includes content type definitions, but also compressions, negotiation, retry mechanisms and so on. +* a "Protocol" is a communication specification that enables the client and server to transfer metrics. This includes content type definitions, but also compressions, negotiations, retry mechanisms and so on. * a "Proto Message" refers to the [content type](https://www.rfc-editor.org/rfc/rfc9110.html#name-content-type) definition of the data structure Remote-Write is specifying for this Protocol. Since this specification uses [Google Protocol Buffers ("protobuf")](https://protobuf.dev/) exclusively, the schema is defined in a ["proto" file](https://protobuf.dev/programming-guides/proto3/) and represented by a single Protobuf ["message"](https://protobuf.dev/programming-guides/proto3/#simple). -* a "Wire Format" is the format of the data as it travels on the wire (i.e. in a network). In case of Remote-Write this is always the compressed binary protobuf format. +* a "Wire Format" is the format of the data as it travels on the wire (i.e. in a network). In the case of Remote-Write, this is always the compressed binary protobuf format. * a "Sender" is something that sends Remote-Write data. * a "Receiver" is something that receives Remote-Write data. * a "Sample" is a pair of (timestamp, value). @@ -54,9 +54,9 @@ The Remote-Write Protocol MUST consist of RPCs with the request body serialized -The protobuf serialization MUST use either of the following Proto Messages: -* [`prometheus.WriteRequest`](./remote_write_spec.md#protocol) introduced in the Remote-Write 1.0 specification. As of 2.0 the `prometheus.WriteRequest` message is deprecated. It SHOULD be used only for compatibility reasons. Sender and Receiver MAY NOT support `prometheus.WriteRequest`. -* `io.prometheus.write.v2.Request` introduced in this specification and defined [below](#ioprometheuswritev2request-proto-schema). Senders and Receivers SHOULD use `io.prometheus.write.v2.Request` when possible. Sender and Receiver MUST support `io.prometheus.write.v2.Request`. +The protobuf serialization MUST use either of the following Proto Messages: +* [`prometheus.WriteRequest`](./remote_write_spec.md#protocol) introduced in the Remote-Write 1.0 specification. As of 2.0, the `prometheus.WriteRequest` message is deprecated. It SHOULD be used only for compatibility reasons. Sender and Receiver MAY NOT support `prometheus.WriteRequest`. +* `io.prometheus.write.v2.Request` introduced in this specification and defined [below](#ioprometheuswritev2request-proto-schema). Sender and Receiver SHOULD use `io.prometheus.write.v2.Request` when possible. Sender and Receiver MUST support `io.prometheus.write.v2.Request`. The Proto Message MUST use binary Wire Format. Then, MUST be compressed with [Google’s Snappy](https://github.com/google/snappy). The block format MUST be used -- the framed format MUST NOT be used. @@ -95,7 +95,7 @@ Sender MAY allow users to add custom HTTP headers; they MUST NOT allow users to ### Response -Receiver ingesting all samples successfully MUST return HTTP 200 status code. In such a successful case, the response body from the Receiver SHOULD be empty; Sender MUST ignore the response body. The response body is RESERVED for future use. +Receiver ingesting all samples successfully MUST return a 200 HTTP status code. In such a successful case, the response body from the Receiver SHOULD be empty; Sender MUST ignore the response body. The response body is RESERVED for future use. The following subsections specify Sender and Receiver semantics around write errors. @@ -106,33 +106,33 @@ Rationales: https://github.com/prometheus/proposals/blob/alexg/remote-write-20-p --> Sender SHOULD use Remote-Write to send samples for multiple series in a single request. As a result, Receiver MAY ingest valid samples within a write request that contains invalid or otherwise unwritten samples, which represents a partial write case. -In a partial write case, Receiver MUST NOT return HTTP 200 status code. Receiver MUST provide a human-readable error message in the response body. The Receiver's error SHOULD contain information about the amount of the samples being rejected and for what reasons. +In a partial write case, Receiver MUST NOT return a 200 HTTP status code. Receiver MUST provide a human-readable error message in the response body. The Receiver's error SHOULD contain information about the amount of the samples being rejected and for what reasons. -Sender MUST NOT try and interpret the error message, and SHOULD log it as is. +Sender MUST NOT try and interpret the error message and SHOULD log it as is. #### Unsupported Request Content -Receiver MUST return [415 HTTP Unsupported Media Type](https://www.rfc-editor.org/rfc/rfc9110.html#name-415-unsupported-media-type) status code if they don't support a given content type or encoding provided by the Sender. +Receiver MUST return [415 HTTP Unsupported Media Type](https://www.rfc-editor.org/rfc/rfc9110.html#name-415-unsupported-media-type) status code if they don't support a given content type or encoding provided by Sender. Sender SHOULD expect [400 HTTP Bad Request](https://www.rfc-editor.org/rfc/rfc9110.html#name-400-bad-request) for the above reasons from the 1.x Receiver, for backward compatibility. #### Invalid Samples -Receiver MAY NOT support certain metric types or samples (e.g. Receiver might reject sample without metadata type specified or without created timestamp, while another Receiver might accept such sample.). It’s up to the Receiver what sample is invalid. Receiver MUST return a [400 HTTP Bad Request](https://www.rfc-editor.org/rfc/rfc9110.html#name-400-bad-request) status code for write requests that contain any invalid samples, unless the [partial retryable write](#retries-on-partial-writes) occurs. +Receiver MAY NOT support certain metric types or samples (e.g. Receiver might reject sample without metadata type specified or without created timestamp, while another Receiver might accept such sample.). It’s up to the Receiver what sample is invalid. Receiver MUST return a [400 HTTP Bad Request](https://www.rfc-editor.org/rfc/rfc9110.html#name-400-bad-request) status code for write requests that contain any invalid samples unless the [partial retriable write](#retries-on-partial-writes) occurs. -Sender MUST NOT retry on 4xx HTTP (other than [429](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/429)) status codes, which MUST be used by Receiver to indicate that the write operation will never be able to succeed and should not be retried. Sender MAY retry on 415 HTTP status code with a different content-type or encoding to see if Receiver supports it. +Sender MUST NOT retry on a 4xx HTTP status codes (other than [429](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/429)), which MUST be used by Receiver to indicate that the write operation will never be able to succeed and should not be retried. Sender MAY retry on the 415 HTTP status code with a different content type or encoding to see if Receiver supports it. ### Retries & Backoff Receiver MAY return a [429 HTTP Too Many Requests](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/429) status code to indicate the overloaded server situation. Receiver MAY return [the Retry-After](https://www.rfc-editor.org/rfc/rfc9110.html#name-retry-after) header to indicate the time for the next write attempt. Receiver MAY return a 5xx HTTP status code to represent internal server errors, that should be retried. -Sender MAY retry on 429 HTTP status code. Sender MUST retry write requests on 5xx HTTP. Sender MUST use a backoff algorithm to prevent overwhelming the server. Sender MAY handle [the Retry-After response header](https://www.rfc-editor.org/rfc/rfc9110.html#name-retry-after) to estimate the next retry time. +Sender MAY retry on a 429 HTTP status code. Sender MUST retry write requests on 5xx HTTP. Sender MUST use a backoff algorithm to prevent overwhelming the server. Sender MAY handle [the Retry-After response header](https://www.rfc-editor.org/rfc/rfc9110.html#name-retry-after) to estimate the next retry time. The difference between 429 vs 5xx handling is due to a potential situation for the Sender “falling behind” if the Receiver cannot keep up. As a result, the ability to NOT retry on 429 allows progress is made when there are Sender side errors (e.g. too much traffic), while the data is not lost when there are Receiver side errors. ### Retries on Partial Writes -Receiver MAY return a 5xx HTTP or 429 HTTP status code on partial write or [partial invalid sample cases](#partial-write), when it expects Sender to retry the whole request. In that case Receiver MUST support idempotency as sender MAY retry with the same request. +Receiver MAY return a 5xx HTTP or 429 HTTP status code on partial write or [partial invalid sample cases](#partial-write) when it expects Sender to retry the whole request. In that case Receiver MUST support idempotency as sender MAY retry with the same request. ### Backward and Forward Compatibility @@ -140,10 +140,10 @@ The protocol follows [semantic versioning 2.0](https://semver.org/): any 2.x com The Proto Messages (in Wire Format) themselves are forward / backward compatible, in some respects: -* Removing fields from the proto requirements mean a major version bump. -* Adding (optional) fields will be a minor version bump. +* Removing fields from the proto message requires a major version bump. +* Adding (optional) fields can be done in a minor version bump. -In other words, this means that future minor versions of 2.x MAY add new optional fields to `io.prometheus.write.v2.Request`, new compressions, Proto Messages and negotiation mechanisms, as long as they are backward compatible (e.g. optional to both Receiver and Sender). +In other words, this means that future minor versions of 2.x MAY add new optional fields to `io.prometheus.write.v2.Request`, new compressions, Proto Messages and negotiation mechanisms, as long as they are backwards compatible (e.g. optional to both Receiver and Sender). #### 2.x vs 1.x Compatibility @@ -164,7 +164,7 @@ The simplified version of the new `io.prometheus.write.v2.Request` is presented // Request represents a request to write the given timeseries to a remote destination. message Request { // symbols contains a de-duplicated array of string elements used for various - // items in a Request message, like labels and metadata items. For the sender convenience + // items in a Request message, like labels and metadata items. For the sender's convenience // around empty values for optional fields like unit_ref, symbols array MUST start with // empty string. // @@ -189,7 +189,7 @@ message TimeSeries { repeated uint32 labels_refs = 1; // Timeseries messages can either specify samples or (native) histogram samples - // (histogram field), but not both. For typical sender (real-time metric + // (histogram field), but not both. For a typical sender (real-time metric // streaming), in healthy cases, there will be only one sample or histogram. // // Samples and histograms are sorted by timestamp (older first). @@ -221,7 +221,7 @@ message TimeSeries { int64 created_timestamp = 6; } -// Exemplar is an additional information attached to some series' samples. +// Exemplar represents additional information attached to some series' samples. message Exemplar { // labels_refs is a list of label name-value pair references, encoded // as indices to the Request.symbols array. This list's len is always @@ -263,11 +263,11 @@ message Metadata { } MetricType type = 1; // help_ref is a reference to the Request.symbols array representing help - // text for the metric. Help is optional, reference should point to empty string in + // text for the metric. Help is optional, reference should point to an empty string in // such a case. uint32 help_ref = 3; - // unit_ref is a reference to the Request.symbols array representing unit - // for the metric. Unit is optional, reference should point to empty string in + // unit_ref is a reference to the Request.symbols array representing a unit + // for the metric. Unit is optional, reference should point to an empty string in // such a case. uint32 unit_ref = 4; } @@ -292,10 +292,10 @@ Rationales: https://github.com/prometheus/proposals/blob/alexg/remote-write-20-p Rationales: https://github.com/prometheus/proposals/blob/alexg/remote-write-20-proposal/proposals/2024-04-09_remote-write-20.md#always-on-metadata --> * Metadata fields SHOULD be provided. Receiver MAY reject series with unspecified type. -* Exemplars SHOULD be provided, if they exist for a series. -* Created timestamp SHOULD be provided for metrics that follow counter semantics (e.g. counters and histograms). Receiver MAY reject those series without created timestamp being set. +* Exemplars SHOULD be provided if they exist for a series. +* Created timestamp SHOULD be provided for metrics that follow counter semantics (e.g. counters and histograms). Receiver MAY reject those series without the created timestamp being set. -The following subsections define some schema elements in details. +The following subsections define some schema elements in detail. #### Symbols @@ -323,11 +323,11 @@ Metric names, label names, and label values MUST be any sequence of UTF-8 charac Metric names SHOULD adhere to the regex `[a-zA-Z_:]([a-zA-Z0-9_:])*`. Label names SHOULD adhere to the regex `[a-zA-Z_]([a-zA-Z0-9_])*`. -Names that does not adhere to the above, might be harder to use for PromQL users (see [the UTF-8 proposal for more details](https://github.com/prometheus/proposals/blob/main/proposals/2023-08-21-utf8.md)). +Names that do not adhere to the above, might be harder to use for PromQL users (see [the UTF-8 proposal for more details](https://github.com/prometheus/proposals/blob/main/proposals/2023-08-21-utf8.md)). Label names beginning with "__" are RESERVED for system usage and SHOULD NOT be used, see [Prometheus Data Model](https://prometheus.io/docs/concepts/data_model/). -Receiver also MAY impose limits on the number and length of labels, but this is receiver-specific and is out of scope for this document. +Receiver also MAY impose limits on the number and length of labels, but this is receiver-specific and is out of the scope for this document. #### Samples and Histogram Samples @@ -349,13 +349,13 @@ Generally, not sending stale markers for series that are discontinued can lead t Stale markers MUST be signalled by the special NaN value `0x7ff0000000000002`. This value MUST NOT be used otherwise. -Typically, Sender can detect when a time series will no longer be appended to using the following techniques: +Typically, Sender can detect when a time series will no longer be appended using the following techniques: 1. Detecting, using service discovery, that the target exposing the series has gone away. 1. Noticing the target is no longer exposing the time series between successive scrapes. 1. Failing to scrape the target that originally exposed a time series. 1. Tracking configuration and evaluation for recording and alerting rules. -1. Tracking discontinuation of other source of metrics (e.g. in k6 when benchmark has finished for series per benchmark, it could emit stale marker). +1. Tracking discontinuation of metrics for non-scrape source of metric (e.g. in k6 when the benchmark has finished for series per benchmark, it could emit a stale marker). #### Metadata @@ -372,12 +372,12 @@ Metadata MAY follow the official OpenMetrics guidelines for: Each exemplar, if attached to a `TimeSeries`: -* MUST contain value. +* MUST contain a value. -* MAY contain labels e.g. referencing trace or request ID. If the exemplar references a trace it SHOULD use `trace_id` label name, as a best practice. -* MAY contain timestamp. +* MAY contain labels e.g. referencing trace or request ID. If the exemplar references a trace it SHOULD use the `trace_id` label name, as a best practice. +* MAY contain a timestamp. ## Out of Scope @@ -385,33 +385,33 @@ The same as in [1.0](./remote_write_spec.md#out-of-scope). ## Future Plans -This section contains speculative plans that are not considered part of protocol specification yet, but are mentioned here for completeness. Note that 2.0 specification completed [2 of 3 future plans in the 1.0](./remote_write_spec.md#future-plans). +This section contains speculative plans that are not considered part of protocol specification yet but are mentioned here for completeness. Note that 2.0 specification completed [2 of 3 future plans in the 1.0](./remote_write_spec.md#future-plans). * **Transactionality** There is still no transactionality defined for 2.0 specification, mostly because it makes scalable Sender implementation difficult. Prometheus Sender aims at being "transactional" - i.e. to never expose a partially scraped target to a query. We intend to do the same with Remote-Write -- for instance, in the future we would like to "align" Remote-Write with scrapes, perhaps such that all the samples, metadata and exemplars for a single scrape are sent in a single Remote-Write request. - However, Remote-Write 2.0 specification solves an important transactionality problem for [the classic histogram buckets](https://docs.google.com/document/d/1mpcSWH1B82q-BtJza-eJ8xMLlKt6EJ9oFGH325vtY1Q/edit#heading=h.ueg7q07wymku). This is done thanks to the native histograms supporting custom bucket-ing possible with the `io.prometheus.write.v2.Request` wire format. Sender might translate all classic histograms to native histograms this way, but it's out of this specification to mandate this. However, for this reason Receiver MAY ignore certain metric types (e.g. classic histograms). + However, Remote-Write 2.0 specification solves an important transactionality problem for [the classic histogram buckets](https://docs.google.com/document/d/1mpcSWH1B82q-BtJza-eJ8xMLlKt6EJ9oFGH325vtY1Q/edit#heading=h.ueg7q07wymku). This is done thanks to the native histograms supporting custom bucket-ing possible with the `io.prometheus.write.v2.Request` wire format. Sender might translate all classic histograms to native histograms this way, but it's out of this specification to mandate this. However, for this reason, Receiver MAY ignore certain metric types (e.g. classic histograms). -* **Alternative wire formats**. The OpenTelemetry community has shown the validity of Apache Arrow (and potentially other columnar formats) for over the wire data transfer with their OTLP protocol. We would like to do experiments to confirm the compatibility of a similar format with Prometheus’ data model, and include benchmarks of any resource usage changes. We would potentially maintain both a protobuf and columnar format long term for compatibility reasons and use our content negotiation to add different proto message for this purpose. +* **Alternative wire formats**. The OpenTelemetry community has shown the validity of Apache Arrow (and potentially other columnar formats) for over-wire data transfer with their OTLP protocol. We would like to do experiments to confirm the compatibility of a similar format with Prometheus’ data model and include benchmarks of any resource usage changes. We would potentially maintain both a protobuf and columnar format long term for compatibility reasons and use our content negotiation to add different proto messages for this purpose. -* **Global symbols**. Pre-defined string dictionary for interning The protocol could pre-define a static dictionary of ref->symbol that includes strings that are considered common, e.g. “namespace”, “le”, “job”, “seconds”, “bytes”, etc. Sender and refer to these without the need to include them in the request’s symbols table. This dictionary could incrementally grow with minor versions releases of the protocol. +* **Global symbols**. Pre-defined string dictionary for interning The protocol could pre-define a static dictionary of ref->symbol that includes strings that are considered common, e.g. “namespace”, “le”, “job”, “seconds”, “bytes”, etc. Sender and refer to these without the need to include them in the request’s symbols table. This dictionary could incrementally grow with a minor version releases of this protocol. ## Related ### FAQ **Why did you not use gRPC?** -Because 1.0 protocol is not using gRPC, breaking it would increase friction in the adoption. See 1.0 [reason](./remote_write_spec.md#faq). +Because 1.0 protocol does not use gRPC, breaking it would increase friction in the adoption. See 1.0 [reason](./remote_write_spec.md#faq). -**Why not streaming protobuf messages?** +**Why not stream protobuf messages?** If you use persistent HTTP/1.1 connections, they are pretty close to streaming. Of course headers have to be re-sent, but yes that is less expensive than a new TCP set up. **Why do we send samples in order?** -The in-order constraint comes from the encoding we use for time series data in Prometheus, the implementation of which is append only. It is possible to remove this constraint, for instance by buffering samples and reordering them before encoding. +The in-order constraint comes from the encoding we use for time series data in Prometheus, the implementation of which is append-only. It is possible to remove this constraint, for instance by buffering samples and reordering them before encoding. **How can we parallelise requests with the in-order constraint?** Samples must be in-order _for a given series_. Remote-Write requests can be sent in parallel as long as they are for different series. In Prometheus, we shard the samples by their labels into separate queues, and then writes happen sequentially in each queue. This guarantees samples for the same series are delivered in order, but samples for different series are sent in parallel - and potentially "out of order" between different series. **What are the differences between Remote-Write 2.0 and OpenTelemetry's OTLP protocol?** -[OpenTelemetry OTLP](https://github.com/open-telemetry/opentelemetry-proto/blob/a05597bff803d3d9405fcdd1e1fb1f42bed4eb7a/docs/specification.md) is a protocol for transporting of telemetry data (such as metrics, logs, traces and profiles) between telemetry sources, intermediate nodes and telemetry backends. The recommended transport involves gRPC with protobuf, but HTTP with protobuf or JSON are also described. It was designed from scratch with the intent to support variety of different observability signals, data types and extra information. For [metrics](https://github.com/open-telemetry/opentelemetry-proto/blob/main/opentelemetry/proto/metrics/v1/metrics.proto) that means additional non-identifying labels, flags, temporal aggregations types, resource or scoped metrics, schema URLs and more. OTLP also requires [the semantic convention](https://opentelemetry.io/docs/concepts/semantic-conventions/) to be used. +[OpenTelemetry OTLP](https://github.com/open-telemetry/opentelemetry-proto/blob/a05597bff803d3d9405fcdd1e1fb1f42bed4eb7a/docs/specification.md) is a protocol for transporting of telemetry data (such as metrics, logs, traces and profiles) between telemetry sources, intermediate nodes and telemetry backends. The recommended transport involves gRPC with protobuf, but HTTP with protobuf or JSON are also described. It was designed from scratch with the intent to support a variety of different observability signals, data types and extra information. For [metrics](https://github.com/open-telemetry/opentelemetry-proto/blob/main/opentelemetry/proto/metrics/v1/metrics.proto) that means additional non-identifying labels, flags, temporal aggregations types, resource or scoped metrics, schema URLs and more. OTLP also requires [the semantic convention](https://opentelemetry.io/docs/concepts/semantic-conventions/) to be used. -Remote-Write was designed for simplicity, efficiency and organic growth. First version was officially released in 2023, when already [dozens of battle-tested adopters in the CNCF ecosystem](./remote_write_spec.md#compatible-senders-and-receivers) were using it for years. Remote-Write 2.0 iterates on the previous protocol by adding a few new elements (metadata, exemplars, created timestamp and native histograms) and string interning. Remote-Write 2.0 is always stateless, focuses only on metrics and is opinionated -- it is scoped down to elements that by Prometheus community, is all you need to have robust metric solution. We believe Remote-Write 2.0 proposes an export transport, for metrics, that is a magnitude simpler to adopt and use, and often more efficient than competitors. +Remote-Write was designed for simplicity, efficiency and organic growth. The first version was officially released in 2023, when already [dozens of battle-tested adopters in the CNCF ecosystem](./remote_write_spec.md#compatible-senders-and-receivers) had been using this protocol for years. Remote-Write 2.0 iterates on the previous protocol by adding a few new elements (metadata, exemplars, created timestamp and native histograms) and string interning. Remote-Write 2.0 is always stateless, focuses only on metrics and is opinionated -- it is scoped down to elements that by Prometheus community, are all you need to have robust metric solution. We believe Remote-Write 2.0 proposes an export transport, for metrics, that is a magnitude simpler to adopt and use, and often more efficient than competitors. From e50ee4d6558ccf908384b77322b951304dfd8690 Mon Sep 17 00:00:00 2001 From: bwplotka Date: Wed, 5 Jun 2024 17:45:34 +0100 Subject: [PATCH 08/31] Added Experimental status, moved to 2.0-rc.0 version. Signed-off-by: bwplotka --- content/docs/concepts/remote_write_spec_2_0.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/content/docs/concepts/remote_write_spec_2_0.md b/content/docs/concepts/remote_write_spec_2_0.md index 1aeb75a1f..0786a3cfd 100644 --- a/content/docs/concepts/remote_write_spec_2_0.md +++ b/content/docs/concepts/remote_write_spec_2_0.md @@ -1,14 +1,16 @@ --- -title: Prometheus Remote-Write Specification 2.0 +title: "[EXPERIMENTAL] Prometheus Remote-Write Specification 2.0" sort_rank: 4 --- # Prometheus Remote-Write Specification -* Version: 2.0 -* Status: Proposed +* Version: 2.0-rc.0 +* Status: **Experimental** * Date: May 2024 +> NOTE: This is a release candidate for Remote-Write 2.0 specification. This means that this specification is currently in an experimental state--no major changes are expected, but we reserve the rights to break the compatibility if it's absolutely necessary, based on the early adopters' feedback. The potential feedback, questions and suggestions should be added as comments to the [PR with the open proposal](https://github.com/prometheus/proposals/pull/35). + The Remote-Write specification, in general, is intended to document the standard for how Prometheus and Prometheus Remote-Write compatible senders send data to Prometheus or Prometheus Remote-Write compatible receivers. This document is intended to define a second version of the [Prometheus Remote-Write](./remote_write_spec.md) API with minor changes to protocol and semantics. This second version adds a new Proto Message with new features enabling more use cases and wider adoption on top of performance and cost savings. The second version also deprecates the previous Proto Message from a [1.0 Remote-Write specification](./remote_write_spec.md#protocol). Finally, this spec outlines how to implement backwards-compatible senders and receivers (even under a single endpoint) using existing basic content negotiation request headers. More advanced, automatic content negotiation mechanisms might come in a future minor version, if needed. For the rationales behind the 2.0 specification, see [the formal proposal](https://github.com/prometheus/proposals/pull/35). From bb765d5f0b8a3992c1a2179b04ecaa1188d8c98b Mon Sep 17 00:00:00 2001 From: bwplotka Date: Wed, 5 Jun 2024 17:57:29 +0100 Subject: [PATCH 09/31] Trying same but different format. Signed-off-by: bwplotka --- content/docs/concepts/remote_write_spec_2_0.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/docs/concepts/remote_write_spec_2_0.md b/content/docs/concepts/remote_write_spec_2_0.md index 0786a3cfd..9675cc171 100644 --- a/content/docs/concepts/remote_write_spec_2_0.md +++ b/content/docs/concepts/remote_write_spec_2_0.md @@ -1,5 +1,5 @@ --- -title: "[EXPERIMENTAL] Prometheus Remote-Write Specification 2.0" +title: "Prometheus Remote-Write Specification 2.0 [EXPERIMENTAL]" sort_rank: 4 --- @@ -9,7 +9,7 @@ sort_rank: 4 * Status: **Experimental** * Date: May 2024 -> NOTE: This is a release candidate for Remote-Write 2.0 specification. This means that this specification is currently in an experimental state--no major changes are expected, but we reserve the rights to break the compatibility if it's absolutely necessary, based on the early adopters' feedback. The potential feedback, questions and suggestions should be added as comments to the [PR with the open proposal](https://github.com/prometheus/proposals/pull/35). +*NOTE: This is a release candidate for Remote-Write 2.0 specification. This means that this specification is currently in an experimental state--no major changes are expected, but we reserve the rights to break the compatibility if it's absolutely necessary, based on the early adopters' feedback. The potential feedback, questions and suggestions should be added as comments to the [PR with the open proposal](https://github.com/prometheus/proposals/pull/35).* The Remote-Write specification, in general, is intended to document the standard for how Prometheus and Prometheus Remote-Write compatible senders send data to Prometheus or Prometheus Remote-Write compatible receivers. From 321d8cb93edba119361f9aeb15a1068abb2a3772 Mon Sep 17 00:00:00 2001 From: bwplotka Date: Wed, 5 Jun 2024 18:24:58 +0100 Subject: [PATCH 10/31] Fixed formatting issues. Signed-off-by: bwplotka --- .../docs/concepts/remote_write_spec_2_0.md | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/content/docs/concepts/remote_write_spec_2_0.md b/content/docs/concepts/remote_write_spec_2_0.md index 9675cc171..4534c0813 100644 --- a/content/docs/concepts/remote_write_spec_2_0.md +++ b/content/docs/concepts/remote_write_spec_2_0.md @@ -57,6 +57,7 @@ The Remote-Write Protocol MUST consist of RPCs with the request body serialized Rationales: https://github.com/prometheus/proposals/blob/alexg/remote-write-20-proposal/proposals/2024-04-09_remote-write-20.md#a-new-protobuf-message-identified-by-fully-qualified-name-old-one-deprecated --> The protobuf serialization MUST use either of the following Proto Messages: + * [`prometheus.WriteRequest`](./remote_write_spec.md#protocol) introduced in the Remote-Write 1.0 specification. As of 2.0, the `prometheus.WriteRequest` message is deprecated. It SHOULD be used only for compatibility reasons. Sender and Receiver MAY NOT support `prometheus.WriteRequest`. * `io.prometheus.write.v2.Request` introduced in this specification and defined [below](#ioprometheuswritev2request-proto-schema). Sender and Receiver SHOULD use `io.prometheus.write.v2.Request` when possible. Sender and Receiver MUST support `io.prometheus.write.v2.Request`. @@ -81,9 +82,12 @@ Rationales: https://github.com/prometheus/proposals/blob/alexg/remote-write-20-p Content type request header MUST follow [the RFC 9110](https://www.rfc-editor.org/rfc/rfc9110.html#name-content-type). Sender MUST use `application/x-protobuf` as the only media type. Sender MAY add `;proto=` parameter to the header's value to indicate the fully qualified name of the Proto Message that was used, from the two mentioned above. As a result, Sender MUST send any of the three supported header values: For the deprecated message introduced in PRW 1.0, identified by `prometheus.WriteRequest`: + * `Content-Type: application/x-protobuf` * `Content-Type: application/x-protobuf;proto=prometheus.WriteRequest` + For the message introduced in PRW 2.0, identified by `io.prometheus.write.v2.Request`: + * `Content-Type: application/x-protobuf;proto=io.prometheus.write.v2.Request` When talking to 1.x Receiver, the Sender SHOULD use `Content-Type: application/x-protobuf` for backward compatibility. Otherwise, Sender SHOULD use `Content-Type: application/x-protobuf;proto=io.prometheus.write.v2.Request`. More Proto Messages might come in 2.x or beyond. @@ -286,10 +290,12 @@ All timestamps MUST be int64 counted as milliseconds since the Unix epoch. Sampl For every `TimeSeries` message: * Label references MUST be provided. + * At least one element in Samples or in Histograms MUST be provided. For series which (rarely) would mix float and histogram samples, a separate `TimeSeries` message MUST be used. + @@ -322,7 +328,8 @@ The complete set of labels MUST be sent with each Sample or Histogram sample. Ad Metric names, label names, and label values MUST be any sequence of UTF-8 characters. -Metric names SHOULD adhere to the regex `[a-zA-Z_:]([a-zA-Z0-9_:])*`. +Metric names SHOULD adhere to the regex `[a-zA-Z_:]([a-zA-Z0-9_:])*`. + Label names SHOULD adhere to the regex `[a-zA-Z_]([a-zA-Z0-9_])*`. Names that do not adhere to the above, might be harder to use for PromQL users (see [the UTF-8 proposal for more details](https://github.com/prometheus/proposals/blob/main/proposals/2023-08-21-utf8.md)). @@ -361,20 +368,16 @@ Typically, Sender can detect when a time series will no longer be appended using #### Metadata -Metadata SHOULD follow the official Prometheus guidelines for: - -* [Type](https://prometheus.io/docs/instrumenting/writing_exporters/#types) -* [Help](https://prometheus.io/docs/instrumenting/writing_exporters/#help-strings). +Metadata SHOULD follow the official Prometheus guidelines for [Type](https://prometheus.io/docs/instrumenting/writing_exporters/#types) and [Help](https://prometheus.io/docs/instrumenting/writing_exporters/#help-strings). -Metadata MAY follow the official OpenMetrics guidelines for: - -* [Unit](https://github.com/OpenObservability/OpenMetrics/blob/main/specification/OpenMetrics.md#unit) +Metadata MAY follow the official OpenMetrics guidelines for [Unit](https://github.com/OpenObservability/OpenMetrics/blob/main/specification/OpenMetrics.md#unit). #### Exemplars Each exemplar, if attached to a `TimeSeries`: * MUST contain a value. + From b5a98132a566bf86267d4aab084276247b4f9985 Mon Sep 17 00:00:00 2001 From: bwplotka Date: Mon, 10 Jun 2024 07:58:53 +0100 Subject: [PATCH 11/31] Fixed typo. Signed-off-by: bwplotka --- content/docs/concepts/remote_write_spec_2_0.md | 1 - 1 file changed, 1 deletion(-) diff --git a/content/docs/concepts/remote_write_spec_2_0.md b/content/docs/concepts/remote_write_spec_2_0.md index 4534c0813..25629ffaf 100644 --- a/content/docs/concepts/remote_write_spec_2_0.md +++ b/content/docs/concepts/remote_write_spec_2_0.md @@ -178,7 +178,6 @@ message Request { // need to lookup the actual string by index from symbols array. The order of // strings is up to the sender. The receiver should not assume any particular encoding. repeated string symbols = 1; - repeated string symbols = 1; // timeseries represents an array of distinct series with 0 or more samples. repeated TimeSeries timeseries = 2; } From 291aafa2c9ee3d58b43ee2da24ba86eff19227e4 Mon Sep 17 00:00:00 2001 From: bwplotka Date: Mon, 10 Jun 2024 08:56:23 +0100 Subject: [PATCH 12/31] Proposed wording changes after Tom Wilkie's feedback. Signed-off-by: bwplotka --- content/docs/concepts/remote_write_spec_2_0.md | 2 +- content/docs/specs/remote_write_spec.md | 3 +-- 2 files changed, 2 insertions(+), 3 deletions(-) diff --git a/content/docs/concepts/remote_write_spec_2_0.md b/content/docs/concepts/remote_write_spec_2_0.md index 25629ffaf..4f269c360 100644 --- a/content/docs/concepts/remote_write_spec_2_0.md +++ b/content/docs/concepts/remote_write_spec_2_0.md @@ -26,7 +26,7 @@ The Remote-Write protocol is designed to make it possible to reliably propagate -The Remote-Write protocol is designed to make stateless implementations of the server possible; as such there are little-to-no inter-message references. As such the protocol is not considered "streaming." To achieve a streaming effect multiple messages should be sent over the same connection using e.g. HTTP/1.1 or HTTP/2. "Fancy" technologies such as gRPC were considered, but at the time were not widely adopted, and it was challenging to expose gRPC services to the internet behind load balancers such as an AWS EC2 ELB. +The Remote-Write protocol is designed to be stateless; there is strictly no inter-message communication. As such the protocol is not considered "streaming." To achieve a streaming effect multiple messages should be sent over the same connection using e.g. HTTP/1.1 or HTTP/2. "Fancy" technologies such as gRPC were considered, but at the time were not widely adopted, and it was challenging to expose gRPC services to the internet behind load balancers such as an AWS EC2 ELB. The Remote-Write protocol contains opportunities for batching, e.g. sending multiple samples for different series in a single request. It is not expected that multiple samples for the same series will be commonly sent in the same request, although there is support for this in the Proto Message. diff --git a/content/docs/specs/remote_write_spec.md b/content/docs/specs/remote_write_spec.md index aa2bad26f..d24aaab39 100644 --- a/content/docs/specs/remote_write_spec.md +++ b/content/docs/specs/remote_write_spec.md @@ -19,8 +19,7 @@ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "S The remote write protocol is designed to make it possible to reliably propagate samples in real-time from a sender to a receiver, without loss. -The remote write protocol is designed to make stateless implementations of the server possible; as such there are little-to-no inter-message references. As such the protocol is not considered "streaming." To achieve a streaming effect multiple messages should be sent over the same connection using e.g. HTTP/1.1 or HTTP/2. "Fancy" technologies such as gRPC were considered, but at the time were not widely adopted, and it was challenging to expose gRPC services to the internet behind load balancers such as an AWS EC2 ELB. - +The Remote-Write protocol is designed to be stateless; there is strictly no inter-message communication. As such the protocol is not considered "streaming. To achieve a streaming effect multiple messages should be sent over the same connection using e.g. HTTP/1.1 or HTTP/2. "Fancy" technologies such as gRPC were considered, but at the time were not widely adopted, and it was challenging to expose gRPC services to the internet behind load balancers such as an AWS EC2 ELB. The remote write protocol contains opportunities for batching, e.g. sending multiple samples for different series in a single request. It is not expected that multiple samples for the same series will be commonly sent in the same request, although there is support for this in the protocol. From 4db930cd747733465312230b1a21fb76afb4336d Mon Sep 17 00:00:00 2001 From: bwplotka Date: Mon, 10 Jun 2024 09:01:51 +0100 Subject: [PATCH 13/31] Adjusted the language in the comparision. Signed-off-by: bwplotka --- content/docs/concepts/remote_write_spec_2_0.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/docs/concepts/remote_write_spec_2_0.md b/content/docs/concepts/remote_write_spec_2_0.md index 4f269c360..56d94ba44 100644 --- a/content/docs/concepts/remote_write_spec_2_0.md +++ b/content/docs/concepts/remote_write_spec_2_0.md @@ -418,4 +418,4 @@ Samples must be in-order _for a given series_. Remote-Write requests can be sent **What are the differences between Remote-Write 2.0 and OpenTelemetry's OTLP protocol?** [OpenTelemetry OTLP](https://github.com/open-telemetry/opentelemetry-proto/blob/a05597bff803d3d9405fcdd1e1fb1f42bed4eb7a/docs/specification.md) is a protocol for transporting of telemetry data (such as metrics, logs, traces and profiles) between telemetry sources, intermediate nodes and telemetry backends. The recommended transport involves gRPC with protobuf, but HTTP with protobuf or JSON are also described. It was designed from scratch with the intent to support a variety of different observability signals, data types and extra information. For [metrics](https://github.com/open-telemetry/opentelemetry-proto/blob/main/opentelemetry/proto/metrics/v1/metrics.proto) that means additional non-identifying labels, flags, temporal aggregations types, resource or scoped metrics, schema URLs and more. OTLP also requires [the semantic convention](https://opentelemetry.io/docs/concepts/semantic-conventions/) to be used. -Remote-Write was designed for simplicity, efficiency and organic growth. The first version was officially released in 2023, when already [dozens of battle-tested adopters in the CNCF ecosystem](./remote_write_spec.md#compatible-senders-and-receivers) had been using this protocol for years. Remote-Write 2.0 iterates on the previous protocol by adding a few new elements (metadata, exemplars, created timestamp and native histograms) and string interning. Remote-Write 2.0 is always stateless, focuses only on metrics and is opinionated -- it is scoped down to elements that by Prometheus community, are all you need to have robust metric solution. We believe Remote-Write 2.0 proposes an export transport, for metrics, that is a magnitude simpler to adopt and use, and often more efficient than competitors. +Remote-Write was designed for simplicity, efficiency and organic growth. The first version was officially released in 2023, when already [dozens of battle-tested adopters in the CNCF ecosystem](./remote_write_spec.md#compatible-senders-and-receivers) had been using this protocol for years. Remote-Write 2.0 iterates on the previous protocol by adding a few new elements (metadata, exemplars, created timestamp and native histograms) and string interning. Remote-Write 2.0 is always stateless, focuses only on metrics and is opinionated; as such it is scoped down to elements that Prometheus community considers enough to have a robust metric solution. The intention is to ensure the Remote-Write is a stable protocol that is a cheaper and simpler to adopt and use than the alternatives in the observability ecosystem. From 48bfc1e39bd497af0f2820530e485f6989204e95 Mon Sep 17 00:00:00 2001 From: bwplotka Date: Mon, 10 Jun 2024 09:24:14 +0100 Subject: [PATCH 14/31] Improved formatting, added RFC 9111 SHOULD mention for User-Agent. Signed-off-by: bwplotka --- .../docs/concepts/remote_write_spec_2_0.md | 84 ++++++++++++------- 1 file changed, 53 insertions(+), 31 deletions(-) diff --git a/content/docs/concepts/remote_write_spec_2_0.md b/content/docs/concepts/remote_write_spec_2_0.md index 56d94ba44..e93a2369e 100644 --- a/content/docs/concepts/remote_write_spec_2_0.md +++ b/content/docs/concepts/remote_write_spec_2_0.md @@ -9,7 +9,7 @@ sort_rank: 4 * Status: **Experimental** * Date: May 2024 -*NOTE: This is a release candidate for Remote-Write 2.0 specification. This means that this specification is currently in an experimental state--no major changes are expected, but we reserve the rights to break the compatibility if it's absolutely necessary, based on the early adopters' feedback. The potential feedback, questions and suggestions should be added as comments to the [PR with the open proposal](https://github.com/prometheus/proposals/pull/35).* +**NOTE: This is a release candidate for Remote-Write 2.0 specification. This means that this specification is currently in an experimental state--no major changes are expected, but we reserve the rights to break the compatibility if it's absolutely necessary, based on the early adopters' feedback. The potential feedback, questions and suggestions should be added as comments to the [PR with the open proposal](https://github.com/prometheus/proposals/pull/35).** The Remote-Write specification, in general, is intended to document the standard for how Prometheus and Prometheus Remote-Write compatible senders send data to Prometheus or Prometheus Remote-Write compatible receivers. @@ -36,16 +36,16 @@ A test suite can be found at https://github.com/prometheus/compliance/tree/main/ In this document, the following definitions are followed: -* a "Remote-Write" is the name of this Prometheus protocol. -* a "Protocol" is a communication specification that enables the client and server to transfer metrics. This includes content type definitions, but also compressions, negotiations, retry mechanisms and so on. -* a "Proto Message" refers to the [content type](https://www.rfc-editor.org/rfc/rfc9110.html#name-content-type) definition of the data structure Remote-Write is specifying for this Protocol. Since this specification uses [Google Protocol Buffers ("protobuf")](https://protobuf.dev/) exclusively, the schema is defined in a ["proto" file](https://protobuf.dev/programming-guides/proto3/) and represented by a single Protobuf ["message"](https://protobuf.dev/programming-guides/proto3/#simple). -* a "Wire Format" is the format of the data as it travels on the wire (i.e. in a network). In the case of Remote-Write, this is always the compressed binary protobuf format. -* a "Sender" is something that sends Remote-Write data. -* a "Receiver" is something that receives Remote-Write data. -* a "Sample" is a pair of (timestamp, value). -* a "Histogram" is a pair of (timestamp, [histogram value](https://github.com/prometheus/docs/blob/b9657b5f5b264b81add39f6db2f1df36faf03efe/content/docs/concepts/native_histograms.md)). -* a "Label" is a pair of (key, value). -* a "Series" is a list of samples, identified by a unique set of labels. +* a `Remote-Write` is the name of this Prometheus protocol. +* a `Protocol` is a communication specification that enables the client and server to transfer metrics. +* a `Proto Message` refers to the [content type](https://www.rfc-editor.org/rfc/rfc9110.html#name-content-type) definition of the data structure for this Protocol. Since the specification uses [Google Protocol Buffers ("protobuf")](https://protobuf.dev/) exclusively, the schema is defined in a ["proto" file](https://protobuf.dev/programming-guides/proto3/) and represented by a single Protobuf ["message"](https://protobuf.dev/programming-guides/proto3/#simple). +* a `Wire Format` is the format of the data as it travels on the wire (i.e. in a network). In the case of Remote-Write, this is always the compressed binary protobuf format. +* a `Sender` is something that sends Remote-Write data. +* a `Receiver` is something that receives Remote-Write data. +* a `Sample` is a pair of (timestamp, value). +* a `Histogram` is a pair of (timestamp, [histogram value](https://github.com/prometheus/docs/blob/b9657b5f5b264b81add39f6db2f1df36faf03efe/content/docs/concepts/native_histograms.md)). +* a `Label` is a pair of (key, value). +* a `Series` is a list of samples, identified by a unique set of labels. ## Definitions @@ -58,8 +58,8 @@ Rationales: https://github.com/prometheus/proposals/blob/alexg/remote-write-20-p --> The protobuf serialization MUST use either of the following Proto Messages: -* [`prometheus.WriteRequest`](./remote_write_spec.md#protocol) introduced in the Remote-Write 1.0 specification. As of 2.0, the `prometheus.WriteRequest` message is deprecated. It SHOULD be used only for compatibility reasons. Sender and Receiver MAY NOT support `prometheus.WriteRequest`. -* `io.prometheus.write.v2.Request` introduced in this specification and defined [below](#ioprometheuswritev2request-proto-schema). Sender and Receiver SHOULD use `io.prometheus.write.v2.Request` when possible. Sender and Receiver MUST support `io.prometheus.write.v2.Request`. +* The `prometheus.WriteRequest` introduced in [the Remote-Write 1.0 specification](./remote_write_spec.md#protocol). As of 2.0, this message is deprecated. It SHOULD be used only for compatibility reasons. Sender and Receiver MAY NOT support the `prometheus.WriteRequest`. +* The `io.prometheus.write.v2.Request` introduced in this specification and defined [below](#proto-message). Sender and Receiver SHOULD use this message when possible. Sender and Receiver MUST support the `io.prometheus.write.v2.Request`. The Proto Message MUST use binary Wire Format. Then, MUST be compressed with [Google’s Snappy](https://github.com/google/snappy). The block format MUST be used -- the framed format MUST NOT be used. @@ -68,36 +68,54 @@ Sender MUST send serialized and compressed Proto Message in the body of an HTTP -Sender MUST send the following reserved headers with the HTTP request: +Sender MUST send the following reserved headers with the HTTP request. Sender MAY allow users to add custom HTTP headers; they MUST NOT allow users to configure them in such a way as to send reserved headers. -* `Content-Encoding: ` +#### Content-Encoding + +``` +Content-Encoding: +``` - Content encoding request header MUST follow [the RFC 9110](https://www.rfc-editor.org/rfc/rfc9110.html#name-content-encoding). Sender MUST use the `snappy` value. Receiver MUST support `snappy` compression. New, optional compression algorithms might come in 2.x or beyond. +Content encoding request header MUST follow [the RFC 9110](https://www.rfc-editor.org/rfc/rfc9110.html#name-content-encoding). Sender MUST use the `snappy` value. Receiver MUST support `snappy` compression. New, optional compression algorithms might come in 2.x or beyond. + +#### Content-Type -* `Content-Type: application/x-protobuf` or `Content-Type: application/x-protobuf;proto=` +``` +Content-Type: application/x-protobuf +Content-Type: application/x-protobuf;proto= +``` - Content type request header MUST follow [the RFC 9110](https://www.rfc-editor.org/rfc/rfc9110.html#name-content-type). Sender MUST use `application/x-protobuf` as the only media type. Sender MAY add `;proto=` parameter to the header's value to indicate the fully qualified name of the Proto Message that was used, from the two mentioned above. As a result, Sender MUST send any of the three supported header values: +Content type request header MUST follow [the RFC 9110](https://www.rfc-editor.org/rfc/rfc9110.html#name-content-type). Sender MUST use `application/x-protobuf` as the only media type. Sender MAY add `;proto=` parameter to the header's value to indicate the fully qualified name of the Proto Message that was used, from the two mentioned above. As a result, Sender MUST send any of the three supported header values: - For the deprecated message introduced in PRW 1.0, identified by `prometheus.WriteRequest`: +For the deprecated message introduced in PRW 1.0, identified by `prometheus.WriteRequest`: - * `Content-Type: application/x-protobuf` - * `Content-Type: application/x-protobuf;proto=prometheus.WriteRequest` +* `Content-Type: application/x-protobuf` +* `Content-Type: application/x-protobuf;proto=prometheus.WriteRequest` - For the message introduced in PRW 2.0, identified by `io.prometheus.write.v2.Request`: +For the message introduced in PRW 2.0, identified by `io.prometheus.write.v2.Request`: - * `Content-Type: application/x-protobuf;proto=io.prometheus.write.v2.Request` +* `Content-Type: application/x-protobuf;proto=io.prometheus.write.v2.Request` - When talking to 1.x Receiver, the Sender SHOULD use `Content-Type: application/x-protobuf` for backward compatibility. Otherwise, Sender SHOULD use `Content-Type: application/x-protobuf;proto=io.prometheus.write.v2.Request`. More Proto Messages might come in 2.x or beyond. +When talking to 1.x Receiver, Sender SHOULD use `Content-Type: application/x-protobuf` for backward compatibility. Otherwise, Sender SHOULD use `Content-Type: application/x-protobuf;proto=io.prometheus.write.v2.Request`. More Proto Messages might come in 2.x or beyond. -* `User-Agent: ` -* `X-Prometheus-Remote-Write-Version: ` +#### X-Prometheus-Remote-Write-Version - When talking to 1.x Receiver, the Sender MUST use `X-Prometheus-Remote-Write-Version: 0.1.0` for backward compatibility. Otherwise, Sender SHOULD use the newest Remote-Write version it is compatible with e.g. `X-Prometheus-Remote-Write-Version: 2.0.0`. +``` +X-Prometheus-Remote-Write-Version: +``` -Sender MAY allow users to add custom HTTP headers; they MUST NOT allow users to configure them in such a way as to send reserved headers. +When talking to 1.x Receiver, Sender MUST use `X-Prometheus-Remote-Write-Version: 0.1.0` for backward compatibility. Otherwise, Sender SHOULD use the newest Remote-Write version it is compatible with e.g. `X-Prometheus-Remote-Write-Version: 2.0.0`. + +#### User-Agent + +``` +User-Agent: +``` + +Sender MUST include a user agent header that SHOULD follow [the RFC 9110 User-Agent header format](https://www.rfc-editor.org/rfc/rfc9110.html#name-user-agent). ### Response @@ -134,7 +152,7 @@ Receiver MAY return a [429 HTTP Too Many Requests](https://developer.mozilla.org Sender MAY retry on a 429 HTTP status code. Sender MUST retry write requests on 5xx HTTP. Sender MUST use a backoff algorithm to prevent overwhelming the server. Sender MAY handle [the Retry-After response header](https://www.rfc-editor.org/rfc/rfc9110.html#name-retry-after) to estimate the next retry time. -The difference between 429 vs 5xx handling is due to a potential situation for the Sender “falling behind” if the Receiver cannot keep up. As a result, the ability to NOT retry on 429 allows progress is made when there are Sender side errors (e.g. too much traffic), while the data is not lost when there are Receiver side errors. +The difference between 429 vs 5xx handling is due to a potential situation for Sender “falling behind” if the Receiver cannot keep up. As a result, the ability to NOT retry on 429 allows progress is made when there are Sender side errors (e.g. too much traffic), while the data is not lost when there are Receiver side errors. ### Retries on Partial Writes @@ -157,12 +175,16 @@ The 2.x protocol is breaking compatibility with 1.x by introducing a new, mandat 2.x Sender MAY support 1.x Receiver by allowing users to configure what content type Sender should use. 2.x Sender also MAY automatically fall back to different content types, if the Receiver returns 415 HTTP status code. -#### `io.prometheus.write.v2.Request` Proto Schema +## Proto Message + +### `io.prometheus.write.v2.Request` + +The `io.prometheus.write.v2.Request` references the new Proto Message that's meant to replace and deprecate the Remote-Write 1.0's `prometheus.WriteRequest` message. -The source of truth is [here](https://github.com/prometheus/prometheus/blob/remote-write-2.0/prompb/io/prometheus/write/v2/types.proto#L32). The `gogo` dependency and options CAN be ignored ([will be removed eventually](https://github.com/prometheus/prometheus/issues/11908)). They are not part of the specification as they don't impact the serialized format. +The full schema and source of the truth is in Prometheus repository in [`prompb/io/prometheus/write/v2/types.proto`](https://github.com/prometheus/prometheus/blob/remote-write-2.0/prompb/io/prometheus/write/v2/types.proto#L32). The `gogo` dependency and options CAN be ignored ([will be removed eventually](https://github.com/prometheus/prometheus/issues/11908)). They are not part of the specification as they don't impact the serialized format. The simplified version of the new `io.prometheus.write.v2.Request` is presented below. From 7968e29117889e08fd6c2f20a1dca58f16b9d749 Mon Sep 17 00:00:00 2001 From: bwplotka Date: Mon, 10 Jun 2024 09:32:19 +0100 Subject: [PATCH 15/31] Last formatting touches. Signed-off-by: bwplotka --- content/docs/concepts/remote_write_spec_2_0.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/content/docs/concepts/remote_write_spec_2_0.md b/content/docs/concepts/remote_write_spec_2_0.md index e93a2369e..b167efea5 100644 --- a/content/docs/concepts/remote_write_spec_2_0.md +++ b/content/docs/concepts/remote_write_spec_2_0.md @@ -9,14 +9,14 @@ sort_rank: 4 * Status: **Experimental** * Date: May 2024 -**NOTE: This is a release candidate for Remote-Write 2.0 specification. This means that this specification is currently in an experimental state--no major changes are expected, but we reserve the rights to break the compatibility if it's absolutely necessary, based on the early adopters' feedback. The potential feedback, questions and suggestions should be added as comments to the [PR with the open proposal](https://github.com/prometheus/proposals/pull/35).** - The Remote-Write specification, in general, is intended to document the standard for how Prometheus and Prometheus Remote-Write compatible senders send data to Prometheus or Prometheus Remote-Write compatible receivers. This document is intended to define a second version of the [Prometheus Remote-Write](./remote_write_spec.md) API with minor changes to protocol and semantics. This second version adds a new Proto Message with new features enabling more use cases and wider adoption on top of performance and cost savings. The second version also deprecates the previous Proto Message from a [1.0 Remote-Write specification](./remote_write_spec.md#protocol). Finally, this spec outlines how to implement backwards-compatible senders and receivers (even under a single endpoint) using existing basic content negotiation request headers. More advanced, automatic content negotiation mechanisms might come in a future minor version, if needed. For the rationales behind the 2.0 specification, see [the formal proposal](https://github.com/prometheus/proposals/pull/35). The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC 2119](https://datatracker.ietf.org/doc/html/rfc2119). +> NOTE: This is a release candidate for Remote-Write 2.0 specification. This means that this specification is currently in an experimental state--no major changes are expected, but we reserve the rights to break the compatibility if it's absolutely necessary, based on the early adopters' feedback. The potential feedback, questions and suggestions should be added as comments to the [PR with the open proposal](https://github.com/prometheus/proposals/pull/35). + ## Introduction ### Background @@ -154,7 +154,7 @@ Sender MAY retry on a 429 HTTP status code. Sender MUST retry write requests on The difference between 429 vs 5xx handling is due to a potential situation for Sender “falling behind” if the Receiver cannot keep up. As a result, the ability to NOT retry on 429 allows progress is made when there are Sender side errors (e.g. too much traffic), while the data is not lost when there are Receiver side errors. -### Retries on Partial Writes +#### Retries on Partial Writes Receiver MAY return a 5xx HTTP or 429 HTTP status code on partial write or [partial invalid sample cases](#partial-write) when it expects Sender to retry the whole request. In that case Receiver MUST support idempotency as sender MAY retry with the same request. From b64c2e6e6b404fa2d86050242c003f6e2ecf1c81 Mon Sep 17 00:00:00 2001 From: bwplotka Date: Mon, 10 Jun 2024 11:52:38 +0100 Subject: [PATCH 16/31] Moved the spec to new location after the rebase. Signed-off-by: bwplotka --- content/docs/specs/remote_write_spec.md | 5 +++-- content/docs/{concepts => specs}/remote_write_spec_2_0.md | 2 +- 2 files changed, 4 insertions(+), 3 deletions(-) rename content/docs/{concepts => specs}/remote_write_spec_2_0.md (99%) diff --git a/content/docs/specs/remote_write_spec.md b/content/docs/specs/remote_write_spec.md index d24aaab39..854bc95e4 100644 --- a/content/docs/specs/remote_write_spec.md +++ b/content/docs/specs/remote_write_spec.md @@ -1,9 +1,10 @@ --- -title: Prometheus Remote-Write -sort_rank: 4 +title: Prometheus Remote-Write 1.0 +sort_rank: 5 --- # Prometheus Remote-Write Specification + - Version: 1.0 - Status: Published - Date: April 2023 diff --git a/content/docs/concepts/remote_write_spec_2_0.md b/content/docs/specs/remote_write_spec_2_0.md similarity index 99% rename from content/docs/concepts/remote_write_spec_2_0.md rename to content/docs/specs/remote_write_spec_2_0.md index b167efea5..c33c71c08 100644 --- a/content/docs/concepts/remote_write_spec_2_0.md +++ b/content/docs/specs/remote_write_spec_2_0.md @@ -1,5 +1,5 @@ --- -title: "Prometheus Remote-Write Specification 2.0 [EXPERIMENTAL]" +title: "Prometheus Remote-Write 2.0 [EXPERIMENTAL]" sort_rank: 4 --- From 843eca967b70a4a140525e4ec18ebd3a1497e5f0 Mon Sep 17 00:00:00 2001 From: Bartlomiej Plotka Date: Tue, 11 Jun 2024 21:07:46 +0100 Subject: [PATCH 17/31] Update content/docs/specs/remote_write_spec.md Co-authored-by: Arthur Silva Sens Signed-off-by: Bartlomiej Plotka --- content/docs/specs/remote_write_spec.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/docs/specs/remote_write_spec.md b/content/docs/specs/remote_write_spec.md index 854bc95e4..b3a474433 100644 --- a/content/docs/specs/remote_write_spec.md +++ b/content/docs/specs/remote_write_spec.md @@ -20,7 +20,7 @@ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "S The remote write protocol is designed to make it possible to reliably propagate samples in real-time from a sender to a receiver, without loss. -The Remote-Write protocol is designed to be stateless; there is strictly no inter-message communication. As such the protocol is not considered "streaming. To achieve a streaming effect multiple messages should be sent over the same connection using e.g. HTTP/1.1 or HTTP/2. "Fancy" technologies such as gRPC were considered, but at the time were not widely adopted, and it was challenging to expose gRPC services to the internet behind load balancers such as an AWS EC2 ELB. +The Remote-Write protocol is designed to be stateless; there is strictly no inter-message communication. As such the protocol is not considered "streaming". To achieve a streaming effect multiple messages should be sent over the same connection using e.g. HTTP/1.1 or HTTP/2. "Fancy" technologies such as gRPC were considered, but at the time were not widely adopted, and it was challenging to expose gRPC services to the internet behind load balancers such as an AWS EC2 ELB. The remote write protocol contains opportunities for batching, e.g. sending multiple samples for different series in a single request. It is not expected that multiple samples for the same series will be commonly sent in the same request, although there is support for this in the protocol. From 179cc51f757c670a728d3a4b4ea80dd0a012ba45 Mon Sep 17 00:00:00 2001 From: Bartlomiej Plotka Date: Thu, 13 Jun 2024 09:00:06 +0100 Subject: [PATCH 18/31] Apply suggestions from code review Co-authored-by: Nico Pazos <32206519+npazosmendez@users.noreply.github.com> Signed-off-by: Bartlomiej Plotka --- content/docs/specs/remote_write_spec_2_0.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/docs/specs/remote_write_spec_2_0.md b/content/docs/specs/remote_write_spec_2_0.md index c33c71c08..f746367dd 100644 --- a/content/docs/specs/remote_write_spec_2_0.md +++ b/content/docs/specs/remote_write_spec_2_0.md @@ -315,7 +315,7 @@ For every `TimeSeries` message: -* At least one element in Samples or in Histograms MUST be provided. For series which (rarely) would mix float and histogram samples, a separate `TimeSeries` message MUST be used. +* At least one element in `samples` or in `histograms` MUST be provided. A `TimeSeries` MUST NOT include both `samples` and `histograms`. For series which (rarely) would mix float and histogram samples, a separate `TimeSeries` message MUST be used. -Sender MUST send the following reserved headers with the HTTP request. Sender MAY allow users to add custom HTTP headers; they MUST NOT allow users to configure them in such a way as to send reserved headers. +Sender MUST send the following reserved headers with the HTTP request: +- Content-Encoding +- Content-Type +- X-Prometheus-Remote-Write-Version +- User-Agent + + +Sender MAY allow users to add custom HTTP headers; they MUST NOT allow users to configure them in such a way as to send reserved headers. #### Content-Encoding From e61d6c7c367b184251a738610c13f4c267fb38ac Mon Sep 17 00:00:00 2001 From: Bartlomiej Plotka Date: Wed, 19 Jun 2024 09:49:42 +0100 Subject: [PATCH 22/31] Update content/docs/specs/remote_write_spec_2_0.md Co-authored-by: Callum Styan Signed-off-by: Bartlomiej Plotka --- content/docs/specs/remote_write_spec_2_0.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/docs/specs/remote_write_spec_2_0.md b/content/docs/specs/remote_write_spec_2_0.md index 540b41055..3bd07f5fb 100644 --- a/content/docs/specs/remote_write_spec_2_0.md +++ b/content/docs/specs/remote_write_spec_2_0.md @@ -135,7 +135,7 @@ The following subsections specify Sender and Receiver semantics around write err -Sender SHOULD use Remote-Write to send samples for multiple series in a single request. As a result, Receiver MAY ingest valid samples within a write request that contains invalid or otherwise unwritten samples, which represents a partial write case. +Sender SHOULD use Remote-Write to send samples for multiple series in a single request. As a result, Receiver MAY ingest valid samples within a write request that also contains some invalid or otherwise unwritten samples, which represents a partial write case. In a partial write case, Receiver MUST NOT return a 200 HTTP status code. Receiver MUST provide a human-readable error message in the response body. The Receiver's error SHOULD contain information about the amount of the samples being rejected and for what reasons. From 0cd29353663345ec7f5f64b526b0d3445a203eea Mon Sep 17 00:00:00 2001 From: Bartlomiej Plotka Date: Wed, 19 Jun 2024 09:50:47 +0100 Subject: [PATCH 23/31] Update content/docs/specs/remote_write_spec_2_0.md Co-authored-by: Callum Styan Signed-off-by: Bartlomiej Plotka --- content/docs/specs/remote_write_spec_2_0.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/docs/specs/remote_write_spec_2_0.md b/content/docs/specs/remote_write_spec_2_0.md index 3bd07f5fb..efada287a 100644 --- a/content/docs/specs/remote_write_spec_2_0.md +++ b/content/docs/specs/remote_write_spec_2_0.md @@ -159,7 +159,7 @@ Receiver MAY return a [429 HTTP Too Many Requests](https://developer.mozilla.org Sender MAY retry on a 429 HTTP status code. Sender MUST retry write requests on 5xx HTTP. Sender MUST use a backoff algorithm to prevent overwhelming the server. Sender MAY handle [the Retry-After response header](https://www.rfc-editor.org/rfc/rfc9110.html#name-retry-after) to estimate the next retry time. -The difference between 429 vs 5xx handling is due to a potential situation for Sender “falling behind” if the Receiver cannot keep up. As a result, the ability to NOT retry on 429 allows progress is made when there are Sender side errors (e.g. too much traffic), while the data is not lost when there are Receiver side errors. +The difference between 429 vs 5xx handling is due to the potential situation of the Sender “falling behind” when the Receiver cannot keep up with the request volume, or the Receiver choosing to rate limit the Sender to protect it's own availability. As a result, the Sender has the option to NOT retry on 429, which allows progress to be made when there are Sender side errors (e.g. too much traffic), while the data is not lost when there are Receiver side errors (5xx). #### Retries on Partial Writes From 4a6182b4cca5ece102f4ddb9ac67ec0e034172ab Mon Sep 17 00:00:00 2001 From: Bartlomiej Plotka Date: Wed, 19 Jun 2024 09:51:24 +0100 Subject: [PATCH 24/31] Update content/docs/specs/remote_write_spec_2_0.md Co-authored-by: Callum Styan Signed-off-by: Bartlomiej Plotka --- content/docs/specs/remote_write_spec_2_0.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/docs/specs/remote_write_spec_2_0.md b/content/docs/specs/remote_write_spec_2_0.md index efada287a..05f50dc48 100644 --- a/content/docs/specs/remote_write_spec_2_0.md +++ b/content/docs/specs/remote_write_spec_2_0.md @@ -145,7 +145,7 @@ Sender MUST NOT try and interpret the error message and SHOULD log it as is. Receiver MUST return [415 HTTP Unsupported Media Type](https://www.rfc-editor.org/rfc/rfc9110.html#name-415-unsupported-media-type) status code if they don't support a given content type or encoding provided by Sender. -Sender SHOULD expect [400 HTTP Bad Request](https://www.rfc-editor.org/rfc/rfc9110.html#name-400-bad-request) for the above reasons from the 1.x Receiver, for backward compatibility. +Sender SHOULD expect [400 HTTP Bad Request](https://www.rfc-editor.org/rfc/rfc9110.html#name-400-bad-request) for the above reasons from the 1.x Receiver, for backwards compatibility. #### Invalid Samples From bfd0ef7592121525fe72f5d4cfa350aca6da71ca Mon Sep 17 00:00:00 2001 From: Bartlomiej Plotka Date: Wed, 19 Jun 2024 09:52:22 +0100 Subject: [PATCH 25/31] Update content/docs/specs/remote_write_spec_2_0.md Co-authored-by: Callum Styan Signed-off-by: Bartlomiej Plotka --- content/docs/specs/remote_write_spec_2_0.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/docs/specs/remote_write_spec_2_0.md b/content/docs/specs/remote_write_spec_2_0.md index 05f50dc48..f7f8cd948 100644 --- a/content/docs/specs/remote_write_spec_2_0.md +++ b/content/docs/specs/remote_write_spec_2_0.md @@ -327,7 +327,7 @@ Rationales: https://github.com/prometheus/proposals/blob/alexg/remote-write-20-p -* Metadata fields SHOULD be provided. Receiver MAY reject series with unspecified type. +* Metadata fields SHOULD be provided. Receiver MAY reject series with unspecified Type. * Exemplars SHOULD be provided if they exist for a series. * Created timestamp SHOULD be provided for metrics that follow counter semantics (e.g. counters and histograms). Receiver MAY reject those series without the created timestamp being set. From badc7cab7f029a7ddf1a0779ce6e48034cfb65a6 Mon Sep 17 00:00:00 2001 From: Bartlomiej Plotka Date: Wed, 19 Jun 2024 09:53:48 +0100 Subject: [PATCH 26/31] Update content/docs/specs/remote_write_spec_2_0.md Co-authored-by: Callum Styan Signed-off-by: Bartlomiej Plotka --- content/docs/specs/remote_write_spec_2_0.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/docs/specs/remote_write_spec_2_0.md b/content/docs/specs/remote_write_spec_2_0.md index f7f8cd948..b5deeab37 100644 --- a/content/docs/specs/remote_write_spec_2_0.md +++ b/content/docs/specs/remote_write_spec_2_0.md @@ -340,7 +340,7 @@ Rationales: https://github.com/prometheus/proposals/blob/alexg/remote-write-20-p --> The `io.prometheus.write.v2.Request` Proto Message is designed to [intern all strings](https://en.wikipedia.org/wiki/String_interning) for the proven additional compression and memory efficiency gains on top of the standard compressions. -Symbols table MUST be provided and it MUST contain deduplicated strings used in series, exemplar labels and metadata strings. The first element of the symbols table MUST be an empty string. References MUST point to the existing index in the Symbols string array. +Symbols table MUST be provided and it MUST contain deduplicated strings used in series, exemplar labels, and metadata strings. The first element of the symbols table MUST be an empty string, which is used to represent empty or unspecified values such as when Unit or Help metadata are not provided. References MUST point to the existing index in the Symbols string array. #### Series Labels From 0fe10e37fff787b24c64d44c30f42f994031fd6c Mon Sep 17 00:00:00 2001 From: bwplotka Date: Wed, 19 Jun 2024 09:58:44 +0100 Subject: [PATCH 27/31] Addressed Nico & Callum review comments. Signed-off-by: bwplotka --- content/docs/specs/remote_write_spec_2_0.md | 24 ++++++++++----------- 1 file changed, 11 insertions(+), 13 deletions(-) diff --git a/content/docs/specs/remote_write_spec_2_0.md b/content/docs/specs/remote_write_spec_2_0.md index b5deeab37..7cf62cb99 100644 --- a/content/docs/specs/remote_write_spec_2_0.md +++ b/content/docs/specs/remote_write_spec_2_0.md @@ -26,7 +26,7 @@ The Remote-Write protocol is designed to make it possible to reliably propagate -The Remote-Write protocol is designed to be stateless; there is strictly no inter-message communication. As such the protocol is not considered "streaming." To achieve a streaming effect multiple messages should be sent over the same connection using e.g. HTTP/1.1 or HTTP/2. "Fancy" technologies such as gRPC were considered, but at the time were not widely adopted, and it was challenging to expose gRPC services to the internet behind load balancers such as an AWS EC2 ELB. +The Remote-Write protocol is designed to be stateless; there is strictly no inter-message communication. As such the protocol is not considered "streaming". To achieve a streaming effect multiple messages should be sent over the same connection using e.g. HTTP/1.1 or HTTP/2. "Fancy" technologies such as gRPC were considered, but at the time were not widely adopted, and it was challenging to expose gRPC services to the internet behind load balancers such as an AWS EC2 ELB. The Remote-Write protocol contains opportunities for batching, e.g. sending multiple samples for different series in a single request. It is not expected that multiple samples for the same series will be commonly sent in the same request, although there is support for this in the Proto Message. @@ -38,7 +38,7 @@ In this document, the following definitions are followed: * a `Remote-Write` is the name of this Prometheus protocol. * a `Protocol` is a communication specification that enables the client and server to transfer metrics. -* a `Proto Message` refers to the [content type](https://www.rfc-editor.org/rfc/rfc9110.html#name-content-type) definition of the data structure for this Protocol. Since the specification uses [Google Protocol Buffers ("protobuf")](https://protobuf.dev/) exclusively, the schema is defined in a ["proto" file](https://protobuf.dev/programming-guides/proto3/) and represented by a single Protobuf ["message"](https://protobuf.dev/programming-guides/proto3/#simple). +* a `Proto Message` (or Protobuf Message) refers to the [content type](https://www.rfc-editor.org/rfc/rfc9110.html#name-content-type) definition of the data structure for this Protocol. Since the specification uses [Google Protocol Buffers ("protobuf")](https://protobuf.dev/) exclusively, the schema is defined in a ["proto" file](https://protobuf.dev/programming-guides/proto3/) and represented by a single Protobuf ["message"](https://protobuf.dev/programming-guides/proto3/#simple). * a `Wire Format` is the format of the data as it travels on the wire (i.e. in a network). In the case of Remote-Write, this is always the compressed binary protobuf format. * a `Sender` is something that sends Remote-Write data. * a `Receiver` is something that receives Remote-Write data. @@ -61,7 +61,7 @@ The protobuf serialization MUST use either of the following Proto Messages: * The `prometheus.WriteRequest` introduced in [the Remote-Write 1.0 specification](./remote_write_spec.md#protocol). As of 2.0, this message is deprecated. It SHOULD be used only for compatibility reasons. Sender and Receiver MAY NOT support the `prometheus.WriteRequest`. * The `io.prometheus.write.v2.Request` introduced in this specification and defined [below](#proto-message). Sender and Receiver SHOULD use this message when possible. Sender and Receiver MUST support the `io.prometheus.write.v2.Request`. -The Proto Message MUST use binary Wire Format. Then, MUST be compressed with [Google’s Snappy](https://github.com/google/snappy). The block format MUST be used -- the framed format MUST NOT be used. +The Proto Message MUST use binary Wire Format. Then, MUST be compressed with [Google’s Snappy](https://github.com/google/snappy). Snappy's [block format](https://github.com/google/snappy/blob/2c94e11145f0b7b184b831577c93e5a41c4c0346/format_description.txt) MUST be used -- [the framed format](https://github.com/google/snappy/blob/2c94e11145f0b7b184b831577c93e5a41c4c0346/framing_format.txt) MUST NOT be used. Sender MUST send a serialized and compressed Proto Message in the body of an HTTP POST request and send it to the Receiver via HTTP at the provided URL path. The Receiver MAY specify any HTTP URL path to receive metrics. @@ -128,18 +128,16 @@ Sender MUST include a user agent header that SHOULD follow [the RFC 9110 User-Ag Receiver ingesting all samples successfully MUST return a 200 HTTP status code. In such a successful case, the response body from the Receiver SHOULD be empty; Sender MUST ignore the response body. The response body is RESERVED for future use. -The following subsections specify Sender and Receiver semantics around write errors. +Receiver MUST NOT return a 200 HTTP status code if any of the samples were not written successfully (e.g. on a [partial write](#partial-write) or a full write rejection). In such a case, Receiver MUST provide a human-readable error message in the response body. The Receiver's error SHOULD contain information about the amount of the samples being rejected and for what reasons. Sender MUST NOT try and interpret the error message and SHOULD log it as is. + +The following subsections specify Sender and Receiver semantics around different write error cases. #### Partial Write -Sender SHOULD use Remote-Write to send samples for multiple series in a single request. As a result, Receiver MAY ingest valid samples within a write request that also contains some invalid or otherwise unwritten samples, which represents a partial write case. - -In a partial write case, Receiver MUST NOT return a 200 HTTP status code. Receiver MUST provide a human-readable error message in the response body. The Receiver's error SHOULD contain information about the amount of the samples being rejected and for what reasons. - -Sender MUST NOT try and interpret the error message and SHOULD log it as is. +Sender SHOULD use Remote-Write to send samples for multiple series in a single request. As a result, Receiver MAY ingest valid samples within a write request that also contains some invalid or otherwise unwritten samples, which represents a partial write case. In such a case, Receiver MUST return non-200 status code following the [Invalid Samples](#invalid-samples) and [Retry on Partial Writes](#retries-on-partial-writes) sections. #### Unsupported Request Content @@ -155,11 +153,11 @@ Sender MUST NOT retry on a 4xx HTTP status codes (other than [429](https://devel ### Retries & Backoff -Receiver MAY return a [429 HTTP Too Many Requests](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/429) status code to indicate the overloaded server situation. Receiver MAY return [the Retry-After](https://www.rfc-editor.org/rfc/rfc9110.html#name-retry-after) header to indicate the time for the next write attempt. Receiver MAY return a 5xx HTTP status code to represent internal server errors, that should be retried. +Receiver MAY return a [429 HTTP Too Many Requests](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/429) status code to indicate the overloaded server situation. Receiver MAY return [the Retry-After](https://www.rfc-editor.org/rfc/rfc9110.html#name-retry-after) header to indicate the time for the next write attempt. Receiver MAY return a 5xx HTTP status code to represent internal server errors. Sender MAY retry on a 429 HTTP status code. Sender MUST retry write requests on 5xx HTTP. Sender MUST use a backoff algorithm to prevent overwhelming the server. Sender MAY handle [the Retry-After response header](https://www.rfc-editor.org/rfc/rfc9110.html#name-retry-after) to estimate the next retry time. -The difference between 429 vs 5xx handling is due to the potential situation of the Sender “falling behind” when the Receiver cannot keep up with the request volume, or the Receiver choosing to rate limit the Sender to protect it's own availability. As a result, the Sender has the option to NOT retry on 429, which allows progress to be made when there are Sender side errors (e.g. too much traffic), while the data is not lost when there are Receiver side errors (5xx). +The difference between 429 vs 5xx handling is due to the potential situation of Sender “falling behind” when Receiver cannot keep up with the request volume, or Receiver choosing to rate limit Sender to protect its own availability. As a result, Sender has the option to NOT retry on 429, which allows progress to be made when there are Sender side errors (e.g. too much traffic), while the data is not lost when there are Receiver side errors (5xx). #### Retries on Partial Writes @@ -439,10 +437,10 @@ Because 1.0 protocol does not use gRPC, breaking it would increase friction in t If you use persistent HTTP/1.1 connections, they are pretty close to streaming. Of course headers have to be re-sent, but yes that is less expensive than a new TCP set up. **Why do we send samples in order?** -The in-order constraint comes from the encoding we use for time series data in Prometheus, the implementation of which is append-only. It is possible to remove this constraint, for instance by buffering samples and reordering them before encoding. +The in-order constraint comes from the encoding we use for time series data in Prometheus, the implementation of which is optimized for append-only workloads. However, this requirement is also shared across many other databases and vendors in the ecosystem. In fact, [Prometheus with OOO feature enabled](https://youtu.be/qYsycK3nTSQ?t=1321), allows out-of-order writes, but with the performance penalty, thus reserved for rare events. To sum up, Receiver may support out-of-order ingestion, though it is not permitted by the specification. In the future e.g. 2.x spec versions, we could extend content type to negotiate the out-of-order writes, if needed. **How can we parallelise requests with the in-order constraint?** -Samples must be in-order _for a given series_. Remote-Write requests can be sent in parallel as long as they are for different series. In Prometheus, we shard the samples by their labels into separate queues, and then writes happen sequentially in each queue. This guarantees samples for the same series are delivered in order, but samples for different series are sent in parallel - and potentially "out of order" between different series. +Samples must be in-order _for a given series_. However, even if Receiver does not support out-of-order ingestion, the Remote-Write requests can be sent in parallel as long as they are for different series. Prometheus shards the samples by their labels into separate queues, and then writes happen sequentially in each queue. This guarantees samples for the same series are delivered in order, but samples for different series are sent in parallel - and potentially "out of order" between different series. **What are the differences between Remote-Write 2.0 and OpenTelemetry's OTLP protocol?** [OpenTelemetry OTLP](https://github.com/open-telemetry/opentelemetry-proto/blob/a05597bff803d3d9405fcdd1e1fb1f42bed4eb7a/docs/specification.md) is a protocol for transporting of telemetry data (such as metrics, logs, traces and profiles) between telemetry sources, intermediate nodes and telemetry backends. The recommended transport involves gRPC with protobuf, but HTTP with protobuf or JSON are also described. It was designed from scratch with the intent to support a variety of different observability signals, data types and extra information. For [metrics](https://github.com/open-telemetry/opentelemetry-proto/blob/main/opentelemetry/proto/metrics/v1/metrics.proto) that means additional non-identifying labels, flags, temporal aggregations types, resource or scoped metrics, schema URLs and more. OTLP also requires [the semantic convention](https://opentelemetry.io/docs/concepts/semantic-conventions/) to be used. From c3c282c24d0c9ebc38e5d686544182b304f3a8ba Mon Sep 17 00:00:00 2001 From: Bartlomiej Plotka Date: Wed, 19 Jun 2024 09:59:03 +0100 Subject: [PATCH 28/31] Update content/docs/specs/remote_write_spec_2_0.md Co-authored-by: Callum Styan Signed-off-by: Bartlomiej Plotka --- content/docs/specs/remote_write_spec_2_0.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/docs/specs/remote_write_spec_2_0.md b/content/docs/specs/remote_write_spec_2_0.md index 7cf62cb99..f5d6bdbf5 100644 --- a/content/docs/specs/remote_write_spec_2_0.md +++ b/content/docs/specs/remote_write_spec_2_0.md @@ -327,7 +327,7 @@ Rationales: https://github.com/prometheus/proposals/blob/alexg/remote-write-20-p --> * Metadata fields SHOULD be provided. Receiver MAY reject series with unspecified Type. * Exemplars SHOULD be provided if they exist for a series. -* Created timestamp SHOULD be provided for metrics that follow counter semantics (e.g. counters and histograms). Receiver MAY reject those series without the created timestamp being set. +* Created Timestamp SHOULD be provided for metrics that follow counter semantics (e.g. counters and histograms). Receiver MAY reject those series without the Created Timestamp being set. The following subsections define some schema elements in detail. From 9042e61d7d318fdac5b619632500f72db6c88f71 Mon Sep 17 00:00:00 2001 From: bwplotka Date: Wed, 19 Jun 2024 10:27:13 +0100 Subject: [PATCH 29/31] Fixed formatting. Signed-off-by: bwplotka --- content/docs/specs/remote_write_spec_2_0.md | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/content/docs/specs/remote_write_spec_2_0.md b/content/docs/specs/remote_write_spec_2_0.md index f5d6bdbf5..e68601312 100644 --- a/content/docs/specs/remote_write_spec_2_0.md +++ b/content/docs/specs/remote_write_spec_2_0.md @@ -69,11 +69,11 @@ Sender MUST send a serialized and compressed Proto Message in the body of an HTT Rationales: https://github.com/prometheus/proposals/blob/alexg/remote-write-20-proposal/proposals/2024-04-09_remote-write-20.md#basic-content-negotiation-built-on-what-we-have --> Sender MUST send the following reserved headers with the HTTP request: -- Content-Encoding -- Content-Type -- X-Prometheus-Remote-Write-Version -- User-Agent +- `Content-Encoding` +- `Content-Type` +- `X-Prometheus-Remote-Write-Version` +- `User-Agent` Sender MAY allow users to add custom HTTP headers; they MUST NOT allow users to configure them in such a way as to send reserved headers. @@ -255,10 +255,13 @@ message TimeSeries { // Exemplar represents additional information attached to some series' samples. message Exemplar { - // labels_refs is a list of label name-value pair references, encoded + // labels_refs is an optional list of label name-value pair references, encoded // as indices to the Request.symbols array. This list's len is always // a multiple of 2, and the underlying labels should be sorted lexicographically. + // If the exemplar references a trace it should use the `trace_id` label name, as a best practice. repeated uint32 labels_refs = 1; + // value represents an exact example value. This can be useful when the exemplar + // is attached to a histogram, which only gives an estimated value through buckets. double value = 2; // timestamp represents an optional timestamp of the sample in ms. // For Go, see github.com/prometheus/prometheus/model/timestamp/timestamp.go @@ -276,8 +279,6 @@ message Sample { // value of the sample. double value = 1; // timestamp represents timestamp of the sample in ms. - // For Go, see github.com/prometheus/prometheus/model/timestamp/timestamp.go - // for conversion from/to time.Time to Prometheus timestamp. int64 timestamp = 2; } @@ -313,19 +314,19 @@ message Histogram { ... } All timestamps MUST be int64 counted as milliseconds since the Unix epoch. Sample's values MUST be float64. -For every `TimeSeries` message: +For every TimeSeries message: * Label references MUST be provided. -* At least one element in `samples` or in `histograms` MUST be provided. A `TimeSeries` MUST NOT include both `samples` and `histograms`. For series which (rarely) would mix float and histogram samples, a separate `TimeSeries` message MUST be used. +* At least one element in Samples or in Histograms MUST be provided. A TimeSeries MUST NOT include both Samples and Histograms. For series which (rarely) would mix float and histogram samples, a separate TimeSeries message MUST be used. -* Metadata fields SHOULD be provided. Receiver MAY reject series with unspecified Type. +* Metadata sub-fields SHOULD be provided. Receiver MAY reject series with unspecified Metadata.Type. * Exemplars SHOULD be provided if they exist for a series. * Created Timestamp SHOULD be provided for metrics that follow counter semantics (e.g. counters and histograms). Receiver MAY reject those series without the Created Timestamp being set. From 411cd9dcecc3180191a7358490f6008a12e6c29a Mon Sep 17 00:00:00 2001 From: Bartlomiej Plotka Date: Thu, 20 Jun 2024 10:13:17 +0200 Subject: [PATCH 30/31] add backticks to all fields and types references (#2480) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Nicolás Pazos Signed-off-by: bwplotka --- content/docs/specs/remote_write_spec_2_0.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/content/docs/specs/remote_write_spec_2_0.md b/content/docs/specs/remote_write_spec_2_0.md index e68601312..e15717b89 100644 --- a/content/docs/specs/remote_write_spec_2_0.md +++ b/content/docs/specs/remote_write_spec_2_0.md @@ -314,21 +314,21 @@ message Histogram { ... } All timestamps MUST be int64 counted as milliseconds since the Unix epoch. Sample's values MUST be float64. -For every TimeSeries message: +For every `TimeSeries` message: -* Label references MUST be provided. +* `labels_refs` MUST be provided. -* At least one element in Samples or in Histograms MUST be provided. A TimeSeries MUST NOT include both Samples and Histograms. For series which (rarely) would mix float and histogram samples, a separate TimeSeries message MUST be used. +* At least one element in `samples` or in `histograms` MUST be provided. A `TimeSeries` MUST NOT include both `samples` and `histograms`. For series which (rarely) would mix float and histogram samples, a separate `TimeSeries` message MUST be used. -* Metadata sub-fields SHOULD be provided. Receiver MAY reject series with unspecified Metadata.Type. +* `metadata` sub-fields SHOULD be provided. Receiver MAY reject series with unspecified `Metadata.type`. * Exemplars SHOULD be provided if they exist for a series. -* Created Timestamp SHOULD be provided for metrics that follow counter semantics (e.g. counters and histograms). Receiver MAY reject those series without the Created Timestamp being set. +* `created_timestamp` SHOULD be provided for metrics that follow counter semantics (e.g. counters and histograms). Receiver MAY reject those series without `created_timestamp` being set. The following subsections define some schema elements in detail. @@ -339,14 +339,14 @@ Rationales: https://github.com/prometheus/proposals/blob/alexg/remote-write-20-p --> The `io.prometheus.write.v2.Request` Proto Message is designed to [intern all strings](https://en.wikipedia.org/wiki/String_interning) for the proven additional compression and memory efficiency gains on top of the standard compressions. -Symbols table MUST be provided and it MUST contain deduplicated strings used in series, exemplar labels, and metadata strings. The first element of the symbols table MUST be an empty string, which is used to represent empty or unspecified values such as when Unit or Help metadata are not provided. References MUST point to the existing index in the Symbols string array. +The `symbols` table MUST be provided and it MUST contain deduplicated strings used in series, exemplar labels, and metadata strings. The first element of the `symbols` table MUST be an empty string, which is used to represent empty or unspecified values such as when `Metadata.unit_ref` or `Metadata.help_ref` are not provided. References MUST point to the existing index in the `symbols` string array. #### Series Labels -The complete set of labels MUST be sent with each Sample or Histogram sample. Additionally, the label set associated with samples: +The complete set of labels MUST be sent with each `Sample` or `Histogram` sample. Additionally, the label set associated with samples: * SHOULD contain a `__name__` label. * MUST NOT contain repeated label names. @@ -370,7 +370,7 @@ Receiver also MAY impose limits on the number and length of labels, but this is -Sender MUST send samples (or histogram samples) for any given TimeSeries in timestamp order. Sender MAY send multiple requests for different series in parallel. +Sender MUST send `samples` (or `histograms`) for any given `TimeSeries` in timestamp order. Sender MAY send multiple requests for different series in parallel. The Remote-Write protocol is designed to be stateless; there is strictly no inter-message communication. As such the protocol is not considered "streaming". To achieve a streaming effect multiple messages should be sent over the same connection using e.g. HTTP/1.1 or HTTP/2. "Fancy" technologies such as gRPC were considered, but at the time were not widely adopted, and it was challenging to expose gRPC services to the internet behind load balancers such as an AWS EC2 ELB. -The Remote-Write protocol contains opportunities for batching, e.g. sending multiple samples for different series in a single request. It is not expected that multiple samples for the same series will be commonly sent in the same request, although there is support for this in the Proto Message. +The Remote-Write protocol contains opportunities for batching, e.g. sending multiple samples for different series in a single request. It is not expected that multiple samples for the same series will be commonly sent in the same request, although there is support for this in the Protobuf Message. A test suite can be found at https://github.com/prometheus/compliance/tree/main/remote_write_sender. The compliance tests for remote write 2.0 compatibility are still [in progress](https://github.com/prometheus/compliance/issues/101). @@ -36,9 +36,9 @@ A test suite can be found at https://github.com/prometheus/compliance/tree/main/ In this document, the following definitions are followed: -* a `Remote-Write` is the name of this Prometheus protocol. +* `Remote-Write` is the name of this Prometheus protocol. * a `Protocol` is a communication specification that enables the client and server to transfer metrics. -* a `Proto Message` (or Protobuf Message) refers to the [content type](https://www.rfc-editor.org/rfc/rfc9110.html#name-content-type) definition of the data structure for this Protocol. Since the specification uses [Google Protocol Buffers ("protobuf")](https://protobuf.dev/) exclusively, the schema is defined in a ["proto" file](https://protobuf.dev/programming-guides/proto3/) and represented by a single Protobuf ["message"](https://protobuf.dev/programming-guides/proto3/#simple). +* a `Protobuf Message` (or Proto Message) refers to the [content type](https://www.rfc-editor.org/rfc/rfc9110.html#name-content-type) definition of the data structure for this Protocol. Since the specification uses [Google Protocol Buffers ("protobuf")](https://protobuf.dev/) exclusively, the schema is defined in a ["proto" file](https://protobuf.dev/programming-guides/proto3/) and represented by a single Protobuf ["message"](https://protobuf.dev/programming-guides/proto3/#simple). * a `Wire Format` is the format of the data as it travels on the wire (i.e. in a network). In the case of Remote-Write, this is always the compressed binary protobuf format. * a `Sender` is something that sends Remote-Write data. * a `Receiver` is something that receives Remote-Write data. @@ -56,26 +56,26 @@ The Remote-Write Protocol MUST consist of RPCs with the request body serialized -The protobuf serialization MUST use either of the following Proto Messages: +The protobuf serialization MUST use either of the following Protobuf Messages: -* The `prometheus.WriteRequest` introduced in [the Remote-Write 1.0 specification](./remote_write_spec.md#protocol). As of 2.0, this message is deprecated. It SHOULD be used only for compatibility reasons. Sender and Receiver MAY NOT support the `prometheus.WriteRequest`. -* The `io.prometheus.write.v2.Request` introduced in this specification and defined [below](#proto-message). Sender and Receiver SHOULD use this message when possible. Sender and Receiver MUST support the `io.prometheus.write.v2.Request`. +* The `prometheus.WriteRequest` introduced in [the Remote-Write 1.0 specification](./remote_write_spec.md#protocol). As of 2.0, this message is deprecated. It SHOULD be used only for compatibility reasons. Senders and Receivers MAY NOT support the `prometheus.WriteRequest`. +* The `io.prometheus.write.v2.Request` introduced in this specification and defined [below](#protobuf-message). Senders and Receivers SHOULD use this message when possible. Senders and Receivers MUST support the `io.prometheus.write.v2.Request`. -The Proto Message MUST use binary Wire Format. Then, MUST be compressed with [Google’s Snappy](https://github.com/google/snappy). Snappy's [block format](https://github.com/google/snappy/blob/2c94e11145f0b7b184b831577c93e5a41c4c0346/format_description.txt) MUST be used -- [the framed format](https://github.com/google/snappy/blob/2c94e11145f0b7b184b831577c93e5a41c4c0346/framing_format.txt) MUST NOT be used. +Protobuf Message MUST use binary Wire Format. Then, MUST be compressed with [Google’s Snappy](https://github.com/google/snappy). Snappy's [block format](https://github.com/google/snappy/blob/2c94e11145f0b7b184b831577c93e5a41c4c0346/format_description.txt) MUST be used -- [the framed format](https://github.com/google/snappy/blob/2c94e11145f0b7b184b831577c93e5a41c4c0346/framing_format.txt) MUST NOT be used. -Sender MUST send a serialized and compressed Proto Message in the body of an HTTP POST request and send it to the Receiver via HTTP at the provided URL path. The Receiver MAY specify any HTTP URL path to receive metrics. +Senders MUST send a serialized and compressed Protobuf Message in the body of an HTTP POST request and send it to the Receiver via HTTP at the provided URL path. Receivers MAY specify any HTTP URL path to receive metrics. -Sender MUST send the following reserved headers with the HTTP request: +Senders MUST send the following reserved headers with the HTTP request: - `Content-Encoding` - `Content-Type` - `X-Prometheus-Remote-Write-Version` - `User-Agent` -Sender MAY allow users to add custom HTTP headers; they MUST NOT allow users to configure them in such a way as to send reserved headers. +Senders MAY allow users to add custom HTTP headers; they MUST NOT allow users to configure them in such a way as to send reserved headers. #### Content-Encoding @@ -86,7 +86,7 @@ Content-Encoding: -Content encoding request header MUST follow [the RFC 9110](https://www.rfc-editor.org/rfc/rfc9110.html#name-content-encoding). Sender MUST use the `snappy` value. Receiver MUST support `snappy` compression. New, optional compression algorithms might come in 2.x or beyond. +Content encoding request header MUST follow [the RFC 9110](https://www.rfc-editor.org/rfc/rfc9110.html#name-content-encoding). Senders MUST use the `snappy` value. Receivers MUST support `snappy` compression. New, optional compression algorithms might come in 2.x or beyond. #### Content-Type @@ -95,7 +95,7 @@ Content-Type: application/x-protobuf Content-Type: application/x-protobuf;proto= ``` -Content type request header MUST follow [the RFC 9110](https://www.rfc-editor.org/rfc/rfc9110.html#name-content-type). Sender MUST use `application/x-protobuf` as the only media type. Sender MAY add `;proto=` parameter to the header's value to indicate the fully qualified name of the Proto Message that was used, from the two mentioned above. As a result, Sender MUST send any of the three supported header values: +Content type request header MUST follow [the RFC 9110](https://www.rfc-editor.org/rfc/rfc9110.html#name-content-type). Senders MUST use `application/x-protobuf` as the only media type. Senders MAY add `;proto=` parameter to the header's value to indicate the fully qualified name of the Protobuf Message that was used, from the two mentioned above. As a result, Senders MUST send any of the three supported header values: For the deprecated message introduced in PRW 1.0, identified by `prometheus.WriteRequest`: @@ -106,7 +106,7 @@ For the message introduced in PRW 2.0, identified by `io.prometheus.write.v2.Req * `Content-Type: application/x-protobuf;proto=io.prometheus.write.v2.Request` -When talking to 1.x Receiver, Sender SHOULD use `Content-Type: application/x-protobuf` for backward compatibility. Otherwise, Sender SHOULD use `Content-Type: application/x-protobuf;proto=io.prometheus.write.v2.Request`. More Proto Messages might come in 2.x or beyond. +When talking to 1.x Receivers, Senders SHOULD use `Content-Type: application/x-protobuf` for backward compatibility. Otherwise, Senders SHOULD use `Content-Type: application/x-protobuf;proto=io.prometheus.write.v2.Request`. More Protobuf Messages might come in 2.x or beyond. #### X-Prometheus-Remote-Write-Version @@ -114,21 +114,21 @@ When talking to 1.x Receiver, Sender SHOULD use `Content-Type: application/x-pro X-Prometheus-Remote-Write-Version: ``` -When talking to 1.x Receiver, Sender MUST use `X-Prometheus-Remote-Write-Version: 0.1.0` for backward compatibility. Otherwise, Sender SHOULD use the newest Remote-Write version it is compatible with e.g. `X-Prometheus-Remote-Write-Version: 2.0.0`. +When talking to 1.x Receivers, Senders MUST use `X-Prometheus-Remote-Write-Version: 0.1.0` for backward compatibility. Otherwise, Senders SHOULD use the newest Remote-Write version it is compatible with e.g. `X-Prometheus-Remote-Write-Version: 2.0.0`. #### User-Agent ``` -User-Agent: +User-Agent: ``` -Sender MUST include a user agent header that SHOULD follow [the RFC 9110 User-Agent header format](https://www.rfc-editor.org/rfc/rfc9110.html#name-user-agent). +Senders MUST include a user agent header that SHOULD follow [the RFC 9110 User-Agent header format](https://www.rfc-editor.org/rfc/rfc9110.html#name-user-agent). ### Response -Receiver ingesting all samples successfully MUST return a 200 HTTP status code. In such a successful case, the response body from the Receiver SHOULD be empty; Sender MUST ignore the response body. The response body is RESERVED for future use. +Receivers ingesting all samples successfully MUST return a 200 HTTP status code. In such a successful case, the response body from the Receiver SHOULD be empty; Senders MUST ignore the response body. The response body is RESERVED for future use. -Receiver MUST NOT return a 200 HTTP status code if any of the samples were not written successfully (e.g. on a [partial write](#partial-write) or a full write rejection). In such a case, Receiver MUST provide a human-readable error message in the response body. The Receiver's error SHOULD contain information about the amount of the samples being rejected and for what reasons. Sender MUST NOT try and interpret the error message and SHOULD log it as is. +Receivers MUST NOT return a 200 HTTP status code if any of the samples were not written successfully (e.g. on a [partial write](#partial-write) or a full write rejection). In such a case, the Receiver MUST provide a human-readable error message in the response body. The Receiver's error SHOULD contain information about the amount of the samples being rejected and for what reasons. Senders MUST NOT try and interpret the error message and SHOULD log it as is. The following subsections specify Sender and Receiver semantics around different write error cases. @@ -137,54 +137,54 @@ The following subsections specify Sender and Receiver semantics around different -Sender SHOULD use Remote-Write to send samples for multiple series in a single request. As a result, Receiver MAY ingest valid samples within a write request that also contains some invalid or otherwise unwritten samples, which represents a partial write case. In such a case, Receiver MUST return non-200 status code following the [Invalid Samples](#invalid-samples) and [Retry on Partial Writes](#retries-on-partial-writes) sections. +Senders SHOULD use Remote-Write to send samples for multiple series in a single request. As a result, Receivers MAY ingest valid samples within a write request that also contains some invalid or otherwise unwritten samples, which represents a partial write case. In such a case, the Receiver MUST return non-200 status code following the [Invalid Samples](#invalid-samples) and [Retry on Partial Writes](#retries-on-partial-writes) sections. #### Unsupported Request Content -Receiver MUST return [415 HTTP Unsupported Media Type](https://www.rfc-editor.org/rfc/rfc9110.html#name-415-unsupported-media-type) status code if they don't support a given content type or encoding provided by Sender. +Receivers MUST return [415 HTTP Unsupported Media Type](https://www.rfc-editor.org/rfc/rfc9110.html#name-415-unsupported-media-type) status code if they don't support a given content type or encoding provided by Senders. -Sender SHOULD expect [400 HTTP Bad Request](https://www.rfc-editor.org/rfc/rfc9110.html#name-400-bad-request) for the above reasons from the 1.x Receiver, for backwards compatibility. +Senders SHOULD expect [400 HTTP Bad Request](https://www.rfc-editor.org/rfc/rfc9110.html#name-400-bad-request) for the above reasons from 1.x Receivers, for backwards compatibility. #### Invalid Samples -Receiver MAY NOT support certain metric types or samples (e.g. Receiver might reject sample without metadata type specified or without created timestamp, while another Receiver might accept such sample.). It’s up to the Receiver what sample is invalid. Receiver MUST return a [400 HTTP Bad Request](https://www.rfc-editor.org/rfc/rfc9110.html#name-400-bad-request) status code for write requests that contain any invalid samples unless the [partial retriable write](#retries-on-partial-writes) occurs. +Receivers MAY NOT support certain metric types or samples (e.g. a Receiver might reject sample without metadata type specified or without created timestamp, while another Receiver might accept such sample.). It’s up to the Receiver what sample is invalid. Receivers MUST return a [400 HTTP Bad Request](https://www.rfc-editor.org/rfc/rfc9110.html#name-400-bad-request) status code for write requests that contain any invalid samples unless the [partial retriable write](#retries-on-partial-writes) occurs. -Sender MUST NOT retry on a 4xx HTTP status codes (other than [429](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/429)), which MUST be used by Receiver to indicate that the write operation will never be able to succeed and should not be retried. Sender MAY retry on the 415 HTTP status code with a different content type or encoding to see if Receiver supports it. +Senders MUST NOT retry on a 4xx HTTP status codes (other than [429](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/429)), which MUST be used by Receivers to indicate that the write operation will never be able to succeed and should not be retried. Senders MAY retry on the 415 HTTP status code with a different content type or encoding to see if the Receiver supports it. ### Retries & Backoff -Receiver MAY return a [429 HTTP Too Many Requests](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/429) status code to indicate the overloaded server situation. Receiver MAY return [the Retry-After](https://www.rfc-editor.org/rfc/rfc9110.html#name-retry-after) header to indicate the time for the next write attempt. Receiver MAY return a 5xx HTTP status code to represent internal server errors. +Receivers MAY return a [429 HTTP Too Many Requests](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/429) status code to indicate the overloaded server situation. Receivers MAY return [the Retry-After](https://www.rfc-editor.org/rfc/rfc9110.html#name-retry-after) header to indicate the time for the next write attempt. Receivers MAY return a 5xx HTTP status code to represent internal server errors. -Sender MAY retry on a 429 HTTP status code. Sender MUST retry write requests on 5xx HTTP. Sender MUST use a backoff algorithm to prevent overwhelming the server. Sender MAY handle [the Retry-After response header](https://www.rfc-editor.org/rfc/rfc9110.html#name-retry-after) to estimate the next retry time. +Senders MAY retry on a 429 HTTP status code. Senders MUST retry write requests on 5xx HTTP. Senders MUST use a backoff algorithm to prevent overwhelming the server. Senders MAY handle [the Retry-After response header](https://www.rfc-editor.org/rfc/rfc9110.html#name-retry-after) to estimate the next retry time. -The difference between 429 vs 5xx handling is due to the potential situation of Sender “falling behind” when Receiver cannot keep up with the request volume, or Receiver choosing to rate limit Sender to protect its own availability. As a result, Sender has the option to NOT retry on 429, which allows progress to be made when there are Sender side errors (e.g. too much traffic), while the data is not lost when there are Receiver side errors (5xx). +The difference between 429 vs 5xx handling is due to the potential situation of a Sender “falling behind” when the Receiver cannot keep up with the request volume, or the Receiver choosing to rate limit the Sender to protect its availability. As a result, Senders has the option to NOT retry on 429, which allows progress to be made when there are Sender side errors (e.g. too much traffic), while the data is not lost when there are Receiver side errors (5xx). #### Retries on Partial Writes -Receiver MAY return a 5xx HTTP or 429 HTTP status code on partial write or [partial invalid sample cases](#partial-write) when it expects Sender to retry the whole request. In that case Receiver MUST support idempotency as sender MAY retry with the same request. +Receivers MAY return a 5xx HTTP or 429 HTTP status code on partial write or [partial invalid sample cases](#partial-write) when it expects Senders to retry the whole request. In that case, the Receiver MUST support idempotency as Senders MAY retry with the same request. ### Backward and Forward Compatibility -The protocol follows [semantic versioning 2.0](https://semver.org/): any 2.x compatible Receiver MUST be able to read any 2.x compatible Sender and vice versa. Breaking or backwards incompatible changes will result in a 3.x version of the spec. +The protocol follows [semantic versioning 2.0](https://semver.org/): any 2.x compatible Receiver MUST be able to read any 2.x compatible Senders and vice versa. Breaking or backwards incompatible changes will result in a 3.x version of the spec. -The Proto Messages (in Wire Format) themselves are forward / backward compatible, in some respects: +The Protobuf Messages (in Wire Format) themselves are forward / backward compatible, in some respects: -* Removing fields from the proto message requires a major version bump. +* Removing fields from the Protobuf Message requires a major version bump. * Adding (optional) fields can be done in a minor version bump. -In other words, this means that future minor versions of 2.x MAY add new optional fields to `io.prometheus.write.v2.Request`, new compressions, Proto Messages and negotiation mechanisms, as long as they are backwards compatible (e.g. optional to both Receiver and Sender). +In other words, this means that future minor versions of 2.x MAY add new optional fields to `io.prometheus.write.v2.Request`, new compressions, Protobuf Messages and negotiation mechanisms, as long as they are backwards compatible (e.g. optional to both Receiver and Sender). #### 2.x vs 1.x Compatibility -The 2.x protocol is breaking compatibility with 1.x by introducing a new, mandatory `io.prometheus.write.v2.Request` Proto Message and deprecating the `prometheus.WriteRequest`. +The 2.x protocol is breaking compatibility with 1.x by introducing a new, mandatory `io.prometheus.write.v2.Request` Protobuf Message and deprecating the `prometheus.WriteRequest`. -2.x Sender MAY support 1.x Receiver by allowing users to configure what content type Sender should use. 2.x Sender also MAY automatically fall back to different content types, if the Receiver returns 415 HTTP status code. +2.x Senders MAY support 1.x Receivers by allowing users to configure what content type Senders should use. 2.x Senders also MAY automatically fall back to different content types, if the Receiver returns 415 HTTP status code. -## Proto Message +## Protobuf Message ### `io.prometheus.write.v2.Request` -The `io.prometheus.write.v2.Request` references the new Proto Message that's meant to replace and deprecate the Remote-Write 1.0's `prometheus.WriteRequest` message. +The `io.prometheus.write.v2.Request` references the new Protobuf Message that's meant to replace and deprecate the Remote-Write 1.0's `prometheus.WriteRequest` message. -* `metadata` sub-fields SHOULD be provided. Receiver MAY reject series with unspecified `Metadata.type`. +* `metadata` sub-fields SHOULD be provided. Receivers MAY reject series with unspecified `Metadata.type`. * Exemplars SHOULD be provided if they exist for a series. -* `created_timestamp` SHOULD be provided for metrics that follow counter semantics (e.g. counters and histograms). Receiver MAY reject those series without `created_timestamp` being set. +* `created_timestamp` SHOULD be provided for metrics that follow counter semantics (e.g. counters and histograms). Receivers MAY reject those series without `created_timestamp` being set. The following subsections define some schema elements in detail. @@ -337,7 +337,7 @@ The following subsections define some schema elements in detail. -The `io.prometheus.write.v2.Request` Proto Message is designed to [intern all strings](https://en.wikipedia.org/wiki/String_interning) for the proven additional compression and memory efficiency gains on top of the standard compressions. +The `io.prometheus.write.v2.Request` Protobuf Message is designed to [intern all strings](https://en.wikipedia.org/wiki/String_interning) for the proven additional compression and memory efficiency gains on top of the standard compressions. The `symbols` table MUST be provided and it MUST contain deduplicated strings used in series, exemplar labels, and metadata strings. The first element of the `symbols` table MUST be an empty string, which is used to represent empty or unspecified values such as when `Metadata.unit_ref` or `Metadata.help_ref` are not provided. References MUST point to the existing index in the `symbols` string array. @@ -363,29 +363,29 @@ Names that do not adhere to the above, might be harder to use for PromQL users ( Label names beginning with "__" are RESERVED for system usage and SHOULD NOT be used, see [Prometheus Data Model](https://prometheus.io/docs/concepts/data_model/). -Receiver also MAY impose limits on the number and length of labels, but this is receiver-specific and is out of the scope for this document. +Receivers also MAY impose limits on the number and length of labels, but this is receiver-specific and is out of the scope of this document. #### Samples and Histogram Samples -Sender MUST send `samples` (or `histograms`) for any given `TimeSeries` in timestamp order. Sender MAY send multiple requests for different series in parallel. +Senders MUST send `samples` (or `histograms`) for any given `TimeSeries` in timestamp order. Senders MAY send multiple requests for different series in parallel. -Sender SHOULD send stale markers when a time series will no longer be appended to. -Sender MUST send stale markers if the discontinuation of time series is possible to detect, for example: +Senders SHOULD send stale markers when a time series will no longer be appended to. +Senders MUST send stale markers if the discontinuation of time series is possible to detect, for example: * For series that were pulled (scraped), unless explicit timestamp was used. * For series that is resulted by a recording rule evaluation. -Generally, not sending stale markers for series that are discontinued can lead to Receiver [non-trivial query time alignment issues](https://prometheus.io/docs/prometheus/latest/querying/basics/#staleness). +Generally, not sending stale markers for series that are discontinued can lead to the Receiver [non-trivial query time alignment issues](https://prometheus.io/docs/prometheus/latest/querying/basics/#staleness). Stale markers MUST be signalled by the special NaN value `0x7ff0000000000002`. This value MUST NOT be used otherwise. -Typically, Sender can detect when a time series will no longer be appended using the following techniques: +Typically, Senders can detect when a time series will no longer be appended using the following techniques: 1. Detecting, using service discovery, that the target exposing the series has gone away. 1. Noticing the target is no longer exposing the time series between successive scrapes. @@ -419,31 +419,31 @@ The same as in [1.0](./remote_write_spec.md#out-of-scope). This section contains speculative plans that are not considered part of protocol specification yet but are mentioned here for completeness. Note that 2.0 specification completed [2 of 3 future plans in the 1.0](./remote_write_spec.md#future-plans). -* **Transactionality** There is still no transactionality defined for 2.0 specification, mostly because it makes scalable Sender implementation difficult. Prometheus Sender aims at being "transactional" - i.e. to never expose a partially scraped target to a query. We intend to do the same with Remote-Write -- for instance, in the future we would like to "align" Remote-Write with scrapes, perhaps such that all the samples, metadata and exemplars for a single scrape are sent in a single Remote-Write request. +* **Transactionality** There is still no transactionality defined for 2.0 specification, mostly because it makes a scalable Sender implementation difficult. Prometheus Sender aims at being "transactional" - i.e. to never expose a partially scraped target to a query. We intend to do the same with Remote-Write -- for instance, in the future we would like to "align" Remote-Write with scrapes, perhaps such that all the samples, metadata and exemplars for a single scrape are sent in a single Remote-Write request. - However, Remote-Write 2.0 specification solves an important transactionality problem for [the classic histogram buckets](https://docs.google.com/document/d/1mpcSWH1B82q-BtJza-eJ8xMLlKt6EJ9oFGH325vtY1Q/edit#heading=h.ueg7q07wymku). This is done thanks to the native histograms supporting custom bucket-ing possible with the `io.prometheus.write.v2.Request` wire format. Sender might translate all classic histograms to native histograms this way, but it's out of this specification to mandate this. However, for this reason, Receiver MAY ignore certain metric types (e.g. classic histograms). + However, Remote-Write 2.0 specification solves an important transactionality problem for [the classic histogram buckets](https://docs.google.com/document/d/1mpcSWH1B82q-BtJza-eJ8xMLlKt6EJ9oFGH325vtY1Q/edit#heading=h.ueg7q07wymku). This is done thanks to the native histograms supporting custom bucket-ing possible with the `io.prometheus.write.v2.Request` wire format. Senders might translate all classic histograms to native histograms this way, but it's out of this specification to mandate this. However, for this reason, Receivers MAY ignore certain metric types (e.g. classic histograms). -* **Alternative wire formats**. The OpenTelemetry community has shown the validity of Apache Arrow (and potentially other columnar formats) for over-wire data transfer with their OTLP protocol. We would like to do experiments to confirm the compatibility of a similar format with Prometheus’ data model and include benchmarks of any resource usage changes. We would potentially maintain both a protobuf and columnar format long term for compatibility reasons and use our content negotiation to add different proto messages for this purpose. +* **Alternative wire formats**. The OpenTelemetry community has shown the validity of Apache Arrow (and potentially other columnar formats) for over-wire data transfer with their OTLP protocol. We would like to do experiments to confirm the compatibility of a similar format with Prometheus’ data model and include benchmarks of any resource usage changes. We would potentially maintain both a protobuf and columnar format long term for compatibility reasons and use our content negotiation to add different Protobuf Messages for this purpose. -* **Global symbols**. Pre-defined string dictionary for interning The protocol could pre-define a static dictionary of ref->symbol that includes strings that are considered common, e.g. “namespace”, “le”, “job”, “seconds”, “bytes”, etc. Sender could refer to these without the need to include them in the request’s symbols table. This dictionary could incrementally grow with a minor version releases of this protocol. +* **Global symbols**. Pre-defined string dictionary for interning The protocol could pre-define a static dictionary of ref->symbol that includes strings that are considered common, e.g. “namespace”, “le”, “job”, “seconds”, “bytes”, etc. Senders could refer to these without the need to include them in the request’s symbols table. This dictionary could incrementally grow with minor version releases of this protocol. ## Related ### FAQ **Why did you not use gRPC?** -Because 1.0 protocol does not use gRPC, breaking it would increase friction in the adoption. See 1.0 [reason](./remote_write_spec.md#faq). +Because the 1.0 protocol does not use gRPC, breaking it would increase friction in the adoption. See 1.0 [reason](./remote_write_spec.md#faq). **Why not stream protobuf messages?** -If you use persistent HTTP/1.1 connections, they are pretty close to streaming. Of course headers have to be re-sent, but yes that is less expensive than a new TCP set up. +If you use persistent HTTP/1.1 connections, they are pretty close to streaming. Of course, headers have to be re-sent, but that is less expensive than a new TCP set up. **Why do we send samples in order?** -The in-order constraint comes from the encoding we use for time series data in Prometheus, the implementation of which is optimized for append-only workloads. However, this requirement is also shared across many other databases and vendors in the ecosystem. In fact, [Prometheus with OOO feature enabled](https://youtu.be/qYsycK3nTSQ?t=1321), allows out-of-order writes, but with the performance penalty, thus reserved for rare events. To sum up, Receiver may support out-of-order ingestion, though it is not permitted by the specification. In the future e.g. 2.x spec versions, we could extend content type to negotiate the out-of-order writes, if needed. +The in-order constraint comes from the encoding we use for time series data in Prometheus, the implementation of which is optimized for append-only workloads. However, this requirement is also shared across many other databases and vendors in the ecosystem. In fact, [Prometheus with OOO feature enabled](https://youtu.be/qYsycK3nTSQ?t=1321), allows out-of-order writes, but with the performance penalty, thus reserved for rare events. To sum up, Receivers may support out-of-order ingestion, though it is not permitted by the specification. In the future e.g. 2.x spec versions, we could extend content type to negotiate the out-of-order writes, if needed. **How can we parallelise requests with the in-order constraint?** -Samples must be in-order _for a given series_. However, even if Receiver does not support out-of-order ingestion, the Remote-Write requests can be sent in parallel as long as they are for different series. Prometheus shards the samples by their labels into separate queues, and then writes happen sequentially in each queue. This guarantees samples for the same series are delivered in order, but samples for different series are sent in parallel - and potentially "out of order" between different series. +Samples must be in-order _for a given series_. However, even if a Receiver does not support out-of-order ingestion, the Remote-Write requests can be sent in parallel as long as they are for different series. Prometheus shards the samples by their labels into separate queues, and then writes happen sequentially in each queue. This guarantees samples for the same series are delivered in order, but samples for different series are sent in parallel - and potentially "out of order" between different series. **What are the differences between Remote-Write 2.0 and OpenTelemetry's OTLP protocol?** [OpenTelemetry OTLP](https://github.com/open-telemetry/opentelemetry-proto/blob/a05597bff803d3d9405fcdd1e1fb1f42bed4eb7a/docs/specification.md) is a protocol for transporting of telemetry data (such as metrics, logs, traces and profiles) between telemetry sources, intermediate nodes and telemetry backends. The recommended transport involves gRPC with protobuf, but HTTP with protobuf or JSON are also described. It was designed from scratch with the intent to support a variety of different observability signals, data types and extra information. For [metrics](https://github.com/open-telemetry/opentelemetry-proto/blob/main/opentelemetry/proto/metrics/v1/metrics.proto) that means additional non-identifying labels, flags, temporal aggregations types, resource or scoped metrics, schema URLs and more. OTLP also requires [the semantic convention](https://opentelemetry.io/docs/concepts/semantic-conventions/) to be used. -Remote-Write was designed for simplicity, efficiency and organic growth. The first version was officially released in 2023, when already [dozens of battle-tested adopters in the CNCF ecosystem](./remote_write_spec.md#compatible-senders-and-receivers) had been using this protocol for years. Remote-Write 2.0 iterates on the previous protocol by adding a few new elements (metadata, exemplars, created timestamp and native histograms) and string interning. Remote-Write 2.0 is always stateless, focuses only on metrics and is opinionated; as such it is scoped down to elements that Prometheus community considers enough to have a robust metric solution. The intention is to ensure the Remote-Write is a stable protocol that is a cheaper and simpler to adopt and use than the alternatives in the observability ecosystem. +Remote-Write was designed for simplicity, efficiency and organic growth. The first version was officially released in 2023, when already [dozens of battle-tested adopters in the CNCF ecosystem](./remote_write_spec.md#compatible-senders-and-receivers) had been using this protocol for years. Remote-Write 2.0 iterates on the previous protocol by adding a few new elements (metadata, exemplars, created timestamp and native histograms) and string interning. Remote-Write 2.0 is always stateless, focuses only on metrics and is opinionated; as such it is scoped down to elements that the Prometheus community considers enough to have a robust metric solution. The intention is to ensure the Remote-Write is a stable protocol that is cheaper and simpler to adopt and use than the alternatives in the observability ecosystem.