[feat][monitor] PIP-223: Add metrics for all rest endpoints.#21772
[feat][monitor] PIP-223: Add metrics for all rest endpoints.#21772dao-jun wants to merge 17 commits intoapache:masterfrom
Conversation
|
refers to: #18836 |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #21772 +/- ##
============================================
+ Coverage 73.57% 73.59% +0.01%
+ Complexity 32624 32153 -471
============================================
Files 1877 1878 +1
Lines 139502 139603 +101
Branches 15299 15321 +22
============================================
+ Hits 102638 102737 +99
+ Misses 28908 28878 -30
- Partials 7956 7988 +32
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Thanks for implementing this.
Please note that we have already started implementing PIP-264, which is Open Telemetry. We have added the infrastructure needed to define metrics, and we are underway of adding the first metrics.
Once we finish the first PR, can we ping you to add those metrics using OTel? Once that PR we fill finally know the naming conventions to use.
pulsar-broker/src/main/java/org/apache/pulsar/broker/web/RestEndpointMetricsFilter.java
Outdated
Show resolved
Hide resolved
pulsar-broker/src/main/java/org/apache/pulsar/broker/web/RestEndpointMetricsFilter.java
Outdated
Show resolved
Hide resolved
pulsar-broker/src/main/java/org/apache/pulsar/broker/web/RestEndpointMetricsFilter.java
Outdated
Show resolved
Hide resolved
pulsar-broker/src/main/java/org/apache/pulsar/broker/web/WebService.java
Outdated
Show resolved
Hide resolved
| admin.namespaces().deleteNamespace("test/test"); | ||
| admin.tenants().deleteTenant("test"); | ||
|
|
||
| ByteArrayOutputStream output = new ByteArrayOutputStream(); |
There was a problem hiding this comment.
Save yourself this code and the intimate knowledge of how it is implemented:
Create a client like this against the local broker.
prometheusMetricsClient = new PrometheusMetricsClient("127.0.0.1", pulsar.getListenPortHTTP().get());
Get the Metrics:
Metrics metrics = prometheusMetricsClient.getMetrics();
and assert example:
Metric backlogAgeMetric =
metrics.findSingleMetricByNameAndLabels("pulsar_storage_backlog_age_seconds",
Pair.of("topic", topic1));
assertThat(backlogAgeMetric.tags).containsExactly(
entry("cluster", CLUSTER_NAME),
entry("namespace", namespace),
entry("topic", topic1));
assertThat((long) backlogAgeMetric.value).isCloseTo(expectedMessageAgeSeconds, within(2L));
Add the method you lack at Metrics
pulsar-broker/src/test/java/org/apache/pulsar/broker/stats/BrokerRestEndpointMetricsTest.java
Outdated
Show resolved
Hide resolved
pulsar-broker/src/main/java/org/apache/pulsar/broker/PulsarService.java
Outdated
Show resolved
Hide resolved
pulsar-broker/src/main/java/org/apache/pulsar/broker/web/RestEndpointMetricsFilter.java
Outdated
Show resolved
Hide resolved
| private RestEndpointMetricsFilter(PulsarService pulsar) { | ||
| PulsarBrokerOpenTelemetry telemetry = pulsar.getOpenTelemetry(); | ||
| Meter meter = telemetry.getMeter(); | ||
| latency = meter.histogramBuilder("pulsar_broker_rest_endpoint_latency") |
There was a problem hiding this comment.
Please look at #22058 to understand how to do:
Instrument name, description, unit.
pulsar-broker/src/main/java/org/apache/pulsar/broker/web/RestEndpointMetricsFilter.java
Outdated
Show resolved
Hide resolved
pulsar-broker/src/main/java/org/apache/pulsar/broker/web/RestEndpointMetricsFilter.java
Outdated
Show resolved
Hide resolved
pulsar-broker/src/main/java/org/apache/pulsar/broker/web/RestEndpointMetricsFilter.java
Outdated
Show resolved
Hide resolved
pulsar-broker/src/main/java/org/apache/pulsar/broker/web/RestEndpointMetricsFilter.java
Show resolved
Hide resolved
|
@dragosvictor Could you please take a look that why my test keeps failing? |
| private RestEndpointMetricsFilter(PulsarService pulsar) { | ||
| PulsarBrokerOpenTelemetry telemetry = pulsar.getOpenTelemetry(); | ||
| Meter meter = telemetry.getMeter(); | ||
| latency = meter.histogramBuilder("pulsar_broker_rest_endpoint_latency") |
| Meter meter = openTelemetry.getMeter(); | ||
| latency = meter.histogramBuilder("pulsar_broker_rest_endpoint_latency") | ||
| .setDescription("Latency of REST endpoints in Pulsar broker") | ||
| .setUnit("ms") |
There was a problem hiding this comment.
Should be s, and measured in seconds: See https://opentelemetry.io/docs/specs/semconv/general/metrics/#instrument-units
| .build(); | ||
| } | ||
|
|
||
| private static volatile RestEndpointMetricsFilter instance; |
There was a problem hiding this comment.
Why does it need to be static? Can't you just created one instance of the filter and register it?
| UriRoutingContext info = (UriRoutingContext) req.getUriInfo(); | ||
| attrs = getRequestAttributes(info, statusCode); | ||
| } catch (Throwable ex) { | ||
| attrs = Attributes.of(PATH, "UNKNOWN", METHOD, req.getMethod(), CODE, (long) statusCode); |
|
|
||
| Object o = req.getProperty(REQUEST_START_TIME); | ||
| if (o instanceof Long start) { | ||
| long duration = System.currentTimeMillis() - start; |
There was a problem hiding this comment.
I read online now, and from from what I can gather, it's not recommended to use System.currentTimeMillis(). It's a native call to obtain the wall clock time. NTP for example can sync the clock and change. Not 100% sure but leap second may also change it. See: https://develotters.com/posts/how-not-to-measure-elapsed-time/
I think it is safer to use System.nanoTime()
| UriRoutingContext info = (UriRoutingContext) req.getUriInfo(); | ||
| attrs = getRequestAttributes(info, statusCode); | ||
| } catch (Throwable ex) { | ||
| attrs = Attributes.of(PATH, "UNKNOWN", METHOD, req.getMethod(), CODE, (long) statusCode); |
There was a problem hiding this comment.
I don't think it's a good practice. For example OOME will vanish like that.
I think if we don't have any specific exception in mind, no point in guarding this part.
| if (CollectionUtils.isEmpty(templates)) { | ||
| return Attributes.of(PATH, "UNKNOWN", METHOD, httpMethod, CODE, statusCode); | ||
| } | ||
| UriTemplate[] arr = templates.toArray(new UriTemplate[0]); |
There was a problem hiding this comment.
Why do you specifically need to convert this to an array? Can't you just iterate over the list using get(i)?
| UriTemplate[] arr = templates.toArray(new UriTemplate[0]); | ||
| int idx = arr.length - 1; | ||
| StringBuilder builder = new StringBuilder(); | ||
| for (; idx >= 0; idx--) { |
There was a problem hiding this comment.
Maybe for (int idx = arr.length - 1; ...?
| admin.tenants().deleteTenant("test"); | ||
|
|
||
| Collection<MetricData> metricDatas = pulsarTestContext.getOpenTelemetryMetricReader().collectAllMetrics(); | ||
| log.info("Metrics size: {}", metricDatas.size()); |
There was a problem hiding this comment.
Can be removed at this stage
|
|
||
| Collection<MetricData> metricDatas = pulsarTestContext.getOpenTelemetryMetricReader().collectAllMetrics(); | ||
| log.info("Metrics size: {}", metricDatas.size()); | ||
| Optional<MetricData> optional = metricDatas.stream().peek(m -> log.info("metric name: {}", m.getName())) |
There was a problem hiding this comment.
OpenTelemetry SDK has a beautiful AssertJ extension for MetricData. @dragosvictor used it in his PR. It's here: https://github.com/open-telemetry/opentelemetry-java/blob/f83c020d4d2a11f16be8af739790718bb3413cba/sdk/testing/src/main/java/io/opentelemetry/sdk/testing/assertj/OpenTelemetryAssertions.java#L50
PIP: #18560
Motivation
#18560
Modifications
Verifying this change
(Please pick either of the following options)
This change is a trivial rework / code cleanup without any test coverage.
(or)
This change is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Does this pull request potentially affect one of the following parts:
If the box was checked, please highlight the changes
Documentation
docdoc-requireddoc-not-neededdoc-completeMatching PR in forked repository
PR in forked repository: dao-jun#6