-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Add per catalog metastore metrics to QueryStats #26900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add per catalog metastore metrics to QueryStats #26900
Conversation
Reviewer's GuideThis PR extends the metadata API to collect and expose per-catalog metastore call metrics in QueryStats and QueryInfo. It adds SPI methods for listing active catalogs and fetching connector metrics, wraps Hive metastore calls with timing and failure counting, integrates metrics capture into QueryStateMachine on query completion or failure, and updates connectors and tests to support and validate the new metrics field. Class diagram for MeasuredHiveMetastore and metastore metrics integrationclassDiagram
class MeasuredHiveMetastore {
-HiveMetastore delegate
-MetastoreApiCallStats allApiCallsStats
-Map<String, MetastoreApiCallStats> apiCallStats
-Ticker ticker
+Metrics getMetrics()
+<all HiveMetastore methods> (wrapped)
}
class MetastoreApiCallStats {
-TDigest timeNanosDistribution
-long totalTimeNanos
-long totalFailures
+addTime(long)
+addFailure()
+put(ImmutableMap.Builder<String, Metric<?>>, String)
}
class MeasuredMetastoreFactory {
-HiveMetastoreFactory metastoreFactory
+createMetastore(Optional<ConnectorIdentity>)
+isImpersonationEnabled()
}
MeasuredHiveMetastore --> HiveMetastore : delegates
MeasuredHiveMetastore --> MetastoreApiCallStats : uses
MeasuredMetastoreFactory --> MeasuredHiveMetastore : creates
MeasuredMetastoreFactory --> HiveMetastoreFactory : delegates
class HiveMetastore {
<<interface>>
+getMetrics() : Metrics
+<other methods>
}
MeasuredHiveMetastore ..|> HiveMetastore
class Metrics {
+Map<String, Metric<?>> metrics
}
MeasuredHiveMetastore --> Metrics : returns
MetastoreApiCallStats --> Metric : builds
class Metric {
<<interface>>
}
Class diagram for QueryStats and catalogMetadataMetrics fieldclassDiagram
class QueryStats {
+Map<String, Metrics> catalogMetadataMetrics
+getCatalogMetadataMetrics()
+<other fields and methods>
}
QueryStats --> Metrics : contains
class Metrics {
+Map<String, Metric<?>> metrics
}
Metrics --> Metric : contains
class Metric {
<<interface>>
}
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- Consolidate the duplicated assertCountMetricExists/assertDistributionMetricExists helpers in BaseHiveConnectorTest and BaseIcebergConnectorTest into a shared test utility to reduce code duplication.
- Centralize the collectCatalogMetadataMetrics invocation in QueryStateMachine (rather than calling it separately in both transitionToFinishing and transitionToFailed) to DRY up the code and ensure consistency.
- Consider refactoring the very large MeasuredHiveMetastore class by extracting the stats‐collection logic into smaller, focused components or utility classes to improve readability and maintainability.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consolidate the duplicated assertCountMetricExists/assertDistributionMetricExists helpers in BaseHiveConnectorTest and BaseIcebergConnectorTest into a shared test utility to reduce code duplication.
- Centralize the collectCatalogMetadataMetrics invocation in QueryStateMachine (rather than calling it separately in both transitionToFinishing and transitionToFailed) to DRY up the code and ensure consistency.
- Consider refactoring the very large MeasuredHiveMetastore class by extracting the stats‐collection logic into smaller, focused components or utility classes to improve readability and maintainability.
## Individual Comments
### Comment 1
<location> `plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java:9225-9226` </location>
<code_context>
assertQuerySucceeds("CALL system.flush_metadata_cache()");
}
+ @Test
+ public void testCatalogMetadataMetrics()
+ {
+ MaterializedResultWithPlan result = getQueryRunner().executeWithPlan(
</code_context>
<issue_to_address>
**suggestion (testing):** Missing test for metrics with multiple catalogs.
Please add a test that runs a query across multiple catalogs to ensure metrics are tracked separately for each.
Suggested implementation:
```java
@Test
public void testCatalogMetadataMetrics()
{
MaterializedResultWithPlan result = getQueryRunner().executeWithPlan(
getSession(),
"SELECT count(*) FROM region r, nation n WHERE r.regionkey = n.regionkey");
Map<String, Metrics> metrics = getCatalogMetadataMetrics(result.queryId());
assertCountMetricExists(metrics, "iceberg", "metastore.all.time.total");
assertDistributionMetricExists(metrics, "iceberg", "metastore.all.time.distribution");
assertCountMetricExists(metrics, "iceberg", "metastore.getTable.time.total");
assertDistributionMetricExists(metrics, "iceberg", "metastore.getTable.time.distribution");
}
@Test
public void testCatalogMetadataMetricsWithMultipleCatalogs()
{
// Assume "iceberg" and "tpch" catalogs are available for testing
MaterializedResultWithPlan result = getQueryRunner().executeWithPlan(
getSession(),
"SELECT count(*) FROM iceberg.region r JOIN tpch.nation n ON r.regionkey = n.regionkey");
Map<String, Metrics> metrics = getCatalogMetadataMetrics(result.queryId());
// Assert metrics for iceberg catalog
assertCountMetricExists(metrics, "iceberg", "metastore.all.time.total");
assertDistributionMetricExists(metrics, "iceberg", "metastore.all.time.distribution");
assertCountMetricExists(metrics, "iceberg", "metastore.getTable.time.total");
assertDistributionMetricExists(metrics, "iceberg", "metastore.getTable.time.distribution");
// Assert metrics for tpch catalog (replace with actual metric names if different)
assertCountMetricExists(metrics, "tpch", "metastore.all.time.total");
assertDistributionMetricExists(metrics, "tpch", "metastore.all.time.distribution");
assertCountMetricExists(metrics, "tpch", "metastore.getTable.time.total");
assertDistributionMetricExists(metrics, "tpch", "metastore.getTable.time.distribution");
}
```
- If the "tpch" catalog does not support the same metrics, adjust the metric names or assertions accordingly.
- Ensure that the catalogs "iceberg" and "tpch" are available and configured in your test environment.
- If you use different catalogs, update the catalog names in the test.
</issue_to_address>
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java
Show resolved
Hide resolved
There are related CI failures. Moving to draft until I fix it |
57c65e9
to
2b8cd5c
Compare
2b8cd5c
to
0c78e13
Compare
a120650
to
75ab2b8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- The MeasuredHiveMetastore class manually wraps every HiveMetastore method, which leads to a lot of boilerplate; consider using a dynamic proxy or an abstract base wrapper to automatically instrument all methods and reduce duplication.
- The Hive and Iceberg connector tests duplicate the same metric‐assertion logic; extracting the
assertCountMetricExists
andassertDistributionMetricExists
helpers into a shared base test would DRY up the code and centralize metric validation. - Since connectors now rely on the new getMetrics/listActiveCatalogs SPI methods, add a quick check or lint to ensure every connector overrides these (or explicitly opts out) so no catalog is left without metrics by accident.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The MeasuredHiveMetastore class manually wraps every HiveMetastore method, which leads to a lot of boilerplate; consider using a dynamic proxy or an abstract base wrapper to automatically instrument all methods and reduce duplication.
- The Hive and Iceberg connector tests duplicate the same metric‐assertion logic; extracting the `assertCountMetricExists` and `assertDistributionMetricExists` helpers into a shared base test would DRY up the code and centralize metric validation.
- Since connectors now rely on the new getMetrics/listActiveCatalogs SPI methods, add a quick check or lint to ensure every connector overrides these (or explicitly opts out) so no catalog is left without metrics by accident.
## Individual Comments
### Comment 1
<location> `core/trino-main/src/main/java/io/trino/execution/QueryStateMachine.java:392-390` </location>
<code_context>
return queryStateMachine;
}
+ private void collectCatalogMetadataMetrics()
+ {
+ // collect the metrics only once. This avoid issue with transaction being removed
+ // after the check but before the metrics collection
+ if (catalogMetadataMetricsCollected.compareAndSet(false, true)) {
+ if (session.getTransactionId().filter(transactionManager::transactionExists).isEmpty()) {
+ // The metrics collection depends on active transaction as the metrics
+ // are stored in the transactional ConnectorMetadata, but the collection can be
+ // run after the query has failed e.g., via cancel.
+ return;
+ }
+
+ ImmutableMap.Builder<String, Metrics> catalogMetadataMetrics = ImmutableMap.builder();
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Consider handling exceptions during metrics collection to avoid masking query state transitions.
Unexpected exceptions during metrics collection may interrupt query state transitions. Wrapping this logic in a try-catch and logging errors will improve robustness, especially during transitions like finishing or failure.
Suggested implementation:
```java
private void collectCatalogMetadataMetrics()
{
// collect the metrics only once. This avoid issue with transaction being removed
// after the check but before the metrics collection
if (catalogMetadataMetricsCollected.compareAndSet(false, true)) {
if (session.getTransactionId().filter(transactionManager::transactionExists).isEmpty()) {
// The metrics collection depends on active transaction as the metrics
// are stored in the transactional ConnectorMetadata, but the collection can be
// run after the query has failed e.g., via cancel.
return;
}
try {
ImmutableMap.Builder<String, Metrics> catalogMetadataMetrics = ImmutableMap.builder();
for (CatalogInfo activeCatalog : metadata.listActiveCatalogs(session)) {
Metrics metrics = metadata.getMetrics(session, activeCatalog.catalogName());
if (!metrics.getMetrics().isEmpty()) {
catalogMetadataMetrics.put(activeCatalog.catalogName(), metrics);
}
}
this.catalogMetadataMetrics.set(catalogMetadataMetrics.buildOrThrow());
}
catch (Exception e) {
log.error(e, "Error collecting catalog metadata metrics for query %s", queryId);
}
}
}
```
If the logger (`log`) is not already defined in this class, you should add:
```java
private static final Logger log = Logger.get(QueryStateMachine.class);
```
at the top of the class, with the appropriate import:
```java
import io.airlift.log.Logger;
```
</issue_to_address>
### Comment 2
<location> `lib/trino-metastore/src/main/java/io/trino/metastore/MeasuredHiveMetastore.java:46-54` </location>
<code_context>
+ private final HiveMetastore delegate;
+ private final MetastoreApiCallStats allApiCallsStats = new MetastoreApiCallStats();
+ private final Map<String, MetastoreApiCallStats> apiCallStats = new ConcurrentHashMap<>();
+ private final Ticker ticker = Ticker.systemTicker();
+
+ public MeasuredHiveMetastore(HiveMetastore delegate)
</code_context>
<issue_to_address>
**suggestion:** Consider allowing ticker injection for testability.
Injecting the ticker through the constructor would make it easier to test time-dependent logic.
```suggestion
private final HiveMetastore delegate;
private final MetastoreApiCallStats allApiCallsStats = new MetastoreApiCallStats();
private final Map<String, MetastoreApiCallStats> apiCallStats = new ConcurrentHashMap<>();
private final Ticker ticker;
public MeasuredHiveMetastore(HiveMetastore delegate)
{
this(delegate, Ticker.systemTicker());
}
public MeasuredHiveMetastore(HiveMetastore delegate, Ticker ticker)
{
this.delegate = requireNonNull(delegate, "delegate is null");
this.ticker = requireNonNull(ticker, "ticker is null");
}
```
</issue_to_address>
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
lib/trino-metastore/src/main/java/io/trino/metastore/MeasuredHiveMetastore.java
Show resolved
Hide resolved
@findepi @raunaqmorarka This is ready for review. There is one CI failure, but it is unrelated. |
test (plugin/trino-lakehouse) this job hanged it contains a couple errors like this
and then
are they related? |
75ab2b8
to
15f1657
Compare
Yes, the impl for the |
core/trino-main/src/main/java/io/trino/execution/QueryStateMachine.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/execution/QueryStateMachine.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/test/java/io/trino/execution/TestQueryStateMachine.java
Show resolved
Hide resolved
...rino-tests/src/test/java/io/trino/connector/system/metadata/TestSystemMetadataConnector.java
Show resolved
Hide resolved
lib/trino-metastore/src/main/java/io/trino/metastore/MeasuredHiveMetastore.java
Outdated
Show resolved
Hide resolved
lib/trino-metastore/src/main/java/io/trino/metastore/MeasuredHiveMetastore.java
Show resolved
Hide resolved
lib/trino-metastore/src/main/java/io/trino/metastore/MeasuredHiveMetastore.java
Show resolved
Hide resolved
lib/trino-metastore/src/main/java/io/trino/metastore/MeasuredHiveMetastore.java
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/BaseHiveConnectorTest.java
Outdated
Show resolved
Hide resolved
c3d6dcb
to
8be4339
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review @findepi ! I addressed the comments
core/trino-main/src/main/java/io/trino/execution/QueryStateMachine.java
Outdated
Show resolved
Hide resolved
8be4339
to
879926b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed some comments
lib/trino-metastore/src/main/java/io/trino/metastore/MeasuredHiveMetastore.java
Show resolved
Hide resolved
lib/trino-metastore/src/main/java/io/trino/metastore/MeasuredHiveMetastore.java
Show resolved
Hide resolved
lib/trino-metastore/src/main/java/io/trino/metastore/MeasuredHiveMetastore.java
Show resolved
Hide resolved
lib/trino-metastore/src/main/java/io/trino/metastore/MeasuredHiveMetastore.java
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/BaseHiveConnectorTest.java
Outdated
Show resolved
Hide resolved
d014746
to
a4aa981
Compare
823da64
to
107045c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comments addressed
@Override | ||
public Metrics getMetrics() | ||
{ | ||
return metastore.getMetrics(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal of this PR is to add Hive metastore metrics, as this is often the source of perf problems.
Support for REST and glue catalogs can be added as a follow-up.
...tests/src/test/java/io/trino/connector/informationschema/TestInformationSchemaConnector.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/tracing/TracingConnectorMetadata.java
Outdated
Show resolved
Hide resolved
107045c
to
dc0bbec
Compare
dc0bbec
to
b1b0f26
Compare
rebased on master to resolve conflicts |
d3abfd3
to
bdb5bb0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm % minor comment
core/trino-spi/src/main/java/io/trino/spi/metrics/Distribution.java
Outdated
Show resolved
Hide resolved
397ea58
to
2256f6a
Compare
client/trino-jdbc/src/test/java/io/trino/jdbc/TestTrinoDatabaseMetaData.java
Outdated
Show resolved
Hide resolved
2256f6a
to
b4b521a
Compare
Before Benchmark (valuesPerHistogram) Mode Cnt Score Error Units BenchmarkHistogram.benchmarkTDigestSnapshot 10 avgt 10 152.097 ± 22.783 ns/op BenchmarkHistogram.benchmarkTDigestSnapshot 100 avgt 10 350.355 ± 13.313 ns/op BenchmarkHistogram.benchmarkTDigestSnapshot 1000 avgt 10 413.406 ± 6.479 ns/op After Benchmark (valuesPerHistogram) Mode Cnt Score Error Units BenchmarkHistogram.benchmarkTDigestSnapshot 10 avgt 10 52.963 ± 23.871 ns/op BenchmarkHistogram.benchmarkTDigestSnapshot 100 avgt 10 106.029 ± 0.304 ns/op BenchmarkHistogram.benchmarkTDigestSnapshot 1000 avgt 10 109.722 ± 0.327 ns/op
Metadata and QueryStateMachine must use the same `TransactionManager` instance.
The goal is to expose in QueryStats, per catalog, connector-specific metrics like metastore api call stats.
b4b521a
to
7ba8e6f
Compare
@findepi @raunaqmorarka The build is finally green. We should take the chance... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm % test failures
1473e44
to
a8f8de0
Compare
Description
Slow metastore can be a root cause of slow analysis or planning. This adds explicit metrics to the
QueryStats
with remote metastore call stats made for a given query.This is what this looks like in the query.json:
Additional context and related issues
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( X) Release notes are required, with the following suggested text: