Conversation
There was a problem hiding this comment.
Pull request overview
Adds first-class E-Series SSD cache observability by introducing new configuration + performance counters, derived metrics, and a dedicated Grafana dashboard, along with regenerated metrics documentation/metadata.
Changes:
- Add E-Series SSD cache config collector support (capacity, drive/volume mappings) and ESeriesPerf SSD cache performance polling + derived hit/allocation/utilization metrics.
- Introduce a new “E-Series: SSD Cache” Grafana dashboard wired to the new Prometheus metrics.
- Update generated metric metadata/docs and counter-definition sources to include the new SSD cache metrics.
Reviewed changes
Copilot reviewed 23 out of 24 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| mcp/metadata/eseries_metrics.json | Adds SSD cache metric descriptions for generated metadata. |
| grafana/dashboards/eseries/ssd_cache.json | New Grafana dashboard for SSD cache capacity/perf. |
| docs/storagegrid-metrics.md | Regenerated metrics doc creation date. |
| docs/ontap-metrics.md | Regenerated metrics doc creation date. |
| docs/eseries-metrics.md | Adds SSD cache metrics documentation + dashboard panel references. |
| docs/cisco-switch-metrics.md | Regenerated metrics doc creation date. |
| conf/eseriesperf/static_counter_definitions.yaml | Adds static counter definitions for SSD cache perf counters. |
| conf/eseriesperf/default.yaml | Enables SsdCache object in ESeriesPerf default config. |
| conf/eseriesperf/11.80.0/ssd_cache.yaml | New ESeriesPerf SSD cache template (statistics endpoint + plugins). |
| conf/eseries/default.yaml | Enables SsdCache object in E-Series default config. |
| conf/eseries/11.80.0/ssd_cache.yaml | New E-Series SSD cache template (config endpoint + capacity plugin). |
| cmd/tools/generate/eseries_counter.yaml | Adds SSD cache metrics to generated counter catalog source. |
| cmd/collectors/eseriesperf/testdata/ssd_cache1.json | Test data for SSD cache perf polling (poll 1). |
| cmd/collectors/eseriesperf/testdata/ssd_cache2.json | Test data for SSD cache perf polling (poll 2). |
| cmd/collectors/eseriesperf/testdata/ssd_cache_zero_io.json | Test data for zero-I/O scenario. |
| cmd/collectors/eseriesperf/testdata/ssd_cache_single_controller.json | Test data for single-controller scenario. |
| cmd/collectors/eseriesperf/plugins/ssdcachestats/ssdcachestats.go | New plugin computing SSD cache derived percent metrics. |
| cmd/collectors/eseriesperf/plugins/ssdcachestats/ssdcachestats_test.go | Unit tests for SSD cache derived-metric calculations. |
| cmd/collectors/eseriesperf/eseriesperf.go | Adds SSD cache ID discovery, URL templating, plugin loading, timestamp fallback. |
| cmd/collectors/eseriesperf/eseriesperf_test.go | Adds ESeriesPerf SSD cache tests for timestamp/rates/deltas. |
| cmd/collectors/eseries/rest/url_builder.go | Adds {ssd_cache_id} placeholder replacement support. |
| cmd/collectors/eseries/rest/client.go | Changes non-200 error handling/logging behavior. |
| cmd/collectors/eseries/plugins/ssdcachecapacity/ssdcachecapacity.go | New plugin to enrich SSD cache capacity with max/additional + drive/volume label matrices. |
| cmd/collectors/eseries/eseries.go | Registers SsdCacheCapacity plugin for the E-Series collector. |
Comments suppressed due to low confidence (1)
cmd/collectors/eseries/rest/client.go:269
- For non-200 responses (other than 401), the returned RestError is built with only StatusCode and API, and the prior log that included HTTP status/body was removed. This will make failures hard to diagnose. Consider including at least
Message(res.Status)(and optionally a sanitized/truncated response body) in the error, or re-adding structured logging at an appropriate level.
if res.StatusCode != http.StatusOK {
return nil, errs.NewRest().
StatusCode(res.StatusCode).
API(endpoint).
Build()
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 23 out of 24 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 23 out of 24 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (2)
cmd/collectors/eseriesperf/eseriesperf.go:205
- This
if _, exists := ep.Prop.Metrics[timestampMetric]; !exists { ... }block is unreachable becausetimestampMetricis selected by iterating overep.Prop.Metricskeys right above, so it will always exist when non-empty. Consider simplifying by removing the redundant existence check (or settimestampMetricfrom a different source if that was the intent).
// Fallback: look for timestamp metric (e.g., statistics.timestamp for SSD cache)
if timestampMetric == "" {
for metricName := range ep.Prop.Metrics {
if strings.HasSuffix(metricName, "timestamp") {
timestampMetric = metricName
break
}
}
}
if timestampMetric != "" {
if _, exists := ep.Prop.Metrics[timestampMetric]; !exists {
ep.Prop.Metrics[timestampMetric] = &eseries.Metric{
Label: "observedTimeInMS",
Name: timestampMetric,
Exportable: false,
}
}
}
cmd/collectors/eseries/rest/client.go:269
- For non-200 (non-401) responses, the prior structured log with
status,url, and responsebodyhas been removed. Since the returnedRestErrorcurrently includes onlyStatusCodeandAPI, this makes troubleshooting API failures significantly harder in production. Consider restoring some form of diagnostic context (e.g., includeres.Status/body inRestError.Message, or log it at debug level with redaction if needed).
}
if res.StatusCode != http.StatusOK {
return nil, errs.NewRest().
StatusCode(res.StatusCode).
API(endpoint).
Build()
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
No description provided.