Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion conf/ems/9.6.0/ems.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1041,4 +1041,9 @@ events:

- name: smbc.pfo.completed
exports:
- parameters.dstpath => dst_path
- parameters.dstpath => dst_path

- name: callhome.data.outage.detected
exports:
- ^^node.name => node
- parameters.subject => subject
24 changes: 24 additions & 0 deletions container/prometheus/ems_alert_rules.yml
Original file line number Diff line number Diff line change
Expand Up @@ -531,6 +531,30 @@ groups:
impact: "Availability"
runbook: "https://netapp.github.io/harvest/nightly/resources/ems-alert-runbook/#nvram-battery-low"

- alert: Data Outage Detected
expr: last_over_time(ems_events{message="callhome.data.outage.detected"}[1d]) == 1
labels:
severity: >
{{- if $labels.severity -}}
{{- if eq $labels.severity "alert" -}}
critical
{{- else if eq $labels.severity "error" -}}
warning
{{- else if eq $labels.severity "emergency" -}}
critical
{{- else if eq $labels.severity "notice" -}}
info
{{- else if eq $labels.severity "informational" -}}
info
{{- else -}}
{{ $labels.severity }}
{{- end -}}
{{- end -}}
annotations:
summary: "Call home for {{ $labels.subject }} on node {{ $labels.node }}"
impact: "Availability"
runbook: "https://netapp.github.io/harvest/nightly/resources/ems-alert-runbook/#data-outage-detected"

- alert: HA Interconnect Down
expr: last_over_time(ems_events{message="callhome.hainterconnect.down"}[1d]) == 1
labels:
Expand Down
14 changes: 14 additions & 0 deletions docs/resources/ems-alert-runbook.md
Original file line number Diff line number Diff line change
Expand Up @@ -401,6 +401,20 @@ Perform the following corrective actions:
2. If the battery was replaced recently or the system was non-operational for an extended period of time, monitor the battery to verify that it is charging properly.
3. Contact NetApp technical support if the battery runtime continues to decrease below critical levels, and the storage system shuts down automatically.

### Data Outage Detected

**Impact**: Availability

**EMS Event**: `callhome.data.outage.detected`

This message occurs when the system detects that it has encountered an outage prior to this boot.
If your system is configured to do so, it generates and transmits an AutoSupport (or 'call home') message to NetApp technical support and to the configured destinations.
Successful delivery of an AutoSupport message significantly improves problem determination and resolution.

**Remediation**

Contact NetApp technical support.

### NetBIOS Name Conflict

**Impact**: Availability
Expand Down
4 changes: 2 additions & 2 deletions integration/test/alert_rule_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -185,8 +185,8 @@ func parseEmsLabels(exports *node.Node) string {
var labels []string
if exports != nil {
for _, export := range exports.GetAllChildContentS() {
name, display, _, _ := template.ParseMetric(export)
if strings.HasPrefix(name, "parameters") {
_, display, _, _ := template.ParseMetric(export)
if display != "" {
labels = append(labels, display)
}
}
Expand Down
Loading