Skip to content

Icinga for Windows Issues with Treshholdintervall on Windows Server 2012 #836

@Julian3452

Description

@Julian3452

Describe the bug

Hello everyone,
we are currently encountering some issues while working with icinga2 and Treshholdintervals on Windows Servers 2012 R2.

To Reproduce

I guess this can be reproduced on every Windows Server 2012 R2 trying to use Treshholdinterval.
(We already tried on different Server)

Expected behavior

We want to avoid the flapping state of the following 3 Services.

(In our environment, flapping describes a behavior where the service temporarily enters a 'not-OK' state for 2 minutes before returning to normal operation.)

Invoke-IcingaCheckCPU
Invoke-IcingaCheckMemory
Invoke-IcingaCheckDiskHealth (most affected Service)

We want to realize this with Threshold intervals (Avg. of 15 min).

Current behavior

Regarding OS Windows Server > 2012 R2

We configured the Threshholdintervall as mentoined below at config validation
'+ added on the Webinterface the Treshholdintervall "15m"

This one works without any Problems.

Regarding OS Windows Server = 2012 R2

At the beginning, we tried the same procedure as mentioned above on the affected host devices.
Sadly, it is not working at all.
We experimented with various configurations, but nothing seems to work correctly.
The issue is resolved if only 1 service check is registered.
Is there maybe a limitation on Windows Server 2012?

What we already tried in different ways:

  • Using -TimeIndexes 1,3,5,15 (default values in seconds) – only working with 1 registered check (Web Interface: Treshholdintervall 15)

  • Using -TimeIndexes 1m,3m,5m,15m (values in minutes) – same as above (Web Interface: Treshholdintervall 15m)

  • Using -TimeIndexes 60,180,300,900 (values in seconds instead of minutes) – same as above (Web Interface: Treshholdintervall 900 - tried also 900s)

  • Using only a single value (e.g. -TimeIndexes 900) – same as above (Web Interface: Treshholdintervall 900 - tried also 900s)

In addition, we Updated Cache, Flushed API Directory, Updated JEA Profile, reinstalled the Framework, checked the permissions of the Icinga User and everything just seems to be fine.

When registering all 3 Checks, we encounter 2 different Types of Issues:

First one :

The Services change, as expected, from the uknown state to up. Instead of 1 Service (the first one which is configured) they only update their value once. This value stays unchanged even after several hours or days ("check now" button even has no effect)

Second one:

Error:
[Failed to parse metrics over time with -ThresholdInterval "15": No data found matching the requested time index. Available indexes: [180s, 300s, 900s, 60s]]

Curious is, in both cases, the grafana graph underneath is still updating and displays changing values.

Our Environment

  • Versions used:

vers. Agent : 2.15.0
vers. Framework : 1.13.3 (also tried 1.13.2)
vers. Plugins : 1.13.1
vers. MSSQL : 1.5.0
vers. Hyper-V : 1.3.1
vers. Cluster : 1.3.0

  • Operating System and version: Windows Server 2012 R2 Datacenter

  • Icinga Web 2 version and modules (System - About):

Icinga Web 2 Version | 2.12.4
audit | 1.0.3
businessprocess | 2.5.2
icingadb | 1.2.0
cube | 1.3.3
director | rox-prod
doc | 2.12.4
global-dashboards | 1.0
grafana | 3.1.1
incubator | 0.22.0
translation | 2.12.4

  • Config validation (icinga2 daemon):
Start-IcingaServiceCheckDeamon
-----------
No arguments defined

Invoke-IcingaCheckMemory
-----------
Arguments    =>
CheckCommand => Invoke-IcingaCheckMemory
Id           => 225104251186107973819916186121177195202524160180123241
Interval     => 30
TimeIndexes  => 1m, 3m, 5m, 15m

Invoke-IcingaCheckDiskHealth
-----------
Arguments    =>
CheckCommand => Invoke-IcingaCheckDiskHealth
Id           => 19311275010219321079161492502312398169205312620245
Interval     => 30
TimeIndexes  => 1m, 3m, 5m, 15m

Invoke-IcingaCheckCPU
-----------
Arguments    =>
CheckCommand => Invoke-IcingaCheckCPU
Id           => 5275219864641021224811420224776891459631192206
Interval     => 30
TimeIndexes  => 1m, 3m, 5m, 15m
*PowerShell Version: 5.1.14409.2001

Metadata

Metadata

Assignees

Labels

InvestigationThe team is looking into the cause of the issue

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions