-
Notifications
You must be signed in to change notification settings - Fork 31
Description
Describe the bug
Hello everyone,
we are currently encountering some issues while working with icinga2 and Treshholdintervals on Windows Servers 2012 R2.
To Reproduce
I guess this can be reproduced on every Windows Server 2012 R2 trying to use Treshholdinterval.
(We already tried on different Server)
Expected behavior
We want to avoid the flapping state of the following 3 Services.
(In our environment, flapping describes a behavior where the service temporarily enters a 'not-OK' state for 2 minutes before returning to normal operation.)
Invoke-IcingaCheckCPU
Invoke-IcingaCheckMemory
Invoke-IcingaCheckDiskHealth (most affected Service)
We want to realize this with Threshold intervals (Avg. of 15 min).
Current behavior
Regarding OS Windows Server > 2012 R2
We configured the Threshholdintervall as mentoined below at config validation
'+ added on the Webinterface the Treshholdintervall "15m"
This one works without any Problems.
Regarding OS Windows Server = 2012 R2
At the beginning, we tried the same procedure as mentioned above on the affected host devices.
Sadly, it is not working at all.
We experimented with various configurations, but nothing seems to work correctly.
The issue is resolved if only 1 service check is registered.
Is there maybe a limitation on Windows Server 2012?
What we already tried in different ways:
-
Using -TimeIndexes 1,3,5,15 (default values in seconds) – only working with 1 registered check (Web Interface: Treshholdintervall 15)
-
Using -TimeIndexes 1m,3m,5m,15m (values in minutes) – same as above (Web Interface: Treshholdintervall 15m)
-
Using -TimeIndexes 60,180,300,900 (values in seconds instead of minutes) – same as above (Web Interface: Treshholdintervall 900 - tried also 900s)
-
Using only a single value (e.g. -TimeIndexes 900) – same as above (Web Interface: Treshholdintervall 900 - tried also 900s)
In addition, we Updated Cache, Flushed API Directory, Updated JEA Profile, reinstalled the Framework, checked the permissions of the Icinga User and everything just seems to be fine.
When registering all 3 Checks, we encounter 2 different Types of Issues:
First one :
The Services change, as expected, from the uknown state to up. Instead of 1 Service (the first one which is configured) they only update their value once. This value stays unchanged even after several hours or days ("check now" button even has no effect)
Second one:
Error:
[Failed to parse metrics over time with -ThresholdInterval "15": No data found matching the requested time index. Available indexes: [180s, 300s, 900s, 60s]]
Curious is, in both cases, the grafana graph underneath is still updating and displays changing values.
Our Environment
- Versions used:
vers. Agent : 2.15.0
vers. Framework : 1.13.3 (also tried 1.13.2)
vers. Plugins : 1.13.1
vers. MSSQL : 1.5.0
vers. Hyper-V : 1.3.1
vers. Cluster : 1.3.0
-
Operating System and version: Windows Server 2012 R2 Datacenter
-
Icinga Web 2 version and modules (System - About):
Icinga Web 2 Version | 2.12.4
audit | 1.0.3
businessprocess | 2.5.2
icingadb | 1.2.0
cube | 1.3.3
director | rox-prod
doc | 2.12.4
global-dashboards | 1.0
grafana | 3.1.1
incubator | 0.22.0
translation | 2.12.4
- Config validation (icinga2 daemon):
Start-IcingaServiceCheckDeamon
-----------
No arguments defined
Invoke-IcingaCheckMemory
-----------
Arguments =>
CheckCommand => Invoke-IcingaCheckMemory
Id => 225104251186107973819916186121177195202524160180123241
Interval => 30
TimeIndexes => 1m, 3m, 5m, 15m
Invoke-IcingaCheckDiskHealth
-----------
Arguments =>
CheckCommand => Invoke-IcingaCheckDiskHealth
Id => 19311275010219321079161492502312398169205312620245
Interval => 30
TimeIndexes => 1m, 3m, 5m, 15m
Invoke-IcingaCheckCPU
-----------
Arguments =>
CheckCommand => Invoke-IcingaCheckCPU
Id => 5275219864641021224811420224776891459631192206
Interval => 30
TimeIndexes => 1m, 3m, 5m, 15m
*PowerShell Version: 5.1.14409.2001