-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
We are porting various alerts from Nagios to the prometheus ecosystem and we've found one check that is kind of useful in Nagios that seems to be missing from the node exporter. It's a check that looks at EXT filesystems with the tune2fs -l
command and (basically) greps for the FS Error count
field.
This should normally be zero but under certain circumstances (failing disk, filesystem bug, power outage), it will rise. running fsck
on the filesystem will fix this (and, normally, after a power outage, a reboot will run fsck, but under certain circumstances, it might not fully do it).
So I think the node exporter should do this. I've tried to find metrics about this in our node exporters and couldn't find anything under the node_filesystem_*
namespace. There is node_filesystem_readonly
and, according to this post node_filesystem_device_error
(but I can't see that metric here), but neither of those are the same as the error count.
Am I missing something or this is missing from the node exporter?
Here's a copy of the check, called dsa-check-filesystems
here:
#!/usr/bin/ruby
require 'filesystem'
ignorefs = ["NFS", "nfs", "nfs4", "nfsd", "afs", "binfmt_misc", "proc", "smbfs",
"autofs", "iso9660", "ncpfs", "coda", "devpts", "ftpfs", "devfs",
"mfs", "shfs", "sysfs", "cifs", "lustre_lite", "tmpfs", "usbfs",
"udf", "fusectl", "fuse.snapshotfs", "rpc_pipefs"]
mountpoints = {}
FileSystem.mounts.each do |m|
if ((not ignorefs.include?(m.fstype)) && (m.options !~ /bind/))
mountpoints[m.device] = { 'type' => m.fstype, 'mount' => m.mount }
end
end
def check_ext3(dev, mnt)
output=%x{tune2fs -l #{dev}}
if output =~ /FS Error count:\s*(\d+)/ and $1.to_i > 0
return "#{dev} (#{mnt}) has #{$1} errors"
end
end
output = []
mountpoints.keys.each do |m|
temp = ''
begin
if mountpoints[m]['type'] =~ /ext/
temp = check_ext3(m, mountpoints[m]['mount'])
end
rescue Exception => e
end
if temp && (temp.length > 0)
output << temp
end
end
if output.length > 0
puts output.join("\n")
exit 1
end
puts "OK: All filesystems ok."
exit 0