Skip to content

Conversation

mshahzeb
Copy link

@mshahzeb mshahzeb commented Jun 10, 2024

Fixes: #3005

Adds:

# HELP node_filesystem_errors Number of filesystem errors encountered.
# TYPE node_filesystem_errors counter
node_filesystem_errors{device="/dev/vda2",device_error="",fstype="ext4",mountpoint="/boot"} 0

# HELP node_filesystem_warnings Number of filesystem warnings encountered.
# TYPE node_filesystem_warnings counter
node_filesystem_warnings{device="/dev/vda2",device_error="",fstype="ext4",mountpoint="/boot"} 0

# HELP node_filesystem_messages Number of filesystem log messages.
# TYPE node_filesystem_messages counter
node_filesystem_messages{device="/dev/vda2",device_error="",fstype="ext4",mountpoint="/boot"} 0

From

  • /sys/fs/ext4/<partition>/errors_count: number of ext4 errors (commit)
  • /sys/fs/ext4/<partition>/warning_count: number of ext4 warning log messages (commit)
  • /sys/fs/ext4/<partition>/msg_count: number of other ext4 log messages

and

# HELP node_disk_ioerr_total Number of IO commands that completed with an error.
# TYPE node_disk_ioerr_total counter
node_disk_ioerr_total{device="sda"} 3
node_disk_ioerr_total{device="sr0"} 29

# HELP node_disk_iodone_total Number of completed or rejected IO commands.
# TYPE node_disk_iodone_total counter
node_disk_iodone_total{device="sda"} 307
node_disk_iodone_total{device="sr0"} 4483

From

  • /sys/block/<disk>/device/ioerr_cnt: number of SCSI commands that completed with an error
  • /sys/block/<disk>/device/iodone_cnt: number of completed or rejected SCSI commands

Implements new ext4 collector.

Corresponding procfs changes: prometheus/procfs#651

@mshahzeb
Copy link
Author

Sample generated metrics file
node_metrics.txt

@BurritoWrapped
Copy link

This would be wonderful to have as a feature

@gouthamve
Copy link
Member

Hi @mshahzeb, thanks for looking into this! This is a great start, we now know which files to read.

node_exporter doesn't really try to read the files directly in this codebase, but rather, we abstract the parsing here: https://github.com/prometheus/procfs

/sys/block/<disk>/device/ioerr_cnt and /sys/block/<disk>/device/iodone_cnt should be added here: https://github.com/prometheus/procfs/blob/master/blockdevice/stats.go

/sys/fs/ext4/<partition> should be added to a new ext4 folder like we did for xfs and btrfs

@mshahzeb
Copy link
Author

Thank you I will be moving the code to procfs and open a PR there.

@mshahzeb
Copy link
Author

PR in the works on procfs: prometheus/procfs#651

@mshahzeb mshahzeb changed the title Add IO stats and FS stats Add IO stats and ext4 FS stats through new ext4 collector Jul 11, 2024
@mshahzeb
Copy link
Author

Procfs PR merged: prometheus/procfs#651

@mshahzeb
Copy link
Author

mshahzeb commented Nov 6, 2024

Waiting for new procfs release

@discordianfish
Copy link
Member

You can also use the unreleased version for now

mshahzeb added 6 commits April 1, 2025 04:07
- Updated diskstats_linux.go to replace log.Debug with slog.Logger for error logging.
- Modified ext4_linux.go to change logger type from log.Logger to *slog.Logger in the ext4Collector struct and NewExt4Collector function signature.
- Included ext4 in the list of enabled collectors in end-to-end tests.
- Added additional metrics for disk I/O operations in diskstats_linux.go and updated corresponding test fixtures.
- Enhanced error handling for reading I/O counts in diskstats collector.
- Removed unused device metrics from diskstats_linux_test.go and corresponding e2e fixture.
- Added new metrics for completed and errored I/O commands for devices sda and sr0 in both test and fixture files.
- Updated sys.ttar to include new I/O count paths for devices sda and sr0.
@mshahzeb
Copy link
Author

mshahzeb commented Apr 1, 2025

Closing this PR; new PR opened at: #3295

@mshahzeb mshahzeb closed this Apr 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Disk and filesystem error metrics
4 participants