Skip to content

feat: Add Prometheus /metrics endpoint#541

Open
prydie wants to merge 1 commit intosle118:master-v4.3from
prydie:feature/prometheus-metrics-endpoint
Open

feat: Add Prometheus /metrics endpoint#541
prydie wants to merge 1 commit intosle118:master-v4.3from
prydie:feature/prometheus-metrics-endpoint

Conversation

@prydie
Copy link
Copy Markdown

@prydie prydie commented Jan 16, 2026

Register a new HTTP handler at /metrics to expose critical system and audio health statistics in Prometheus text format. This enables real-time monitoring of playback stability and network performance without impacting audio decoding.

Metrics exposed:

  • Audio Buffers: Fill %, bytes used, and size for both Stream (network) and Output (PCM) buffers.
  • Decoder Health: Playback state, total frames decoded (monotonic), and I2S underrun count.
  • System Health: WiFi RSSI (dBm), disconnect totals, and both free heap and minimum free heap watermark.

Implementation details:

  • Use non-blocking pointer arithmetic to calculate buffer stats, avoiding mutex locks that could cause audio glitches.
  • Decouple WiFi RSSI polling via a background timer to prevent HTTP request blocking.

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello @prydie, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new capability for external monitoring by adding a Prometheus-compatible '/metrics' HTTP endpoint. This allows users to gain real-time insights into the device's operational health, covering audio playback stability, network performance, and overall system resource utilization. The implementation prioritizes performance by using non-blocking techniques for data collection.

Highlights

  • Prometheus Metrics Endpoint: A new '/metrics' HTTP endpoint has been added to expose various system and audio-related statistics in Prometheus text format, enabling real-time monitoring.
  • Comprehensive Audio Monitoring: Key audio metrics are now available, including fill percentages, used bytes, and total sizes for both stream (network) and output (PCM) buffers, as well as playback state, total decoded frames, and I2S underrun counts.
  • System Health Insights: Critical system health metrics such as WiFi RSSI (signal strength), total WiFi disconnects, and heap memory usage (current free and minimum free watermark) are exposed.
  • Non-Blocking Metric Collection: Buffer statistics are calculated using non-blocking pointer arithmetic to prevent audio glitches, and WiFi RSSI polling is decoupled via a background timer to avoid blocking HTTP requests, ensuring minimal impact on performance.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a Prometheus /metrics endpoint to expose system and audio health statistics. The implementation correctly uses a background timer for polling WiFi RSSI and increments counters on relevant events. My review focuses on the implementation of the metrics handler, where I've identified a critical buffer handling issue that could lead to heap corruption, as well as a maintainability issue with redefined structs. I've provided suggestions to fix these for improved safety and code quality.

Comment on lines +169 to +243
static esp_err_t metrics_get_handler(httpd_req_t *req)
{
char *resp_str = malloc(2048); // Allocate buffer for response
if (!resp_str) return ESP_FAIL;

char *p = resp_str;
size_t rem = 2048;
int n;

// Output Buffer Fill
unsigned output_bytes = 0;
if (outputbuf && outputbuf->buf) {
if (outputbuf->writep >= outputbuf->readp) {
output_bytes = outputbuf->writep - outputbuf->readp;
} else {
output_bytes = outputbuf->size - (outputbuf->readp - outputbuf->writep);
}
}
unsigned output_fill = (outputbuf && outputbuf->size > 0) ? (output_bytes * 100) / outputbuf->size : 0;

// Stream Buffer Fill
unsigned stream_bytes = 0;
if (streambuf && streambuf->buf) {
if (streambuf->writep >= streambuf->readp) {
stream_bytes = streambuf->writep - streambuf->readp;
} else {
stream_bytes = streambuf->size - (streambuf->readp - streambuf->writep);
}
}
unsigned stream_fill = (streambuf && streambuf->size > 0) ? (stream_bytes * 100) / streambuf->size : 0;

uint32_t min_free_heap = esp_get_minimum_free_heap_size();

n = snprintf(p, rem, "# HELP squeezelite_buffer_fill_percent Audio buffer fill level (0-100)\n# TYPE squeezelite_buffer_fill_percent gauge\nsqueezelite_buffer_fill_percent %u\n", output_fill);
p += n; rem -= n;

n = snprintf(p, rem, "# HELP squeezelite_buffer_bytes_used Audio buffer used bytes\n# TYPE squeezelite_buffer_bytes_used gauge\nsqueezelite_buffer_bytes_used %u\n", output_bytes);
p += n; rem -= n;

n = snprintf(p, rem, "# HELP squeezelite_buffer_size_bytes Audio buffer total size in bytes\n# TYPE squeezelite_buffer_size_bytes gauge\nsqueezelite_buffer_size_bytes %u\n", (unsigned)(outputbuf ? outputbuf->size : 0));
p += n; rem -= n;

n = snprintf(p, rem, "# HELP squeezelite_stream_buffer_fill_percent Stream buffer fill level (0-100)\n# TYPE squeezelite_stream_buffer_fill_percent gauge\nsqueezelite_stream_buffer_fill_percent %u\n", stream_fill);
p += n; rem -= n;

n = snprintf(p, rem, "# HELP squeezelite_stream_buffer_bytes_used Stream buffer used bytes\n# TYPE squeezelite_stream_buffer_bytes_used gauge\nsqueezelite_stream_buffer_bytes_used %u\n", stream_bytes);
p += n; rem -= n;

n = snprintf(p, rem, "# HELP squeezelite_stream_buffer_size_bytes Stream buffer total size in bytes\n# TYPE squeezelite_stream_buffer_size_bytes gauge\nsqueezelite_stream_buffer_size_bytes %u\n", (unsigned)(streambuf ? streambuf->size : 0));
p += n; rem -= n;

n = snprintf(p, rem, "# HELP squeezelite_output_state Decoder state (-1=OFF, 0=STOPPED, 1=BUFFER, 2=RUNNING)\n# TYPE squeezelite_output_state gauge\nsqueezelite_output_state %d\n", output.state);
p += n; rem -= n;

n = snprintf(p, rem, "# HELP squeezelite_frames_decoded Total frames decoded (monotonic)\n# TYPE squeezelite_frames_decoded counter\nsqueezelite_frames_decoded %u\n", output.frames_played_dmp);
p += n; rem -= n;

n = snprintf(p, rem, "# HELP squeezelite_underruns_total Total I2S underruns/overflows\n# TYPE squeezelite_underruns_total counter\nsqueezelite_underruns_total %u\n", total_underruns);
p += n; rem -= n;

n = snprintf(p, rem, "# HELP esp32_wifi_rssi WiFi Signal Strength (dBm)\n# TYPE esp32_wifi_rssi gauge\nesp32_wifi_rssi %d\n", cached_rssi);
p += n; rem -= n;

n = snprintf(p, rem, "# HELP esp32_wifi_disconnects_total Total WiFi disconnections\n# TYPE esp32_wifi_disconnects_total counter\nesp32_wifi_disconnects_total %u\n", wifi_disconnects);
p += n; rem -= n;

n = snprintf(p, rem, "# HELP esp32_min_free_heap_bytes Minimum free heap size since boot\n# TYPE esp32_min_free_heap_bytes gauge\nesp32_min_free_heap_bytes %u\n", min_free_heap);
p += n; rem -= n;

httpd_resp_set_type(req, "text/plain");
httpd_resp_send(req, resp_str, strlen(resp_str));

free(resp_str);
return ESP_OK;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The metrics_get_handler has a few issues that should be addressed for safety and correctness:

  1. Heap Corruption Risk: The snprintf calls to build the response string do not correctly handle potential buffer truncation. If the output is larger than the remaining buffer space, p += n will advance the pointer beyond the allocated memory, leading to a heap buffer overflow in subsequent snprintf calls.
  2. Inaccurate Percentage: The fill percentage calculation (output_bytes * 100) / outputbuf->size uses integer division, which will lead to loss of precision.
  3. Race Condition: Reading readp and writep from the audio buffers without a lock is racy, which can lead to inconsistent metrics.

I'm providing a refactored version of the function that addresses these issues. It uses a do-while(0) loop for safe buffer writing and promotes the percentage calculation to 64-bit integers to preserve precision. It also copies the buffer pointers to local variables to narrow the race condition window.

static esp_err_t metrics_get_handler(httpd_req_t *req)
{
    char *resp_str = malloc(2048); // Allocate buffer for response
    if (!resp_str) return ESP_FAIL;

    char *p = resp_str;
    size_t rem = 2048;
    int n;

    do {
        // Output Buffer Fill
        unsigned output_bytes = 0;
        if (outputbuf && outputbuf->buf) {
            // This is racy, but a design choice per PR description.
            // Reading pointers to local variables reduces the window for races but doesn't eliminate them.
            uint8_t *readp = outputbuf->readp;
            uint8_t *writep = outputbuf->writep;
            if (writep >= readp) {
                output_bytes = writep - readp;
            } else {
                output_bytes = outputbuf->size - (readp - writep);
            }
        }
        unsigned output_fill = (outputbuf && outputbuf->size > 0) ? (unsigned)(((uint64_t)output_bytes * 100) / outputbuf->size) : 0;

        // Stream Buffer Fill
        unsigned stream_bytes = 0;
        if (streambuf && streambuf->buf) {
            // This is racy, but a design choice per PR description.
            uint8_t *readp = streambuf->readp;
            uint8_t *writep = streambuf->writep;
            if (writep >= readp) {
                stream_bytes = writep - readp;
            } else {
                stream_bytes = streambuf->size - (readp - writep);
            }
        }
        unsigned stream_fill = (streambuf && streambuf->size > 0) ? (unsigned)(((uint64_t)stream_bytes * 100) / streambuf->size) : 0;

        uint32_t min_free_heap = esp_get_minimum_free_heap_size();

        n = snprintf(p, rem, "# HELP squeezelite_buffer_fill_percent Audio buffer fill level (0-100)\n# TYPE squeezelite_buffer_fill_percent gauge\nsqueezelite_buffer_fill_percent %u\n", output_fill);
        if (n < 0 || n >= rem) break;
        p += n; rem -= n;

        n = snprintf(p, rem, "# HELP squeezelite_buffer_bytes_used Audio buffer used bytes\n# TYPE squeezelite_buffer_bytes_used gauge\nsqueezelite_buffer_bytes_used %u\n", output_bytes);
        if (n < 0 || n >= rem) break;
        p += n; rem -= n;

        n = snprintf(p, rem, "# HELP squeezelite_buffer_size_bytes Audio buffer total size in bytes\n# TYPE squeezelite_buffer_size_bytes gauge\nsqueezelite_buffer_size_bytes %u\n", (unsigned)(outputbuf ? outputbuf->size : 0));
        if (n < 0 || n >= rem) break;
        p += n; rem -= n;

        n = snprintf(p, rem, "# HELP squeezelite_stream_buffer_fill_percent Stream buffer fill level (0-100)\n# TYPE squeezelite_stream_buffer_fill_percent gauge\nsqueezelite_stream_buffer_fill_percent %u\n", stream_fill);
        if (n < 0 || n >= rem) break;
        p += n; rem -= n;

        n = snprintf(p, rem, "# HELP squeezelite_stream_buffer_bytes_used Stream buffer used bytes\n# TYPE squeezelite_stream_buffer_bytes_used gauge\nsqueezelite_stream_buffer_bytes_used %u\n", stream_bytes);
        if (n < 0 || n >= rem) break;
        p += n; rem -= n;

        n = snprintf(p, rem, "# HELP squeezelite_stream_buffer_size_bytes Stream buffer total size in bytes\n# TYPE squeezelite_stream_buffer_size_bytes gauge\nsqueezelite_stream_buffer_size_bytes %u\n", (unsigned)(streambuf ? streambuf->size : 0));
        if (n < 0 || n >= rem) break;
        p += n; rem -= n;

        n = snprintf(p, rem, "# HELP squeezelite_output_state Decoder state (-1=OFF, 0=STOPPED, 1=BUFFER, 2=RUNNING)\n# TYPE squeezelite_output_state gauge\nsqueezelite_output_state %d\n", output.state);
        if (n < 0 || n >= rem) break;
        p += n; rem -= n;

        n = snprintf(p, rem, "# HELP squeezelite_frames_decoded Total frames decoded (monotonic)\n# TYPE squeezelite_frames_decoded counter\nsqueezelite_frames_decoded %u\n", output.frames_played_dmp);
        if (n < 0 || n >= rem) break;
        p += n; rem -= n;

        n = snprintf(p, rem, "# HELP squeezelite_underruns_total Total I2S underruns/overflows\n# TYPE squeezelite_underruns_total counter\nsqueezelite_underruns_total %u\n", total_underruns);
        if (n < 0 || n >= rem) break;
        p += n; rem -= n;

        n = snprintf(p, rem, "# HELP esp32_wifi_rssi WiFi Signal Strength (dBm)\n# TYPE esp32_wifi_rssi gauge\nesp32_wifi_rssi %d\n", cached_rssi);
        if (n < 0 || n >= rem) break;
        p += n; rem -= n;

        n = snprintf(p, rem, "# HELP esp32_wifi_disconnects_total Total WiFi disconnections\n# TYPE esp32_wifi_disconnects_total counter\nesp32_wifi_disconnects_total %u\n", wifi_disconnects);
        if (n < 0 || n >= rem) break;
        p += n; rem -= n;

        n = snprintf(p, rem, "# HELP esp32_min_free_heap_bytes Minimum free heap size since boot\n# TYPE esp32_min_free_heap_bytes gauge\nesp32_min_free_heap_bytes %u\n", min_free_heap);
        if (n < 0 || n >= rem) break;
        p += n; rem -= n;

    } while (0);

    httpd_resp_set_type(req, "text/plain");
    httpd_resp_send(req, resp_str, p - resp_str);

    free(resp_str);
    return ESP_OK;
}

Comment on lines +135 to +160
typedef enum { OUTPUT_OFF = -1, OUTPUT_STOPPED = 0, OUTPUT_BUFFER, OUTPUT_RUNNING,
OUTPUT_PAUSE_FRAMES, OUTPUT_SKIP_FRAMES, OUTPUT_START_AT } output_state;

// Partial definition of outputstate to access frames_played_dmp
struct outputstate {
output_state state;
int _ignore_format;
uint8_t channels;
const char *device;
int external;
// ALSA/PORTAUDIO assumed 0 for embedded
bool track_started;
int (* write_cb)(void);
unsigned start_frames;
unsigned frames_played;
unsigned frames_played_dmp;
};

// Partial definition of buffer to access pointers
struct buffer {
uint8_t *buf;
uint8_t *readp;
uint8_t *writep;
uint8_t *wrap;
size_t size;
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Redefining output_state, struct outputstate, and struct buffer here is fragile and a maintainability risk. These are already defined in squeezelite.h. If the original definitions in squeezelite.h change, these partial definitions will become outdated and could lead to memory corruption or incorrect behavior. It's better to include the canonical header.

To fix this, you will need to add squeezelite to REQUIRES or PRIV_REQUIRES in the CMakeLists.txt for the wifi-manager component to make the header available.

#include "squeezelite.h"

// [Remove the typedef and struct definitions from lines 135-160]

Register a new HTTP handler at `/metrics` to expose critical system and
audio health statistics in Prometheus text format. This enables
real-time monitoring of playback stability and network performance
without impacting audio decoding.

Metrics exposed:
- Audio Buffers: Fill %, bytes used, and size for both Stream (network)
  and Output (PCM) buffers.
- Decoder Health: Playback state, total frames decoded (monotonic), and
  I2S underrun count.
- System Health: WiFi RSSI (dBm), disconnect totals, and both free heap
  and minimum free heap watermark.

Implementation details:
- Use non-blocking pointer arithmetic to calculate buffer stats,
  avoiding mutex locks that could cause audio glitches.
- Decouple WiFi RSSI polling via a background timer to prevent HTTP
  request blocking.
@prydie prydie force-pushed the feature/prometheus-metrics-endpoint branch from c49a0ba to 1810b41 Compare January 16, 2026 11:16

#define EMBEDDED 1
#define LINKALL 1
#define LOOPBACK 1
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must confess I'm not much of a c programmer so this might be heretical

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant