feat: Add Prometheus /metrics endpoint#541
Conversation
Summary of ChangesHello @prydie, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a new capability for external monitoring by adding a Prometheus-compatible '/metrics' HTTP endpoint. This allows users to gain real-time insights into the device's operational health, covering audio playback stability, network performance, and overall system resource utilization. The implementation prioritizes performance by using non-blocking techniques for data collection. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a Prometheus /metrics endpoint to expose system and audio health statistics. The implementation correctly uses a background timer for polling WiFi RSSI and increments counters on relevant events. My review focuses on the implementation of the metrics handler, where I've identified a critical buffer handling issue that could lead to heap corruption, as well as a maintainability issue with redefined structs. I've provided suggestions to fix these for improved safety and code quality.
| static esp_err_t metrics_get_handler(httpd_req_t *req) | ||
| { | ||
| char *resp_str = malloc(2048); // Allocate buffer for response | ||
| if (!resp_str) return ESP_FAIL; | ||
|
|
||
| char *p = resp_str; | ||
| size_t rem = 2048; | ||
| int n; | ||
|
|
||
| // Output Buffer Fill | ||
| unsigned output_bytes = 0; | ||
| if (outputbuf && outputbuf->buf) { | ||
| if (outputbuf->writep >= outputbuf->readp) { | ||
| output_bytes = outputbuf->writep - outputbuf->readp; | ||
| } else { | ||
| output_bytes = outputbuf->size - (outputbuf->readp - outputbuf->writep); | ||
| } | ||
| } | ||
| unsigned output_fill = (outputbuf && outputbuf->size > 0) ? (output_bytes * 100) / outputbuf->size : 0; | ||
|
|
||
| // Stream Buffer Fill | ||
| unsigned stream_bytes = 0; | ||
| if (streambuf && streambuf->buf) { | ||
| if (streambuf->writep >= streambuf->readp) { | ||
| stream_bytes = streambuf->writep - streambuf->readp; | ||
| } else { | ||
| stream_bytes = streambuf->size - (streambuf->readp - streambuf->writep); | ||
| } | ||
| } | ||
| unsigned stream_fill = (streambuf && streambuf->size > 0) ? (stream_bytes * 100) / streambuf->size : 0; | ||
|
|
||
| uint32_t min_free_heap = esp_get_minimum_free_heap_size(); | ||
|
|
||
| n = snprintf(p, rem, "# HELP squeezelite_buffer_fill_percent Audio buffer fill level (0-100)\n# TYPE squeezelite_buffer_fill_percent gauge\nsqueezelite_buffer_fill_percent %u\n", output_fill); | ||
| p += n; rem -= n; | ||
|
|
||
| n = snprintf(p, rem, "# HELP squeezelite_buffer_bytes_used Audio buffer used bytes\n# TYPE squeezelite_buffer_bytes_used gauge\nsqueezelite_buffer_bytes_used %u\n", output_bytes); | ||
| p += n; rem -= n; | ||
|
|
||
| n = snprintf(p, rem, "# HELP squeezelite_buffer_size_bytes Audio buffer total size in bytes\n# TYPE squeezelite_buffer_size_bytes gauge\nsqueezelite_buffer_size_bytes %u\n", (unsigned)(outputbuf ? outputbuf->size : 0)); | ||
| p += n; rem -= n; | ||
|
|
||
| n = snprintf(p, rem, "# HELP squeezelite_stream_buffer_fill_percent Stream buffer fill level (0-100)\n# TYPE squeezelite_stream_buffer_fill_percent gauge\nsqueezelite_stream_buffer_fill_percent %u\n", stream_fill); | ||
| p += n; rem -= n; | ||
|
|
||
| n = snprintf(p, rem, "# HELP squeezelite_stream_buffer_bytes_used Stream buffer used bytes\n# TYPE squeezelite_stream_buffer_bytes_used gauge\nsqueezelite_stream_buffer_bytes_used %u\n", stream_bytes); | ||
| p += n; rem -= n; | ||
|
|
||
| n = snprintf(p, rem, "# HELP squeezelite_stream_buffer_size_bytes Stream buffer total size in bytes\n# TYPE squeezelite_stream_buffer_size_bytes gauge\nsqueezelite_stream_buffer_size_bytes %u\n", (unsigned)(streambuf ? streambuf->size : 0)); | ||
| p += n; rem -= n; | ||
|
|
||
| n = snprintf(p, rem, "# HELP squeezelite_output_state Decoder state (-1=OFF, 0=STOPPED, 1=BUFFER, 2=RUNNING)\n# TYPE squeezelite_output_state gauge\nsqueezelite_output_state %d\n", output.state); | ||
| p += n; rem -= n; | ||
|
|
||
| n = snprintf(p, rem, "# HELP squeezelite_frames_decoded Total frames decoded (monotonic)\n# TYPE squeezelite_frames_decoded counter\nsqueezelite_frames_decoded %u\n", output.frames_played_dmp); | ||
| p += n; rem -= n; | ||
|
|
||
| n = snprintf(p, rem, "# HELP squeezelite_underruns_total Total I2S underruns/overflows\n# TYPE squeezelite_underruns_total counter\nsqueezelite_underruns_total %u\n", total_underruns); | ||
| p += n; rem -= n; | ||
|
|
||
| n = snprintf(p, rem, "# HELP esp32_wifi_rssi WiFi Signal Strength (dBm)\n# TYPE esp32_wifi_rssi gauge\nesp32_wifi_rssi %d\n", cached_rssi); | ||
| p += n; rem -= n; | ||
|
|
||
| n = snprintf(p, rem, "# HELP esp32_wifi_disconnects_total Total WiFi disconnections\n# TYPE esp32_wifi_disconnects_total counter\nesp32_wifi_disconnects_total %u\n", wifi_disconnects); | ||
| p += n; rem -= n; | ||
|
|
||
| n = snprintf(p, rem, "# HELP esp32_min_free_heap_bytes Minimum free heap size since boot\n# TYPE esp32_min_free_heap_bytes gauge\nesp32_min_free_heap_bytes %u\n", min_free_heap); | ||
| p += n; rem -= n; | ||
|
|
||
| httpd_resp_set_type(req, "text/plain"); | ||
| httpd_resp_send(req, resp_str, strlen(resp_str)); | ||
|
|
||
| free(resp_str); | ||
| return ESP_OK; | ||
| } |
There was a problem hiding this comment.
The metrics_get_handler has a few issues that should be addressed for safety and correctness:
- Heap Corruption Risk: The
snprintfcalls to build the response string do not correctly handle potential buffer truncation. If the output is larger than the remaining buffer space,p += nwill advance the pointer beyond the allocated memory, leading to a heap buffer overflow in subsequentsnprintfcalls. - Inaccurate Percentage: The fill percentage calculation
(output_bytes * 100) / outputbuf->sizeuses integer division, which will lead to loss of precision. - Race Condition: Reading
readpandwritepfrom the audio buffers without a lock is racy, which can lead to inconsistent metrics.
I'm providing a refactored version of the function that addresses these issues. It uses a do-while(0) loop for safe buffer writing and promotes the percentage calculation to 64-bit integers to preserve precision. It also copies the buffer pointers to local variables to narrow the race condition window.
static esp_err_t metrics_get_handler(httpd_req_t *req)
{
char *resp_str = malloc(2048); // Allocate buffer for response
if (!resp_str) return ESP_FAIL;
char *p = resp_str;
size_t rem = 2048;
int n;
do {
// Output Buffer Fill
unsigned output_bytes = 0;
if (outputbuf && outputbuf->buf) {
// This is racy, but a design choice per PR description.
// Reading pointers to local variables reduces the window for races but doesn't eliminate them.
uint8_t *readp = outputbuf->readp;
uint8_t *writep = outputbuf->writep;
if (writep >= readp) {
output_bytes = writep - readp;
} else {
output_bytes = outputbuf->size - (readp - writep);
}
}
unsigned output_fill = (outputbuf && outputbuf->size > 0) ? (unsigned)(((uint64_t)output_bytes * 100) / outputbuf->size) : 0;
// Stream Buffer Fill
unsigned stream_bytes = 0;
if (streambuf && streambuf->buf) {
// This is racy, but a design choice per PR description.
uint8_t *readp = streambuf->readp;
uint8_t *writep = streambuf->writep;
if (writep >= readp) {
stream_bytes = writep - readp;
} else {
stream_bytes = streambuf->size - (readp - writep);
}
}
unsigned stream_fill = (streambuf && streambuf->size > 0) ? (unsigned)(((uint64_t)stream_bytes * 100) / streambuf->size) : 0;
uint32_t min_free_heap = esp_get_minimum_free_heap_size();
n = snprintf(p, rem, "# HELP squeezelite_buffer_fill_percent Audio buffer fill level (0-100)\n# TYPE squeezelite_buffer_fill_percent gauge\nsqueezelite_buffer_fill_percent %u\n", output_fill);
if (n < 0 || n >= rem) break;
p += n; rem -= n;
n = snprintf(p, rem, "# HELP squeezelite_buffer_bytes_used Audio buffer used bytes\n# TYPE squeezelite_buffer_bytes_used gauge\nsqueezelite_buffer_bytes_used %u\n", output_bytes);
if (n < 0 || n >= rem) break;
p += n; rem -= n;
n = snprintf(p, rem, "# HELP squeezelite_buffer_size_bytes Audio buffer total size in bytes\n# TYPE squeezelite_buffer_size_bytes gauge\nsqueezelite_buffer_size_bytes %u\n", (unsigned)(outputbuf ? outputbuf->size : 0));
if (n < 0 || n >= rem) break;
p += n; rem -= n;
n = snprintf(p, rem, "# HELP squeezelite_stream_buffer_fill_percent Stream buffer fill level (0-100)\n# TYPE squeezelite_stream_buffer_fill_percent gauge\nsqueezelite_stream_buffer_fill_percent %u\n", stream_fill);
if (n < 0 || n >= rem) break;
p += n; rem -= n;
n = snprintf(p, rem, "# HELP squeezelite_stream_buffer_bytes_used Stream buffer used bytes\n# TYPE squeezelite_stream_buffer_bytes_used gauge\nsqueezelite_stream_buffer_bytes_used %u\n", stream_bytes);
if (n < 0 || n >= rem) break;
p += n; rem -= n;
n = snprintf(p, rem, "# HELP squeezelite_stream_buffer_size_bytes Stream buffer total size in bytes\n# TYPE squeezelite_stream_buffer_size_bytes gauge\nsqueezelite_stream_buffer_size_bytes %u\n", (unsigned)(streambuf ? streambuf->size : 0));
if (n < 0 || n >= rem) break;
p += n; rem -= n;
n = snprintf(p, rem, "# HELP squeezelite_output_state Decoder state (-1=OFF, 0=STOPPED, 1=BUFFER, 2=RUNNING)\n# TYPE squeezelite_output_state gauge\nsqueezelite_output_state %d\n", output.state);
if (n < 0 || n >= rem) break;
p += n; rem -= n;
n = snprintf(p, rem, "# HELP squeezelite_frames_decoded Total frames decoded (monotonic)\n# TYPE squeezelite_frames_decoded counter\nsqueezelite_frames_decoded %u\n", output.frames_played_dmp);
if (n < 0 || n >= rem) break;
p += n; rem -= n;
n = snprintf(p, rem, "# HELP squeezelite_underruns_total Total I2S underruns/overflows\n# TYPE squeezelite_underruns_total counter\nsqueezelite_underruns_total %u\n", total_underruns);
if (n < 0 || n >= rem) break;
p += n; rem -= n;
n = snprintf(p, rem, "# HELP esp32_wifi_rssi WiFi Signal Strength (dBm)\n# TYPE esp32_wifi_rssi gauge\nesp32_wifi_rssi %d\n", cached_rssi);
if (n < 0 || n >= rem) break;
p += n; rem -= n;
n = snprintf(p, rem, "# HELP esp32_wifi_disconnects_total Total WiFi disconnections\n# TYPE esp32_wifi_disconnects_total counter\nesp32_wifi_disconnects_total %u\n", wifi_disconnects);
if (n < 0 || n >= rem) break;
p += n; rem -= n;
n = snprintf(p, rem, "# HELP esp32_min_free_heap_bytes Minimum free heap size since boot\n# TYPE esp32_min_free_heap_bytes gauge\nesp32_min_free_heap_bytes %u\n", min_free_heap);
if (n < 0 || n >= rem) break;
p += n; rem -= n;
} while (0);
httpd_resp_set_type(req, "text/plain");
httpd_resp_send(req, resp_str, p - resp_str);
free(resp_str);
return ESP_OK;
}| typedef enum { OUTPUT_OFF = -1, OUTPUT_STOPPED = 0, OUTPUT_BUFFER, OUTPUT_RUNNING, | ||
| OUTPUT_PAUSE_FRAMES, OUTPUT_SKIP_FRAMES, OUTPUT_START_AT } output_state; | ||
|
|
||
| // Partial definition of outputstate to access frames_played_dmp | ||
| struct outputstate { | ||
| output_state state; | ||
| int _ignore_format; | ||
| uint8_t channels; | ||
| const char *device; | ||
| int external; | ||
| // ALSA/PORTAUDIO assumed 0 for embedded | ||
| bool track_started; | ||
| int (* write_cb)(void); | ||
| unsigned start_frames; | ||
| unsigned frames_played; | ||
| unsigned frames_played_dmp; | ||
| }; | ||
|
|
||
| // Partial definition of buffer to access pointers | ||
| struct buffer { | ||
| uint8_t *buf; | ||
| uint8_t *readp; | ||
| uint8_t *writep; | ||
| uint8_t *wrap; | ||
| size_t size; | ||
| }; |
There was a problem hiding this comment.
Redefining output_state, struct outputstate, and struct buffer here is fragile and a maintainability risk. These are already defined in squeezelite.h. If the original definitions in squeezelite.h change, these partial definitions will become outdated and could lead to memory corruption or incorrect behavior. It's better to include the canonical header.
To fix this, you will need to add squeezelite to REQUIRES or PRIV_REQUIRES in the CMakeLists.txt for the wifi-manager component to make the header available.
#include "squeezelite.h"
// [Remove the typedef and struct definitions from lines 135-160]Register a new HTTP handler at `/metrics` to expose critical system and audio health statistics in Prometheus text format. This enables real-time monitoring of playback stability and network performance without impacting audio decoding. Metrics exposed: - Audio Buffers: Fill %, bytes used, and size for both Stream (network) and Output (PCM) buffers. - Decoder Health: Playback state, total frames decoded (monotonic), and I2S underrun count. - System Health: WiFi RSSI (dBm), disconnect totals, and both free heap and minimum free heap watermark. Implementation details: - Use non-blocking pointer arithmetic to calculate buffer stats, avoiding mutex locks that could cause audio glitches. - Decouple WiFi RSSI polling via a background timer to prevent HTTP request blocking.
c49a0ba to
1810b41
Compare
|
|
||
| #define EMBEDDED 1 | ||
| #define LINKALL 1 | ||
| #define LOOPBACK 1 |
There was a problem hiding this comment.
I must confess I'm not much of a c programmer so this might be heretical
Register a new HTTP handler at
/metricsto expose critical system and audio health statistics in Prometheus text format. This enables real-time monitoring of playback stability and network performance without impacting audio decoding.Metrics exposed:
Implementation details: