Skip to content

feat: add Prometheus metrics for service and token health#129

Open
zy6p wants to merge 2 commits intoTheSmallHanCat:mainfrom
zy6p:upstream-main-tsn-monitoring
Open

feat: add Prometheus metrics for service and token health#129
zy6p wants to merge 2 commits intoTheSmallHanCat:mainfrom
zy6p:upstream-main-tsn-monitoring

Conversation

@zy6p
Copy link
Copy Markdown
Contributor

@zy6p zy6p commented Apr 22, 2026

Summary

  • add a /metrics endpoint backed by prometheus-client
  • export Prometheus gauges/counters for service uptime, request outcomes, token health, credits, errors, and dashboard totals
  • enrich /health and /api/tokens so operators can also inspect token expiry and ban state without opening the UI

Details

  • add src/core/monitoring.py to render metrics and build a public health snapshot
  • track generation success/failure/cancelled outcomes and AT/ST refresh success/failure counters
  • expose aggregate metrics such as active tokens, credits, image/video/error totals, expiring tokens, and 429-banned tokens
  • expose per-token gauges for expiry, credits, errors, and current in-flight requests
  • fix Database.update_token() to allow clearing fields like ban_reason / banned_at back to NULL
  • clear stale ban metadata when a token is manually re-enabled

Validation

  • python3 -m compileall src main.py
  • git diff --check

@zy6p
Copy link
Copy Markdown
Contributor Author

zy6p commented Apr 28, 2026

补充说明一下这个 PR 的情况,方便 review:

目前 CI 已通过,分支也已经和 main 保持同步,没有冲突。

这个 PR 的主要内容是给 Flow2API 增加标准 Prometheus /metrics 指标,用于监控:

  • 服务是否存活
  • Token 总数 / 活跃 Token 数
  • Token 积分汇总
  • AT 是否过期 / 1 小时内是否将过期
  • 429 禁用数量
  • AT / ST 刷新成功失败次数
  • 图片 / 视频生成请求成功、失败、取消次数与耗时
  • 首页统计里的今日图片、今日视频、今日错误、总图片、总视频、总错误

另外,在接入 Prometheus 指标时发现现有的“今日图片 / 今日视频 / 今日错误”统计存在跨天污染问题。原逻辑里这几个今日计数字段共用一个 today_date,但跨天时只重置当前触发的那一种计数。举例:昨天有图片生成,今天第一笔请求是视频生成时,只会重置 today_video_count,不会清零旧的 today_image_count,导致 /api/stats 里的今日图片数仍然带着昨天的数据。这个问题会同时影响管理界面和 Prometheus 指标,所以也在这个 PR 里一起修了,并补了回归测试。

如果你觉得这个 PR 改动面偏大,我可以按你的偏好拆成两个 PR:

  1. 先单独修复今日统计跨天错误。
  2. 再单独提交 Prometheus /metrics 支持。

看你这边更倾向哪种方式。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant