-
Notifications
You must be signed in to change notification settings - Fork 8
feat(service_manager):将发布系统接入服务管理模块 #92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
实现HTTP中间件记录请求时延并导出到Prometheus指标 添加服务信息到指标收集器 统一代码格式和修复缩进问题 添加Prometheus Adapter API文档
重构指标注入器以支持服务版本维度 移除冗余的exported_job标签和实例ID生成 新增HTTP延迟注入器并与中间件集成
更新HTTP时延指标名称使其更简洁,同时改进健康检查脚本以支持批量检查多个端口
实现Prometheus适配器模块,包括以下主要功能: - 添加Prometheus客户端封装,支持指标查询和范围查询 - 实现指标服务层,提供指标列表获取和指标数据查询 - 添加API路由和控制器,提供RESTful接口 - 定义模型结构体和错误处理机制 - 更新依赖添加Prometheus客户端库 - 编写详细API文档说明接口使用方式
新增告警规则同步API及服务实现,支持将规则同步到Prometheus并触发重载 重构API层提取公共错误处理和工具方法到通用模块 添加相关模型定义和文档更新
- 在prometheus.yml中配置告警规则文件路径 - 修改docker-compose.yml挂载规则目录 - 重构AlertService,移除本地文件存储,直接写入容器 - 添加容器内规则文件写入的容错机制
- 将watch_time字段从AlertRuleMeta移到AlertRule中 - 移除全量同步接口,改为增量更新方式 - 实现批量更新规则元信息的API - 重构服务层代码结构,提高可维护性 - 更新文档
添加告警规则本地文件持久化功能,支持启动时加载和关闭时保存规则 重构关闭逻辑实现优雅关闭,包括保存当前规则状态 更新构建和部署脚本以处理规则文件目录 修改测试脚本以适配新的增量更新接口
添加prometheus_adapter.yml配置文件支持 重构alert_service使用配置而非环境变量 新增alert_webhook_service实现告警轮询推送 更新build.sh和deploy.sh支持配置文件部署 更新README文档说明新的webhook架构
- 新增Alertmanager API v2兼容接口用于接收Prometheus告警 - 重构告警服务架构,替换原有的轮询模式为推送模式 - 添加docker-compose配置支持Prometheus管理API - 移除过时的AlertWebhookService实现
实现删除告警规则模板及其关联元信息的功能,包括: 1. 添加DELETE /v1/alert-rules/:rule_name接口删除规则模板 2. 添加DELETE /v1/alert-rules-meta/:rule_name接口删除特定元信息 3. 更新相关文档说明删除操作的使用方法
- 在deploy.sh和build.sh中添加PID文件管理,优化服务启动和停止流程 - 修复alert_service.go中告警表达式生成的标签处理逻辑 - 使用%g代替%f格式化浮点数以避免科学计数法显示
- 新增部署任务状态常量"error"表示失败状态 - 重构部署任务表结构,使用服务名和版本作为联合主键 - 实现部署适配器处理与deploy模块的交互 - 新增部署任务API支持服务名和版本作为路径参数 - 完善部署任务创建、更新、暂停、继续和回滚逻辑 - 新增配置模块支持数据库配置加载 - 更新SQL schema文件反映部署任务表结构变更 - 优化API分组注释和代码结构
- 添加本地部署包存储路径支持,修改包URL构建逻辑 - 优化部署服务异步执行上下文传递 - 新增部署测试脚本和打包脚本 - 添加数据库schema文件用于部署模块 - 更新配置文件包含数据库连接和私钥信息
检查表达式是否已包含比较操作符,避免重复添加 更新告警Webhook服务地址配置
Summary of ChangesHello @JaD1ng, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! 本次拉取请求引入了一个新的Prometheus Adapter服务,旨在集中化指标查询和告警规则管理,从而连接内部系统与Prometheus。同时,它将部署系统集成到服务管理模块中,增强了其管理服务部署和回滚的能力。这些更改还通过添加HTTP延迟注入和详细的HTTP指标收集,改进了模拟服务的可观测性。 Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
本次 PR 引入了发布系统与服务管理模块的集成,并新增了一个完整的 prometheus_adapter
服务,功能变更范围非常大。代码整体结构清晰,特别是 prometheus_adapter
的分层设计值得肯定。然而,在实现中发现了一些严重问题,需要优先解决:
- 严重安全漏洞:
internal/deploy/config.yaml
文件中硬编码了 RSA 私钥,这是一个严重的安全风险,私钥绝不应该提交到版本控制系统中。 - 严重设计问题:
prometheus_adapter
服务通过执行docker
命令行来修改容器内的文件和重载配置。这种做法破坏了容器化的原则,非常脆弱且不安全,强耦合了应用和外部环境。 - 硬编码路径:多处代码和脚本中包含了开发者的本地绝对路径,导致代码不可移植。
此外,还存在一些配置加载逻辑混乱、数据库 schema 类型不匹配等问题。建议在合并前优先处理上述严重问题,以确保系统的安全性、稳定性和可维护性。
privateKey: | | ||
-----BEGIN RSA PRIVATE KEY----- | ||
MIICXQIBAAKBgQDZsfv1qscqYdy4vY+P4e3cAtmvppXQcRvrF1cB4drkv0haU24Y | ||
7m5qYtT52Kr539RdbKKdLAM6s20lWy7+5C0DgacdwYWd/7PeCELyEipZJL07Vro7 | ||
Ate8Bfjya+wltGK9+XNUIHiumUKULW4KDx21+1NLAUeJ6PeW+DAkmJWF6QIDAQAB | ||
AoGBAJlNxenTQj6OfCl9FMR2jlMJjtMrtQT9InQEE7m3m7bLHeC+MCJOhmNVBjaM | ||
ZpthDORdxIZ6oCuOf6Z2+Dl35lntGFh5J7S34UP2BWzF1IyyQfySCNexGNHKT1G1 | ||
XKQtHmtc2gWWthEg+S6ciIyw2IGrrP2Rke81vYHExPrexf0hAkEA9Izb0MiYsMCB | ||
/jemLJB0Lb3Y/B8xjGjQFFBQT7bmwBVjvZWZVpnMnXi9sWGdgUpxsCuAIROXjZ40 | ||
IRZ2C9EouwJBAOPjPvV8Sgw4vaseOqlJvSq/C/pIFx6RVznDGlc8bRg7SgTPpjHG | ||
4G+M3mVgpCX1a/EU1mB+fhiJ2LAZ/pTtY6sCQGaW9NwIWu3DRIVGCSMm0mYh/3X9 | ||
DAcwLSJoctiODQ1Fq9rreDE5QfpJnaJdJfsIJNtX1F+L3YceeBXtW0Ynz2MCQBI8 | ||
9KP274Is5FkWkUFNKnuKUK4WKOuEXEO+LpR+vIhs7k6WQ8nGDd4/mujoJBr5mkrw | ||
DPwqA3N5TMNDQVGv8gMCQQCaKGJgWYgvo3/milFfImbp+m7/Y3vCptarldXrYQWO | ||
AQjxwc71ZGBFDITYvdgJM1MTqc8xQek1FXn1vfpy2c6O | ||
-----END RSA PRIVATE KEY----- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cmd := exec.Command("docker", "exec", containerName, "sh", "-c", | ||
fmt.Sprintf("cat > /etc/prometheus/rules/alert_rules.yml << 'EOF'\n%s\nEOF", string(data))) | ||
|
||
if output, err := cmd.CombinedOutput(); err != nil { | ||
// 如果直接写入容器失败,尝试使用临时文件+docker cp | ||
log.Warn(). | ||
Err(err). | ||
Str("output", string(output)). | ||
Msg("Failed to write directly to container, trying docker cp") | ||
|
||
// 写入临时文件 | ||
tmpFile := "/tmp/prometheus_alert_rules.yml" | ||
if err := os.WriteFile(tmpFile, data, 0644); err != nil { | ||
return fmt.Errorf("failed to write temp rules file: %w", err) | ||
} | ||
|
||
// 使用docker cp复制到容器 | ||
if err := s.syncRuleFileToContainer(tmpFile); err != nil { | ||
return fmt.Errorf("failed to sync to container: %w", err) | ||
} | ||
|
||
// 清理临时文件 | ||
os.Remove(tmpFile) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
严重设计问题:在应用内部通过 exec.Command
调用 docker
CLI 来修改容器内的文件,这是非常危险且脆弱的做法。
这种方式存在以下问题:
- 破坏容器化原则:应用不应该感知到自己运行在容器中,更不应该依赖外部的
docker
命令。 - 安全性差:如果
containerName
等变量可被外部影响,可能导致命令注入漏洞。 - 可移植性差:强依赖于运行环境中存在并配置好了
docker
CLI。 - 脆弱性:任何
docker
命令的语法变化或环境问题都可能导致应用失败。
推荐的替代方案:
- 使用卷挂载(Volume Mounts):将 Prometheus 的规则目录挂载到
prometheus_adapter
容器的一个路径下,adapter
直接读写这个挂载的文件系统即可,Prometheus 会自动感知到变化。
func NewDeployAdapter(instanceManager InstanceManager) *DeployAdapter { | ||
return &DeployAdapter{ | ||
instanceManager: instanceManager, | ||
baseURL: "/Users/dingnanjia/workspace/mock/zeroops/internal/deploy/packages", // 本地包仓库路径 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
严重配置问题:代码中硬编码了一个开发者的本地绝对路径 "/Users/dingnanjia/workspace/mock/zeroops/internal/deploy/packages"
。
这会导致代码在任何其他环境或其他开发者机器上都无法正常工作。这个路径必须通过配置文件或环境变量进行配置,使其在不同部署环境中可以被修改。
baseURL: "/Users/dingnanjia/workspace/mock/zeroops/internal/deploy/packages", // 本地包仓库路径 | |
baseURL: os.Getenv("PACKAGE_REPO_BASE_URL"), // 或者从配置中读取 |
adapter, err := prometheusadapter.NewPrometheusAdapterServer(&config.Config{}) | ||
if err != nil { | ||
log.Fatal().Err(err).Msg("Failed to create Prometheus Adapter server") | ||
} | ||
|
||
// 获取 Prometheus Adapter 内部配置的绑定地址 | ||
bindAddr := ":9999" // 默认端口 | ||
if adapter.GetBindAddr() != "" { | ||
bindAddr = adapter.GetBindAddr() | ||
} | ||
|
||
// 如果有环境变量,优先使用环境变量的端口 | ||
if port := os.Getenv("ADAPTER_PORT"); port != "" { | ||
bindAddr = ":" + port | ||
} | ||
|
||
// 更新配置(虽然已经创建了 adapter,但需要端口信息用于启动服务器) | ||
cfg := &config.Config{ | ||
Server: config.ServerConfig{ | ||
BindAddr: bindAddr, | ||
}, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
id VARCHAR(255) NOT NULL PRIMARY KEY, -- VARCHAR类型主键,非自增,不为空 | ||
service_name VARCHAR(255), | ||
service_version VARCHAR(255), | ||
host_id VARCHAR(255), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if err.Error() == fmt.Sprintf("rule '%s' not found", ruleName) { | ||
SendErrorResponse(c, http.StatusNotFound, model.ErrorCodeInvalidParameter, | ||
err.Error(), nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func getDefaultConfig() *PrometheusAdapterConfig { | ||
return &PrometheusAdapterConfig{ | ||
Prometheus: PrometheusConfig{ | ||
Address: "http://10.210.10.33:9090", | ||
ContainerName: "mock-s3-prometheus", | ||
}, | ||
AlertWebhook: AlertWebhookConfig{ | ||
URL: "http://alert-module:8080/v1/integrations/alertmanager/webhook", | ||
PollingInterval: "10s", | ||
}, | ||
AlertRules: AlertRulesConfig{ | ||
LocalFile: "../rules/alert_rules.yml", | ||
PrometheusRulesDir: "/etc/prometheus/rules/", | ||
}, | ||
Server: ServerConfig{ | ||
BindAddr: "0.0.0.0:9999", | ||
}, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log_info "打包服务..." | ||
|
||
local package_name="storage-${VERSION}.tar.gz" | ||
local project_root="/Users/dingnanjia/workspace/mock/zeroops" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# 配置参数 | ||
SERVICE_NAME="storage" | ||
BASE_URL="http://localhost:8080" | ||
PACKAGE_PATH="/Users/dingnanjia/workspace/mock/zeroops/internal/deploy/packages" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func NewDeployAdapter(instanceManager InstanceManager) *DeployAdapter { | ||
return &DeployAdapter{ | ||
instanceManager: instanceManager, | ||
baseURL: "/Users/dingnanjia/workspace/mock/zeroops/internal/deploy/packages", // 本地包仓库路径 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚨 CRITICAL: Hardcoded absolute path
This hardcoded path will fail in any environment except your local machine. This breaks production deployments, CI/CD, and other developers' environments.
Fix: Use environment variable or configuration:
baseURL := os.Getenv("PACKAGE_REPO_PATH")
if baseURL == "" {
baseURL = "./packages" // Relative default
}
|
||
// 创建元信息 | ||
if len(rule.Labels) > 0 { | ||
labelsJSON, _ := json.Marshal(rule.Labels) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔴 HIGH: Ignored JSON error causes silent data corruption
If marshaling fails, labelsJSON
will be nil
and the error is ignored. This can lead to invalid data being stored without any indication.
Fix:
labelsJSON, err := json.Marshal(rule.Labels)
if err != nil {
log.Warn().Err(err).Msg("Failed to marshal labels")
continue
}
service := &AlertService{ | ||
promClient: promClient, | ||
config: config, | ||
currentRules: []model.AlertRule{}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currentRules
and currentRuleMetas
are accessed from HTTP handlers (concurrent goroutines) without any mutex protection. This can cause data races, panics, or corrupted state.
Fix: Add mutex:
type AlertService struct {
mu sync.RWMutex
currentRules []model.AlertRule
currentRuleMetas []model.AlertRuleMeta
// ...
}
Then use s.mu.Lock()
/s.mu.RLock()
in all methods that access these fields.
return &AlertmanagerService{ | ||
config: config, | ||
webhookURL: config.AlertWebhook.URL, | ||
httpClient: &http.Client{Timeout: 30 * time.Second}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚡ PERFORMANCE: Missing connection pooling
HTTP client without configured Transport
will not reuse connections, causing:
- Increased latency (TCP handshake overhead)
- Port exhaustion under load
- High memory usage
Fix:
httpClient: &http.Client{
Timeout: 30 * time.Second,
Transport: &http.Transport{
MaxIdleConns: 100,
MaxIdleConnsPerHost: 10,
IdleConnTimeout: 90 * time.Second,
},
}
|
||
// 解析 Prometheus 发送的告警 | ||
var alerts []model.AlertmanagerAlert | ||
body, err := io.ReadAll(r.Body) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔒 SECURITY: Unbounded request body read
Reading entire request body without size limits can cause OOM with large/malicious payloads.
Fix:
body, err := io.ReadAll(io.LimitReader(r.Body, 10*1024*1024)) // 10MB limit
@@ -0,0 +1,122 @@ | |||
package api |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚨 SECURITY CRITICAL: No authentication or authorization
All API endpoints are completely unauthenticated. Anyone with network access can:
- Create/update/delete alert rules
- Query metrics
- Trigger deployments
This is a critical security vulnerability for production use.
Recommendations:
- Implement API key or JWT authentication middleware
- Add authorization checks for different operations (RBAC)
- Add rate limiting to prevent abuse
- Consider mTLS for service-to-service communication
This should be addressed before merging to production.
综合评审总结通过代码质量、性能、安全和文档四个维度的专业评审,发现以下关键问题: 🚨 关键问题(必须修复)
|
变更背景和解决方案
关联issue: #
文档更新(架构文档、API文档、升级文档)
Checklist