-
-
Notifications
You must be signed in to change notification settings - Fork 206
Optimize the opera log storage logic through queue #750
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while True 事件是否合理?我不确定它是否会占用大量资源,当无用户操作时,此程序无限期运行,轮询时间将取决于 timeout 配置
@downdawn 可以看看这个吗?
这个while True肯定不合理的,这个有很多解决方案,但是感觉都容易弄成套娃。 最豪的就是缓存redis+定时任务 我的建议是:内存缓冲 + 批量更新 |
@downdawn 能说明 whie True 不合理的地方吗,我认为这里就是一个永久运行的协程任务,该任务会等待指定 timeout 指定超时时间或者满足内存队列中数据达到指定size数据后,批量插入数据到数据库,这里并不会导致cpu满载,超时时间默认是1秒,也可以指定更长超时时间,我认为一秒的计时频率并不会给程序带来多大的cpu开销 |
仔细看了一下,这里用whie True也是正常的,只是个人对whie True很谨慎。 内存缓冲 + 批量更新的方案,缺点就是可能会丢失部分日志。 如果要实现分布式,依赖redis做缓冲和分布式锁的方案我觉得会更加好一点。 |
而且既然要做,这里可能要做一下对比,压测一下性能,然后再做选择 |
@downdawn 感谢回复,我后面会对性能出一份压测数据,
|
@IAseven 十分感谢,我将在今晚审核这个 |
@IAseven 我对部分代码进行了更新,如果你有任何疑问,请随时答复,如果没有其他问题,我将稍后合并此 PR. |
backend/core/conf.py
Outdated
@@ -177,6 +177,8 @@ class Settings(BaseSettings): | |||
'new_password', | |||
'confirm_password', | |||
] | |||
OPERA_LOG_QUEUE_MAX: int = 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OPERA_LOG_QUEUE_MAX
参数名有点歧义,看上去是整个queue队列的maxsize,应该是单次从queue批量获取数据量OPERA_LOG_QUEUE_TIMEOUT
一分钟是否有点太长?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Agree
- 1 分钟感觉还好,空闲状态下,不会过于频繁执行任务,并发情况下,1分钟则并不重要,因为 items 会很快填满
backend/common/queue.py
Outdated
while len(results) < max_items: | ||
item = await queue.get() | ||
results.append(item) | ||
while len(items) < max_items: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我不太明白这里使用自定义计时器优势有哪些
asyncio.wait_for
可以等该任意可等待对象,
queue.get
默认会一直阻塞,直到队列有数据可以弹出
文档 https://docs.python.org/zh-cn/3.13/library/asyncio-task.html#asyncio.wait_for
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我仔细想了下,collector 实现确实更符合目标,总体超时,而不是多开携程
存在问题
原有的日志中间件在每次接收请求时都会调用数据库插入接口,当高并发场景可能会导致数据库写入瓶颈
解决方案
使用本地内存消息队列来实现超时批量插入
压测性能
压测脚本
改造前
改造后