Introduce StickerService; add duplicate detection and cleanup#48
Introduce StickerService; add duplicate detection and cleanup#48
Conversation
审阅者指南此拉取请求将所有贴纸管理重构为一个独立的 StickerService,该服务具有基于感知哈希的去重、垃圾回收和缩略图网格重新生成功能;更新了 ActionHandler、MessageBuilder、PromptBuilder 和引导连接,以使用新服务;并扩展了数据库层,增加了对 perceptual_hash 的支持和相似性查找功能。 带有感知哈希去重功能的贴纸添加时序图sequenceDiagram
participant User
participant ActionHandler
participant StickerService
participant StickerStorageService
participant EventStorageService
User->>ActionHandler: Request add sticker (image_hash, impression)
ActionHandler->>StickerService: manage_stickers(add)
StickerService->>EventStorageService: find_event_by_image_hash(image_hash)
EventStorageService-->>StickerService: event_doc
StickerService->>StickerService: calculate_perceptual_hash(image_bytes)
StickerService->>StickerStorageService: find_similar_sticker_by_phash(perceptual_hash)
StickerStorageService-->>StickerService: similar_sticker or None
alt Similar sticker found
StickerService-->>ActionHandler: Return deduplication message
else No similar sticker
StickerService->>StickerStorageService: add_sticker(..., perceptual_hash)
StickerStorageService-->>StickerService: sticker_doc
StickerService-->>ActionHandler: Return success message
end
启动时贴纸垃圾回收时序图sequenceDiagram
participant System
participant StickerService
participant StickerStorageService
System->>StickerService: run_garbage_collection()
StickerService->>StickerStorageService: get_all_stickers(platform)
StickerStorageService-->>StickerService: all_db_stickers
StickerService->>StickerService: Scan sticker files, compare with DB
StickerService->>StickerService: Delete orphan files
StickerService-->>System: Return summary
重构后的贴纸管理类图classDiagram
class ActionHandler {
- sticker_service: StickerService | None
+ set_dependencies(..., sticker_service: StickerService)
+ process_action_flow(...)
}
class StickerService {
+ get_all_stickers(platform_id)
+ get_sticker_file_path(platform_id, sticker_id)
+ manage_stickers(platform_id, params)
+ add_sticker(platform_id, params)
+ remove_sticker(platform_id, params)
+ edit_impression(platform_id, params)
+ regenerate_sticker_grid(platform_id)
+ run_garbage_collection()
}
class StickerStorageService {
+ add_sticker(platform, filename, impression, source_image_hash, perceptual_hash)
+ find_similar_sticker_by_phash(platform, phash_to_check, tolerance)
+ get_sticker_by_id(platform, sticker_id)
+ get_all_stickers(platform)
+ remove_sticker(platform, sticker_id)
+ edit_impression(platform, sticker_id, new_impression)
}
class EventStorageService {
+ find_event_by_image_hash(image_hash)
}
class MessageBuilder {
+ _add_sticker(sticker_id)
}
class PromptBuilder {
+ _build_system_prompt_blocks(...)
}
ActionHandler --> StickerService
StickerService --> StickerStorageService
StickerService --> EventStorageService
MessageBuilder --> ActionHandler
PromptBuilder --> ActionHandler
文件级更改
提示和命令与 Sourcery 交互
自定义您的体验访问您的 仪表盘 以:
获取帮助Original review guide in EnglishReviewer's GuideThis PR refactors all sticker management into a standalone StickerService with perceptual‐hash–based deduplication, garbage collection, and thumbnail grid regeneration; updates ActionHandler, MessageBuilder, PromptBuilder, and bootstrap wiring to use the new service; and extends the database layer with perceptual_hash support and similarity lookup. Sequence diagram for sticker add with perceptual hash deduplicationsequenceDiagram
participant User
participant ActionHandler
participant StickerService
participant StickerStorageService
participant EventStorageService
User->>ActionHandler: Request add sticker (image_hash, impression)
ActionHandler->>StickerService: manage_stickers(add)
StickerService->>EventStorageService: find_event_by_image_hash(image_hash)
EventStorageService-->>StickerService: event_doc
StickerService->>StickerService: calculate_perceptual_hash(image_bytes)
StickerService->>StickerStorageService: find_similar_sticker_by_phash(perceptual_hash)
StickerStorageService-->>StickerService: similar_sticker or None
alt Similar sticker found
StickerService-->>ActionHandler: Return deduplication message
else No similar sticker
StickerService->>StickerStorageService: add_sticker(..., perceptual_hash)
StickerStorageService-->>StickerService: sticker_doc
StickerService-->>ActionHandler: Return success message
end
Sequence diagram for sticker garbage collection on startupsequenceDiagram
participant System
participant StickerService
participant StickerStorageService
System->>StickerService: run_garbage_collection()
StickerService->>StickerStorageService: get_all_stickers(platform)
StickerStorageService-->>StickerService: all_db_stickers
StickerService->>StickerService: Scan sticker files, compare with DB
StickerService->>StickerService: Delete orphan files
StickerService-->>System: Return summary
Class diagram for refactored sticker managementclassDiagram
class ActionHandler {
- sticker_service: StickerService | None
+ set_dependencies(..., sticker_service: StickerService)
+ process_action_flow(...)
}
class StickerService {
+ get_all_stickers(platform_id)
+ get_sticker_file_path(platform_id, sticker_id)
+ manage_stickers(platform_id, params)
+ add_sticker(platform_id, params)
+ remove_sticker(platform_id, params)
+ edit_impression(platform_id, params)
+ regenerate_sticker_grid(platform_id)
+ run_garbage_collection()
}
class StickerStorageService {
+ add_sticker(platform, filename, impression, source_image_hash, perceptual_hash)
+ find_similar_sticker_by_phash(platform, phash_to_check, tolerance)
+ get_sticker_by_id(platform, sticker_id)
+ get_all_stickers(platform)
+ remove_sticker(platform, sticker_id)
+ edit_impression(platform, sticker_id, new_impression)
}
class EventStorageService {
+ find_event_by_image_hash(image_hash)
}
class MessageBuilder {
+ _add_sticker(sticker_id)
}
class PromptBuilder {
+ _build_system_prompt_blocks(...)
}
ActionHandler --> StickerService
StickerService --> StickerStorageService
StickerService --> EventStorageService
MessageBuilder --> ActionHandler
PromptBuilder --> ActionHandler
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Summary of Changes
Hello @Dax233, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request significantly refactors the application's sticker management system. The primary goal is to centralize and enhance sticker-related functionalities by introducing a dedicated StickerService. This new service encapsulates all business logic previously scattered across different modules, improving code organization and maintainability. Key enhancements include the implementation of perceptual hashing to detect and prevent the addition of duplicate or visually similar stickers, and a new garbage collection routine to clean up unreferenced sticker files on disk. The changes involve updating dependencies across various components to utilize the new service, modifying the sticker database schema to support perceptual hashes, and adding new utility functions for image processing.
Highlights
- Sticker Management Refactoring: The core logic for managing stickers (adding, removing, editing impressions, and regenerating grids) has been extracted from
ActionHandlerinto a new, dedicatedStickerService. This improves modularity and separation of concerns. - Introduction of StickerService: A new
StickerServicehas been introduced, centralizing all sticker-related business logic. This service now handles file operations, database interactions, and image processing for stickers. - Duplicate Sticker Detection with Perceptual Hashing: The system now uses perceptual hashing (pHash) to detect visually similar stickers before adding new ones. This prevents the storage of redundant or near-duplicate images, improving collection efficiency.
- Sticker File Garbage Collection: A garbage collection mechanism has been implemented within the
StickerServiceto identify and remove sticker image files from disk that are no longer referenced in the database. This helps maintain a clean and efficient storage. - Database Schema and Index Update: The
StickerDocumentdatabase model has been updated to include aperceptual_hashfield, and a new database index has been added to support efficient lookups for similar stickers.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
A sticker's journey, now refined, From handler's grasp, a service designed. With hash and cleanup, duplicates flee, A cleaner collection, for all to see.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
嗨 @Dax233 - 我已经审阅了你的更改,它们看起来很棒!
AI 代理提示
请解决此代码审查中的注释:
## 单个评论
### 评论 1
<location> `src/action/services/sticker_service.py:93` </location>
<code_context>
+ async def add_sticker(self, platform_id: str, params: dict) -> str:
</code_context>
<issue_to_address>
在添加新贴纸之前,`add_sticker` 会检查是否存在视觉上相似的贴纸。
使相似度容忍度可配置将提高灵活性,并允许针对特定平台进行调整。
Suggested implementation:
```python
async def add_sticker(self, platform_id: str, params: dict, similarity_tolerance: float = 0.85) -> str:
"""处理添加表情包的逻辑 (加入查重, 支持可配置的相似度容忍度)."""
image_hash = params.get("image_hash")
impression = params.get("impression")
if not image_hash or not impression:
return "错误:添加表情包缺少 image_hash 或 impression。"
event_doc = await self.event_storage_service.find_event_by_image_hash(image_hash)
if not event_doc:
return f"错误:找不到哈希值为 '{image_hash}' 的图片来源。"
```
1. 无论在哪里调用 `add_sticker`,都应传入所需的 `similarity_tolerance` 值,可以来自平台配置或作为参数。
2. 在检查视觉相似贴纸的代码中(未在你的代码片段中显示),将任何硬编码的容忍度值替换为 `similarity_tolerance` 参数。
3. 如果你有配置系统,请考虑根据 `platform_id` 从中获取容忍度。
</issue_to_address>
### 评论 2
<location> `src/common/image_utils.py:26` </location>
<code_context>
+ return None
+
+
+def compare_phashes(hash1: str, hash2: str, tolerance: int = 5) -> bool:
+ """比较两个感知哈希字符串的汉明距离.
+
+ Args:
+ hash1 (str): 第一个哈希值.
+ hash2 (str): 第二个哈希值.
+ tolerance (int): 相似度容忍度。汉明距离小于等于此值被认为是相似图片.
+ 默认值 5 是一个比较常用的阈值.
+
+ Returns:
+ bool: 如果图片相似则返回 True.
+ """
+ if len(hash1) != len(hash2):
+ return False
+
+ # 将十六进制字符串转换为整数进行汉明距离计算
+ h1 = int(hash1, 16)
+ h2 = int(hash2, 16)
+
+ # 计算汉明距离
+ distance = bin(h1 ^ h2).count("1")
+
+ return distance <= tolerance
</code_context>
<issue_to_address>
`compare_phashes` 假定两个哈希值都是长度相等的有效十六进制字符串。
添加输入验证以确保两个哈希值都是有效的十六进制字符串且长度相等,以防止意外错误。
</issue_to_address>帮助我更有用!请在每个评论上点击 👍 或 👎,我将使用您的反馈来改进您的评论。
Original comment in English
Hey @Dax233 - I've reviewed your changes and they look great!
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location> `src/action/services/sticker_service.py:93` </location>
<code_context>
+ async def add_sticker(self, platform_id: str, params: dict) -> str:
</code_context>
<issue_to_address>
add_sticker checks for visually similar stickers before adding a new one.
Making the similarity tolerance configurable would improve flexibility and allow for platform-specific adjustments.
Suggested implementation:
```python
async def add_sticker(self, platform_id: str, params: dict, similarity_tolerance: float = 0.85) -> str:
"""处理添加表情包的逻辑 (加入查重, 支持可配置的相似度容忍度)."""
image_hash = params.get("image_hash")
impression = params.get("impression")
if not image_hash or not impression:
return "错误:添加表情包缺少 image_hash 或 impression。"
event_doc = await self.event_storage_service.find_event_by_image_hash(image_hash)
if not event_doc:
return f"错误:找不到哈希值为 '{image_hash}' 的图片来源。"
```
1. Wherever `add_sticker` is called, you should pass the desired `similarity_tolerance` value, either from platform config or as a parameter.
2. In the code that checks for visually similar stickers (not shown in your snippet), replace any hardcoded tolerance value with the `similarity_tolerance` parameter.
3. If you have a config system, consider fetching the tolerance from there based on `platform_id`.
</issue_to_address>
### Comment 2
<location> `src/common/image_utils.py:26` </location>
<code_context>
+ return None
+
+
+def compare_phashes(hash1: str, hash2: str, tolerance: int = 5) -> bool:
+ """比较两个感知哈希字符串的汉明距离.
+
+ Args:
+ hash1 (str): 第一个哈希值.
+ hash2 (str): 第二个哈希值.
+ tolerance (int): 相似度容忍度。汉明距离小于等于此值被认为是相似图片.
+ 默认值 5 是一个比较常用的阈值.
+
+ Returns:
+ bool: 如果图片相似则返回 True.
+ """
+ if len(hash1) != len(hash2):
+ return False
+
+ # 将十六进制字符串转换为整数进行汉明距离计算
+ h1 = int(hash1, 16)
+ h2 = int(hash2, 16)
+
+ # 计算汉明距离
+ distance = bin(h1 ^ h2).count("1")
+
+ return distance <= tolerance
</code_context>
<issue_to_address>
compare_phashes assumes both hashes are valid hexadecimal strings of equal length.
Add input validation to ensure both hashes are valid hexadecimal strings and of equal length to prevent unexpected errors.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Summary by Sourcery
引入一个专用的 StickerService,以集中所有贴纸管理逻辑,包括基于感知哈希的重复检测和孤立文件垃圾回收,并重构现有组件 (ActionHandler, MessageBuilder, PromptBuilder),将贴纸操作委托给这个新服务。
New Features:
Enhancements:
perceptual_hash并提供相似性搜索 API。Original summary in English
Summary by Sourcery
Introduce a dedicated StickerService to centralize all sticker management logic, including perceptual-hash-based duplicate detection and orphaned-file garbage collection, and refactor existing components (ActionHandler, MessageBuilder, PromptBuilder) to delegate sticker operations to this new service.
New Features:
Enhancements: