-
Notifications
You must be signed in to change notification settings - Fork 399
First-person experience detection does not support Chinese (中文) #834
Description
Use Case
I'm running Hindsight as the memory backend for a Chinese-language AI agent system (OpenClaw). All conversations between the user and the AI assistant are conducted in Chinese. The agent performs tasks like Reddit monitoring, SEO keyword research, Feishu integration, and the extracted facts are predominantly in Chinese.
Problem Statement
The experience fact type was introduced in v0.4.22 to classify first-person agent actions/experiences separately from world facts. However, after upgrading to 0.4.22, my main bank has 465 memories with 0 experience facts — every single one is classified as either world or observation.
The root cause is in fact_extraction.py:
The _FIRST_PERSON_PATTERN regex only matches English patterns like I fixed, I created, User configured, etc. Chinese first-person expressions (e.g., "我修复了", "助手完成了", "我发现了") are completely unmatched.
The LLM fact extraction prompt instructs the model to use "assistant" for first-person facts (which maps to experience), but when the conversation and extracted facts are in Chinese, the LLM tends to describe things in third person (e.g., "助手成功写入5条记录" instead of "I wrote 5 records"), so the LLM itself rarely outputs fact_type: "assistant".
As a result, the entire experience classification pipeline is effectively non-functional for Chinese-language memory banks.
How This Feature Would Help
With proper Chinese support, the experience fact type would correctly capture agent actions and learnings — things like "我通过 app_id + app_secret 获取 token 写入了飞书表格" or "助手从 Reddit 提取了75个关键词". This would enable better recall filtering (querying only experiences vs world facts) and richer Mental Model generation.
Proposed Solution
1.Extend _FIRST_PERSON_PATTERN to include Chinese first-person markers:
_FIRST_PERSON_PATTERN = _re.compile(
# English patterns (existing)
r'(?:(?:^|[.!?]\s+)'
r'(?:I|[Uu]ser)\s+'
r'(?:fixed|debugged|traced|patched|refactored|implemented|deployed|'
r'discovered|found|resolved|changed|updated|modified|created|built|'
r'configured|installed|migrated|optimized|investigated|analyzed|'
r'tested|verified|confirmed|learned|decided|chose|designed|wrote|'
r'added|removed|deleted|merged|committed|pushed|pulled|reviewed|'
r'diagnosed|troubleshot|upgraded|downgraded|reverted|set\s+up|'
r'cleaned\s+up|wrapped\s+up|finished|completed|started|began))'
# Chinese patterns (new)
r'|(?:我|助手|助理|本人|agent)'
r'(?:修复了|调试了|实现了|部署了|发现了|找到了|解决了|'
r'更改了|更新了|修改了|创建了|构建了|配置了|安装了|'
r'迁移了|优化了|调查了|分析了|测试了|验证了|确认了|'
r'学到了|决定了|选择了|设计了|编写了|添加了|删除了|'
r'完成了|开始了|搭建了|整理了|清理了|设置了|写入了|'
r'提取了|采集了|抓取了|执行了|运行了|搞定了|处理了|'
r'成功|尝试|记录了|读取了|导入了|导出了)',
_re.IGNORECASE | _re.MULTILINE
)
2.Improve the LLM prompt for Chinese contexts — add Chinese examples in the fact extraction prompt showing when to use fact_type: "assistant":
示例:
- "助手从Reddit提取了75个关键词并写入飞书表格" → fact_type: "assistant"
- "王先生要求每周一执行Reddit监控" → fact_type: "world"
- "我发现 openclaw-lark 和 feishu 插件同时启用会冲突" → fact_type: "assistant"
Alternatives Considered
Manually post-processing facts to reclassify Chinese first-person descriptions as experience via the API. This works but is fragile and doesn't scale.
Forcing the agent to converse in English so facts are extracted in English. Not practical for Chinese-speaking users.
Priority
Nice to have
Additional Context
No response
Checklist
- I would be willing to contribute this feature