Skip to content

feat: add multilingual prompt template support (closes #85)#220

Open
Jah-yee wants to merge 2 commits intoHKUDS:mainfrom
Jah-yee:feat-multilingual-prompts
Open

feat: add multilingual prompt template support (closes #85)#220
Jah-yee wants to merge 2 commits intoHKUDS:mainfrom
Jah-yee:feat-multilingual-prompts

Conversation

@Jah-yee
Copy link

@Jah-yee Jah-yee commented Mar 2, 2026

Summary

This PR adds first-class support for multilingual prompt templates, including a complete Chinese (zh) set and a small runtime manager for switching prompt languages with lazy loading and safe fallback to English.

Motivation

Right now, all prompt templates are hard-coded in English, which can reduce RAG quality and user experience for non-English users (#85). Different languages often need different phrasing for instructions and system messages to work well with LLMs. Multilingual prompts and a simple runtime selector help RAG-Anything better serve a global user base.

Changes

  • raganything/prompts_zh.py
    • Full set of Chinese prompt templates (27 keys) mirroring the existing English keys, using natural, idiomatic Chinese phrasing for LLMs.
  • raganything/prompt_manager.py
    • set_prompt_language(lang) to switch the active prompt language at runtime (e.g. "zh", "en").
    • reset_prompts() to restore the default language and state.
    • register_prompt_language(lang, prompts) to register additional languages (e.g. "ja") without changing core code.
    • Lazy loading of language modules so unused languages do not add overhead.
    • Fallback to English when a key is missing in the selected language.
  • tests/test_prompt_language.py
    • Tests for language switching, lazy loading behavior, registering a new language, fallback semantics when keys are missing, and reset_prompts().

Testing

  • Ran pytest locally including tests/test_prompt_language.py; all tests passed.
  • Manually tested switching between "en" and "zh" to confirm correct localized templates and English fallback where needed.

Thanks for your work on RAG-Anything—if you’d like different language codes, file naming, or prompt phrasing, I’m happy to revise this PR.

@LarFii
Copy link
Collaborator

LarFii commented Mar 4, 2026

I did a Codex-assisted review of this PR and found a few issues worth addressing before merge:

  1. Config-based activation is documented but not implemented (P1)
    In prompts_zh.py, the docstring says users can enable Chinese prompts via RAGAnythingConfig.prompt_language = "zh".
    However, this PR does not add prompt_language to RAGAnythingConfig, and I don’t see initialization wiring in RAGAnything that calls set_prompt_language(...).
    So currently this works only via manual runtime API calls, which is a doc/behavior mismatch.

  2. Language code normalization is inconsistent (P2)
    set_prompt_language() normalizes input with strip().lower(), but register_prompt_language() stores language codes as-is.
    Example: registering "FR" then switching to "fr" fails.
    Suggestion: normalize codes in register_prompt_language() too, and add a test for mixed-case registration.

  3. Global prompt mutation is not concurrency-safe (P2)
    set_prompt_language() mutates the global PROMPTS dict key-by-key.
    Under concurrent requests, readers can observe partially switched state (mixed-language prompts).
    Suggestion: make switching atomic (at least lock-protected), or better scope prompt sets per instance/request instead of global mutable state.

@Jah-yee
Copy link
Author

Jah-yee commented Mar 4, 2026

Thanks a lot for the Codex-assisted review on the multilingual prompts PR.

  • Config-based activation: RAGAnythingConfig now has a prompt_language field (backed by PROMPT_LANGUAGE), and RAGAnything.post_init reads it and calls set_prompt_language(...) at initialization time, so the documented RAGAnythingConfig.prompt_language = "zh" flow is now actually wired into the main pipeline.
  • Language code normalization: prompt_manager now uses a shared _normalize_language_code() helper in both set_prompt_language() and register_prompt_language(). Codes are normalized with strip().lower() at registration time as well, so registering "FR" and then switching with set_prompt_language("fr") works as expected. tests/test_prompt_language.py now includes a mixed-case registration test.
  • Concurrency safety: I introduced a threading.RLock around PROMPTS updates. set_prompt_language() now builds a resolved prompts dict (target language plus English fallback), then swaps PROMPTS and updates _current_language inside the lock; reset_prompts() does the same to restore the English baseline. This avoids partially switched mixed-language states under concurrent access.

If you’d like prompt_language to default to None (opt-in only) or prefer a different env var name, I’m happy to tweak it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants