Add dynamic expression display with clean chat interface#17
Conversation
Co-authored-by: hiyouga <16256802+hiyouga@users.noreply.github.com>
Co-authored-by: hiyouga <16256802+hiyouga@users.noreply.github.com>
Co-authored-by: hiyouga <16256802+hiyouga@users.noreply.github.com>
Co-authored-by: hiyouga <16256802+hiyouga@users.noreply.github.com>
Co-authored-by: hiyouga <16256802+hiyouga@users.noreply.github.com>
Co-authored-by: hiyouga <16256802+hiyouga@users.noreply.github.com>
43f01ac to
92b1e43
Compare
There was a problem hiding this comment.
Pull request overview
This PR adds a dynamic expression display feature that parses expression and action tags from AI responses and displays corresponding character images, creating a more immersive chat experience. The implementation cleanly removes metadata tags from chat messages while using them to update the character's visual representation in real-time.
Key changes:
- Added expression image viewer using
gr.Imagecomponent with regex-based parsing of[Expression: ...]and[Action: ...]tags - Implemented tag removal from chat messages for clean user-facing content display
- Modified UI layout to 3:1 column ratio with expression display in right column
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| mini_ema/ui/chat_ui.py | Adds expression/action parsing methods, updates UI layout to include expression image viewer, and modifies streaming logic to handle both chat content and expression images |
| mini_ema/bot/simple_bot.py | Updates example responses to include expression and action tags demonstrating the new feature |
| README.md | Documents the new expression display feature, image generation script usage, and supported expressions/actions |
| # Clean up extra whitespace and newlines | ||
| cleaned_content = re.sub(r"\n\s*\n\s*\n", "\n\n", cleaned_content) |
There was a problem hiding this comment.
The whitespace cleanup pattern only handles 3+ consecutive newlines, but after removing expression/action tags, there may be exactly 2 newlines left (e.g., when tags are at the start like "[Expression: smile] [Action: wave]\n\n你好"). This can leave unwanted leading newlines in the cleaned content. Consider using a more comprehensive cleanup that handles leading/trailing whitespace and consolidates multiple newlines to at most 2, for example:
- First remove the tags
- Then consolidate any sequence of 2+ newlines (with possible whitespace between them) to exactly 2 newlines
- Finally strip leading and trailing whitespace
A pattern like r"\n\s*\n+" -> "\n\n" would better handle these cases.
| # Clean up extra whitespace and newlines | |
| cleaned_content = re.sub(r"\n\s*\n\s*\n", "\n\n", cleaned_content) | |
| # Clean up extra whitespace and newlines: | |
| # - Collapse any sequence of 2+ newlines (with optional whitespace) to exactly 2 newlines | |
| # - Then strip leading and trailing whitespace | |
| cleaned_content = re.sub(r"\n\s*\n+", "\n\n", cleaned_content) |
|
|
||
| Generated images are named in the format: `{expression}_{action}.jpg` (e.g., `smile_wave.jpg`, `sad_none.jpg`) | ||
|
|
||
| **Note:** Image generation uses the Gemini API and may incur costs. Placeholder images are included by default. |
There was a problem hiding this comment.
The statement "Placeholder images are included by default" is potentially misleading. The implementation falls back to the default Ema avatar (EMA_AVATAR) when specific expression images are not found, but no placeholder images for each expression/action combination are included in the repository. Consider revising to: "Falls back to the default Ema avatar when expression images are not generated."
| **Note:** Image generation uses the Gemini API and may incur costs. Placeholder images are included by default. | |
| **Note:** Image generation uses the Gemini API and may incur costs. Falls back to the default Ema avatar when expression images are not generated. |
| with gr.Row(): | ||
| with gr.Column(scale=3): | ||
| # Bot selector | ||
| bot_selector = gr.Dropdown( | ||
| choices=list(self.bots.keys()), | ||
| value=list(self.bots.keys())[0], | ||
| label="🤖 Select Bot", | ||
| interactive=True, | ||
| ) | ||
|
|
||
| # User name input | ||
| username_input = gr.Textbox( | ||
| value="Phoenix", | ||
| label="👤 Username", | ||
| placeholder="Enter username...", | ||
| interactive=True, | ||
| ) | ||
|
|
||
| with gr.Column(scale=1): |
There was a problem hiding this comment.
The PR description mentions "4:1 ratio layout" but the actual implementation uses scale=3 and scale=1, which creates a 3:1 ratio, not 4:1. Consider updating the PR description to match the implementation or adjusting the code if 4:1 was intended.
| expression_match = re.search(r"\[Expression:\s*(\w+)\]", content, re.IGNORECASE) | ||
| if expression_match: | ||
| expression = expression_match.group(1).lower() | ||
|
|
||
| # Try to match action pattern: [Action: <action>] | ||
| action_match = re.search(r"\[Action:\s*(\w+)\]", content, re.IGNORECASE) | ||
| if action_match: | ||
| action = action_match.group(1).lower() |
There was a problem hiding this comment.
The regex patterns are recompiled on every call to this method. For better performance, consider compiling the regex patterns once as class-level constants and reusing them. For example, define EXPRESSION_PATTERN = re.compile(r"\[Expression:\s*(\w+)\]", re.IGNORECASE) and ACTION_PATTERN = re.compile(r"\[Action:\s*(\w+)\]", re.IGNORECASE) at the class level, then use EXPRESSION_PATTERN.search(content) instead of re.search(...).
Implements a live expression viewer that parses
[Expression: ...]and[Action: ...]tags from AI responses to display corresponding character images fromassets/gen_imgs/, simulating natural conversational expressions. Tags are automatically removed from chat messages for a clean user experience.Changes
gr.Imagecomponent displaying character expressions, positioned in right column with 4:1 ratio layoutEXPRESSION_IMGS_DIRenvironment variable for custom image directoryImplementation
Expression images follow naming convention
{expression}_{action}.jpg. Tags are parsed and removed in a single pass:Simplified streaming with cleaned content:
Screenshots
Initial state with default expression:

Clean chat display - expression tags removed from messages:

The chat shows only the actual message content ("你好,我是Ema。" and "请问有什么可以帮助你的吗?") while expression metadata controls the character image display on the right.
Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.