Skip to content

Review app and improve performance#1

Open
Bafff wants to merge 27 commits intomainfrom
claude/app-performance-review-011CUtrkG98PYbtkcieHdt2k
Open

Review app and improve performance#1
Bafff wants to merge 27 commits intomainfrom
claude/app-performance-review-011CUtrkG98PYbtkcieHdt2k

Conversation

@Bafff
Copy link
Copy Markdown
Owner

@Bafff Bafff commented Nov 11, 2025

@codex please check

claude added 27 commits November 7, 2025 16:55
Performance Optimizations:
- Add automatic retry with exponential backoff for API throttling (429) and server errors (5xx)
- Increase pagination size from 50 to 100 messages per request
- Implement parallel export processing for multiple chats (up to 3 concurrent)
- Add progress tracking for multi-chat exports

User Experience Improvements:
- Add interactive chat selection menu with chat names, types, and last activity
- Implement Jira-friendly markdown format as default with HTML cleanup
- Handle multiple search matches with interactive selection instead of errors
- Default to "jira" format for easy copy-paste into Jira tickets
- Show real-time progress indicators during export

New Features:
- formatters.py: Jira markdown formatter with HTML-to-text conversion
- interactive.py: Interactive chat selection utilities
- Support for "jira", "jira-markdown", and "markdown" format aliases

Technical Changes:
- GraphClient: Add _request_with_retry() for resilient API calls
- CLI: Use ThreadPoolExecutor for parallel exports
- Exporter: Support new Jira format with chat metadata
- README: Updated with comprehensive usage examples and feature list
Performance & UX Improvements:
- Add real-time progress indicator during chat loading
- Show "Loading chats... N loaded" with live count updates
- Display authentication success message
- Load ALL chats instead of limiting to 100 for complete chat list

GraphClient enhancements:
- Add progress_callback parameter to list_chats() and _paginate()
- Add max_items parameter for optional limiting
- Increase chat pagination from default to 50 per request

CLI improvements:
- Show "✓ Authenticated successfully" after login
- Display live progress: "Loading chats... 1448 loaded"
- Show final count: "✓ Loaded 1448 chats"
- Remove artificial 100-chat limit that caused missing chats

Interactive menu:
- Update to reflect full chat loading (no "showing limited" message)
- All chats now visible and sortable by lastUpdatedDateTime

Fixes issue where active chats weren't visible in the menu because
they weren't in the first 100 chats returned by Graph API.
Performance improvements:
- Add 5-minute local cache for chat lists (~/.teams-exporter/cache/)
- First run loads from API, subsequent runs are instant
- Add --refresh-cache flag to force cache refresh

UX improvements:
- Add interactive search in chat selection menu
- Press 's' to search by chat name or participant across ALL chats
- Solves issue where recent chats don't appear in top-20 due to
  Microsoft Teams not updating lastUpdatedDateTime consistently

New module:
- cache.py: Simple file-based cache with TTL support

CLI changes:
- Shows "Loaded from cache" message when using cached data
- Shows "Loading from Microsoft Graph" when fetching fresh data
- Search feature works on all 1000+ chats, not just top-20

Example workflow:
1. Run teams-export (loads 1448 chats in ~30s, caches them)
2. Press 's' for search
3. Type "Games loading"
4. Select from matching results
Critical fix: Use lastMessagePreview for accurate chat sorting
- Change from lastUpdatedDateTime to lastMessagePreview.createdDateTime
- This matches how Teams desktop client sorts chats
- Fixes issue where active chats don't appear in top-20

Technical changes:
- GraphClient: Add lastMessagePreview to $expand parameter
- interactive.py: Use lastMessagePreview.createdDateTime for sorting
- Fallback to lastUpdatedDateTime if preview unavailable

Why this matters:
Microsoft Teams doesn't always update lastUpdatedDateTime when
new messages arrive. The lastMessagePreview field contains the
actual timestamp of the last message, which is what the desktop
client uses for "most recent" sorting.

IMPORTANT: Users should run with --refresh-cache after updating
to ensure the new lastMessagePreview field is loaded from API.
- Fix critical bug: Revert $top from 100 to 50 for chat messages endpoint
  (Graph API returns 400 error with $top=100, max is 50)
- Document Graph API sorting limitation in README with official source
  (explains why all chats must be loaded for correct sorting)
- Increase cache TTL from 5 minutes to 24 hours for better UX
- Add 'c' key in interactive menu to refresh cache without restart
- Update CLI to handle cache refresh action and reload chat list
- Update all documentation to reflect 24h cache and new controls
- Show cache TTL in status message with refresh instructions

Interactive menu now supports:
- Number selection (1-20)
- 's' for search
- 'c' for cache refresh (new!)
- 'q' to quit
Fixes issue where Jira formatting wasn't being applied in tickets.

**Changes:**
- Replace Jira Wiki Markup with standard Markdown syntax:
  - `h2.` / `h3.` → `##` / `###`
  - `{quote}` → `>` (blockquote)
  - `*bold*` → `**bold**`
  - `_italic_` → `*italic*`
  - `----` → `---`

- Add image support in attachments:
  - Images rendered as `![name](url)` for inline display
  - Other files as clickable links `[name](url)`
  - Detects images by contentType or file extension

- Change file extension from .txt to .md for better compatibility

- Update documentation to reflect standard Markdown format

The new format works seamlessly in Jira, GitHub, Confluence,
and any other Markdown-compatible platform.
Eliminates code duplication for chat loading with progress indicator.

**Changes:**
- Add `_load_chats_with_progress()` helper function
- Use in both initial load and cache refresh ('c' command)
- Remove duplicated progress callback definitions
- Cleaner, more maintainable code

**Fix:** Progress bar now displays correctly when refreshing cache
with 'c' key in interactive menu.
Emoji characters (like 🚀) occupy 2 terminal columns but Python's len()
counts them as 1 character, causing misalignment in formatted tables.

**Changes:**
- Add wcwidth>=0.2 dependency for proper Unicode width calculation
- Add helper functions:
  - _visual_width(): Calculate true terminal width
  - _truncate_to_width(): Truncate text respecting visual width
  - _pad_to_width(): Pad text to target visual width
- Replace all len() and simple string slicing with width-aware functions
- Apply to both main chat table and search results

**Result:**
Chat names with emoji now align properly in columns:
- Before: "Admin Site Support 🚀" causes date column to shift
- After: All columns aligned regardless of emoji presence
Changes:
- Sort messages chronologically (oldest to newest) before export
- Add interactive date range picker with presets (7/30/90/365 days)
- Support custom date range selection interactively
- Default to last year for --all and --user/--chat modes when dates not specified

User experience improvements:
- Clear preset options for common time ranges
- Ability to cancel export from date selection
- Date range confirmation before export starts
Teams stores inline images as <img> tags in the HTML content field,
not in the attachments array. This update:
- Adds _extract_images_from_html() to parse inline images from HTML
- Modifies _strip_html() to remove <img> tags before text processing
- Updates _format_jira_message() to format inline images as Markdown

Fixes issue where messages with only inline images showed "[No content]"
Implements local image download to make exports standalone and accessible
without Microsoft authentication. Key changes:

- Add _download_attachment() to fetch images using authenticated Graph client
- Add _extract_image_urls() to find all images in messages (inline + attachments)
- Add _download_attachments() to download images to local subfolder
- Update formatters to use local paths when available (url_mapping)
- Add --download-attachments CLI flag (enabled by default)
- Images saved to {chat_name}_{date}_files/ directory
- Markdown updated to reference local files: ![alt](./files/image.png)

This resolves authentication issues with image URLs and ensures exports
work offline with all images included locally.
Images were being saved with .bin extension because we weren't checking
the actual Content-Type from HTTP response headers.

Changes:
- Add _get_extension_from_mime() to map MIME types to file extensions
- Update _download_attachment() to return Content-Type from response headers
- Modify _download_attachments() to use correct extension based on MIME type
- Download to temp file first, then rename with correct extension
- Supports common image formats: png, jpg, gif, bmp, webp, svg, tiff

This ensures images are saved with proper extensions (.png, .jpg, etc.)
and can be opened directly in OS and displayed in WYSIWYG Markdown editors.
Perfect for copy-pasting into Jira/Confluence! Images are embedded
directly in the HTML as base64 data URLs, so when you copy from the
browser, images are included in the clipboard.

Features:
- Add write_html() formatter with embedded base64 images
- Add _image_to_base64() to convert local images to data URLs
- Add _format_html_message() for HTML formatting with inline styles
- Support html format in exporter (downloads attachments like jira)
- Update CLI help to include html format option
- Generated HTML can be opened in browser, copied, and pasted to Jira

Usage:
  teams-export --chat "My Chat" --format html

Then:
  1. Open the .html file in your browser
  2. Select all (Ctrl+A)
  3. Copy (Ctrl+C)
  4. Paste into Jira/Confluence (Ctrl+V)

All images will be embedded and visible!
Previously, HTML export showed remote URLs which required authentication.
Now images are properly converted to base64 data URLs.

Changes:
- Pass url_mapping to write_html() instead of attachments_dir
- Update _format_html_message() to accept url_mapping parameter
- Use url_mapping to resolve remote URLs to local file paths
- Convert local image files to base64 data URLs for embedding
- Apply same logic to both inline images and file attachments

This ensures that when you copy HTML from browser and paste into
Jira/Confluence, all images are embedded and visible without authentication.
When using --chat or --user without specifying dates, the default was
set to last 365 days, which exported too much data.

Changed to last 24 hours as the default, which matches the "Today"
option in the interactive date selector and is more reasonable for
typical use cases.

This fixes the issue where exports contained a year of data instead
of just today's messages.
When copying from browser with Ctrl+A+C, base64 images don't transfer
properly to Jira/Confluence clipboard. Added:

- Floating "Copy to Clipboard" button in top-right corner
- JavaScript that converts base64 images to Blob URLs before copying
- Event handler for manual Ctrl+C that also converts images
- Visual feedback with success/error messages

How it works:
1. Open HTML file in browser
2. Click "📋 Copy to Clipboard" button (or use Ctrl+C)
3. JavaScript converts data:image/png;base64,... to blob URLs
4. Paste into Jira/Confluence - images now work correctly!

This solves the issue where right-click → copy image worked,
but Ctrl+A+C didn't properly transfer images.
Removed complex blob URL conversion and clipboard API attempts.
Using simple execCommand('copy') which should preserve rendered
content including base64 images.
- Added python-docx dependency to pyproject.toml
- Implemented write_docx() formatter with embedded images
- Updated exporter to support 'docx' and 'word' format options
- Updated CLI help text to recommend docx for Jira/Confluence
- Word format embeds images as binary data, ensuring proper
  copy-paste support in Jira/Confluence (unlike HTML base64)

Usage: teams-export --chat "Chat Name" --format docx
Then open the .docx file, select all, copy, and paste into Jira.
The messages are pre-processed before being passed to formatters,
so we need to use:
- message.get("sender") not message.get("from")
- message.get("timestamp") not message.get("createdDateTime")
- message.get("content") not message.get("body", {}).get("content")

This matches the field names used in the Jira markdown formatter.
Images were only being downloaded for jira and html formats.
Added docx to the list so images are downloaded and can be
embedded in Word documents.
Changes:
- Extended _get_extension_from_mime() with 40+ MIME types
  (documents, archives, videos, audio, code files, etc.)
- Renamed _extract_image_urls() to _extract_attachment_urls()
  Returns tuples (url, is_image) to distinguish file types
- Updated _download_attachments() to handle all file types
  Shows counts: "Downloading X image(s) and Y file(s)..."
  Uses appropriate filename prefixes (image_XXX vs file_XXX)
- Added attachment support to docx formatter
  Images are embedded in the Word document
  Non-image files are shown as styled links with 📎 icon
- Jira and HTML formatters already supported non-image attachments

Now users can export all attachments from Teams chats,
including documents, PDFs, archives, etc.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/teams_export/cli.py
Comment on lines +315 to +337
# Use parallel processing for multiple chats
if len(selected_chats) > 1:
typer.echo(f"\nExporting {len(selected_chats)} chats in parallel...")

def export_single_chat(chat):
title = _chat_title(chat)
try:
output_path, count = export_chat(
client,
chat,
start_dt,
end_dt,
output_dir=output_dir,
output_format=output_format,
download_attachments=download_attachments,
)
return (title, output_path, count, None)
except Exception as exc:
return (title, None, 0, str(exc))

# Use ThreadPoolExecutor for parallel downloads (limited to 3 concurrent)
with ThreadPoolExecutor(max_workers=3) as executor:
futures = {executor.submit(export_single_chat, chat): chat for chat in selected_chats}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid sharing GraphClient across worker threads

The new parallel export path submits multiple export_chat calls to a ThreadPoolExecutor but passes the single client instance into every worker. GraphClient wraps a requests.Session, and requests explicitly documents Session as not thread‑safe. Running several exports concurrently through the same session risks sporadic connection errors, corrupted responses, or mixing cookies/auth headers between requests. Each worker should either create its own GraphClient/Session or the exports should remain single‑threaded.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants