Conversation
Performance Optimizations: - Add automatic retry with exponential backoff for API throttling (429) and server errors (5xx) - Increase pagination size from 50 to 100 messages per request - Implement parallel export processing for multiple chats (up to 3 concurrent) - Add progress tracking for multi-chat exports User Experience Improvements: - Add interactive chat selection menu with chat names, types, and last activity - Implement Jira-friendly markdown format as default with HTML cleanup - Handle multiple search matches with interactive selection instead of errors - Default to "jira" format for easy copy-paste into Jira tickets - Show real-time progress indicators during export New Features: - formatters.py: Jira markdown formatter with HTML-to-text conversion - interactive.py: Interactive chat selection utilities - Support for "jira", "jira-markdown", and "markdown" format aliases Technical Changes: - GraphClient: Add _request_with_retry() for resilient API calls - CLI: Use ThreadPoolExecutor for parallel exports - Exporter: Support new Jira format with chat metadata - README: Updated with comprehensive usage examples and feature list
Performance & UX Improvements: - Add real-time progress indicator during chat loading - Show "Loading chats... N loaded" with live count updates - Display authentication success message - Load ALL chats instead of limiting to 100 for complete chat list GraphClient enhancements: - Add progress_callback parameter to list_chats() and _paginate() - Add max_items parameter for optional limiting - Increase chat pagination from default to 50 per request CLI improvements: - Show "✓ Authenticated successfully" after login - Display live progress: "Loading chats... 1448 loaded" - Show final count: "✓ Loaded 1448 chats" - Remove artificial 100-chat limit that caused missing chats Interactive menu: - Update to reflect full chat loading (no "showing limited" message) - All chats now visible and sortable by lastUpdatedDateTime Fixes issue where active chats weren't visible in the menu because they weren't in the first 100 chats returned by Graph API.
Performance improvements: - Add 5-minute local cache for chat lists (~/.teams-exporter/cache/) - First run loads from API, subsequent runs are instant - Add --refresh-cache flag to force cache refresh UX improvements: - Add interactive search in chat selection menu - Press 's' to search by chat name or participant across ALL chats - Solves issue where recent chats don't appear in top-20 due to Microsoft Teams not updating lastUpdatedDateTime consistently New module: - cache.py: Simple file-based cache with TTL support CLI changes: - Shows "Loaded from cache" message when using cached data - Shows "Loading from Microsoft Graph" when fetching fresh data - Search feature works on all 1000+ chats, not just top-20 Example workflow: 1. Run teams-export (loads 1448 chats in ~30s, caches them) 2. Press 's' for search 3. Type "Games loading" 4. Select from matching results
Critical fix: Use lastMessagePreview for accurate chat sorting - Change from lastUpdatedDateTime to lastMessagePreview.createdDateTime - This matches how Teams desktop client sorts chats - Fixes issue where active chats don't appear in top-20 Technical changes: - GraphClient: Add lastMessagePreview to $expand parameter - interactive.py: Use lastMessagePreview.createdDateTime for sorting - Fallback to lastUpdatedDateTime if preview unavailable Why this matters: Microsoft Teams doesn't always update lastUpdatedDateTime when new messages arrive. The lastMessagePreview field contains the actual timestamp of the last message, which is what the desktop client uses for "most recent" sorting. IMPORTANT: Users should run with --refresh-cache after updating to ensure the new lastMessagePreview field is loaded from API.
- Fix critical bug: Revert $top from 100 to 50 for chat messages endpoint (Graph API returns 400 error with $top=100, max is 50) - Document Graph API sorting limitation in README with official source (explains why all chats must be loaded for correct sorting)
- Increase cache TTL from 5 minutes to 24 hours for better UX - Add 'c' key in interactive menu to refresh cache without restart - Update CLI to handle cache refresh action and reload chat list - Update all documentation to reflect 24h cache and new controls - Show cache TTL in status message with refresh instructions Interactive menu now supports: - Number selection (1-20) - 's' for search - 'c' for cache refresh (new!) - 'q' to quit
Fixes issue where Jira formatting wasn't being applied in tickets.
**Changes:**
- Replace Jira Wiki Markup with standard Markdown syntax:
- `h2.` / `h3.` → `##` / `###`
- `{quote}` → `>` (blockquote)
- `*bold*` → `**bold**`
- `_italic_` → `*italic*`
- `----` → `---`
- Add image support in attachments:
- Images rendered as `` for inline display
- Other files as clickable links `[name](url)`
- Detects images by contentType or file extension
- Change file extension from .txt to .md for better compatibility
- Update documentation to reflect standard Markdown format
The new format works seamlessly in Jira, GitHub, Confluence,
and any other Markdown-compatible platform.
Eliminates code duplication for chat loading with progress indicator.
**Changes:**
- Add `_load_chats_with_progress()` helper function
- Use in both initial load and cache refresh ('c' command)
- Remove duplicated progress callback definitions
- Cleaner, more maintainable code
**Fix:** Progress bar now displays correctly when refreshing cache
with 'c' key in interactive menu.
Emoji characters (like 🚀) occupy 2 terminal columns but Python's len() counts them as 1 character, causing misalignment in formatted tables. **Changes:** - Add wcwidth>=0.2 dependency for proper Unicode width calculation - Add helper functions: - _visual_width(): Calculate true terminal width - _truncate_to_width(): Truncate text respecting visual width - _pad_to_width(): Pad text to target visual width - Replace all len() and simple string slicing with width-aware functions - Apply to both main chat table and search results **Result:** Chat names with emoji now align properly in columns: - Before: "Admin Site Support 🚀" causes date column to shift - After: All columns aligned regardless of emoji presence
Changes: - Sort messages chronologically (oldest to newest) before export - Add interactive date range picker with presets (7/30/90/365 days) - Support custom date range selection interactively - Default to last year for --all and --user/--chat modes when dates not specified User experience improvements: - Clear preset options for common time ranges - Ability to cancel export from date selection - Date range confirmation before export starts
Teams stores inline images as <img> tags in the HTML content field, not in the attachments array. This update: - Adds _extract_images_from_html() to parse inline images from HTML - Modifies _strip_html() to remove <img> tags before text processing - Updates _format_jira_message() to format inline images as Markdown Fixes issue where messages with only inline images showed "[No content]"
Implements local image download to make exports standalone and accessible
without Microsoft authentication. Key changes:
- Add _download_attachment() to fetch images using authenticated Graph client
- Add _extract_image_urls() to find all images in messages (inline + attachments)
- Add _download_attachments() to download images to local subfolder
- Update formatters to use local paths when available (url_mapping)
- Add --download-attachments CLI flag (enabled by default)
- Images saved to {chat_name}_{date}_files/ directory
- Markdown updated to reference local files: 
This resolves authentication issues with image URLs and ensures exports
work offline with all images included locally.
Images were being saved with .bin extension because we weren't checking the actual Content-Type from HTTP response headers. Changes: - Add _get_extension_from_mime() to map MIME types to file extensions - Update _download_attachment() to return Content-Type from response headers - Modify _download_attachments() to use correct extension based on MIME type - Download to temp file first, then rename with correct extension - Supports common image formats: png, jpg, gif, bmp, webp, svg, tiff This ensures images are saved with proper extensions (.png, .jpg, etc.) and can be opened directly in OS and displayed in WYSIWYG Markdown editors.
Perfect for copy-pasting into Jira/Confluence! Images are embedded directly in the HTML as base64 data URLs, so when you copy from the browser, images are included in the clipboard. Features: - Add write_html() formatter with embedded base64 images - Add _image_to_base64() to convert local images to data URLs - Add _format_html_message() for HTML formatting with inline styles - Support html format in exporter (downloads attachments like jira) - Update CLI help to include html format option - Generated HTML can be opened in browser, copied, and pasted to Jira Usage: teams-export --chat "My Chat" --format html Then: 1. Open the .html file in your browser 2. Select all (Ctrl+A) 3. Copy (Ctrl+C) 4. Paste into Jira/Confluence (Ctrl+V) All images will be embedded and visible!
Previously, HTML export showed remote URLs which required authentication. Now images are properly converted to base64 data URLs. Changes: - Pass url_mapping to write_html() instead of attachments_dir - Update _format_html_message() to accept url_mapping parameter - Use url_mapping to resolve remote URLs to local file paths - Convert local image files to base64 data URLs for embedding - Apply same logic to both inline images and file attachments This ensures that when you copy HTML from browser and paste into Jira/Confluence, all images are embedded and visible without authentication.
When using --chat or --user without specifying dates, the default was set to last 365 days, which exported too much data. Changed to last 24 hours as the default, which matches the "Today" option in the interactive date selector and is more reasonable for typical use cases. This fixes the issue where exports contained a year of data instead of just today's messages.
When copying from browser with Ctrl+A+C, base64 images don't transfer properly to Jira/Confluence clipboard. Added: - Floating "Copy to Clipboard" button in top-right corner - JavaScript that converts base64 images to Blob URLs before copying - Event handler for manual Ctrl+C that also converts images - Visual feedback with success/error messages How it works: 1. Open HTML file in browser 2. Click "📋 Copy to Clipboard" button (or use Ctrl+C) 3. JavaScript converts data:image/png;base64,... to blob URLs 4. Paste into Jira/Confluence - images now work correctly! This solves the issue where right-click → copy image worked, but Ctrl+A+C didn't properly transfer images.
Removed complex blob URL conversion and clipboard API attempts.
Using simple execCommand('copy') which should preserve rendered
content including base64 images.
- Added python-docx dependency to pyproject.toml - Implemented write_docx() formatter with embedded images - Updated exporter to support 'docx' and 'word' format options - Updated CLI help text to recommend docx for Jira/Confluence - Word format embeds images as binary data, ensuring proper copy-paste support in Jira/Confluence (unlike HTML base64) Usage: teams-export --chat "Chat Name" --format docx Then open the .docx file, select all, copy, and paste into Jira.
The messages are pre-processed before being passed to formatters,
so we need to use:
- message.get("sender") not message.get("from")
- message.get("timestamp") not message.get("createdDateTime")
- message.get("content") not message.get("body", {}).get("content")
This matches the field names used in the Jira markdown formatter.
Images were only being downloaded for jira and html formats. Added docx to the list so images are downloaded and can be embedded in Word documents.
Changes: - Extended _get_extension_from_mime() with 40+ MIME types (documents, archives, videos, audio, code files, etc.) - Renamed _extract_image_urls() to _extract_attachment_urls() Returns tuples (url, is_image) to distinguish file types - Updated _download_attachments() to handle all file types Shows counts: "Downloading X image(s) and Y file(s)..." Uses appropriate filename prefixes (image_XXX vs file_XXX) - Added attachment support to docx formatter Images are embedded in the Word document Non-image files are shown as styled links with 📎 icon - Jira and HTML formatters already supported non-image attachments Now users can export all attachments from Teams chats, including documents, PDFs, archives, etc.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| # Use parallel processing for multiple chats | ||
| if len(selected_chats) > 1: | ||
| typer.echo(f"\nExporting {len(selected_chats)} chats in parallel...") | ||
|
|
||
| def export_single_chat(chat): | ||
| title = _chat_title(chat) | ||
| try: | ||
| output_path, count = export_chat( | ||
| client, | ||
| chat, | ||
| start_dt, | ||
| end_dt, | ||
| output_dir=output_dir, | ||
| output_format=output_format, | ||
| download_attachments=download_attachments, | ||
| ) | ||
| return (title, output_path, count, None) | ||
| except Exception as exc: | ||
| return (title, None, 0, str(exc)) | ||
|
|
||
| # Use ThreadPoolExecutor for parallel downloads (limited to 3 concurrent) | ||
| with ThreadPoolExecutor(max_workers=3) as executor: | ||
| futures = {executor.submit(export_single_chat, chat): chat for chat in selected_chats} |
There was a problem hiding this comment.
Avoid sharing GraphClient across worker threads
The new parallel export path submits multiple export_chat calls to a ThreadPoolExecutor but passes the single client instance into every worker. GraphClient wraps a requests.Session, and requests explicitly documents Session as not thread‑safe. Running several exports concurrently through the same session risks sporadic connection errors, corrupted responses, or mixing cookies/auth headers between requests. Each worker should either create its own GraphClient/Session or the exports should remain single‑threaded.
Useful? React with 👍 / 👎.
@codex please check