Chat Archive

Export AI chat conversations to durable JSON and Markdown formats.

A Chrome browser extension that extracts conversations from AI chat platforms (Claude.ai, ChatGPT, Gemini, Grok) into portable, machine-readable files. Zero network requests. Zero telemetry. Your conversations never leave your browser.

Acknowledgement and thanks to Jerren@trifall.com for hinting at use of gpt icon actions to capture user and assistant content in his security-first chat-export repo https://github.com/Trifall/chat-export

✨ Features

5 Platform Support: Claude, ChatGPT, Gemini, Grok (grok.com), Grok-X (x.com/i/grok)
Two Export Formats: JSON (canonical, schema v1.0) and Markdown with metadata
Smart Extraction: Three-pass strategy (clipboard → heuristics → ML fallback)
Privacy First: All processing happens in-browser. No data sent to external servers.
Safety Guarantees: Hard limits (500 turns, 60s timeout), integrity checks, circuit breakers
Cross-Platform: Works on Windows, macOS, Linux (any Chromium browser)

🚀 Quick Start

Installation

Download or clone this repository

git clone https://github.com/fxops-ai/chat-archive.git
cd chat-archive
git checkout chat-archive

Load in Chrome (or Edge, Brave, Vivaldi, Arc, Opera)
- Open chrome://extensions/
- Enable "Developer mode" (toggle in top-right)
- Click "Load unpacked"
- Select the chat-archive folder
- Extension icon appears in toolbar

Usage

Navigate to a supported AI chat platform:
- Claude.ai
- ChatGPT or chat.openai.com
- Gemini
- Grok or x.com/i/grok
Open an existing conversation (must have at least one exchange)
Click the Chat Archive extension icon in your toolbar
Click "Export JSON" (or "Export Markdown")
Choose where to save the file

Your conversation is now a durable, addressable artifact. Use it as:

Input to scripts or automation
Training data or evaluation sets
Documentation or knowledge base content
Backup before deleting conversations
Migration between platforms

🏗️ Architecture

Three-Pass Extraction Strategy

Chat Archive uses a resilient, multi-layer approach to handle platform DOM differences:

Pass 0: Clipboard Extraction (Primary - 95%+ success rate)

Uses each platform's native copy buttons to extract content. This is the highest-fidelity method because it retrieves the same formatted content the platform gives to users.

Claude: button[data-testid="action-bar-copy"] on both user and assistant turns
ChatGPT: button[data-testid="copy-turn-action-button"] with role from data-turn attribute
Gemini: aria-label="Copy prompt" (user) / data-test-id="copy-button" (assistant)
Grok: aria-label="Copy" with 6-signal role detection (alignment, button count, styling)
Grok-X: aria-label="Copy text" with React Native Web class fallbacks

Pass 1: Structural Heuristics (Fallback)

When clipboard extraction fails, uses 7 rule-based classifiers:

H1: Explicit role attributes (0.95 confidence)
H2: Alternating pattern detection (0.85)
H3: Text length asymmetry (0.70)
H5: Content signals (code blocks, questions) (0.65)
H7: Code block density (0.60)

Ensemble voting with 0.75 acceptance threshold. Uncertain turns flagged for manual resolution.

Pass 2: ML Classification (Deferred to v1.1+)

Optional micro-classifier (<500KB) for genuinely ambiguous cases. Not yet implemented because Pass 0+1 achieve 95%+ confidence on all verified platforms.

Export Formats

JSON (Canonical Format)

{
  "schema_version": "1.0",
  "export_metadata": {
    "source_platform": "claude.ai",
    "source_url": "https://claude.ai/chat/abc-123",
    "export_timestamp": "2026-02-15T14:30:00Z",
    "total_turns": 24,
    "flagged_turns": 0
  },
  "conversation": [
    {
      "turn": 1,
      "role": "user",
      "content": "Hello, can you help me with...",
      "classification_confidence": 0.95,
      "classification_source": "clipboard"
    }
  ]
}

Markdown

# Chat Export — Claude.ai
**Exported:** 2026-02-15 14:30 UTC
**Source:** https://claude.ai/chat/abc-123
**Turns:** 24

---

## User
Hello, can you help me with...

---

## Assistant
Of course! Here's how...

🔒 Security & Privacy

Core Guarantee

Chat Archive makes zero network requests. Conversations never leave your browser.

✅ No telemetry, analytics, or crash reporting
✅ No external API calls
✅ No background service that phones home
✅ All processing happens locally in the browser tab

Permissions Explained

Permission	Why We Need It	Risk Level
`activeTab`	Access current tab DOM for extraction	Minimal - only active tab, only on user action
`downloads`	Write export files to your device	Low - write-only, user chooses save location
`storage`	Cache user classification corrections	Low - never stores conversation content
`clipboardRead`	Read clipboard after programmatic copy-button click	Low - read-only, scoped to extraction flow
Host permissions (4 domains)	Inject content script on chat platforms	Scoped - only the 4 target domains

Safety Limits (Circuit Breakers)

MAX_TURNS: 500              // Hard cap on turns per export
MAX_EXTRACTION_TIME_MS: 60_000  // Kill switch: abort after 60 seconds
MAX_SINGLE_TURN_SIZE: 100_000   // Flag turns exceeding 100KB
SCROLL_STABILITY_THRESHOLD: 3   // Stop scrolling after 3 unchanged counts

All extraction operations are bounded by hard limits that cannot be overridden. If limits are hit, you get a partial export with clear error messages—never silently truncated data.

✅ Validation

Test Results (v0.1.0 - Feb 2026)

Platform	Test Type	Turns	Result	Confidence
Claude.ai	ARCHIVETEST	8/8	✅ Pass	0.95
ChatGPT	ARCHIVETEST	8/8	✅ Pass	0.99
Gemini	ARCHIVETEST	8/8	✅ Pass	0.99
Grok	ARCHIVETEST	8/8	✅ Pass	0.95
Grok	Real conversation	32/32	✅ Pass	0.95
Grok-X	ARCHIVETEST	8/8	✅ Pass	0.95

Tested on: Windows 11 (Chrome 132), macOS (Safari for DOM analysis, Chrome for validation)

ARCHIVETEST Conversation (standardized validation):

USER:    "ARCHIVETEST: What are the three primary colors?"
ASST:    [response about red, blue, yellow / red, green, blue]
USER:    "ARCHIVETEST: Can you write a short 2-line poem about rain?"
ASST:    [short poem response]
USER:    "ARCHIVETEST: What is 42 * 17?"
ASST:    [714, possibly with explanation]
USER:    "ARCHIVETEST: Summarize the previous conversation in one sentence."
ASST:    [summary response]

All platforms: Zero extraction errors, zero integrity warnings, all roles correctly classified.

📚 Documentation

Detailed technical documentation is available in the repository:

chat-archive-architecture.md - Complete system design, three-pass strategy, export formats, security architecture
DOM_STRATEGY_ANALYSIS.md - How the extension handles platform DOM differences, reverse-engineering methodology
Appendices in architecture.md:
- Appendix A: Grok DOM Analysis (grok.com)
- Appendix B: Grok-X DOM Analysis (x.com/i/grok)
- Appendix C: Claude DOM Analysis
- Appendix D: Gemini DOM Analysis
- Appendix E: ChatGPT DOM Analysis

Each appendix includes:

Turn container structure
Selector stability assessment (HIGH/MEDIUM/LOW ratings)
Role detection signals with confidence scores
Complete extraction implementation
Elements to exclude
Testing checklists

🛠️ Development

Building from Source

The extension uses a simple shell script to concatenate source files:

# Make the build script executable (first time only)
chmod +x build.sh

# Build content.js
./build.sh

On Windows: Use Git Bash (comes with Git for Windows) to run ./build.sh

Project Structure

chat-archive/
├── manifest.json           # Chrome extension manifest (version source of truth)
├── background.js           # Service worker (handles downloads)
├── popup.html              # Extension popup UI
├── popup.js                # Popup logic — platform detection, export trigger
├── content.js              # BUILT FILE — generated by build.sh, do not edit directly
├── build.sh                # Build script — concatenates src/ into content.js
├── test_v021_patch.js      # Unit tests for v0.2.1 HTML escaping patch
├── src/
│   ├── content.js          # Main orchestrator — routes extraction by platform
│   ├── utils/
│   │   ├── constants.js    # ⚠️ Shared utilities module (not just constants!)
│   │   │                   #    Contains: SAFETY_LIMITS, EXTENSION_VERSION,
│   │   │                   #    detectPlatform(), wait(), testClipboardAccess(),
│   │   │                   #    findActionButton(), clickCopyAndRead(),
│   │   │                   #    scrollToLoadAll(), findScrollableAncestor(),
│   │   │                   #    flagIfOversized(), platformToDisplayName()
│   │   │                   #    Used by ALL five extractors — edit with care
│   │   ├── serializer.js   # JSON + Markdown export formatting
│   │   └── filewriter.js   # Download via Chrome API + blob fallback
│   ├── extractors/
│   │   ├── claude.js       # Claude.ai extraction (Pass 0 + direct fallback)
│   │   ├── chatgpt.js      # ChatGPT extraction (Pass 0 + direct fallback)
│   │   ├── gemini.js       # Gemini extraction (Pass 0 + direct fallback)
│   │   ├── grok.js         # Grok (grok.com) extraction
│   │   └── grok-x.js       # Grok (x.com/i/grok) — different DOM, separate extractor
│   └── heuristics/
│       └── pass1.js        # Pass 1 structural classification (7 heuristic rules)
└── docs/
    ├── chat-archive-architecture.md  # Full system design and DOM analysis
    └── DOM_STRATEGY_ANALYSIS.md      # Platform selector strategy and fragility notes

⚠️ Important: src/utils/constants.js is the shared utilities module for the entire codebase. Despite its name, it contains DOM helpers, scroll handling, clipboard utilities, and button-finding logic used by all five platform extractors. When patching or replacing this file, always diff against the existing version first — stripping it to just constants will break extraction on all platforms. See: v0.2.1 post-mortem for context.

Building from Source

# Make the build script executable (first time only — Mac/Linux)
chmod +x build.sh

# Build content.js from src/ files
./build.sh

# On Windows — Git Bash is required (comes with Git for Windows)
& "C:\Program Files\Git\bin\bash.exe" build.sh

Build discipline: Always run build.sh before committing. The root content.js is what Chrome loads — committing src/ changes without rebuilding leaves the repo in an inconsistent state where source and built artifact are out of sync.

After rebuilding, reload the extension by fully removing and re-adding it in chrome://extensions/ or edge://extensions/. The reload button can cache stale builds.

Testing

Make changes to files in src/
Run build.sh to rebuild root content.js
Remove Chat Archive from chrome://extensions/ entirely
Load unpacked → select the repo root folder
Confirm the version number on the extension card matches manifest.json
Run the ARCHIVETEST conversation on target platform (see architecture doc)
Export both JSON and Markdown, validate both files
Check browser console for [Chat Archive] log messages

Changelog

Version	Date	Change
v0.2.1	Feb 2026	Fix: escape HTML entities in Markdown serializer. Patch: restore shared utilities stripped from constants.js during initial patch.
v0.2.0	Feb 2026	Phase 2: 5-platform clipboard extraction, Markdown export, heuristic classification
v0.1.0	Feb 2026	Phase 1: Claude.ai only, direct text extraction, JSON export

🗺️ Roadmap

v1.0 (Current - Phase 1 Complete)

✅ Pass 0 (clipboard) + Pass 1 (heuristics) extraction
✅ 5 platform support (Claude, ChatGPT, Gemini, Grok, Grok-X)
✅ JSON and Markdown export
✅ Safety limits and integrity checks
✅ Cross-platform validation (Windows/Mac)

v1.1 (Next)

User resolution UI for uncertain classifications
Batch export (multiple conversations from platform history pages)
Firefox port (minimal namespace polyfill for Manifest V3)
Optional ML micro-classifier (<500KB, explicit user consent)
Enhanced metadata (timestamps, model versions, regeneration tracking)

v1.2 (Future)

Artifact extraction (Claude's code/document artifacts)
Image description preservation (multimodal conversations)
Custom export templates (user-defined JSON schemas)
Conversation diffing (compare versions, track edits)

v2.0 (Vision)

Safari Web Extension (macOS/iOS if API support improves)
Encrypted export (optional password protection)
Cloud sync integration (Google Drive, Dropbox, GitHub Gists)
Conversation search and indexing

🤝 Contributing

Contributions are welcome! This project is in active development.

Areas Where Help Is Needed

Platform Testing
- Test on different browsers (Edge, Brave, Opera, Vivaldi)
- Validate on different OS versions
- Report DOM changes when platforms update their UI
New Platform Support
- Perplexity.ai
- Pi (Inflection)
- Character.AI
- Poe (multiple models)
Feature Development
- User resolution UI (flagged turns)
- Batch export from conversation history pages
- Firefox compatibility layer
Documentation
- Video tutorials
- Troubleshooting guide
- Translation to other languages

How to Contribute

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Make your changes
Run tests (create ARCHIVETEST conversation, export, validate)
Commit (git commit -m 'Add amazing feature')
Push to your fork (git push origin feature/amazing-feature)
Open a Pull Request

Reporting Issues

When platforms update their UI and extraction breaks:

Open an issue with:
- Platform name and URL
- Browser and OS version
- Error message from browser console (look for [Chat Archive] logs)
- Screenshots of the page structure (right-click → Inspect Element)
Include a minimal reproduction:
- Create a simple 2-turn conversation
- Attempt export
- Share what happened vs. what you expected

📄 License

[Choose your license - MIT, Apache 2.0, GPL, etc.]

This project is licensed under the License - see the LICENSE file for details.

🙏 Acknowledgments

Inspired by the need for durable conversation archives in the age of AI assistants
DOM analysis methodology informed by Chat Export for Claude and similar projects
Built with love for the AI power-user community

📞 Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Email: [john@fxops.ai]

Remember: This extension operates in archive state, not conversation state. The conversations you export become addressable artifacts with destinations beyond the chat interface. Use them wisely.

Chat Archive v0.1.0 - Extracting ephemeral conversations into durable knowledge.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
icons		icons
src		src
DOM_STRATEGY_ANALYSIS.md		DOM_STRATEGY_ANALYSIS.md
LICENSE		LICENSE
README.md		README.md
background.js		background.js
build.sh		build.sh
chat-archive-architecture.md		chat-archive-architecture.md
content.js		content.js
gitattributes		gitattributes
manifest.json		manifest.json
popup.html		popup.html
popup.js		popup.js
test_v021_patch.js		test_v021_patch.js

Folders and files

Latest commit

History

Repository files navigation

Chat Archive

✨ Features

🚀 Quick Start

Installation

Usage

🏗️ Architecture

Three-Pass Extraction Strategy

Pass 0: Clipboard Extraction (Primary - 95%+ success rate)

Pass 1: Structural Heuristics (Fallback)

Pass 2: ML Classification (Deferred to v1.1+)

Export Formats

🔒 Security & Privacy

Core Guarantee

Permissions Explained

Safety Limits (Circuit Breakers)

✅ Validation

Test Results (v0.1.0 - Feb 2026)

📚 Documentation

🛠️ Development

Building from Source

Project Structure

Building from Source

Testing

Changelog

🗺️ Roadmap

v1.0 (Current - Phase 1 Complete)

v1.1 (Next)

v1.2 (Future)

v2.0 (Vision)

🤝 Contributing

Areas Where Help Is Needed

How to Contribute

Reporting Issues

📄 License

🙏 Acknowledgments

📞 Support

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages