CARTA is a POC PostgreSQL data pipeline for extracting geometric and topological quantities from AI conversation data. It was built as an early exploration into whether embedding geometry and conversation tree structure carry meaningful signal beyond similarity search. The result was yes, and that exploratory work informed later, more formal detection work in RICCI.
- Processes ChatGPT conversation export JSON, including branching conversation trees
- Parses conversations into relational node and prompt-response pair structures
- Generates embeddings via the OpenAI API
- Stores conversations, nodes, and pairs in PostgreSQL with
pgvector - Supports cross-conversation semantic search and branch/query analysis through SQL functions
This repository is preserved as an honest proof-of-concept. It is runnable and inspectable, but is not presented as a polished production system.
CARTA currently persists 24 live-populated structural and analytic quantities across nodes and pairs, covering lineage, branching, semantic distance, coherence, and cross-conversation querying.
src/carta/parser/: reconstructs ChatGPT export trees into normalized node and pair recordssrc/carta/embedder/: requests embeddings and computes local semantic distance featuressrc/carta/pipeline.py: Python pipeline orchestrating parse -> embed -> store flow
src/carta/db/migrations/001_initial_setup.sql: schema, tables, and base analytical viewssrc/carta/db/migrations/002_indexes.sql: relational, JSONB, and HNSW vector indexessrc/carta/db/migrations/003-007: vector and ancestry-oriented SQL functionssrc/carta/db/migrations/008-010: branch/query analysis functionssrc/carta/db/migrations/011_access_policies.sql: access policies and analytical views
The schema currently ships as 11 SQL migration modules and 20 SQL functions. The paths table is present in the schema but intentionally left unpopulated in the current pipeline.
- Python 3.8+
- Docker Desktop or another local Docker runtime
- OpenAI API key
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
cp env.example .envSet OPENAI_API_KEY in .env.
./scripts/setup_db.shThis starts a local PostgreSQL container, creates the carta database, and applies all 11 migrations.
./scripts/run_sample.shThis loads test_data.json, generates embeddings, and stores the resulting conversation graph in PostgreSQL.
test_data.jsonis a synthetic ChatGPT-style export included for reproducible local testing- Expected sample result: 1 conversation, 11 nodes, 5 pairs
File-only processing:
from src.carta.pipeline import Pipeline
pipeline = Pipeline(store_to_database=False)
conversation_ids = pipeline.process_file("test_data.json", save_intermediates=False)
print(conversation_ids)Database-backed processing:
from src.carta.pipeline import Pipeline
pipeline = Pipeline(store_to_database=True)
conversation_ids = pipeline.process_file("test_data.json", save_intermediates=False)
print(conversation_ids)Proof-of-concept. No longer actively developed. Preserved as an exploratory predecessor to later work.
CARTA established that conversation geometry appears to carry meaningful behavioral signal. RICCI takes that insight further into a more comprehensive detection architecture with a more mature formal frame.
MIT License
Diesel Black, Inside The Black Box LLC