Skip to content

Diesel-Black/carta

Repository files navigation

CARTA - Conversation Archive for Recursive Thought Analysis

CARTA is a POC PostgreSQL data pipeline for extracting geometric and topological quantities from AI conversation data. It was built as an early exploration into whether embedding geometry and conversation tree structure carry meaningful signal beyond similarity search. The result was yes, and that exploratory work informed later, more formal detection work in RICCI.

Functionality

  • Processes ChatGPT conversation export JSON, including branching conversation trees
  • Parses conversations into relational node and prompt-response pair structures
  • Generates embeddings via the OpenAI API
  • Stores conversations, nodes, and pairs in PostgreSQL with pgvector
  • Supports cross-conversation semantic search and branch/query analysis through SQL functions

This repository is preserved as an honest proof-of-concept. It is runnable and inspectable, but is not presented as a polished production system.

Analytical Scope

CARTA currently persists 24 live-populated structural and analytic quantities across nodes and pairs, covering lineage, branching, semantic distance, coherence, and cross-conversation querying.

Architecture

Ingestion

  • src/carta/parser/: reconstructs ChatGPT export trees into normalized node and pair records
  • src/carta/embedder/: requests embeddings and computes local semantic distance features
  • src/carta/pipeline.py: Python pipeline orchestrating parse -> embed -> store flow

Database

  • src/carta/db/migrations/001_initial_setup.sql: schema, tables, and base analytical views
  • src/carta/db/migrations/002_indexes.sql: relational, JSONB, and HNSW vector indexes
  • src/carta/db/migrations/003-007: vector and ancestry-oriented SQL functions
  • src/carta/db/migrations/008-010: branch/query analysis functions
  • src/carta/db/migrations/011_access_policies.sql: access policies and analytical views

The schema currently ships as 11 SQL migration modules and 20 SQL functions. The paths table is present in the schema but intentionally left unpopulated in the current pipeline.

Quick Start

Prerequisites

  • Python 3.8+
  • Docker Desktop or another local Docker runtime
  • OpenAI API key

Setup Python

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
cp env.example .env

Set OPENAI_API_KEY in .env.

Setup Local Postgres + pgvector

./scripts/setup_db.sh

This starts a local PostgreSQL container, creates the carta database, and applies all 11 migrations.

Run The Included Sample

./scripts/run_sample.sh

This loads test_data.json, generates embeddings, and stores the resulting conversation graph in PostgreSQL.

Sample Data

  • test_data.json is a synthetic ChatGPT-style export included for reproducible local testing
  • Expected sample result: 1 conversation, 11 nodes, 5 pairs

Example Python Usage

File-only processing:

from src.carta.pipeline import Pipeline

pipeline = Pipeline(store_to_database=False)
conversation_ids = pipeline.process_file("test_data.json", save_intermediates=False)
print(conversation_ids)

Database-backed processing:

from src.carta.pipeline import Pipeline

pipeline = Pipeline(store_to_database=True)
conversation_ids = pipeline.process_file("test_data.json", save_intermediates=False)
print(conversation_ids)

Status

Proof-of-concept. No longer actively developed. Preserved as an exploratory predecessor to later work.

Relationship To RICCI

CARTA established that conversation geometry appears to carry meaningful behavioral signal. RICCI takes that insight further into a more comprehensive detection architecture with a more mature formal frame.

License

MIT License

Author

Diesel Black, Inside The Black Box LLC

About

PostgreSQL pipeline for extracting geometric and topological quantities from AI conversation trees. The exploratory predecessor to RICCI; confirmed that embedding geometry carries meaningful behavioral signal and informed the detection architecture that followed.

Topics

Resources

License

Stars

Watchers

Forks

Contributors