Abstracts Explorer

A package to download conference data and search it with LLM-based semantic search including document retrieval and question answering.

Features

📥 Download conference data from various sources (NeurIPS, ICLR, ICML, ML4PS)
💾 Store data in SQL database (SQLite or PostgreSQL) with efficient indexing
🔍 Search papers by keywords, track, and other attributes
🤖 Generate text embeddings for semantic search
🔎 Find similar papers using AI-powered semantic similarity
💬 Interactive RAG chat to ask questions about papers
🎨 NEW: Cluster and visualize paper embeddings with interactive plots
🌐 Web interface for browsing and searching papers
🔌 NEW: MCP server for LLM-based cluster analysis
🗄️ NEW: Multi-database backend support (SQLite and PostgreSQL)
⚙️ Environment-based configuration with .env file support

Installation

Quick Start with Docker/Podman 🐳

The easiest way to get started with a complete stack (PostgreSQL + ChromaDB):

First create a .env file with your blablador token:

LLM_BACKEND_AUTH_TOKEN=your_blablador_token_here

Then download docker-compose.yml:

curl -o docker-compose.yml https://raw.githubusercontent.com/thawn/abstracts-explorer/refs/heads/main/docker-compose.yml

Then start the services with:

# Using Podman (recommended)
podman-compose up -d

# Or using Docker
docker-compose up -d

# Access at http://localhost:5000

The Docker Compose setup includes:

Web UI on port 5000 (exposed)
PostgreSQL for paper metadata (internal only)
ChromaDB for semantic search (internal only)

📖 Complete Docker/Podman Guide

Note: The container images use pre-built static vendor files. Node.js is only needed for local development if you want to rebuild CSS/JS libraries.

Traditional Installation

Requirements: Python 3.11+, uv package manager, Node.js 14+ (for web UI development)

# Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh

# Clone the repository
git clone https://github.com/thawn/neurips-abstracts.git
cd abstracts-explorer

# Install dependencies
uv sync --all-extras

# Install Node.js dependencies for web UI
npm install
npm run install:vendor

📖 Full Installation Guide

Configuration

Create a .env file to customize settings:

cp .env.example .env
# Edit .env with your preferred settings

📖 Configuration Guide - Complete list of settings and options

Database Backend

Abstracts Explorer supports both SQLite and PostgreSQL backends:

# Option 1: SQLite (default, no additional setup required)
PAPER_DB=data/abstracts.db

# Option 2: PostgreSQL (requires PostgreSQL server)
PAPER_DB=postgresql://user:password@localhost/abstracts

PostgreSQL Setup:

# Install PostgreSQL support
uv sync --extra postgres

# Create database
createdb abstracts

# Configure in .env
PAPER_DB=postgresql://user:password@localhost/abstracts

📖 See Configuration Guide for more database options

Quick Start

Download Conference Data

# Download NeurIPS 2025 papers
abstracts-explorer download --year 2025

Generate Embeddings for Semantic Search

# Requires LM Studio running with embedding model loaded
abstracts-explorer create-embeddings

Cluster and Visualize Embeddings

# Cluster embeddings using K-Means (PCA reduction)
abstracts-explorer cluster-embeddings --n-clusters 8 --output clusters.json

# Cluster using t-SNE and DBSCAN
abstracts-explorer cluster-embeddings \
  --reduction-method tsne \
  --clustering-method dbscan \
  --eps 0.5 \
  --min-samples 5 \
  --output clusters.json

# Cluster using Agglomerative with distance threshold
abstracts-explorer cluster-embeddings \
  --clustering-method agglomerative \
  --distance-threshold 5.0 \
  --output clusters.json

# Cluster using Spectral clustering
abstracts-explorer cluster-embeddings \
  --clustering-method spectral \
  --n-clusters 10 \
  --output clusters.json

# The web UI includes an interactive cluster visualization tab!

Start MCP Server for Cluster Analysis

# Start MCP server for LLM-based cluster analysis
abstracts-explorer mcp-server

# The MCP server provides tools to analyze clustered papers:
# - Get most frequently mentioned topics
# - Analyze topic evolution over years
# - Find recent developments in topics
# - Generate cluster visualizations

NEW: MCP clustering tools are now automatically integrated into the RAG chat! The LLM will automatically use clustering tools when appropriate to answer questions about topics, trends, and developments. No need to run a separate MCP server for RAG chat usage.

Start Web Interface

abstracts-explorer web-ui
# Open http://127.0.0.1:5000 in your browser

📖 Usage Guide - Detailed examples and workflows
📖 CLI Reference - Complete command-line documentation
📖 API Reference - Python API documentation

Web Interface

The web UI provides an intuitive interface for browsing and searching papers:

🔍 Search: Keyword and AI-powered semantic search
💬 Chat: Interactive RAG chat with query rewriting
⭐ Ratings: Save and organize interesting papers
📊 Filters: Filter by track, decision, event type, and more
🎨 Clusters: Interactive visualization of paper embeddings (NEW!)

abstracts-explorer web-ui
# Open http://127.0.0.1:5000

The web interface provides an intuitive way to search and explore conference papers

Python API Examples

Download and Search Papers

from abstracts_explorer.plugins import get_plugin
from abstracts_explorer import DatabaseManager

# Download papers
neurips_plugin = get_plugin('neurips')
papers_data = neurips_plugin.download(year=2025)

# Load into database and search
with DatabaseManager() as db:
    db.create_tables()
    db.add_papers(papers_data)
    
    # Search papers
    papers = db.search_papers(keyword="deep learning", limit=5)
    for paper in papers:
        print(f"{paper['title']} by {paper['authors']}")

Semantic Search with Embeddings

from abstracts_explorer import EmbeddingsManager

with EmbeddingsManager() as em:
    em.create_collection()
    em.embed_from_database()
    
    # Find similar papers
    results = em.search_similar(
        "transformers for natural language processing",
        n_results=5
    )

Cluster and Visualize Embeddings

from abstracts_explorer.clustering import perform_clustering

# Perform complete clustering pipeline
results = perform_clustering(
    reduction_method="tsne",      # or "pca", "umap"
    n_components=2,
    clustering_method="kmeans",    # or "dbscan", "agglomerative", "spectral", "fuzzy_cmeans"
    n_clusters=8,
    output_path="clusters.json"
)

# Access clustering results
print(f"Found {results['statistics']['n_clusters']} clusters")
for point in results['points']:
    print(f"Paper: {point['title']} -> Cluster {point['cluster']}")

📖 Complete Usage Guide - More examples and workflows

Documentation

📚 Full Documentation - Complete documentation built with Sphinx

Quick Links

Installation Guide - Detailed installation instructions
Docker/Podman Guide - Container deployment with Docker and Podman
Usage Guide - Examples and workflows
Configuration Guide - Environment variables and settings
CLI Reference - Command-line interface documentation
Plugins Guide - Plugin system and conference downloaders
API Reference - Python API documentation
Contributing Guide - Development setup and guidelines

Development

# Install with development dependencies
uv sync --all-extras

# Run tests
uv run pytest

# Run linters
ruff check src/ tests/
mypy src/ --ignore-missing-imports

📖 Contributing Guide - Complete development documentation

Contributing

Contributions are welcome! Please read our Contributing Guide for details on:

Development setup
Running tests and linters
Code style and conventions
Submitting pull requests

License

Apache License 2.0 - see LICENSE file for details.

Support

For issues, questions, or contributions:

🐛 Report issues
💬 Discussions
📧 Contact the maintainers

Name		Name	Last commit message	Last commit date
Latest commit History 628 Commits
.githooks		.githooks
.github		.github
docs		docs
src/abstracts_explorer		src/abstracts_explorer
tests		tests
.dockerignore		.dockerignore
.env.docker		.env.docker
.env.example		.env.example
.env.tests		.env.tests
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
REFACTORING_SUMMARY.md		REFACTORING_SUMMARY.md
docker-compose.yml		docker-compose.yml
package-lock.json		package-lock.json
package.json		package.json
pyproject.toml		pyproject.toml
tailwind.config.js		tailwind.config.js
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Abstracts Explorer

Features

Installation

Quick Start with Docker/Podman 🐳

Traditional Installation

Configuration

Database Backend

Quick Start

Download Conference Data

Generate Embeddings for Semantic Search

Cluster and Visualize Embeddings

Start MCP Server for Cluster Analysis

Start Web Interface

Web Interface

Python API Examples

Download and Search Papers

Semantic Search with Embeddings

Cluster and Visualize Embeddings

Documentation

Quick Links

Development

Contributing

License

Support

About

Uh oh!

Releases 4

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

thawn/abstracts-explorer

Folders and files

Latest commit

History

Repository files navigation

Abstracts Explorer

Features

Installation

Quick Start with Docker/Podman 🐳

Traditional Installation

Configuration

Database Backend

Quick Start

Download Conference Data

Generate Embeddings for Semantic Search

Cluster and Visualize Embeddings

Start MCP Server for Cluster Analysis

Start Web Interface

Web Interface

Python API Examples

Download and Search Papers

Semantic Search with Embeddings

Cluster and Visualize Embeddings

Documentation

Quick Links

Development

Contributing

License

Support

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages