A package to download conference data and search it with LLM-based semantic search including document retrieval and question answering.
- π₯ Download conference data from various sources (NeurIPS, ICLR, ICML, ML4PS)
- πΎ Store data in SQL database (SQLite or PostgreSQL) with efficient indexing
- π Search papers by keywords, track, and other attributes
- π€ Generate text embeddings for semantic search
- π Find similar papers using AI-powered semantic similarity
- π¬ Interactive RAG chat to ask questions about papers
- π¨ NEW: Cluster and visualize paper embeddings with interactive plots
- π Web interface for browsing and searching papers
- π NEW: MCP server for LLM-based cluster analysis
- ποΈ NEW: Multi-database backend support (SQLite and PostgreSQL)
- βοΈ Environment-based configuration with
.envfile support
The easiest way to get started with a complete stack (PostgreSQL + ChromaDB):
First create a .env file with your blablador token:
LLM_BACKEND_AUTH_TOKEN=your_blablador_token_hereThen download docker-compose.yml:
curl -o docker-compose.yml https://raw.githubusercontent.com/thawn/abstracts-explorer/refs/heads/main/docker-compose.ymlThen start the services with:
# Using Podman (recommended)
podman-compose up -d
# Or using Docker
docker-compose up -d
# Access at http://localhost:5000The Docker Compose setup includes:
- Web UI on port 5000 (exposed)
- PostgreSQL for paper metadata (internal only)
- ChromaDB for semantic search (internal only)
π Complete Docker/Podman Guide
Note: The container images use pre-built static vendor files. Node.js is only needed for local development if you want to rebuild CSS/JS libraries.
Requirements: Python 3.11+, uv package manager, Node.js 14+ (for web UI development)
# Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Clone the repository
git clone https://github.com/thawn/neurips-abstracts.git
cd abstracts-explorer
# Install dependencies
uv sync --all-extras
# Install Node.js dependencies for web UI
npm install
npm run install:vendorCreate a .env file to customize settings:
cp .env.example .env
# Edit .env with your preferred settingsπ Configuration Guide - Complete list of settings and options
Abstracts Explorer supports both SQLite and PostgreSQL backends:
# Option 1: SQLite (default, no additional setup required)
PAPER_DB=data/abstracts.db
# Option 2: PostgreSQL (requires PostgreSQL server)
PAPER_DB=postgresql://user:password@localhost/abstractsPostgreSQL Setup:
# Install PostgreSQL support
uv sync --extra postgres
# Create database
createdb abstracts
# Configure in .env
PAPER_DB=postgresql://user:password@localhost/abstractsπ See Configuration Guide for more database options
# Download NeurIPS 2025 papers
abstracts-explorer download --year 2025# Requires LM Studio running with embedding model loaded
abstracts-explorer create-embeddings# Cluster embeddings using K-Means (PCA reduction)
abstracts-explorer cluster-embeddings --n-clusters 8 --output clusters.json
# Cluster using t-SNE and DBSCAN
abstracts-explorer cluster-embeddings \
--reduction-method tsne \
--clustering-method dbscan \
--eps 0.5 \
--min-samples 5 \
--output clusters.json
# Cluster using Agglomerative with distance threshold
abstracts-explorer cluster-embeddings \
--clustering-method agglomerative \
--distance-threshold 5.0 \
--output clusters.json
# Cluster using Spectral clustering
abstracts-explorer cluster-embeddings \
--clustering-method spectral \
--n-clusters 10 \
--output clusters.json
# The web UI includes an interactive cluster visualization tab!# Start MCP server for LLM-based cluster analysis
abstracts-explorer mcp-server
# The MCP server provides tools to analyze clustered papers:
# - Get most frequently mentioned topics
# - Analyze topic evolution over years
# - Find recent developments in topics
# - Generate cluster visualizationsNEW: MCP clustering tools are now automatically integrated into the RAG chat! The LLM will automatically use clustering tools when appropriate to answer questions about topics, trends, and developments. No need to run a separate MCP server for RAG chat usage.
abstracts-explorer web-ui
# Open http://127.0.0.1:5000 in your browserπ Usage Guide - Detailed examples and workflows
π CLI Reference - Complete command-line documentation
π API Reference - Python API documentation
The web UI provides an intuitive interface for browsing and searching papers:
- π Search: Keyword and AI-powered semantic search
- π¬ Chat: Interactive RAG chat with query rewriting
- β Ratings: Save and organize interesting papers
- π Filters: Filter by track, decision, event type, and more
- π¨ Clusters: Interactive visualization of paper embeddings (NEW!)
abstracts-explorer web-ui
# Open http://127.0.0.1:5000
The web interface provides an intuitive way to search and explore conference papers
from abstracts_explorer.plugins import get_plugin
from abstracts_explorer import DatabaseManager
# Download papers
neurips_plugin = get_plugin('neurips')
papers_data = neurips_plugin.download(year=2025)
# Load into database and search
with DatabaseManager() as db:
db.create_tables()
db.add_papers(papers_data)
# Search papers
papers = db.search_papers(keyword="deep learning", limit=5)
for paper in papers:
print(f"{paper['title']} by {paper['authors']}")from abstracts_explorer import EmbeddingsManager
with EmbeddingsManager() as em:
em.create_collection()
em.embed_from_database()
# Find similar papers
results = em.search_similar(
"transformers for natural language processing",
n_results=5
)from abstracts_explorer.clustering import perform_clustering
# Perform complete clustering pipeline
results = perform_clustering(
reduction_method="tsne", # or "pca", "umap"
n_components=2,
clustering_method="kmeans", # or "dbscan", "agglomerative", "spectral", "fuzzy_cmeans"
n_clusters=8,
output_path="clusters.json"
)
# Access clustering results
print(f"Found {results['statistics']['n_clusters']} clusters")
for point in results['points']:
print(f"Paper: {point['title']} -> Cluster {point['cluster']}")π Complete Usage Guide - More examples and workflows
π Full Documentation - Complete documentation built with Sphinx
- Installation Guide - Detailed installation instructions
- Docker/Podman Guide - Container deployment with Docker and Podman
- Usage Guide - Examples and workflows
- Configuration Guide - Environment variables and settings
- CLI Reference - Command-line interface documentation
- Plugins Guide - Plugin system and conference downloaders
- API Reference - Python API documentation
- Contributing Guide - Development setup and guidelines
# Install with development dependencies
uv sync --all-extras
# Run tests
uv run pytest
# Run linters
ruff check src/ tests/
mypy src/ --ignore-missing-importsπ Contributing Guide - Complete development documentation
Contributions are welcome! Please read our Contributing Guide for details on:
- Development setup
- Running tests and linters
- Code style and conventions
- Submitting pull requests
Apache License 2.0 - see LICENSE file for details.
For issues, questions, or contributions:
- π Report issues
- π¬ Discussions
- π§ Contact the maintainers