Aparture

Bringing the arXiv into focus.

Multi-stage research paper discovery and analysis tool that uses large language models (LLMs) to help search through arXiv and find the preprints that matter for your particular research interests.

Documentation

📚 View Full Documentation (deployed automatically)

The complete documentation includes:

Getting Started - Installation, setup, and quick start guide
User Guide - Web interface, CLI automation, testing, and reports
Concepts - Multi-stage analysis, arXiv categories, model selection, NotebookLM
API Reference - CLI commands, configuration, environment variables

Running Documentation Locally

npm run docs:dev      # Start dev server at http://localhost:5173
npm run docs:build    # Build static site
npm run docs:preview  # Preview built site

Features

Core Workflow

Multi-stage filtering: Quick filter → Abstract scoring → PDF analysis
Flexible processing: Process papers from multiple arXiv categories simultaneously
Smart scoring: 0-10 scale relevance scoring with detailed justifications
Deep analysis: Full PDF content analysis for top papers with vision-capable models
Post-processing: Optional second-pass scoring for consistency
Report generation: Comprehensive markdown reports with all analyses
Podcast creation: Generate NotebookLM-optimized documents for AI podcasts

Advanced Features

Testing modes: Dry run with mock data and minimal API testing
Visual indicators: Clear "TEST MODE" badges when using simulated data
Customizable models: Choose different LLMs for each processing stage
Batch processing: Efficient API usage with configurable batch sizes
Research criteria: Fully editable prompts for domain-specific analysis
Error handling: Automatic retries and correction mechanisms
CLI automation: Complete end-to-end automation from analysis to podcast generation

Supported Models

Anthropic: Claude Opus 4.1, Claude Sonnet 4.5, Claude Haiku 4.5
OpenAI: GPT-5, GPT-5 Mini, GPT-5 Nano
Google: Gemini 2.5 Pro, Gemini 2.5 Flash, Gemini 2.5 Flash-Lite

Quick Start

1. Install Dependencies

npm install

2. Set Up Environment Variables

Create a .env.local file:

# Access Password (required)
ACCESS_PASSWORD=your-secure-password-here

# API Keys (at least one required)
CLAUDE_API_KEY=sk-ant-your-api-key-here        # From https://console.anthropic.com/
OPENAI_API_KEY=sk-your-openai-key-here         # From https://platform.openai.com/
GOOGLE_AI_API_KEY=your-google-ai-key-here      # From https://aistudio.google.com/apikey

3. Run Locally

Web Interface (recommended for first-time users):

npm run dev

Open http://localhost:3000 in your browser.

CLI Automation (for frequent users):

# First time: Interactive setup
npm run setup

# Test configuration
npm run test:dryrun    # Mock API test (free)
npm run test:minimal   # Real API test (~$0.50)

# Run analysis
npm run analyze        # Full workflow: report + document + podcast

See the CLI Automation Guide for details.

Web Interface

Basic Workflow

Enter your password to access the app
Select arXiv categories you want to search
Configure your research interests (used for relevance scoring)
(Optional) Enable quick filtering to reduce paper volume
Select AI models for each processing stage
Click "Start Analysis" to begin processing

Testing Your Configuration

Dry Run Test: Complete workflow with mock API responses (no costs)
Minimal API Test: Test with 3 real papers to verify API integration (~$0.10-0.50)
Look for "TEST MODE" badges to confirm you're using simulated data

See the Testing Guide for more details.

Generating Reports

Download Report: Export comprehensive markdown analysis
NotebookLM Integration:
- Select target podcast duration (5-30 minutes)
- Choose generation model
- Generate structured document optimized for podcast creation
- Upload to NotebookLM for audio generation

See the Reports Guide for more information.

Command Line Interface

For frequent users who prefer automation, Aparture includes comprehensive CLI tools.

Prerequisites

# Install dependencies
npm install

# Install Playwright (first time only)
npx playwright install chromium

First-Time Setup

# Interactive configuration wizard
npm run setup

This opens a browser UI where you can:

Select arXiv categories to monitor
Choose AI models for each processing stage
Set score thresholds and batch sizes
Configure NotebookLM podcast duration
Define your research interests

Settings are saved automatically for future runs.

Testing

# Mock API test (fast, no costs)
npm run test:dryrun

# Real API test with 3 papers (minimal cost ~$0.50)
npm run test:minimal

Running Analyses

# Full workflow (report + document + podcast)
npm run analyze

# Specific workflows
npm run analyze:report     # Report only (skip NotebookLM features)
npm run analyze:document   # Report + NotebookLM document (skip podcast)
npm run analyze:podcast    # Podcast only (use existing files)

First Run: Google will prompt you to log in for NotebookLM authentication. Your session will be cached for future runs.

All outputs are saved to the reports/ directory with dated filenames.

See the CLI Automation Guide for comprehensive documentation.

Deployment

Vercel

# Install Vercel CLI
npm install -g vercel

# Deploy
vercel

# Set environment variables
vercel env add ACCESS_PASSWORD
vercel env add CLAUDE_API_KEY
vercel env add OPENAI_API_KEY
vercel env add GOOGLE_AI_API_KEY

# Deploy to production
vercel --prod

For custom domains:

Go to Settings → Domains in Vercel dashboard
Add your custom domain
Follow DNS configuration instructions

See the Deployment Guide for more options.

Security

This app includes password protection to prevent unauthorized use of your API keys. The password is checked on every API call to ensure security.

API keys are stored in .env.local (local development) or Vercel environment variables (production) and are never exposed to the client.

API Usage Notes

Batch Processing: Abstracts processed in configurable batches to respect rate limits
PDF Analysis: Direct multimodal analysis without text extraction
Error Recovery: Automatic retries with correction prompts for malformed responses
Cost Optimization:
- Use quick filtering to reduce volume before scoring
- Test with dry run mode before using real APIs
- Choose appropriate models for each stage
Default Models: Google Gemini models are set as defaults due to their generous free tier offering

See Model Selection Guide for detailed comparisons and cost analysis.

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

License

MIT

Acknowledgements

Created in collaboration with Claude Sonnet 4/4.5 and Claude Opus 4.1.

Note: This tool was primarily designed to help the author (Josh Speagle) manage daily paper monitoring across multiple arXiv categories (cs, stat, astro-ph) while keeping up with literature across a wide variety of fields.

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
.github/workflows		.github/workflows
.husky		.husky
cli		cli
components		components
docs		docs
pages		pages
styles		styles
utils		utils
.gitattributes		.gitattributes
.gitignore		.gitignore
.prettierignore		.prettierignore
.prettierrc.json		.prettierrc.json
LICENSE		LICENSE
NOTEBOOKLM_PROMPTS.md		NOTEBOOKLM_PROMPTS.md
README.md		README.md
eslint.config.js		eslint.config.js
fix-arxiv.js		fix-arxiv.js
next.config.mjs		next.config.mjs
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
tailwind.config.js		tailwind.config.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Aparture

Documentation

Running Documentation Locally

Features

Core Workflow

Advanced Features

Supported Models

Quick Start

1. Install Dependencies

2. Set Up Environment Variables

3. Run Locally

Web Interface

Basic Workflow

Testing Your Configuration

Generating Reports

Command Line Interface

Prerequisites

First-Time Setup

Testing

Running Analyses

Deployment

Vercel

Security

API Usage Notes

Contributing

License

Acknowledgements

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

joshspeagle/aparture

Folders and files

Latest commit

History

Repository files navigation

Aparture

Documentation

Running Documentation Locally

Features

Core Workflow

Advanced Features

Supported Models

Quick Start

1. Install Dependencies

2. Set Up Environment Variables

3. Run Locally

Web Interface

Basic Workflow

Testing Your Configuration

Generating Reports

Command Line Interface

Prerequisites

First-Time Setup

Testing

Running Analyses

Deployment

Vercel

Security

API Usage Notes

Contributing

License

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages