Bringing the arXiv into focus.
Multi-stage research paper discovery and analysis tool that uses large language models (LLMs) to help search through arXiv and find the preprints that matter for your particular research interests.
π View Full Documentation (deployed automatically)
The complete documentation includes:
- Getting Started - Installation, setup, and quick start guide
- User Guide - Web interface, CLI automation, testing, and reports
- Concepts - Multi-stage analysis, arXiv categories, model selection, NotebookLM
- API Reference - CLI commands, configuration, environment variables
npm run docs:dev # Start dev server at http://localhost:5173
npm run docs:build # Build static site
npm run docs:preview # Preview built site- Multi-stage filtering: Quick filter β Abstract scoring β PDF analysis
- Flexible processing: Process papers from multiple arXiv categories simultaneously
- Smart scoring: 0-10 scale relevance scoring with detailed justifications
- Deep analysis: Full PDF content analysis for top papers with vision-capable models
- Post-processing: Optional second-pass scoring for consistency
- Report generation: Comprehensive markdown reports with all analyses
- Podcast creation: Generate NotebookLM-optimized documents for AI podcasts
- Testing modes: Dry run with mock data and minimal API testing
- Visual indicators: Clear "TEST MODE" badges when using simulated data
- Customizable models: Choose different LLMs for each processing stage
- Batch processing: Efficient API usage with configurable batch sizes
- Research criteria: Fully editable prompts for domain-specific analysis
- Error handling: Automatic retries and correction mechanisms
- CLI automation: Complete end-to-end automation from analysis to podcast generation
- Anthropic: Claude Opus 4.1, Claude Sonnet 4.5, Claude Haiku 4.5
- OpenAI: GPT-5, GPT-5 Mini, GPT-5 Nano
- Google: Gemini 2.5 Pro, Gemini 2.5 Flash, Gemini 2.5 Flash-Lite
npm installCreate a .env.local file:
# Access Password (required)
ACCESS_PASSWORD=your-secure-password-here
# API Keys (at least one required)
CLAUDE_API_KEY=sk-ant-your-api-key-here # From https://console.anthropic.com/
OPENAI_API_KEY=sk-your-openai-key-here # From https://platform.openai.com/
GOOGLE_AI_API_KEY=your-google-ai-key-here # From https://aistudio.google.com/apikeyWeb Interface (recommended for first-time users):
npm run devOpen http://localhost:3000 in your browser.
CLI Automation (for frequent users):
# First time: Interactive setup
npm run setup
# Test configuration
npm run test:dryrun # Mock API test (free)
npm run test:minimal # Real API test (~$0.50)
# Run analysis
npm run analyze # Full workflow: report + document + podcastSee the CLI Automation Guide for details.
- Enter your password to access the app
- Select arXiv categories you want to search
- Configure your research interests (used for relevance scoring)
- (Optional) Enable quick filtering to reduce paper volume
- Select AI models for each processing stage
- Click "Start Analysis" to begin processing
- Dry Run Test: Complete workflow with mock API responses (no costs)
- Minimal API Test: Test with 3 real papers to verify API integration (~$0.10-0.50)
- Look for "TEST MODE" badges to confirm you're using simulated data
See the Testing Guide for more details.
- Download Report: Export comprehensive markdown analysis
- NotebookLM Integration:
- Select target podcast duration (5-30 minutes)
- Choose generation model
- Generate structured document optimized for podcast creation
- Upload to NotebookLM for audio generation
See the Reports Guide for more information.
For frequent users who prefer automation, Aparture includes comprehensive CLI tools.
# Install dependencies
npm install
# Install Playwright (first time only)
npx playwright install chromium# Interactive configuration wizard
npm run setupThis opens a browser UI where you can:
- Select arXiv categories to monitor
- Choose AI models for each processing stage
- Set score thresholds and batch sizes
- Configure NotebookLM podcast duration
- Define your research interests
Settings are saved automatically for future runs.
# Mock API test (fast, no costs)
npm run test:dryrun
# Real API test with 3 papers (minimal cost ~$0.50)
npm run test:minimal# Full workflow (report + document + podcast)
npm run analyze
# Specific workflows
npm run analyze:report # Report only (skip NotebookLM features)
npm run analyze:document # Report + NotebookLM document (skip podcast)
npm run analyze:podcast # Podcast only (use existing files)First Run: Google will prompt you to log in for NotebookLM authentication. Your session will be cached for future runs.
All outputs are saved to the reports/ directory with dated filenames.
See the CLI Automation Guide for comprehensive documentation.
# Install Vercel CLI
npm install -g vercel
# Deploy
vercel
# Set environment variables
vercel env add ACCESS_PASSWORD
vercel env add CLAUDE_API_KEY
vercel env add OPENAI_API_KEY
vercel env add GOOGLE_AI_API_KEY
# Deploy to production
vercel --prodFor custom domains:
- Go to Settings β Domains in Vercel dashboard
- Add your custom domain
- Follow DNS configuration instructions
See the Deployment Guide for more options.
This app includes password protection to prevent unauthorized use of your API keys. The password is checked on every API call to ensure security.
API keys are stored in .env.local (local development) or Vercel environment variables (production) and are never exposed to the client.
- Batch Processing: Abstracts processed in configurable batches to respect rate limits
- PDF Analysis: Direct multimodal analysis without text extraction
- Error Recovery: Automatic retries with correction prompts for malformed responses
- Cost Optimization:
- Use quick filtering to reduce volume before scoring
- Test with dry run mode before using real APIs
- Choose appropriate models for each stage
- Default Models: Google Gemini models are set as defaults due to their generous free tier offering
See Model Selection Guide for detailed comparisons and cost analysis.
Contributions are welcome! Please feel free to submit issues or pull requests.
MIT
Created in collaboration with Claude Sonnet 4/4.5 and Claude Opus 4.1.
Note: This tool was primarily designed to help the author (Josh Speagle) manage daily paper monitoring across multiple arXiv categories (cs, stat, astro-ph) while keeping up with literature across a wide variety of fields.