spignotti · spignotti · Apr 10, 2026 · Apr 10, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -2,50 +2,83 @@
 
 All notable changes to this project will be documented in this file.
 
-The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
+The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
-## [0.2.0] - 2026-03-23
-
+## [1.0.0] - 2026-04-09
 
 ### Added
-
-- V0.1.0 release polish - critical and major fixes
-
-- **screening:** Add global top-percent selection for deep analysis 
-
-
-### Documentation
-
-- **readme:** Update config and screening mode guidance
-
+- **Multi-source discovery**: Support for both Semantic Scholar and OpenAlex APIs
+  - Configurable discovery sources via `discovery_sources` setting
+  - OpenAlex adapter with field mapping
+  - Global deduplication using DOI match and fuzzy title matching
+  - Source tracking (s2, openalex, both, citation_expansion)
+
+- **Citation graph expansion**: Optional post-ranking stage to discover frequently referenced works
+  - Configurable via `expand_citations` and `min_cross_refs` settings
+  - Adds cross-referenced papers as recommended reading
+
+- **Zotero integration**: Export top papers directly to Zotero library
+  - Support for user and group libraries
+  - Automatic PDF attachment
+  - Custom tagging and collection assignment
+  - Configurable via `zotero_*` settings
+
+- **Run-quality telemetry**: Comprehensive metrics collection
+  - `RunMetrics` and `StageMetrics` models
+  - Per-stage timing, input/output counts, error tracking
+  - Aggregate statistics (candidates, screened, analyzed, exported)
+  - Source breakdown and PDF status tracking
+  - Written to `metrics.json` in output directory
+
+- **Manual PDF injection**: Support for providing your own PDFs
+  - `--inject-pdfs` CLI flag
+  - Configurable via `inject_pdf_dir` setting
+  - Matching by paper_id or DOI filename
+  - Useful for papers behind paywalls
+
+- **Token-budgeted PDF extraction**: Intelligent text extraction
+  - Replaces fixed first/last pages heuristic
+  - Keyword-based page scoring
+  - Configurable token budget
+  - Falls back gracefully when extraction fails
+
+- **Abstract-fallback screening**: Multi-signal screening for papers without abstracts
+  - Uses title, venue, citation count, year, and PDF excerpts
+  - Conservative scoring bias toward inclusion
+  - Dedicated `screening_fallback.md` prompt
+
+- **Robust error handling**: Resilience against external failures
+  - `parse_llm_json()` helper with comprehensive validation
+  - `retry_with_backoff()` decorator for API calls
+  - Configurable retry settings (`max_retries`, `retry_base_delay`)
+  - Graceful degradation when LLM returns malformed JSON
+
+- **Security improvements**:
+  - Path sanitization via `safe_filename()` utility
+  - Atomic state persistence using temp file + os.replace
+
+### Changed
+- **PDF tracking**: Replaced `pdf_downloaded: bool` with richer fields
+  - `pdf_path: str | None` - relative path to PDF
+  - `pdf_status: Literal["not_attempted", "downloaded", "unavailable", "user_provided"]`
+  - `data_completeness: Literal["full", "abstract_only", "metadata_only"]`
+
+- **Version source**: Single-source version via `importlib.metadata`
+  - Removed hardcoded version from `__init__.py`
+  - Version now sourced from `pyproject.toml`
+
+- **Configuration**: Added `litresearch.toml.example` with all new options
+  - Renamed existing `litresearch.toml` to example file
+  - Real config files now gitignored
 
 ### Fixed
-
-- **s2:** Enforce 1 rps throttling across S2 stages
-
-
-### Maintenance
-
-- Migrate to opencode workflow
-
-
-### ci
-
-- **release:** Add environment for trusted publisher
-
-
-## [0.1.0] - 2026-03-09
-
-
-### Added
-
-- **ci:** Add oss release workflows
-
-
-### Maintenance
-
-- Initial project setup
-
-- Release v0.1.0
-
+- **Resume bug**: Fixed crash when resuming from `current_stage="start"`
+- **State persistence**: Atomic writes prevent state corruption on interrupt
+- **JSON parsing**: Proper handling of missing keys and validation errors in LLM responses
+- **Path traversal**: Sanitized paper_id usage in filenames
+
+### Dependencies
+- Added `pyalex>=0.15` for OpenAlex integration
+- Added `pyzotero>=1.6` for Zotero export
+- Added `rapidfuzz` for fuzzy title matching (optional, falls back to difflib)
diff --git a/README.md b/README.md
@@ -6,10 +6,41 @@ ranked, and exported paper sets with structured reports.
 
 ## Overview
 - Generates search facets and academic queries from one or more research questions
-- Searches Semantic Scholar for candidate papers
+- Discovers candidates from Semantic Scholar and OpenAlex
 - Screens and analyzes papers with an LLM through LiteLLM
-- Ranks papers and exports reports, references, JSON data, and PDFs
-- Supports resume via a saved `state.json`
+- Supports citation graph expansion for frequently referenced works
+- Ranks papers and exports reports, references, JSON data, PDFs, and metrics
+- Supports robust resume via a saved `state.json`
+
+## What's New in v1.0.0
+
+### Multi-source discovery (S2 + OpenAlex)
+- Use `discovery_sources = ["s2", "openalex"]` for broader coverage.
+- Candidates are deduplicated across sources and source provenance is tracked.
+
+### Citation graph expansion
+- Optional expansion stage adds highly cross-referenced papers after ranking.
+- Configure with `expand_citations` and `min_cross_refs`.
+
+### Zotero export
+- Export top papers to Zotero user or group libraries.
+- Supports collection assignment, tags, and PDF attachment when available.
+
+### PDF injection
+- Bring your own PDFs with `--inject-pdfs` or `inject_pdf_dir`.
+- Match files by `{paper_id}.pdf` or DOI-based filenames.
+
+### Run metrics and telemetry
+- Every run writes `metrics.json` with stage timings and aggregate counts.
+- Includes source breakdown plus PDF availability and usage metrics.
+
+### Resume behavior improvements
+- Improved resume reliability from `state.json` checkpoints.
+- Safer state persistence with atomic writes.
+
+### Token-budgeted PDF extraction
+- Configurable extraction strategy supports token budgets for LLM context limits.
+- Falls back gracefully when PDFs are unavailable or extraction is limited.
 
 ## Installation
 ```bash
@@ -59,6 +90,7 @@ output/
   references.bib
   references.ris
   data.json
+  metrics.json
   papers/
   state.json
 ```
@@ -90,6 +122,12 @@ Resume an interrupted run:
 litresearch resume output/state.json
 ```
 
+Inject local PDFs for papers you already have:
+
+```bash
+litresearch run "Your research question" --inject-pdfs /path/to/pdfs
+```
+
 Inspect current configuration:
 
 ```bash
@@ -108,26 +146,44 @@ Supported environment variables:
 - `ANTHROPIC_API_KEY`
 - `OPENROUTER_API_KEY`
 - `S2_API_KEY`
+- `ZOTERO_API_KEY`
 - `S2_TIMEOUT`
 - `S2_REQUESTS_PER_SECOND`
 - `SCREENING_SELECTION_MODE`
 - `SCREENING_TOP_PERCENT`
 - `SCREENING_TOP_K`
 - `SCREENING_THRESHOLD`
 
-Example `litresearch.toml`:
+Start from the full example config:
+
+```bash
+cp litresearch.toml.example litresearch.toml
+```
+
+Key options include:
 
 ```toml
 default_model = "openai/gpt-4o-mini"
+llm_timeout = 120
+max_retries = 3
+retry_base_delay = 1.0
+discovery_sources = ["s2"]
 screening_selection_mode = "top_percent"
 screening_top_percent = 0.3
 screening_threshold = 60
 top_n = 20
 max_results_per_query = 20
+expand_citations = false
+min_cross_refs = 3
+zotero_export = false
 s2_timeout = 10
 s2_requests_per_second = 1.0
+pdf_extraction_mode = "budget"
+pdf_token_budget = 4000
 pdf_first_pages = 4
 pdf_last_pages = 2
+abstract_fallback = true
+# inject_pdf_dir = "/path/to/pdfs"
 output_dir = "output"
 ```
 
@@ -140,12 +196,25 @@ Semantic Scholar tuning:
 - `s2_timeout`: request timeout in seconds
 - `s2_requests_per_second`: global request rate cap across S2 endpoints
 
+Discovery tuning:
+- `discovery_sources`: choose `s2`, `openalex`, or both
+- `openalex_email`: optional email for OpenAlex polite pool rate limits
+
+Citation expansion tuning:
+- `expand_citations`: enable or disable expansion stage
+- `min_cross_refs`: minimum citation graph references to include
+
+Zotero export tuning:
+- `zotero_export`: enable export integration
+- `zotero_library_id`, `zotero_library_type`, `zotero_collection_key`, `zotero_tag`
+
 ## Output Files
 - `report.md`: main literature review report with research questions, search summary, top papers, and synthesis
 - `paper_analyses.md`: detailed per-paper analysis for all analyzed papers
 - `references.bib`: BibTeX for ranked papers when citation data is available
 - `references.ris`: RIS export for citation managers
 - `data.json`: machine-readable export of the pipeline state
+- `metrics.json`: per-stage timings and aggregate run metrics
 - `papers/`: downloaded open-access PDFs for ranked papers
 - `state.json`: resumable pipeline checkpoint
 
@@ -156,5 +225,5 @@ uv run litresearch --help
 ```
 
 ## Status
-This is an MVP-oriented proof of concept intended to answer one question clearly:
-is the end-to-end literature research workflow useful enough to keep investing in?
+`v1.0.0` delivers a production-ready core workflow for automated literature research,
+including multi-source discovery, ranking, export, and operational telemetry.