feat: v0.1.0 release polish - critical and major fixes#4
Merged
Conversation
- Guard json.loads() in analysis.py with try/except JSONDecodeError - Add s2_timeout config setting (default 10s) with retry=False for S2 client - Prevent PDF double-download by saving during analysis and marking pdf_downloaded - Skip already-downloaded PDFs in export stage
- Refactor _build_settings to use immutable Settings(**overrides) pattern - Add --overwrite flag to run command - Auto-increment output directory name when directory exists and is populated - Add tests for collision detection and overwrite behavior
- Write ScreeningResult with score=0 for papers without abstract - Wrap call_llm in try/except LLMError in query_gen with clear error message
- Rename litresearch.toml to litresearch.toml.example (git mv) - Add html.unescape() for title, abstract, venue in Paper.from_s2()
- Test query generation with successful LLM response and error handling - Test screening behavior for no-abstract papers and JSON parse failures - Test discovery S2 client configuration and paper deduplication
- Add comment for BATCH_SIZE in enrichment.py - Add run summary block in pipeline.py with timing and counts - Change screening_threshold default from 40 to 60 with documentation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR implements all critical and major fixes from FEATURE.md to prepare litresearch for v0.1.0 publication.
Critical Fixes
JSON parsing error handling - Wrapped
json.loads()in both_screen_paperand_analyze_paperwith try/exceptJSONDecodeError. ReturnsNoneand prints warning on malformed LLM responses instead of crashing.Semantic Scholar timeout/retry - Added
s2_timeoutconfig setting (default 10s). S2 client is now created withtimeout=settings.s2_timeout, retry=Falsein both discovery and enrichment stages to prevent 14-minute hangs.PDF double-download prevention - PDFs are now saved during analysis stage to
papers/directory and markedpdf_downloaded=True. Export stage skips already-downloaded papers.Major Fixes
Immutable Settings construction - Refactored
_build_settings()to useSettings(**overrides)pattern instead of post-init mutation.Output directory collision handling - Added
--overwriteflag and auto-increment logic. When output directory exists and is populated, automatically usesoutput-2,output-3, etc.No-abstract paper handling - Papers without abstracts now get a
ScreeningResultwithrelevance_score=0andrationale="no abstract available"instead of being silently skipped.LLMError handling in query_gen - Wrapped
call_llm()in try/except with clear error message on failure.Config file hygiene - Renamed
litresearch.tomltolitresearch.toml.example(already in.gitignore).HTML entity unescaping - Applied
html.unescape()totitle,venue, andabstractfields inPaper.from_s2().Stage-level test coverage - Added 3 new test files with comprehensive coverage:
test_stages_query_gen.py- query generation and error handlingtest_stages_screening.py- screening behavior and no-abstract handlingtest_stages_discovery.py- S2 client config and deduplicationMinor Polish
BATCH_SIZEcomment documenting S2 batch endpoint limitscreening_thresholddefault from 40 to 60 with documentationValidation
All nox sessions pass:
Commits