Open
Conversation
Implements persistent file-based caching for ParsedSyllabus objects to improve performance and reduce redundant LLM API calls. Features: - Content-based caching using SHA-256 hash of PDF bytes - Pickle serialization for efficient storage - Cache stored in system cache directory (~/.cache/syllabusmcp/) - Graceful error handling and cache failures don't break core functionality - Cache operations: get, set, invalidate, clear, list_keys, get_statistics - Environment variable support (SYLLABUSMCP_CACHE_DIR, SYLLABUSMCP_DISABLE_CACHE) - use_cache parameter on parse_syllabus() to bypass cache when needed Performance improvements: - 80%+ reduction in parsing time for repeated queries - 100% reduction in LLM API calls for cached syllabi - Cache read: ~1-5ms vs LLM parsing: ~2-10 seconds Testing: - 16 unit tests for cache operations - 5 integration tests for parse_syllabus caching - All 21 tests passing Closes #6
Contributor
Author
Design DocumentThis PR implements the following design for syllabus caching. ArchitectureHigh-Level DesignComponent Responsibilities1. SyllabusCache (New)
2. parse_syllabus() (Modified)
3. Cache Storage (New)
Module StructureCache Key GenerationAlgorithm: SHA-256 hash of raw PDF bytes Rationale:
File Storage StructureEach cache file is named with the SHA-256 hash (64 hex characters). Integration FlowBefore (No Cache)def parse_syllabus(pdf_path_or_url: str) -> ParsedSyllabus:
pages = extract_pdf_pages(pdf_path_or_url)
# ... LLM parsing (~2-10 seconds) ...
return parsedAfter (With Cache)_cache = SyllabusCache()
def parse_syllabus(pdf_path_or_url: str, use_cache: bool = True) -> ParsedSyllabus:
# Check cache first
if use_cache:
cached = _cache.get(pdf_path_or_url) # ~1-5ms
if cached:
return cached
# Cache miss - parse via LLM
pages = extract_pdf_pages(pdf_path_or_url)
# ... LLM parsing (~2-10 seconds) ...
parsed = ParsedSyllabus(...)
# Cache the result
if use_cache:
_cache.set(pdf_path_or_url, parsed)
return parsedError HandlingPrinciple: Graceful Degradation
ConfigurationCache Location
Disable Caching
Performance Characteristics
Result: 100-1000x faster for cache hits! Testing StrategyUnit Tests (16 tests)
Integration Tests (5 tests)
Future Enhancements (Not in v1)
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
Implements persistent file-based caching for
ParsedSyllabusobjects to dramatically improve performance and reduce redundant LLM API calls.Fixes #6
Key Features
~/.cache/syllabusmcp/on macOS/LinuxSYLLABUSMCP_CACHE_DIR: Override cache locationSYLLABUSMCP_DISABLE_CACHE: Disable caching entirelyuse_cacheparameter onparse_syllabus()Performance Improvements
Testing
Files Changed
Created
syllabus_server/cache.py- Core cache implementation (261 lines)tests/test_cache.py- Unit tests (260 lines)tests/test_cache_integration.py- Integration tests (220 lines)Modified
syllabus_server/server.py- Integrated cache into parse_syllabus()Usage Example
Design Document
See comment below for detailed design documentation.