feat: V3 Phase 0-5 implementation + integration testing audit#8
Merged
sylvanding merged 9 commits intomainfrom Mar 15, 2026
Merged
feat: V3 Phase 0-5 implementation + integration testing audit#8sylvanding merged 9 commits intomainfrom
sylvanding merged 9 commits intomainfrom
Conversation
…mprovements - Fix adjacent chunk assembly producing duplicated/disordered context (P0) - Fix apply_resolution allowing skipped papers to be imported (P0) - Fix KaTeX CSS missing import for formula rendering - Fix PaddleOCR hardcoded English lang, default to ch (Chinese+English) - Fix index_node N+1 query with batch PaperChunk loading - Add RAGService.retrieve_only() to avoid redundant LLM calls in chat pipeline - Make retrieve_node top_k configurable via ChatState - Unify OCR chunking to use semantic chunk_text() instead of per-page splits - Add section and chunk_type to index metadata for richer RAG retrieval - Unify dedup thresholds in config.py, align pipeline with DedupService - Replace MemorySaver with AsyncSqliteSaver for persistent pipeline checkpoints - Remove duplicate /knowledge-bases API routes - Create LLMConfigResolver for unified LLM config across chat/RAG/writing Made-with: Cursor
… architecture - Add MinerU HTTP client (mineru_client.py) for standalone API service - Refactor OCRService with async process_pdf_async() method: MinerU → pdfplumber → PaddleOCR fallback - Add chunk_mineru_markdown() for structured Markdown parsing into text/table/figure_caption chunks - Extend PaperChunk model with has_formula and figure_path fields + Alembic migration - Update pipeline nodes (ocr_node, index_node) and paper_processor to use new async path - Propagate has_formula/figure_path metadata through RAG indexing pipeline - Add MinerU deployment guide (docs/deployment/mineru-setup.md) - Update config defaults: pdf_parser, mineru_api_url, mineru_backend (pipeline), mineru_timeout Made-with: Cursor
- PRD v3: overview, chat logic, knowledge base, settings, architecture, innovation features, implementation roadmap, code audit, technical deep dive - Phase 4 plan: smart autocomplete, citation graph, literature review, PDF AI assistant Made-with: Cursor
Phase 4A - Smart Autocomplete:
- CompletionService with LLM-based input prediction
- POST /chat/complete endpoint with debounce support
- CompletionSuggestion component with ghost text + Tab/Esc handling
- ChatInput integration with AbortController
Phase 4B - Citation Graph:
- CitationGraphService with Semantic Scholar API (S2 ID/DOI/title resolution, tenacity retry)
- GET /papers/{id}/citation-graph endpoint
- react-force-graph-2d visualization with node detail panel
- PapersPage integration with graph dialog
Phase 4C - Auto Literature Review:
- WritingService.generate_literature_review() three-step pipeline (outline → RAG → streaming)
- POST /writing/review-draft/stream SSE endpoint with section/citation events
- WritingPage review tab with streaming, copy, download
Phase 4D - PDF Reader AI Assistant:
- GET /papers/{id}/pdf file serving endpoint
- ChatStreamRequest extended with paper_id/selected_text
- PDFViewer (react-pdf + pdf.js) with zoom/paging/text selection/scanned detection
- PDFReaderLayout with react-resizable-panels
- SelectionQA sidebar with explain/translate/find-citations quick actions
- PDFReaderPage route + PapersPage entry button
Made-with: Cursor
Testing: - Add unit tests for CompletionService, CitationGraphService, EmbeddingService - Add PDF metadata extraction tests and pipeline E2E tests - Extend WritingService stream endpoint tests with mocked SSE - 51 tests passing Performance: - Vite manualChunks code splitting (react-vendor, query, ai-sdk, etc.) - rollup-plugin-visualizer for bundle analysis - useThrottledValue hook for SSE streaming render throttling (80ms) - Batch queries in WritingService (IN clause replaces N+1 db.get) Security: - slowapi rate limiting middleware (120 req/min default) - Global exception handler with sanitised production errors - PDF path traversal guard and magic number validation - APP_DEBUG defaults to false for production safety Documentation: - Phase 4 API docs (EN + ZH) for completion, citation-graph, review-draft - Deployment guide with security checklist - Feature guide for Phase 4 capabilities Made-with: Cursor
- Add 14 missing config items to .env.example (LLM providers, embedding, OCR) - Fix VitePress sidebar: add 9 missing API entries + Deployment guide (EN/ZH) - Fix deployment.md broken link to MinerU setup guide - Add Phase 4 endpoints to README API overview - Fix LangGraph checkpointer returning context manager instead of saver - Remove 3 invalid KB alias tests (routes never implemented) - Create testing guide with coverage report (229 tests passing) - Create brainstorm and plan documents for integration testing audit Made-with: Cursor
- Add e2e/integration.spec.ts covering project creation, project detail navigation, writing page, discovery page, settings page, history/tasks pages, and cross-page navigation (9 tests) - Fix chat-flow.spec.ts: use .first() for ambiguous "New" button selector - Fix kb-paper-flow.spec.ts: use .first() for ambiguous "aside nav" locator - Update testing guide docs with E2E coverage table and API curl test results - All 19 E2E tests pass, 29 backend API endpoints verified via curl Made-with: Cursor
Backend:
- Format test_paper_processor.py with ruff
Frontend TypeScript:
- Remove unused imports (ZoomIn, ZoomOut, Maximize, useTranslation)
- Fix ForceGraph2D type compatibility with object-based callbacks
- Fix react-resizable-panels: direction→orientation, autoSaveId→id
- Remove unused projectId destructuring in SelectionQA
- Add initial values to useRef calls for React 19 strict types
- Replace Array.findLast with reverse().find() for ES2022 target
- Remove incompatible dataPartSchemas from useChat options
- Add explicit GraphData type to citation graph useQuery
VitePress docs:
- Remove Vue template interpolation conflict ({{ in inline code)
- Add dead link ignore patterns for internal reference documents
Made-with: Cursor
PlaygroundPage: "New Chat" button appears in both sidebar and header, causing getByRole strict mode violation. WritingPage: "综述生成" tab and "Generate" action button both match /generate|生成/i pattern. Made-with: Cursor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Key Changes
Backend (FastAPI)
Frontend (React + TypeScript)
Documentation
Configuration
Testing
Backend
Frontend
Integration
Post-Deploy Monitoring & Validation
No additional operational monitoring required: development environment only, no production deployment.