Skip to content

feat: V3 Phase 0-5 implementation + integration testing audit#8

Merged
sylvanding merged 9 commits intomainfrom
feat/integration-testing-audit
Mar 15, 2026
Merged

feat: V3 Phase 0-5 implementation + integration testing audit#8
sylvanding merged 9 commits intomainfrom
feat/integration-testing-audit

Conversation

@sylvanding
Copy link
Copy Markdown
Owner

Summary

  • Phase 0: Bug fixes and architecture improvements (MinerU OCR integration, PDF parsing dual-tier architecture)
  • Phase 1-3: V3 PRD documentation, core feature implementation (chat, RAG, search, keywords, dedup, writing, subscriptions)
  • Phase 4: Innovation features — citation graph visualization, PDF reader with selection QA, smart autocomplete, streaming literature review, frontend performance optimization
  • Phase 5: Polish & release — security hardening (rate limiting, global error handler, PDF validation), frontend code splitting, VitePress documentation, deployment guide
  • Integration Testing & Audit: Comprehensive backend API testing (229 pytest + 29 curl endpoints), E2E Playwright testing (19 tests), configuration alignment, documentation synchronization

Key Changes

Backend (FastAPI)

  • CitationGraphService + Semantic Scholar integration
  • CompletionService for smart autocomplete
  • WritingService streaming literature review (SSE)
  • MinerU PDF parsing client + dual-tier OCR
  • Rate limiting middleware (slowapi)
  • LLM config resolver for multi-provider support
  • 229 backend tests passing

Frontend (React + TypeScript)

  • Citation graph visualization (react-force-graph-2d)
  • PDF reader with resizable panels + selection QA
  • Smart autocomplete overlay in ChatInput
  • Streaming literature review in WritingPage
  • Performance: code splitting, throttled values, deferred rendering
  • 19 E2E Playwright tests passing

Documentation

  • Complete V3 PRD (8 documents)
  • VitePress sidebar aligned with 16 API docs
  • Deployment guide + MinerU setup guide
  • Testing guide with full coverage tables
  • 5 best practices research documents

Configuration

  • .env.example updated with all LLM providers, embedding, OCR settings
  • pyproject.toml + package.json dependencies verified

Testing

Backend

  • 229 pytest tests: all pass
  • 29 API endpoints verified via curl (LLM_PROVIDER=mock)
  • Ruff lint: zero errors

Frontend

  • 19 Playwright E2E tests: all pass
  • TypeScript type check: clean
  • Pages verified: Playground, Knowledge Bases, Settings, History, Tasks, Project Detail, Discovery, Writing

Integration

  • Frontend ↔ Backend: project creation, paper CRUD, settings, chat
  • All pages load without JS errors
  • Cross-page navigation verified

Post-Deploy Monitoring & Validation

No additional operational monitoring required: development environment only, no production deployment.

…mprovements

- Fix adjacent chunk assembly producing duplicated/disordered context (P0)
- Fix apply_resolution allowing skipped papers to be imported (P0)
- Fix KaTeX CSS missing import for formula rendering
- Fix PaddleOCR hardcoded English lang, default to ch (Chinese+English)
- Fix index_node N+1 query with batch PaperChunk loading
- Add RAGService.retrieve_only() to avoid redundant LLM calls in chat pipeline
- Make retrieve_node top_k configurable via ChatState
- Unify OCR chunking to use semantic chunk_text() instead of per-page splits
- Add section and chunk_type to index metadata for richer RAG retrieval
- Unify dedup thresholds in config.py, align pipeline with DedupService
- Replace MemorySaver with AsyncSqliteSaver for persistent pipeline checkpoints
- Remove duplicate /knowledge-bases API routes
- Create LLMConfigResolver for unified LLM config across chat/RAG/writing

Made-with: Cursor
… architecture

- Add MinerU HTTP client (mineru_client.py) for standalone API service
- Refactor OCRService with async process_pdf_async() method: MinerU → pdfplumber → PaddleOCR fallback
- Add chunk_mineru_markdown() for structured Markdown parsing into text/table/figure_caption chunks
- Extend PaperChunk model with has_formula and figure_path fields + Alembic migration
- Update pipeline nodes (ocr_node, index_node) and paper_processor to use new async path
- Propagate has_formula/figure_path metadata through RAG indexing pipeline
- Add MinerU deployment guide (docs/deployment/mineru-setup.md)
- Update config defaults: pdf_parser, mineru_api_url, mineru_backend (pipeline), mineru_timeout

Made-with: Cursor
- PRD v3: overview, chat logic, knowledge base, settings, architecture,
  innovation features, implementation roadmap, code audit, technical deep dive
- Phase 4 plan: smart autocomplete, citation graph, literature review, PDF AI assistant

Made-with: Cursor
Phase 4A - Smart Autocomplete:
- CompletionService with LLM-based input prediction
- POST /chat/complete endpoint with debounce support
- CompletionSuggestion component with ghost text + Tab/Esc handling
- ChatInput integration with AbortController

Phase 4B - Citation Graph:
- CitationGraphService with Semantic Scholar API (S2 ID/DOI/title resolution, tenacity retry)
- GET /papers/{id}/citation-graph endpoint
- react-force-graph-2d visualization with node detail panel
- PapersPage integration with graph dialog

Phase 4C - Auto Literature Review:
- WritingService.generate_literature_review() three-step pipeline (outline → RAG → streaming)
- POST /writing/review-draft/stream SSE endpoint with section/citation events
- WritingPage review tab with streaming, copy, download

Phase 4D - PDF Reader AI Assistant:
- GET /papers/{id}/pdf file serving endpoint
- ChatStreamRequest extended with paper_id/selected_text
- PDFViewer (react-pdf + pdf.js) with zoom/paging/text selection/scanned detection
- PDFReaderLayout with react-resizable-panels
- SelectionQA sidebar with explain/translate/find-citations quick actions
- PDFReaderPage route + PapersPage entry button

Made-with: Cursor
Testing:
- Add unit tests for CompletionService, CitationGraphService, EmbeddingService
- Add PDF metadata extraction tests and pipeline E2E tests
- Extend WritingService stream endpoint tests with mocked SSE
- 51 tests passing

Performance:
- Vite manualChunks code splitting (react-vendor, query, ai-sdk, etc.)
- rollup-plugin-visualizer for bundle analysis
- useThrottledValue hook for SSE streaming render throttling (80ms)
- Batch queries in WritingService (IN clause replaces N+1 db.get)

Security:
- slowapi rate limiting middleware (120 req/min default)
- Global exception handler with sanitised production errors
- PDF path traversal guard and magic number validation
- APP_DEBUG defaults to false for production safety

Documentation:
- Phase 4 API docs (EN + ZH) for completion, citation-graph, review-draft
- Deployment guide with security checklist
- Feature guide for Phase 4 capabilities

Made-with: Cursor
- Add 14 missing config items to .env.example (LLM providers, embedding, OCR)
- Fix VitePress sidebar: add 9 missing API entries + Deployment guide (EN/ZH)
- Fix deployment.md broken link to MinerU setup guide
- Add Phase 4 endpoints to README API overview
- Fix LangGraph checkpointer returning context manager instead of saver
- Remove 3 invalid KB alias tests (routes never implemented)
- Create testing guide with coverage report (229 tests passing)
- Create brainstorm and plan documents for integration testing audit

Made-with: Cursor
- Add e2e/integration.spec.ts covering project creation, project detail
  navigation, writing page, discovery page, settings page, history/tasks
  pages, and cross-page navigation (9 tests)
- Fix chat-flow.spec.ts: use .first() for ambiguous "New" button selector
- Fix kb-paper-flow.spec.ts: use .first() for ambiguous "aside nav" locator
- Update testing guide docs with E2E coverage table and API curl test results
- All 19 E2E tests pass, 29 backend API endpoints verified via curl

Made-with: Cursor
Backend:
- Format test_paper_processor.py with ruff

Frontend TypeScript:
- Remove unused imports (ZoomIn, ZoomOut, Maximize, useTranslation)
- Fix ForceGraph2D type compatibility with object-based callbacks
- Fix react-resizable-panels: direction→orientation, autoSaveId→id
- Remove unused projectId destructuring in SelectionQA
- Add initial values to useRef calls for React 19 strict types
- Replace Array.findLast with reverse().find() for ES2022 target
- Remove incompatible dataPartSchemas from useChat options
- Add explicit GraphData type to citation graph useQuery

VitePress docs:
- Remove Vue template interpolation conflict ({{ in inline code)
- Add dead link ignore patterns for internal reference documents

Made-with: Cursor
PlaygroundPage: "New Chat" button appears in both sidebar and header,
causing getByRole strict mode violation.
WritingPage: "综述生成" tab and "Generate" action button both match
/generate|生成/i pattern.

Made-with: Cursor
@sylvanding sylvanding merged commit 4d6b811 into main Mar 15, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant