Focus: Verify capability as an AI Context Architect & Engineering Lead.
Goal: Build a system that parses PDF documents (text/image/table), constructs a retrieval workflow, and provides accurate answers via a multimodal chat interface.
Why this test?
We are looking for a leader who can design the "Context" for LLMs, not just write code. We want to see your ability to design robust Multimodal Workflows and demonstrate Technical Leadership through architectural decisions.
- Context Engineering: Ability to design data structures (Chunking, Embedding, Metadata) for optimal LLM understanding.
- Multimodal Workflow: Ability to orchestrate text, images, and tables into a single coherent workflow.
- Technical Leadership: Ability to diagnose bottlenecks, ensuring quality and performance.
The following items are already implemented and provided:
- Docker Compose configuration (PostgreSQL+pgvector, Redis, Backend, Frontend)
- Database schema and models (SQLAlchemy)
- API base structure (FastAPI)
- Frontend base structure (Next.js + TailwindCSS)
Document- Uploaded document informationDocumentChunk- Text chunks (with vector embeddings)DocumentImage- Extracted imagesDocumentTable- Extracted tablesConversation- Chat sessionsMessage- Chat messages
POST /api/documents/upload- Upload documentGET /api/documents- List documentsGET /api/documents/{id}- Document detailsDELETE /api/documents/{id}- Delete documentPOST /api/chat- Send chat messageGET /api/conversations- List conversationsGET /api/conversations/{id}- Get conversation history
/- Home (document list)/upload- Document upload/chat- Chat interface/documents/[id]- Document details
Scale: 3 Core Features + 3 Design Deliverables
Location: backend/app/services/document_processor.py
Goal: Parse PDF and structure data for Retrieval.
class DocumentProcessor:
async def process_document(self, file_path: str, document_id: int) -> Dict[str, Any]:
"""
Implementation steps:
1. Parse PDF using Docling
2. Extract and chunk text (Context Engineering)
3. Extract images/tables and create meaningful metadata linkage
"""
passLocation: backend/app/services/vector_store.py
Goal: Store projections of multimodal data.
class VectorStore:
async def store_text_chunks(self, chunks: List[Dict[str, Any]], document_id: int) -> int:
pass
async def search_similar(self, query: str, document_id: Optional[int] = None, k: int = 5) -> List[Dict[str, Any]]:
"""
Retrieve context with high relevance.
Think about how to retrieve 'Tables' or 'Images' associated with text chunks.
"""
passLocation: backend/app/services/chat_engine.py
Goal: Generate grounded answers using retrieved context.
class ChatEngine:
async def process_message(self, conversation_id: int, message: str, document_id: Optional[int] = None) -> Dict[str, Any]:
"""
Orchestrate the RAG flow:
1. Retrieve context (Text + Image + Table)
2. Construct Prompt (Prompt Engineering)
3. Generate Response via LLM
"""
passIn addition to code, please include the following in your README.md or separate markdown files (e.g., DESIGN.md).
- Chunking Strategy: Why did you choose this specific chunk size/overlap? Did you consider semantic chunking?
- Multimodal Linking: How did you logically link extracted images/tables to text? (e.g., spatial proximity, explicit references in text?)
- Quality Metrics: How will you measure the accuracy of answers? (e.g., "If I were to build an eval pipeline, I would check X, Y, Z...")
- Metrics: Mention specific metrics like RAGAS (faithfulness, answer relevance) or LLM-as-a-judge approaches.
- Management: Did you hardcode prompts? If you were to scale this, how would you manage prompt versions?
- Proposal: Suggest a strategy to separate prompts from code (e.g., external templates, prompt registry).
- Design Patterns & Abstraction
- Readability for Junior Devs
- Error Handling & Resilience
- Document Parsing (Docling)
- Vector Search Accuracy
- Multimodal RAG Flow
- Context Engineering: Logic behind chunking & metadata
- Evaluation: Depth of thought on quality assurance
- Prompt Management: Maturity of prompt handling
READMEQuality (Design choices explanation)- UI/UX interaction flow
- API Documentation
- GitHub Repository
- Source Code
- Documentation (README.md + DESIGN.md)
- Screenshots & Results (using
1706.03762v7.pdf)
📄 Test Document: "Attention Is All You Need" Paper
Use this document to demonstrate:
- Architecture Diagram Retrieval: Can it find Figure 1?
- Table Data Retrieval: Can it answer questions about BLEU scores?
- Technical Context: Can it explain "Self-Attention" using specific sections?
Good luck! We look forward to seeing your Technical Leadership and Architectural Insight.
Refer to the service skeleton files for detailed implementation guidance:
backend/app/services/document_processor.pybackend/app/services/vector_store.pybackend/app/services/chat_engine.py
You can use Ollama (free, local) or Gemini/Groq (free tier) to implement this without cost.
Q: Where is the test PDF?
A: https://arxiv.org/pdf/1706.03762.pdf
Q: Can I use LangChain/LlamaIndex? A: Yes, but we value your ability to explain how it works under the hood.
Version: 2.0 (Updated for AI Context Architect Role) Last Updated: 2026-01-12 Author: InterOpera-Apps Hiring Team