Multimodal Document Chat System - Coding Test (AI Context Architect & Engineering Lead)

Project Overview

Focus: Verify capability as an AI Context Architect & Engineering Lead.
Goal: Build a system that parses PDF documents (text/image/table), constructs a retrieval workflow, and provides accurate answers via a multimodal chat interface.

Why this test?
We are looking for a leader who can design the "Context" for LLMs, not just write code. We want to see your ability to design robust Multimodal Workflows and demonstrate Technical Leadership through architectural decisions.

Core Competencies to Evaluate

Context Engineering: Ability to design data structures (Chunking, Embedding, Metadata) for optimal LLM understanding.
Multimodal Workflow: Ability to orchestrate text, images, and tables into a single coherent workflow.
Technical Leadership: Ability to diagnose bottlenecks, ensuring quality and performance.

Provided Components (Starting Point)

The following items are already implemented and provided:

Infrastructure Setup

Docker Compose configuration (PostgreSQL+pgvector, Redis, Backend, Frontend)
Database schema and models (SQLAlchemy)
API base structure (FastAPI)
Frontend base structure (Next.js + TailwindCSS)

Database Models

Document - Uploaded document information
DocumentChunk - Text chunks (with vector embeddings)
DocumentImage - Extracted images
DocumentTable - Extracted tables
Conversation - Chat sessions
Message - Chat messages

API Endpoints (Skeleton provided)

POST /api/documents/upload - Upload document
GET /api/documents - List documents
GET /api/documents/{id} - Document details
DELETE /api/documents/{id} - Delete document
POST /api/chat - Send chat message
GET /api/conversations - List conversations
GET /api/conversations/{id} - Get conversation history

Frontend Pages (Layout only)

/ - Home (document list)
/upload - Document upload
/chat - Chat interface
/documents/[id] - Document details

Core Features to Implement (Your Job)

Scale: 3 Core Features + 3 Design Deliverables

Part A: System Implementation (Implementation)

1. Document Processing Pipeline (Critical)

Location: backend/app/services/document_processor.py Goal: Parse PDF and structure data for Retrieval.

class DocumentProcessor:
    async def process_document(self, file_path: str, document_id: int) -> Dict[str, Any]:
        """
        Implementation steps:
        1. Parse PDF using Docling
        2. Extract and chunk text (Context Engineering)
        3. Extract images/tables and create meaningful metadata linkage
        """
        pass

2. Vector Store Integration (Critical)

Location: backend/app/services/vector_store.py Goal: Store projections of multimodal data.

class VectorStore:
    async def store_text_chunks(self, chunks: List[Dict[str, Any]], document_id: int) -> int:
        pass
    
    async def search_similar(self, query: str, document_id: Optional[int] = None, k: int = 5) -> List[Dict[str, Any]]:
        """
        Retrieve context with high relevance. 
        Think about how to retrieve 'Tables' or 'Images' associated with text chunks.
        """
        pass

3. Multimodal Chat Engine (Critical)

Location: backend/app/services/chat_engine.py Goal: Generate grounded answers using retrieved context.

class ChatEngine:
    async def process_message(self, conversation_id: int, message: str, document_id: Optional[int] = None) -> Dict[str, Any]:
        """
        Orchestrate the RAG flow:
        1. Retrieve context (Text + Image + Table)
        2. Construct Prompt (Prompt Engineering)
        3. Generate Response via LLM
        """
        pass

Part B: Design & Leadership (New Requirements)

In addition to code, please include the following in your README.md or separate markdown files (e.g., DESIGN.md).

1. [Design Choice Document]

Chunking Strategy: Why did you choose this specific chunk size/overlap? Did you consider semantic chunking?
Multimodal Linking: How did you logically link extracted images/tables to text? (e.g., spatial proximity, explicit references in text?)

2. [Evaluation Pipeline Design]

Quality Metrics: How will you measure the accuracy of answers? (e.g., "If I were to build an eval pipeline, I would check X, Y, Z...")
Metrics: Mention specific metrics like RAGAS (faithfulness, answer relevance) or LLM-as-a-judge approaches.

3. [Prompt Versioning Strategy]

Management: Did you hardcode prompts? If you were to scale this, how would you manage prompt versions?
Proposal: Suggest a strategy to separate prompts from code (e.g., external templates, prompt registry).

Evaluation Criteria (100 points)

1. Code Quality & Leadership (25 points)

Design Patterns & Abstraction
Readability for Junior Devs
Error Handling & Resilience

2. Feature Implementation (30 points)

Document Parsing (Docling)
Vector Search Accuracy
Multimodal RAG Flow

3. Systems Design & Engineering (30 points)

Context Engineering: Logic behind chunking & metadata
Evaluation: Depth of thought on quality assurance
Prompt Management: Maturity of prompt handling

4. UX/UX & Documentation (15 points)

README Quality (Design choices explanation)
UI/UX interaction flow
API Documentation

Submission Requirements

GitHub Repository
Source Code
Documentation (README.md + DESIGN.md)
Screenshots & Results (using 1706.03762v7.pdf)

Sample PDF for Testing

📄 Test Document: "Attention Is All You Need" Paper

Use this document to demonstrate:

Architecture Diagram Retrieval: Can it find Figure 1?
Table Data Retrieval: Can it answer questions about BLEU scores?
Technical Context: Can it explain "Self-Attention" using specific sections?

Good luck! We look forward to seeing your Technical Leadership and Architectural Insight.

Implementation Guidelines

Refer to the service skeleton files for detailed implementation guidance:

backend/app/services/document_processor.py
backend/app/services/vector_store.py
backend/app/services/chat_engine.py

Free LLM Options (For Development)

You can use Ollama (free, local) or Gemini/Groq (free tier) to implement this without cost.

FAQ

Q: Where is the test PDF? A: https://arxiv.org/pdf/1706.03762.pdf

Q: Can I use LangChain/LlamaIndex? A: Yes, but we value your ability to explain how it works under the hood.

Version: 2.0 (Updated for AI Context Architect Role) Last Updated: 2026-01-12 Author: InterOpera-Apps Hiring Team

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
backend		backend
frontend		frontend
.env.example		.env.example
.gitignore		.gitignore
QUICKSTART.md		QUICKSTART.md
README.md		README.md
REQUIREMENTS.md		REQUIREMENTS.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multimodal Document Chat System - Coding Test (AI Context Architect & Engineering Lead)

Project Overview

Core Competencies to Evaluate

Provided Components (Starting Point)

Infrastructure Setup

Database Models

API Endpoints (Skeleton provided)

Frontend Pages (Layout only)

Core Features to Implement (Your Job)

Part A: System Implementation (Implementation)

1. Document Processing Pipeline (Critical)

2. Vector Store Integration (Critical)

3. Multimodal Chat Engine (Critical)

Part B: Design & Leadership (New Requirements)

1. [Design Choice Document]

2. [Evaluation Pipeline Design]

3. [Prompt Versioning Strategy]

Evaluation Criteria (100 points)

1. Code Quality & Leadership (25 points)

2. Feature Implementation (30 points)

3. Systems Design & Engineering (30 points)

4. UX/UX & Documentation (15 points)

Submission Requirements

Sample PDF for Testing

Implementation Guidelines

Free LLM Options (For Development)

FAQ

About

Uh oh!

Releases

Packages

Languages

InterOpera-Apps/coding-test-4h

Folders and files

Latest commit

History

Repository files navigation

Multimodal Document Chat System - Coding Test (AI Context Architect & Engineering Lead)

Project Overview

Core Competencies to Evaluate

Provided Components (Starting Point)

Infrastructure Setup

Database Models

API Endpoints (Skeleton provided)

Frontend Pages (Layout only)

Core Features to Implement (Your Job)

Part A: System Implementation (Implementation)

1. Document Processing Pipeline (Critical)

2. Vector Store Integration (Critical)

3. Multimodal Chat Engine (Critical)

Part B: Design & Leadership (New Requirements)

1. [Design Choice Document]

2. [Evaluation Pipeline Design]

3. [Prompt Versioning Strategy]

Evaluation Criteria (100 points)

1. Code Quality & Leadership (25 points)

2. Feature Implementation (30 points)

3. Systems Design & Engineering (30 points)

4. UX/UX & Documentation (15 points)

Submission Requirements

Sample PDF for Testing

Implementation Guidelines

Free LLM Options (For Development)

FAQ

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages