Build a full-stack application using RAG (Retrieval Augmented Generation) technology:
- Next.js as the frontend framework
- FastAPI as the backend API layer
- PDF financial statement document based intelligent Q&A system
- Vector database for document search and generative AI
Parse and embed the provided FinancialStatement_2025_I_AADIpdf.pdf file, then build a system where users can ask questions about the financial statement and AI generates answers by retrieving relevant information.
- Fork this repository to your own GitHub account
- Clone your forked repository to your local machine
- Complete the coding challenge following the requirements below
- Push your completed solution to your forked repository
- Send your repository URL via email when completed
# Fork this repository on GitHub (click "Fork" button)
# Then clone your forked repository
git clone https://github.com/YOUR_USERNAME/coding-test-2nd.git
cd coding-test-2nd
# Start development...When you complete the assignment:
- Ensure your code is pushed to your forked repository
- Test that your application runs correctly
- Send your GitHub repository URL via email
- Include any additional setup instructions if needed
- Parse PDF file to text and split into chunks
- Convert each chunk to vector embeddings and store in vector database
- Implement retrieval system that embeds user questions and searches for relevant document chunks
- Implement generation system that combines retrieved context with questions and sends to LLM
- Implement the following endpoints using FastAPI:
POST /api/upload: PDF file upload and vectorization processingGET /api/documents: Retrieve processed document informationPOST /api/chat: Generate RAG-based answers to questionsGET /api/chunks: Retrieve document chunks and metadata (optional)
- Configure CORS to allow API calls from Next.js app
- Integrate with vector database (e.g., Chroma, FAISS, Pinecone, etc.)
- Implement user-friendly chat interface using Next.js
- Real-time Q&A functionality (chat format)
- Document upload status display and processing progress
- Display referenced document chunk sources with answers
- Loading states and error handling
- Document Processing: PyPDF2, pdfplumber, or langchain Document Loaders
- Embedding Models: OpenAI embeddings, Sentence Transformers, or HuggingFace embeddings
- Vector Database: ChromaDB (local), FAISS, or Pinecone
- LLM: OpenAI GPT, Google Gemini, Anthropic Claude, or open-source models
- Frameworks: LangChain or LlamaIndex (for RAG pipeline construction)
- Multi-PDF file support
- Conversation history maintenance and context continuity
- Answer quality evaluation and feedback system
- Visual highlighting of document chunks
- Financial metrics calculator integration
- Chart and graph generation functionality
- OpenAI API: GPT-3.5/4 (free credits provided)
- Google Gemini API: Free tier available
- Anthropic Claude: Free credits provided
- Cohere: Free API available
- Hugging Face: Free open-source models
- OpenAI Embeddings: text-embedding-ada-002
- Cohere Embeddings: Free tier available
- Sentence Transformers: Open-source models for local execution
- Hugging Face Embeddings: Various free models available
- ChromaDB: Free local and cloud usage
- FAISS: Free open-source by Meta
- Weaviate: Free cloud tier available
- Pinecone: Free starter plan available
coding-test-2nd/
├── backend/
│ ├── main.py # FastAPI application
│ ├── models/ # Data models
│ │ └── schemas.py # Pydantic schemas
│ ├── services/ # RAG service logic
│ │ ├── pdf_processor.py # PDF processing and chunking
│ │ ├── vector_store.py # Vector database integration
│ │ └── rag_pipeline.py # RAG pipeline
│ ├── requirements.txt # Python dependencies
│ └── config.py # Configuration file
├── frontend/
│ ├── pages/ # Next.js pages
│ │ ├── index.tsx # Main page
│ │ └── _app.tsx # App component
│ ├── components/ # React components
│ │ ├── ChatInterface.tsx
│ │ └── FileUpload.tsx
│ ├── styles/ # CSS files
│ │ └── globals.css # Global styles
│ ├── package.json # Node.js dependencies
│ ├── tsconfig.json # TypeScript configuration
│ ├── next.config.js # Next.js configuration
│ ├── next-env.d.ts # Next.js type definitions
│ └── .eslintrc.json # ESLint configuration
├── data/
│ └── FinancialStatement_2025_I_AADIpdf.pdf
└── README.md
# Clone repository
git clone <your-repository-url>
cd coding-test-2nd
# Set up Python virtual environment (recommended)
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activatecd backend
# Install dependencies
pip install -r requirements.txt
# Set up environment variables (create .env file)
OPENAI_API_KEY=your_openai_api_key
VECTOR_DB_PATH=./vector_store
PDF_UPLOAD_PATH=../data
# Run server
uvicorn main:app --host 0.0.0.0 --port 8000 --reloadcd frontend
# Install dependencies
npm install
# Run development server
npm run devNote: If you encounter TypeScript/linting errors:
- Make sure
npm installcompleted successfully - The project includes all necessary configuration files (
tsconfig.json,.eslintrc.json,next-env.d.ts) - Check that all dependencies are properly installed in
node_modules
# Process and vectorize PDF file via API
curl -X POST "http://localhost:8000/api/upload" \
-F "file=@../data/FinancialStatement_2025_I_AADIpdf.pdf"Upload PDF file and store in vector database
{
"file": "multipart/form-data"
}Generate RAG-based answer to question
{
"question": "What is the total revenue for 2025?",
"chat_history": [] // optional
}Response:
{
"answer": "The total revenue for 2025 is 123.4 billion won...",
"sources": [
{
"content": "Related document chunk content",
"page": 1,
"score": 0.85
}
],
"processing_time": 2.3
}Retrieve processed document information
{
"documents": [
{
"filename": "FinancialStatement_2025_I_AADIpdf.pdf",
"upload_date": "2024-01-15T10:30:00Z",
"chunks_count": 125,
"status": "processed"
}
]
}- PDF processing and chunking quality
- Embedding and vector search accuracy
- LLM integration and answer quality
- Code readability and maintainability
- Modularization and separation of concerns
- Error handling and logging
- Intuitive chat interface
- Real-time feedback and loading states
- Answer source display and reliability
- API design and documentation
- Performance optimization
- Scalable architecture
- Your forked GitHub repository with complete implementation
- All source code (frontend, backend, configurations)
- Updated documentation with any additional setup instructions
- Runnable demo that works locally
- Complete your implementation in your forked repository
- Test thoroughly to ensure everything works
- Push all changes to your GitHub repository
- Send an email with your repository URL to the designated contact
- Complete frontend and backend implementation
- All necessary configuration files
- Clear installation and execution instructions
- Any additional documentation or notes
- Demo video showing your system in action
- Performance analysis or optimization notes
- Future improvement suggestions
Your system should be able to handle questions like these about the financial statement PDF:
- "What is the total revenue for 2025?"
- "What is the year-over-year operating profit growth rate?"
- "What are the main cost items?"
- "How is the cash flow situation?"
- "What is the debt ratio?"
Frontend TypeScript Errors:
- Ensure
npm installwas completed successfully - Check that
node_modulesdirectory exists and is populated - Verify all configuration files are present
Backend Import Errors:
- Activate Python virtual environment
- Install all requirements:
pip install -r requirements.txt - Check Python path and module imports
CORS Issues:
- Ensure backend CORS settings allow frontend origin
- Check that API endpoints are accessible from frontend
Build a smarter document Q&A system with RAG technology!