This project demonstrates a Retrieval Augmented Generation (RAG) system for answering questions about a GitHub repository. It leverages ChromaDB for efficient vector similarity search and OpenAI's embedding models for representing textual and code data.
The system processes a GitHub repository, extracts textual and code chunks, generates embeddings for each chunk using OpenAI's text-embedding-ada-002 and CodeBERT, and stores them in ChromaDB collections. When a user asks a question, the system retrieves the most relevant chunks based on semantic similarity, constructs a prompt incorporating the retrieved context, and uses OpenAI's gpt-3.5-turbo to generate an answer.
- GitHub Data Parsing: The
github_parser.pyscript uses the GitHub API to retrieve files from a specified repository. It then splits the content into chunks, differentiating between textual and code files. - Embedding Generation: The
embedding.pyscript generates embeddings for both textual and code chunks. Textual embeddings are created using OpenAI'stext-embedding-ada-002, while code embeddings are generated using CodeBERT. - ChromaDB Storage: The
chromaDB_setup.pyscript sets up two ChromaDB collections: one for textual embeddings and one for code embeddings. The embeddings and corresponding metadata are added to these collections. - Query Processing: The
helper.ipynbnotebook contains the core logic for answering user questions. It retrieves the most similar chunks from ChromaDB, constructs a RAG prompt, and uses OpenAI'sgpt-3.5-turboto generate the answer.
- Install Dependencies:
pip install -r requirements.txt- Set API Keys: Create a
.envfile and add your OpenAI and GitHub API keys:
OPENAI_API_KEY="YOUR_OPENAI_API_KEY"
GITHUB_API_KEY="YOUR_GITHUB_API_KEY"
- Run the Notebook: Execute the
helper.ipynbnotebook. This will parse the specified GitHub repository, generate embeddings, create ChromaDB collections, and allow you to ask questions.
A neo‑brutalist frontend lives in ./frontend, built with Next.js 13+/16 App Router, TypeScript, and Tailwind CSS.
Prerequisites:
- Node.js >= 18.18.0 (Node 20+ recommended)
Install and run in development:
cd frontend
npm install
npm run dev
# The app will start at http://localhost:3000Build and start production server:
cd frontend
npm run build
npm start
# Serves the built app on http://localhost:3000Notes:
- Tailwind is preconfigured (see
frontend/tailwind.config.tsandfrontend/app/globals.css). - UI components are under
frontend/components/ui(Button, Input). - Main page with input → processing → dashboard/chat flow is at
frontend/app/page.tsx.
The helper.ipynb notebook provides a simple interface for querying the system. You can modify the query variable to test different questions. The system will return the AI's response based on the context found in the repository.
- More sophisticated chunk splitting strategies.
- Improved prompt engineering for better AI responses.
- Integration with other knowledge sources.
- Enhanced error handling and logging.
Contributions are welcome! Please open an issue or submit a pull request.