RAG Pipeline - Code Structure

Project Structure

Askly/
├── config/
│   ├── __init__.py
│   └── config.py              # Configuration settings
├── processors/
│   ├── __init__.py
│   ├── pdf_processor.py       # PDF downloading and text extraction
│   └── text_processor.py      # Text processing and chunking
├── models/
│   ├── __init__.py
│   ├── embedding_manager.py   # Embedding creation and management
│   ├── retrieval_system.py    # Semantic search and retrieval
│   └── llm_manager.py         # LLM loading and text generation
├── utils/
│   ├── __init__.py
│   └── utils.py               # Utility functions
├── rag_pipeline.py            # Main pipeline orchestrator
├── data/                          # PDF files and raw data
├── models/                        # Downloaded models
├── outputs/                       # Generated embeddings and outputs
├── main.py                        # Command-line interface
├── run_rag.py                     # Simple runner script
└── README.md                      # This file

Module Descriptions

Core Pipeline

rag_pipeline.py: Main orchestrator that coordinates all components
main.py: Command-line interface with multiple run modes
run_rag.py: Simple script for quick testing

Configuration

config/config.py: All configuration settings, paths, and parameters

Data Processing

processors/pdf_processor.py: Downloads PDFs and extracts text
processors/text_processor.py: Processes text, splits sentences, creates chunks

AI Models

models/embedding_manager.py: Creates and manages text embeddings
models/retrieval_system.py: Performs semantic search and retrieval
models/llm_manager.py: Loads and manages the language model

Utilities

utils/utils.py: Helper functions for text processing, model management, etc.

Usage Examples

Command Line Interface

# Interactive mode
python main.py --mode interactive

# Demo mode with predefined questions
python main.py --mode demo

# Single question
python main.py --mode single --question "What are macronutrients?"

# Custom settings
python main.py --mode single --question "What is protein?" --temperature 0.5 --max-tokens 512

Simple Runner

# Quick start
python run_rag.py

Programmatic Usage

from src.rag_pipeline import RAGPipeline

# Initialize pipeline
pipeline = RAGPipeline()

# Setup (downloads PDF, creates embeddings, loads models)
pipeline.setup_pipeline()

# Ask questions
answer = pipeline.ask("What are the macronutrients?")
print(answer)

# Search without generation
results = pipeline.search("protein sources")

Key Features

Modular Design: Each component is separate and can be used independently
Configuration Management: All settings centralized in config.py
Error Handling: Comprehensive error handling throughout
Multiple Interfaces: CLI, programmatic, and interactive modes
GPU Support: Automatic GPU detection and model optimization
Caching: Saves embeddings to avoid recomputation
Extensible: Easy to add new models or processing steps

Dependencies

The code requires the same dependencies as the original notebook:

PyMuPDF (fitz)
sentence-transformers
transformers
torch
pandas
numpy
spacy
tqdm
requests

File Organization

Each module has a specific responsibility:

PDF Processing: Downloads and extracts text from PDFs
Text Processing: Splits text into sentences and chunks
Embedding Management: Creates and stores text embeddings
Retrieval: Finds relevant documents for queries
LLM Management: Generates answers using language models
Pipeline: Orchestrates all components together

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG Pipeline - Code Structure

Project Structure

Module Descriptions

Core Pipeline

Configuration

Data Processing

AI Models

Utilities

Usage Examples

Command Line Interface

Simple Runner

Programmatic Usage

Key Features

Dependencies

File Organization

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
config		config
data		data
models		models
processors		processors
utils		utils
.gitignore		.gitignore
README.md		README.md
main.py		main.py
rag_pipeline.py		rag_pipeline.py
requirements.txt		requirements.txt
run_rag.py		run_rag.py

quangkmhd/Askly

Folders and files

Latest commit

History

Repository files navigation

RAG Pipeline - Code Structure

Project Structure

Module Descriptions

Core Pipeline

Configuration

Data Processing

AI Models

Utilities

Usage Examples

Command Line Interface

Simple Runner

Programmatic Usage

Key Features

Dependencies

File Organization

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages