Skip to content

gustavokch/rag_ctb

Repository files navigation

RAG System for Legislation Study with Gemini 2.5 Pro

System Architecture Overview

PDF Document → Text Extraction → Chunking → Embedding → Vector DB → RAG Pipeline → Gemini 2.5 Pro → Response with Citations & Confidence

1. Document Processing Pipeline

PDF Text Extraction

See document_processing.py

Intelligent Chunking Strategy

See document_processing.py

2. Vector Database Setup

Using Chroma DB (Local & Free)

See vector_database.py

3. RAG Pipeline Implementation

See rag_pipeline.py

4. Complete System Setup

See system_setup.py

5. Enhanced Features

Query Expansion

See enhanced_features.py

Citation Verification

See enhanced_features.py

6. Web Interface (Optional)

See web_interface.py

7. Cost Optimization Tips

  • Batch embedding generation to reduce API calls
  • Cache embeddings locally to avoid regenerating
  • Use smaller chunks for more precise retrieval
  • Implement query similarity to avoid duplicate processing
  • Set up rate limiting for API calls

8. Evaluation Metrics

def evaluate_system(rag_system, test_questions: List[dict]):
    """Evaluate system performance"""
    metrics = {
        'accuracy': [],
        'confidence_calibration': [],
        'citation_accuracy': []
    }
    
    for qa_pair in test_questions:
        result = rag_system.ask_question(qa_pair['question'])
        
        # Compare with ground truth
        accuracy = calculate_answer_similarity(
            result['answer'], 
            qa_pair['expected_answer']
        )
        metrics['accuracy'].append(accuracy)
        
        # More evaluation logic...
    
    return metrics

This system provides a robust foundation for studying legislation with precise answers, proper citations, and confidence scoring. The modular design allows you to customize each component based on your specific needs and computational resources.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages