Skip to content

An intelligent chatbot to provide information about courses, exams, services and procedures of the Catholic University using RAG (Retrieval-Augmented Generation) technologies

License

Notifications You must be signed in to change notification settings

nluninja/studentsbot

Repository files navigation

StudentsBot πŸŽ“

An intelligent chatbot to provide information about courses, exams, services and procedures of UniversitΓ  Cattolica using RAG (Retrieval-Augmented Generation) technologies.

πŸŽ“ Academic Context

This work was developed by Eleonora Farolfi during his thesis work, under the supervision of Prof. Andrea Belli at UniversitΓ  Cattolica del Sacro Cuore.

πŸš€ Features

  • Interactive chat for natural language questions
  • Batch processing of queries from Excel/CSV/TXT files
  • Automatic indexing of crawled documents
  • Multi-language support (Italian/English)
  • Integrated debug modes for development

πŸ› οΈ Technologies

  • LangChain - Framework for AI applications
  • Google Gemini - Language model (LLM)
  • FAISS - Vector store for semantic search
  • Pandas - Excel file processing
  • BeautifulSoup - Web crawling

πŸ“¦ Installation

Prerequisites

  • Python 3.8+
  • Google Gemini API Key

Quick Setup

# Clone the repository
git clone https://github.com/nluninja/studentsbot.git
cd studentsbot

# Automatic setup
source activate_studentsbot.sh

# Configure API key (edit .env)
GOOGLE_API_KEY=your_api_key_here

# Start the bot
studentsbot

Manual Setup

# Create virtual environment
python -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Configure environment
cp .env.example .env
# Edit .env with your GOOGLE_API_KEY

πŸš€ Usage

Interactive Mode

# Initial configuration
python bot_review.py --index_only       # Create vectorstore
python bot_review.py --interactive      # Start chat

# Guided configuration
python bot_review.py                    # Default mode

# Full help
python bot_review.py --help

Batch Processing

# From Excel file (column A)
python batch_query.py "data/domande chatbot.xlsx" risultati.json --verbose

# From text file
python batch_query.py data/queries.txt risultati.csv

# Extract questions from Excel
python extract_queries.py  # creates data/queries.txt

Web Crawling

# Data collection from website
python crawler.py

πŸ“Š Response Evaluation

The project includes several tools to evaluate the quality of chatbot responses:

1. Automatic Evaluation (rageval.py)

# Complete evaluation with multiple metrics
python rageval.py risultati.json

# Save detailed results
python rageval.py risultati.json valutazione_dettagliata.json

Calculated metrics:

  • Text similarity (difflib SequenceMatcher)
  • ROUGE-1, ROUGE-2, ROUGE-L (n-gram overlap and LCS)
  • BLEU score (precision with brevity penalty)
  • Keyword overlap (precision, recall, F1)

2. LLM-as-Judge Evaluation (llm_as_judge.py)

# Use LLM model to judge semantic equivalence
python llm_as_judge.py risultati.json

# Save detailed judgments
python llm_as_judge.py risultati.json -o giudizi_llm.json

LLM Judge features:

  • Intelligent semantic evaluation
  • Confidence score for each judgment
  • Detailed reasoning for decisions
  • Robust error handling

πŸ“ Project Structure

studentsbot/
β”œβ”€β”€ πŸ€– bot_review.py           # Main bot with interface
β”œβ”€β”€ πŸ“Š batch_query.py          # Batch query processing
β”œβ”€β”€ πŸ•·οΈ crawler.py              # Web crawler for data collection
β”œβ”€β”€ πŸ“‹ extract_queries.py      # Extract questions from Excel
β”œβ”€β”€ πŸ“Š rageval.py              # Complete evaluation (ROUGE, BLEU, etc)
β”œβ”€β”€ 🧠 llm_as_judge.py         # Semantic evaluation with LLM
β”œβ”€β”€ πŸ“ data/                  # Input and test data
β”‚   β”œβ”€β”€ πŸ“„ domande chatbot.xlsx  # Excel file with questions
β”‚   └── πŸ“ queries.txt          # Extracted questions (56 questions)
β”œβ”€β”€ πŸ“ output_crawler/        # Crawled documents (150+ files)
β”œβ”€β”€ πŸ—„οΈ index/                 # FAISS vectorstore (generated)
β”œβ”€β”€ 🐍 venv/                  # Python virtual environment
β”œβ”€β”€ βš™οΈ activate_studentsbot.sh # Automatic setup script
β”œβ”€β”€ πŸ“¦ requirements.txt       # Python dependencies
β”œβ”€β”€ πŸ“– README.md              # Documentation
β”œβ”€β”€ 🚫 .gitignore            # Files to ignore
β”œβ”€β”€ βš™οΈ .env.example          # Environment variables template
└── πŸ“„ LICENSE               # MIT License

🎯 Available Commands

Main Bot

Command Description
python bot_review.py --help Show complete help
python bot_review.py --interactive Direct chat (fast)
python bot_review.py --index_only Indexing only
python bot_review.py Guided configuration

Batch Processing

Format Example
Excel python batch_query.py data/domande.xlsx risultati.json
CSV python batch_query.py data/domande.csv risultati.json
TXT python batch_query.py data/queries.txt risultati.csv

πŸ”§ Configuration

.env File

GOOGLE_API_KEY=your_gemini_api_key_here

Bot Parameters (bot_review.py)

MARKDOWN_DIR = "output_crawler"     # Documents directory
VECTORSTORE_PATH = "index"          # FAISS vectorstore path
MODEL_NAME_LLM = "gemini-2.5-pro"   # Main model
BATCH_SIZE = 100                    # Indexing batch size
BATCH_WAIT = 2                      # Pause between batches (seconds)

πŸ› Debug and Development

For code debugging:

  1. Verbose mode: Use --verbose in commands for detailed output
  2. Sample testing: Create test files with few questions for quick debug
  3. Logs: Check error messages in terminal
  4. VSCode: Configure your debug environment as preferred

πŸ“Š Sample Questions

  • "What are the active master's degree courses at UniversitΓ  Cattolica?"
  • "What exams are there in the first year of Data Analytics?"
  • "How can I enroll in an Erasmus program?"
  • "What are the career opportunities for Economics?"

πŸ—‚οΈ Available Data

  • 150+ documents crawled from official website
  • 56 test questions extracted from Excel
  • Master's degree courses from all campuses
  • Student services and procedures
  • International programs and internships

🀝 Contributing

  1. Fork the project
  2. Create a feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

πŸ“ License

This project is released under the MIT License. See the LICENSE file for details.

πŸ†˜ Support

If you have problems or questions:

  1. Check the documentation
  2. Search in Issues
  3. Open a new issue if necessary

πŸ™ Acknowledgments


⭐ Star this project if it was useful to you!

About

An intelligent chatbot to provide information about courses, exams, services and procedures of the Catholic University using RAG (Retrieval-Augmented Generation) technologies

Topics

Resources

License

Stars

Watchers

Forks