An intelligent chatbot to provide information about courses, exams, services and procedures of UniversitΓ Cattolica using RAG (Retrieval-Augmented Generation) technologies.
This work was developed by Eleonora Farolfi during his thesis work, under the supervision of Prof. Andrea Belli at UniversitΓ Cattolica del Sacro Cuore.
- Interactive chat for natural language questions
- Batch processing of queries from Excel/CSV/TXT files
- Automatic indexing of crawled documents
- Multi-language support (Italian/English)
- Integrated debug modes for development
- LangChain - Framework for AI applications
- Google Gemini - Language model (LLM)
- FAISS - Vector store for semantic search
- Pandas - Excel file processing
- BeautifulSoup - Web crawling
- Python 3.8+
- Google Gemini API Key
# Clone the repository
git clone https://github.com/nluninja/studentsbot.git
cd studentsbot
# Automatic setup
source activate_studentsbot.sh
# Configure API key (edit .env)
GOOGLE_API_KEY=your_api_key_here
# Start the bot
studentsbot# Create virtual environment
python -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Configure environment
cp .env.example .env
# Edit .env with your GOOGLE_API_KEY# Initial configuration
python bot_review.py --index_only # Create vectorstore
python bot_review.py --interactive # Start chat
# Guided configuration
python bot_review.py # Default mode
# Full help
python bot_review.py --help# From Excel file (column A)
python batch_query.py "data/domande chatbot.xlsx" risultati.json --verbose
# From text file
python batch_query.py data/queries.txt risultati.csv
# Extract questions from Excel
python extract_queries.py # creates data/queries.txt# Data collection from website
python crawler.pyThe project includes several tools to evaluate the quality of chatbot responses:
# Complete evaluation with multiple metrics
python rageval.py risultati.json
# Save detailed results
python rageval.py risultati.json valutazione_dettagliata.jsonCalculated metrics:
- Text similarity (difflib SequenceMatcher)
- ROUGE-1, ROUGE-2, ROUGE-L (n-gram overlap and LCS)
- BLEU score (precision with brevity penalty)
- Keyword overlap (precision, recall, F1)
# Use LLM model to judge semantic equivalence
python llm_as_judge.py risultati.json
# Save detailed judgments
python llm_as_judge.py risultati.json -o giudizi_llm.jsonLLM Judge features:
- Intelligent semantic evaluation
- Confidence score for each judgment
- Detailed reasoning for decisions
- Robust error handling
studentsbot/
βββ π€ bot_review.py # Main bot with interface
βββ π batch_query.py # Batch query processing
βββ π·οΈ crawler.py # Web crawler for data collection
βββ π extract_queries.py # Extract questions from Excel
βββ π rageval.py # Complete evaluation (ROUGE, BLEU, etc)
βββ π§ llm_as_judge.py # Semantic evaluation with LLM
βββ π data/ # Input and test data
β βββ π domande chatbot.xlsx # Excel file with questions
β βββ π queries.txt # Extracted questions (56 questions)
βββ π output_crawler/ # Crawled documents (150+ files)
βββ ποΈ index/ # FAISS vectorstore (generated)
βββ π venv/ # Python virtual environment
βββ βοΈ activate_studentsbot.sh # Automatic setup script
βββ π¦ requirements.txt # Python dependencies
βββ π README.md # Documentation
βββ π« .gitignore # Files to ignore
βββ βοΈ .env.example # Environment variables template
βββ π LICENSE # MIT License
| Command | Description |
|---|---|
python bot_review.py --help |
Show complete help |
python bot_review.py --interactive |
Direct chat (fast) |
python bot_review.py --index_only |
Indexing only |
python bot_review.py |
Guided configuration |
| Format | Example |
|---|---|
| Excel | python batch_query.py data/domande.xlsx risultati.json |
| CSV | python batch_query.py data/domande.csv risultati.json |
| TXT | python batch_query.py data/queries.txt risultati.csv |
GOOGLE_API_KEY=your_gemini_api_key_hereMARKDOWN_DIR = "output_crawler" # Documents directory
VECTORSTORE_PATH = "index" # FAISS vectorstore path
MODEL_NAME_LLM = "gemini-2.5-pro" # Main model
BATCH_SIZE = 100 # Indexing batch size
BATCH_WAIT = 2 # Pause between batches (seconds)For code debugging:
- Verbose mode: Use
--verbosein commands for detailed output - Sample testing: Create test files with few questions for quick debug
- Logs: Check error messages in terminal
- VSCode: Configure your debug environment as preferred
- "What are the active master's degree courses at UniversitΓ Cattolica?"
- "What exams are there in the first year of Data Analytics?"
- "How can I enroll in an Erasmus program?"
- "What are the career opportunities for Economics?"
- 150+ documents crawled from official website
- 56 test questions extracted from Excel
- Master's degree courses from all campuses
- Student services and procedures
- International programs and internships
- Fork the project
- Create a feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is released under the MIT License. See the LICENSE file for details.
If you have problems or questions:
- Check the documentation
- Search in Issues
- Open a new issue if necessary
- LangChain for the AI framework
- Google for the Gemini API
- UniversitΓ Cattolica for the data
β Star this project if it was useful to you!