StudentsBot 🎓

An intelligent chatbot to provide information about courses, exams, services and procedures of Università Cattolica using RAG (Retrieval-Augmented Generation) technologies.

🎓 Academic Context

This work was developed by Eleonora Farolfi during his thesis work, under the supervision of Prof. Andrea Belli at Università Cattolica del Sacro Cuore.

🚀 Features

Interactive chat for natural language questions
Batch processing of queries from Excel/CSV/TXT files
Automatic indexing of crawled documents
Multi-language support (Italian/English)
Integrated debug modes for development

🛠️ Technologies

LangChain - Framework for AI applications
Google Gemini - Language model (LLM)
FAISS - Vector store for semantic search
Pandas - Excel file processing
BeautifulSoup - Web crawling

📦 Installation

Prerequisites

Python 3.8+
Google Gemini API Key

Quick Setup

# Clone the repository
git clone https://github.com/nluninja/studentsbot.git
cd studentsbot

# Automatic setup
source activate_studentsbot.sh

# Configure API key (edit .env)
GOOGLE_API_KEY=your_api_key_here

# Start the bot
studentsbot

Manual Setup

# Create virtual environment
python -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Configure environment
cp .env.example .env
# Edit .env with your GOOGLE_API_KEY

🚀 Usage

Interactive Mode

# Initial configuration
python bot_review.py --index_only       # Create vectorstore
python bot_review.py --interactive      # Start chat

# Guided configuration
python bot_review.py                    # Default mode

# Full help
python bot_review.py --help

Batch Processing

# From Excel file (column A)
python batch_query.py "data/domande chatbot.xlsx" risultati.json --verbose

# From text file
python batch_query.py data/queries.txt risultati.csv

# Extract questions from Excel
python extract_queries.py  # creates data/queries.txt

Web Crawling

# Data collection from website
python crawler.py

📊 Response Evaluation

The project includes several tools to evaluate the quality of chatbot responses:

1. Automatic Evaluation (rageval.py)

# Complete evaluation with multiple metrics
python rageval.py risultati.json

# Save detailed results
python rageval.py risultati.json valutazione_dettagliata.json

Calculated metrics:

Text similarity (difflib SequenceMatcher)
ROUGE-1, ROUGE-2, ROUGE-L (n-gram overlap and LCS)
BLEU score (precision with brevity penalty)
Keyword overlap (precision, recall, F1)

2. LLM-as-Judge Evaluation (llm_as_judge.py)

# Use LLM model to judge semantic equivalence
python llm_as_judge.py risultati.json

# Save detailed judgments
python llm_as_judge.py risultati.json -o giudizi_llm.json

LLM Judge features:

Intelligent semantic evaluation
Confidence score for each judgment
Detailed reasoning for decisions
Robust error handling

📁 Project Structure

studentsbot/
├── 🤖 bot_review.py           # Main bot with interface
├── 📊 batch_query.py          # Batch query processing
├── 🕷️ crawler.py              # Web crawler for data collection
├── 📋 extract_queries.py      # Extract questions from Excel
├── 📊 rageval.py              # Complete evaluation (ROUGE, BLEU, etc)
├── 🧠 llm_as_judge.py         # Semantic evaluation with LLM
├── 📁 data/                  # Input and test data
│   ├── 📄 domande chatbot.xlsx  # Excel file with questions
│   └── 📝 queries.txt          # Extracted questions (56 questions)
├── 📁 output_crawler/        # Crawled documents (150+ files)
├── 🗄️ index/                 # FAISS vectorstore (generated)
├── 🐍 venv/                  # Python virtual environment
├── ⚙️ activate_studentsbot.sh # Automatic setup script
├── 📦 requirements.txt       # Python dependencies
├── 📖 README.md              # Documentation
├── 🚫 .gitignore            # Files to ignore
├── ⚙️ .env.example          # Environment variables template
└── 📄 LICENSE               # MIT License

🎯 Available Commands

Main Bot

Command	Description
`python bot_review.py --help`	Show complete help
`python bot_review.py --interactive`	Direct chat (fast)
`python bot_review.py --index_only`	Indexing only
`python bot_review.py`	Guided configuration

Batch Processing

Format	Example
Excel	`python batch_query.py data/domande.xlsx risultati.json`
CSV	`python batch_query.py data/domande.csv risultati.json`
TXT	`python batch_query.py data/queries.txt risultati.csv`

🔧 Configuration

.env File

GOOGLE_API_KEY=your_gemini_api_key_here

Bot Parameters (bot_review.py)

MARKDOWN_DIR = "output_crawler"     # Documents directory
VECTORSTORE_PATH = "index"          # FAISS vectorstore path
MODEL_NAME_LLM = "gemini-2.5-pro"   # Main model
BATCH_SIZE = 100                    # Indexing batch size
BATCH_WAIT = 2                      # Pause between batches (seconds)

🐛 Debug and Development

For code debugging:

Verbose mode: Use --verbose in commands for detailed output
Sample testing: Create test files with few questions for quick debug
Logs: Check error messages in terminal
VSCode: Configure your debug environment as preferred

📊 Sample Questions

"What are the active master's degree courses at Università Cattolica?"
"What exams are there in the first year of Data Analytics?"
"How can I enroll in an Erasmus program?"
"What are the career opportunities for Economics?"

🗂️ Available Data

150+ documents crawled from official website
56 test questions extracted from Excel
Master's degree courses from all campuses
Student services and procedures
International programs and internships

🤝 Contributing

Fork the project
Create a feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

📝 License

This project is released under the MIT License. See the LICENSE file for details.

🆘 Support

If you have problems or questions:

Check the documentation
Search in Issues
Open a new issue if necessary

🙏 Acknowledgments

LangChain for the AI framework
Google for the Gemini API
Università Cattolica for the data

⭐ Star this project if it was useful to you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

StudentsBot 🎓

🎓 Academic Context

🚀 Features

🛠️ Technologies

📦 Installation

Prerequisites

Quick Setup

Manual Setup

🚀 Usage

Interactive Mode

Batch Processing

Web Crawling

📊 Response Evaluation

1. Automatic Evaluation (rageval.py)

2. LLM-as-Judge Evaluation (llm_as_judge.py)

📁 Project Structure

🎯 Available Commands

Main Bot

Batch Processing

🔧 Configuration

.env File

Bot Parameters (bot_review.py)

🐛 Debug and Development

📊 Sample Questions

🗂️ Available Data

🤝 Contributing

📝 License

🆘 Support

🙏 Acknowledgments

About

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
activate_studentsbot.sh		activate_studentsbot.sh
batch_query.py		batch_query.py
bot_review.py		bot_review.py
crawler.py		crawler.py
deactivate_studentsbot.sh		deactivate_studentsbot.sh
extract_queries.py		extract_queries.py
llm_as_judge.py		llm_as_judge.py
llmjudge_results.json		llmjudge_results.json
metrics.txt		metrics.txt
output_final.json		output_final.json
rageval.py		rageval.py
rageval_metrics.json		rageval_metrics.json
requirements.txt		requirements.txt
results.json		results.json

License

nluninja/studentsbot

Folders and files

Latest commit

History

Repository files navigation

StudentsBot 🎓

🎓 Academic Context

🚀 Features

🛠️ Technologies

📦 Installation

Prerequisites

Quick Setup

Manual Setup

🚀 Usage

Interactive Mode

Batch Processing

Web Crawling

📊 Response Evaluation

1. Automatic Evaluation (rageval.py)

2. LLM-as-Judge Evaluation (llm_as_judge.py)

📁 Project Structure

🎯 Available Commands

Main Bot

Batch Processing

🔧 Configuration

.env File

Bot Parameters (bot_review.py)

🐛 Debug and Development

📊 Sample Questions

🗂️ Available Data

🤝 Contributing

📝 License

🆘 Support

🙏 Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages