Here are the sources used to gather the data from-
- Games and Sports Council
- MnC Council
- Science and Technology Council
- Academics
- Academics and Career Council
- Campus Life
- Students' Gymkhana
- Course Overview
Sources scraped to test but not included in the final dataset due to lack of either relevant or extractable information-
- VOX IITK
- ICS Homepage
- CSE Department
- EE Faculty
- Academics at IIT Kanpur
- DOAA Homepage
- ICS FAQ
- IITK Main Website
- SPO – Companies FAQ
- SPO – Students FAQ
- PG Manual PDF
- UG Manual PDF
- Web Scraping:
requests,BeautifulSoup - Text Embedding:
sentence-transformers(all-MiniLM-L6-v2) - Similarity Matching: Cosine similarity with top-k filtering
- Question Answering:
deepset/roberta-base-squad2(extractive answers)google/flan-t5-base(answer elaboration)
- Frontend Interface:
Streamlit
- Clone this repository
git clone https://github.com/pripky/Pulpnet_240221.git cd Pulpnet_240221 - Create a virtual environment (recommended)
python -m venv venv venv\Scripts\activate # On Windows source venv/bin/activate # On Mac/Linux
- Install the required packages
pip install -r requirements.txt
- Run with streamlit
streamlit run app.py
If you're running the chatbot for the first time, some models will be downloaded automatically. Depending on your internet speed, this may take a few minutes.