Skip to content

pripky/Pulpnet_240221

Repository files navigation

Pulpnet_240221

IITK InfoBot

Pages Scraped:

Here are the sources used to gather the data from-

Institute Counselling Service (ICS)

Vox Populi – IITK 101 Series

IIT Kanpur Official Academic Page

Sources scraped to test but not included in the final dataset due to lack of either relevant or extractable information-

Methods employed:

  • Web Scraping: requests, BeautifulSoup
  • Text Embedding: sentence-transformers (all-MiniLM-L6-v2)
  • Similarity Matching: Cosine similarity with top-k filtering
  • Question Answering:
    • deepset/roberta-base-squad2 (extractive answers)
    • google/flan-t5-base (answer elaboration)
  • Frontend Interface: Streamlit

Instructions on how to run the chatbot:

  1. Clone this repository
    git clone https://github.com/pripky/Pulpnet_240221.git
    cd Pulpnet_240221
  2. Create a virtual environment (recommended)
    python -m venv venv
    venv\Scripts\activate  # On Windows
    source venv/bin/activate  # On Mac/Linux
  3. Install the required packages
    pip install -r requirements.txt
  4. Run with streamlit
    streamlit run app.py

If you're running the chatbot for the first time, some models will be downloaded automatically. Depending on your internet speed, this may take a few minutes.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors