This is a simple chatbot app that answers all things related to Georgia Tech, everything from housing, registration, to meal plans. It indexes all .gatech.edu domains
- Semantic search on sites by name and content
- Time-ranged search on pages by update date
- Question answering about the contents of a webpage
- Support for image/audio content coming soon!
This will be hosted at XXX when ready!
Install necessary packages:
pip install -r requirements.txt
conda install anaconda::ffmpeg #for whisper, need the executableUse the processor to process a folder of files and index them in Elasticsearch: python processor/processor.py. You only need to run this once each time the directory is updated.
Create a .env file in the app directory with your OpenAI API key:
OPENAI_API_KEY=sk-...
In project root directory
- Run Scraper:
scrapy crawl crawler_spider - Run Visualiser:
python visualise/visualiser.py - Download Files:
python downloader/main.py - Run Processor:
python processor/processor.py - Run Chainlit:
cd app && chainlit run app.py -w
The -w flag runs the app in watch mode, which allows you to edit the app and see the changes without restarting the app.
import os
import sys
sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..'))
import CONFIG
- Crawler crawls the website and stores it as a NDJSON file.
- Visualiser creates a map of the crawled webpages.
- Proxy is a proxy finder.
- Downloader reads the NDJSON file and downloads it.
- sqlite3
- networkx
- scrapy
- html2text
- bs4
- dash
- pandas
cntrl-c
- (Only press it once for a clean shutdown, allowing the "resume" feature to work.)
- Fix proxy code.
- Remove headers and footers when converting to text.
docker pull docker.elastic.co/elasticsearch/elasticsearch:8.10.0
docker run -d --name elasticsearch \
-p 9200:9200 -p 9300:9300 \
-e "discovery.type=single-node" \
-e "ES_JAVA_OPTS=-Xms1g -Xmx1g" \
-e "xpack.security.enabled=false" \
docker.elastic.co/elasticsearch/elasticsearch:8.10.0
docker start elasticsearch