GitHub

BuzzBot

This is a simple chatbot app that answers all things related to Georgia Tech, everything from housing, registration, to meal plans. It indexes all .gatech.edu domains

Semantic search on sites by name and content
Time-ranged search on pages by update date
Question answering about the contents of a webpage
Support for image/audio content coming soon!

This will be hosted at XXX when ready!

Virtual Environment

Install necessary packages:

pip install -r requirements.txt
conda install anaconda::ffmpeg #for whisper, need the executable

Index Setup

Use the processor to process a folder of files and index them in Elasticsearch: python processor/processor.py. You only need to run this once each time the directory is updated.

Setup OpenAI API Key

Create a .env file in the app directory with your OpenAI API key:

OPENAI_API_KEY=sk-...

Pipeline

In project root directory

Run Scraper: scrapy crawl crawler_spider
Run Visualiser: python visualise/visualiser.py
Download Files: python downloader/main.py
Run Processor: python processor/processor.py
Run Chainlit: cd app && chainlit run app.py -w

The -w flag runs the app in watch mode, which allows you to edit the app and see the changes without restarting the app.

Config Import

import os
import sys
sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..'))
import CONFIG

Scraper README

Project Overview

Crawler crawls the website and stores it as a NDJSON file.
Visualiser creates a map of the crawled webpages.
Proxy is a proxy finder.
Downloader reads the NDJSON file and downloads it.

Requirements

sqlite3
networkx
scrapy
html2text
bs4
dash
pandas

Stop Cralwer

cntrl-c

(Only press it once for a clean shutdown, allowing the "resume" feature to work.)

TODO

Fix proxy code.
Remove headers and footers when converting to text.

Things

docker pull docker.elastic.co/elasticsearch/elasticsearch:8.10.0

docker run -d --name elasticsearch \
  -p 9200:9200 -p 9300:9300 \
  -e "discovery.type=single-node" \
  -e "ES_JAVA_OPTS=-Xms1g -Xmx1g" \
  -e "xpack.security.enabled=false" \
  docker.elastic.co/elasticsearch/elasticsearch:8.10.0

docker start elasticsearch

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
app		app
crawler		crawler
data		data
downloader		downloader
processor		processor
proxy		proxy
test		test
visualise		visualise
.gitignore		.gitignore
CONFIG.py		CONFIG.py
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
scrapy.cfg		scrapy.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BuzzBot

Virtual Environment

Index Setup

Setup OpenAI API Key

Pipeline

Config Import

Scraper README

Project Overview

Requirements

Stop Cralwer

TODO

Things

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

YixiongHao/BuzzBot

Folders and files

Latest commit

History

Repository files navigation

BuzzBot

Virtual Environment

Index Setup

Setup OpenAI API Key

Pipeline

Config Import

Scraper README

Project Overview

Requirements

Stop Cralwer

TODO

Things

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages