NUBot is an intelligent chatbot designed to assist students and visitors with queries related to Northeastern University, such as courses, faculty, co-op opportunities, and more. It utilizes a Retrieval-Augmented Generation (RAG) approach to provide instant, accurate responses.
Before setting up the project, install the necessary dependencies and software.
- Install the Python debugger extension in VS Code:
- Instant responses to academic-related queries.
- Scalable and efficient system for handling high query volumes.
- Continuous updates via cloud deployment.
- Download Ollama software.
- Install the software on your system.
- Open a terminal or command prompt.
- Run the following command:
This will download the model locally.
ollama pull mistral
- Open a terminal.
- Clone the Git repository:
git clone <https://github.com/udishadc/NUBot>
- Navigate to the cloned repository directory.
- If you encounter metadata generation issues, visit rustup.rs, copy the suggested command, and run it in your terminal. As rust might be required for sentence transformers or airflow
- Run the following command to install all necessary dependencies:
pip install .
- Open a terminal.
- Run one of the following commands:
or
python -m src.backend.api
This starts the backend service using Flask.python3 -m src.backend.api
- Open another terminal.
- Run the following command:
streamlit run src/frontend/app.py
- Open a browser and navigate to:
This will launch the frontend interface.
localhost:8501
- Enter a query and wait for the response.
- Airflow requires Docker and takes too long to run.
- To install and run Airflow:
- Install Docker.
- Follow Airflow setup steps.
- Airflow can only run the DAG flow for web scraping but will not create embeddings for the corpus.
Prefect is platform-independent (Windows, macOS, Linux) and lightweight.
- Open a terminal.
- Run the following command:
prefect server start
- Open a browser and navigate to:
This opens the Prefect UI to monitor the workflows.
localhost:4200
- Open another terminal.
- Execute the flow file:
python -m src.prefectWorkflows.scraper_flow
- This runs the scraping workflow, which can be monitored in Prefect UI.
if run succesfully it will show as below

- Prefect flows run manually by default.
- After deployment, you can schedule the workflow using Prefect Cloud.
- Open a terminal.
- Run the following command to test the scraper workflow:
python -m unittest tests.test_scraper_flow
- Run the following command to test the preprocessing of data:
python -m unittest tests.test_preprocess_data
- unittest: A built-in Python testing framework used for writing and running test cases.
- mock: Part of Pythonโs
unittest.mockmodule, used to simulate dependencies and isolate test cases. - MagicMock: A powerful feature of
mockthat allows the simulation of objects, methods, and their return values during testing.
These tests help ensure the correctness of data preprocessing and workflow execution in the pipeline.
- Install and open Docker.
- Run the following command to build the project:
docker compose -f 'docker-compose-airflow.yaml' build - Initialize Airflow:
docker compose -f 'docker-compose-airflow.yaml' up airflow-init - Start Airflow:
docker compose -f 'docker-compose-airflow.yaml' up - Open a browser and navigate to
localhost:8080. - Locate the DAG "web_scraping", run it, and wait until the status shows Success (dark green color).
To stop Airflow, open a new terminal and run:
docker compose -f 'docker-compose-airflow.yaml' down-
Start in detached mode:
docker compose -f 'docker-compose-airflow.yaml' up -d -
Run:
docker compose -f 'docker-compose-airflow.yaml' up
To install Prefect with all dependencies, run:
pip install -U prefectStart the Prefect UI server on port 4200 by running:
prefect server startOnce started, access the UI at: http://localhost:4200
Run the DAG script in src/prefectWorkflows using one of the following commands:
python scraper_flow.py
# OR
python -m src.prefectWorkflows.scraper_flowAfter execution, refresh the Prefect UI at http://localhost:4200 to see the running DAG.
To deploy and run workflows anywhere, first log in to Prefect Cloud:
prefect cloud loginThen, register the flows and deploy them accordingly.
By following these steps, you can efficiently run, test, and manage your Prefect workflows both locally and in the cloud.
NuBot utilizes the Python logging module for tracking events and debugging information across different services. Additionally, we have implemented custom exception handling to manage errors efficiently and provide meaningful error messages.
We have integrated logging into all major services, ensuring that all critical operations are recorded. The following services include dedicated logging mechanisms:
- Frontend
- Backend
- Data Processing
- Other Processing Services
All logs generated by NuBot can be found at:
NuBot/logs
These logs help in debugging issues, monitoring application performance, and ensuring system reliability.
We use the logging module to capture important information, including:
- INFO: General application flow (e.g., API calls, processing steps)
- WARNING: Potential issues that may require attention
- ERROR: Errors that prevent certain operations from completing
- DEBUG: Detailed information useful for debugging purposes
Each service contains a dedicated logging file where all events are recorded systematically.
NuBot has a structured exception handling framework to ensure robust error management. Each service includes an exception handling module that captures and processes errors efficiently.
We have implemented a CustomException class to manage unexpected errors across different components. This ensures that:
- Errors are logged appropriately
- Useful error messages are provided
- The system remains stable and operational
If an exception occurs, it is logged along with relevant details:
try:
# Some code execution
except Exception as e:
logging.error(f"An error occurred: {str(e)}")
raise CustomException(e, sys)This structured approach ensures better traceability of issues and easier debugging.
- Logging is implemented across all major services (Frontend, Backend, Data Processing, etc.).
- Logs can be accessed at
NuBot/logs. - Custom exception handling ensures robust error management.
- All critical events and errors are recorded systematically.
Before setting up DVC with Google Cloud Storage (GCS), ensure you have the following:
- Google Cloud SDK installed and authenticated.
- A Google Cloud Storage (GCS) bucket created.
- DVC installed (
pip install dvc[gdrive]for Google Drive orpip install dvc[gs]for Google Cloud Storage). - Service account key (if using authentication via service account).
# Authenticate with Google Cloud
gcloud auth application-default login-
Generate a service account key from the Google Cloud Console:
- Go to IAM & Admin > Service Accounts.
- Select or create a service account with Storage Admin permissions.
- Generate a JSON key and download it.
-
Set the environment variable:
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/service-account-key.json"gcloud auth application-default login unless you use a service account or other ways to authenticate (more info).
To use custom authentication or further configure your DVC remote, set any supported config parameter with:
dvc remote modify --local myremote credentialpath 'path/to/project-XXX.json'If you haven't initialized DVC in your project, do so with:
dvc init- Run the following command to set up a GCS bucket as a DVC remote:
dvc remote add -d gcsremote gs://your-gcs-bucket-name- Verify the configuration by checking
.dvc/config: keep your bucket name as modelembeddings
[core]
remote = gcsremote
[remote "gcsremote"]
url = gs://<your-gcs-bucket-name>Once the remote storage is set up, you can push your dataset or model files to GCS using DVC:
dvc add faiss_index/
git add faiss_index.dvc .gitignore
git commit -m "Track data with DVC"
dvc pushTo retrieve the dataset from the GCS remote, use:
dvc pullTo monitor experiments and model versions, run MLflow UI in a separate terminal:
mlflow ui- Ensure you have Storage Admin access to the GCS bucket.
- If authentication fails, try re-authenticating using:
gcloud auth application-default login
- If using a service account, verify the JSON key path is correct:
echo $GOOGLE_APPLICATION_CREDENTIALS
You're all set! ๐ Your DVC is now connected to Google Cloud Storage for versioning your datasets and models.
By leveraging logging and structured exception handling, NuBot maintains stability, traceability, and better error resolution. ๐