Skip to content

pierrepo/biopyassistant

Repository files navigation

AI-powered conversational agent designed to help biology students learn the Python programming language.

Made with Python BSD-3 Clause License

Introduction

This conversationnal agent (chatbot) is designed to help biology students learn the Python programming language. It is based on the OpenAI models and provides answers to questions related to Python programming.

The chatbot uses the Retrieval-Augmented Generation (RAG) methodology to build its responses from this Python course (Markdown files available here).

Setup

To install BioPyAssistant and its dependencies, you need to perform the following steps:

Clone the repository

git clone https://github.com/pierrepo/biopyassistant.git
cd biopyassistant

Activate the environment

We use uv to manage dependencies and the project environment.

Sync dependencies:

uv sync

Copy the raw Markdown files of the Python course:

git clone --depth 1 https://github.com/bioinfo-prog/cours-python.git
rm -f data/course_raw/*.md
cp cours-python/cours/*.md data/course_raw/
rm -rf cours-python

Process raw Markdown files

rm -f data/course_processed/*.md
uv run src/parse_clean_markdown.py --config data/chapters_and_levels.yaml

In this step, Python comments (#) are slighty changed to avoid confusion with Markdown headers (#, ##...) and headers are numbered (from ## Title to ## 1.1 Title). Processed Markdown files are stored in data/course_processed

Add OpenAI and OpenRouter API key

Create an .env file with a valid OpenAI and OpenRouter API key:

OPENAI_API_KEY=<your-openai-api-key>
OPENROUTER_API_KEY=<your-openrouter-api-key>

Remark: This .env file is ignored by git.

Create the vector database

uv run src/create_database.py --course-yaml data/chapters_and_levels.yaml \
                              --chroma-path chroma_db \
                              --embedding-model text-embedding-3-large \
                              --model-provider openai

This command will create a Chroma vector database from the processed Markdown files. All files will be split into chunks of 1000 characters with an overlap of 200 characters.

Remark: The vector database is saved on disk.

Usage (command line interface)

uv run python src/query_chatbot.py --query "Your question here" \
                  --level "user_level" \
                  --model "model_name" \
                  --provider-llm "provider_name" \
                  --include-metadata

Options

  • 📚 User Level: Specify the user's Python knowledge level to tailor the chatbot's responses. Choose between: beginner, intermediate, advanced.
  • 🤖 Model Selection: Choose the language model for the query. Examples: gpt-4o, deepseek/deepseek-v3.2, etc.
  • 🌐 LLM Provider: Specify the provider of the language model. Choose between: openai, openrouter.
  • 📝 Include Metadata: Include metadata in the response, such as the sources of the answer. By default, metadata is excluded.

Example:

uv run python src/query_chatbot.py --query "What is the difference between list and set ?" \
                  --level "advanced" \
                  --model "gpt-4o" \
                  --provider-llm "openai" \
                  --include-metadata

This command will query the chatbot for a response to the question "What is the difference between list and set ?" for an intermediate user using the gpt-4o model from the openai provider. The response will include metadata about the sources of the answer.

Output:

Query:
What is the difference between list and set ?

Response:
A list is an ordered collection of elements, while a set is an unordered collection of unique elements. In a list, the order of elements is preserved, and duplicate elements are allowed. In contrast, a set does not preserve the order of elements, and duplicate elements are not allowed. Additionally, a set is optimized for membership testing and eliminating duplicate elements, making it more efficient for certain operations than a list.

For more information, you can refer to the following sources:
- Chapter ... (Link to the source : ...)
- Chapter ... (Link to the source : ...)

Usage (web interface)

uv run streamlit run src/streamlit_app.py

This will run the Streamlit app in your web browser.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •