🤖 IT Purple Hack & Central Bank of Russia

Building LLM-based RAG chat-bot for the Central Bank.

Task Description

🦸‍♂️ Team

We are the finalists of this competition. Get to know us:

Navigation

dataset_preprocessing - cleaning the corpus of scraped texts
custom_qa_chain.py - custom question-answer generation chain compatible with LangChain
qa_generation.ipynb - generating questions for validation and finetuning the solution
num2text.ipynb - numbers preprocessing
chroma_db - all files related to launching and populating ChromaDB
clickhouse - writing data to Clickhouse from a csv file
simularity.py - script for finding relevant documents
embeddings.py - obtaining embeddings for the database
gui.ipynb - the main file with the application front-end

Retrieval pipeline

How to interact with LLM?

Starting the server:

Launch the server in LM Studio
choco install ngrok
ngrok config add-authtoken [authtoken]
ngrok http --domain=live-relaxed-oryx.ngrok-free.app 1234

Interacting with LLM https://github.com/Data-Squad-of-Scoofs/cb-purple-hack/blob/db/inference_llm.ipynb

How to interact with the database?

We implemented various options. Eventually, ClickHouse was chosen.

1. ChromaDB

Launching the database in docker:

docker pull chromadb/chroma
docker run -p 8000:8000 chromadb/chroma

Connecting:

pip install chromadb
self.client = chromadb.HttpClient(host=your_ip, port=your_port)

2. ClickHouse

The database was launched via clickhouse.cloud, but it can also be launched via docker. It operates faster than ChromaDB.

Connecting:

pip install clickhouse_connect
client = clickhouse_connect.get_client(host=your_ip, port=your_port, username=your_user_name, password=your_password)

Generating a validation dataset

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
chroma_db		chroma_db
clickhouse		clickhouse
dataset_preprocessing		dataset_preprocessing
README.md		README.md
custom_qa_chain.py		custom_qa_chain.py
embedding.py		embedding.py
gui.ipynb		gui.ipynb
inference_llm.ipynb		inference_llm.ipynb
num2text.py		num2text.py
qa_generation.ipynb		qa_generation.ipynb
requirements.txt		requirements.txt
similarity.py		similarity.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🤖 IT Purple Hack & Central Bank of Russia

🦸‍♂️ Team

Navigation

Retrieval pipeline

How to interact with LLM?

How to interact with the database?

1. ChromaDB

2. ClickHouse

Generating a validation dataset

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🤖 IT Purple Hack & Central Bank of Russia

🦸‍♂️ Team

Navigation

Retrieval pipeline

How to interact with LLM?

How to interact with the database?

1. ChromaDB

2. ClickHouse

Generating a validation dataset

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages