Building LLM-based RAG chat-bot for the Central Bank.
We are the finalists of this competition. Get to know us:
dataset_preprocessing- cleaning the corpus of scraped textscustom_qa_chain.py- custom question-answer generation chain compatible with LangChainqa_generation.ipynb- generating questions for validation and finetuning the solutionnum2text.ipynb- numbers preprocessingchroma_db- all files related to launching and populating ChromaDBclickhouse- writing data to Clickhouse from a csv filesimularity.py- script for finding relevant documentsembeddings.py- obtaining embeddings for the databasegui.ipynb- the main file with the application front-end
Starting the server:
- Launch the server in LM Studio
choco install ngrokngrok config add-authtoken [authtoken]ngrok http --domain=live-relaxed-oryx.ngrok-free.app 1234
Interacting with LLM https://github.com/Data-Squad-of-Scoofs/cb-purple-hack/blob/db/inference_llm.ipynb
We implemented various options. Eventually, ClickHouse was chosen.
Launching the database in docker:
docker pull chromadb/chroma
docker run -p 8000:8000 chromadb/chroma
Connecting:
pip install chromadb
self.client = chromadb.HttpClient(host=your_ip, port=your_port)
The database was launched via clickhouse.cloud, but it can also be launched via docker. It operates faster than ChromaDB.
Connecting:
pip install clickhouse_connect
client = clickhouse_connect.get_client(host=your_ip, port=your_port, username=your_user_name, password=your_password)


