This repository contains the implementation of chatHPC pipeline, including the end-to-end velopment and deployment cycle of LLM applications on HPC.
HPC documentations and OLCF help tickets (contains private and sentitive information)
- OLCF: https://github.com/olcf/olcf-user-docs
- ALCF: https://github.com/argonne-lcf/user-guides
- NERSC: https://gitlab.com/NERSC/nersc.gitlab.io
The pre-processing scripts include for both documents and tickets.
The pre-training on HPC documents uses FORGE, and a sample input data is provided. The model configuration remains the same as the Forge-13B.
The instruction set for fine-tuning is generated following SI^2 method, which is detailed here.
This step includes supervised fine-tuning (SFT), reward modeling, and reinforcement learning with human feedback (RLHF), which is detailed here
Our setup includes following,
- Generate the embedding database on the HPC documents with this script
- Test the retrieval QA with LlamaCPP on Frontier
- RAG with Forge model on Frontier and FastChat
The prompt with retrieved documents as context follows,
Answer the question at the end. Use following input if the answer is related and be verbatim about links. Ignore the input if it is not related.
Input:
{docs}
{prompt}Both web and API interfaces to fine-tuned models are provided. The detailed steps are provided here.
We evaluate following component of the pipeline
- model performance
- retriever accuracy
- continuous learning
The detailed steps are provided here
The plots of the results are generated using this script, and the corresponding raw job logs can be downloaded
Fine-tuned FORGE-13B model (HF format), Retriever (based on UAE), and embedding database (ChromaDB format) on HPC documents and OLCF help tickets:
| Model/DB | Link |
|---|---|
| FORGE-13B | download |
| Retriever | download |
| emb DB | download |
@article{Yin2024,
author = {Junqi Yin and Jesse Hines and Emily Herron and Tirthankar Ghosal and Hong Liu and Suzanne Prentice and Vanessa Lama and Feiyi Wang},
title = {chatHPC: Empowering HPC users with large language models},
journal = {The Journal of Supercomputing},
volume = {81},
number = {1},
pages = {194},
year = {2024},
month = {November},
doi = {10.1007/s11227-024-06637-1},
url = {https://doi.org/10.1007/s11227-024-06637-1},
issn = {1573-0484}
}
