Evals is a synthetic data generation and evaluation framework for LLMs and RAG applications.
It has 2 main modules:
- datagen
- eval
A high level architecture diagram of evals is the following:
Architecture diagram
To get started with evals, follow these steps:
- Clone the repository to your local machine.
- Install the necessary dependencies by running
pip install -r requirements.txtin the project directory. - Create a copy of
config/config.toml.templateand name itconfig/config.toml. - Update 2 sections in the
config.tomlfile:MISC- Configure your SSL cert file location.
DATAGEN- Set
DATA_DIRvariable controls the location of the data corpus to generate synthetic data from, it’s relative to thedatagen/data/directory. In other words, add your data directories in there and specify their name in the variable. - The
GEN_PROVIDERvariable allows choosing betweenazureorvertex. - Add in the rest variables desired for generative purposes.
- Set
DATAEVALEVAL_TESTSoffers a list of evaluation tests supported by the framework. The possible options areAnswerRelevancy,Hallucination,Faithfulness,Bias,Toxicity,Correctness,Coherence,PromptInjection,PromptBreaking,PromptLeakage.- The
EVAL_RPVODERvariable allows choosing betweenazureorvertex. - Add in the rest of variables required for the model desired to use as judge for evaluations.
To run the synthetic data generation module:
-
Modify/adapt the sample client provided (
datagen/client.py) -
Run
python -m datagen.client -
The synthetically generated data will be stored in the
datagen/qa_out/directory as a CSV file with the format:```csv question,context,ground_truth ```
To run the eval module:
- Modify/adapt the sample client provided (
eval/client.py)- The input data needs to match the format of the data produced by the synthetic data generation (
question,context,ground_truth). - The
ground_truthmay or may not be used depending on the settinguse_answers_from_dataset. When set toFalseit will ignore that data column and generate new outputs using the configured generative model.
- The input data needs to match the format of the data produced by the synthetic data generation (
- Start MlFlow by running:
mlflow ui --port 5000
- Run
python -m eval.client - Monitor and analyse the eval results on your local MlFlow interface here: http://localhost:5000