Add code example

davidateg · davidateg · commit 58549d8cf779 · 2023-11-16T13:37:26.000-08:00
diff --git a/docs/integrations/braintrust.md b/docs/integrations/braintrust.md
@@ -5,8 +5,57 @@ title: Braintrust
 
 [Braintrust](braintrustdata.com) is the enterprise-grade stack for building AI products. They provide tools including: evaluations, prompt playground, dataset management, tracing, etc.
 
-It's easy to use Braintrust to evaluate AI apps built with Chroma. Braintrust provides a Typescript and Python library to run and log evaluations.
+It's easy to use Braintrust to evaluate AI retrieval apps built with Chroma. Braintrust provides a Typescript and Python library to run and log evaluations.
 
 - [Tutorial: Evaluate Chroma Retrieval app w/ Braintrust](https://www.braintrustdata.com/docs/examples/rag)
 
+Example evaluation script in Python:
+(refer to the tutorial above to get the full implementation)
+```python
+from autoevals.llm import *
+from braintrust import Eval
+
+PROJECT_NAME="Chroma_Eval"
+
+from openai import OpenAI
+
+client = OpenAI()
+leven_evaluator = LevenshteinScorer()
+
+async def pipeline_a(input, hooks=None):
+    # Get a relevant fact from Chroma
+    relevant = collection.query(
+        query_texts=[input],
+        n_results=1,
+    )
+    relevant_text = ','.join(relevant["documents"][0])
+    prompt = """
+        You are an assistant called BT. Help the user.
+        Relevant information: {relevant}
+        Question: {question}
+        Answer:
+        """.format(question=input, relevant=relevant_text)
+    messages = [{"role": "system", "content": prompt}]
+    response = client.chat.completions.create(
+        model="gpt-3.5-turbo",
+        messages=messages,
+        temperature=0,
+        max_tokens=100,
+    )
+
+    result = response.choices[0].message.content
+    return result
+
+# Run an evaluation and log to Braintrust
+await Eval(
+    PROJECT_NAME,
+    # define your test cases
+    data = lambda:[{"input": "What is my eye color?", "expected": "Brown"}], 
+    # define your retrieval pipeline w/ Chroma above
+    task = pipeline_a, 
+    # use a prebuilt scoring function or define your own :)
+    scores=[leven_evaluator], 
+)
+```
+
 Learn more on their [docs](https://www.braintrustdata.com/docs).