Skip to content
This repository was archived by the owner on May 10, 2024. It is now read-only.

Commit 2874cad

Browse files
authored
Merge branch 'main' into patch-1
2 parents 31ea625 + 09b3a34 commit 2874cad

File tree

14 files changed

+277
-33
lines changed

14 files changed

+277
-33
lines changed

docs/api/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ Chroma currently maintains 1st party clients for Python and Javscript. For other
1111

1212
`Client` - is the object that wraps a connection to a backing Chroma DB
1313

14-
`Collection` - is the object that wraps a collectiom
14+
`Collection` - is the object that wraps a collection
1515

1616

1717
<div class="special_table"></div>

docs/embeddings.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,10 +13,13 @@ Chroma provides lightweight wrappers around popular embedding providers, making
1313
| | Python | JS |
1414
|--------------|-----------|---------------|
1515
| [OpenAI](/embeddings/openai) |||
16+
| [Google Generative AI](/embeddings/google-gemini) |||
1617
| [Cohere](/embeddings/cohere) |||
1718
| [Google PaLM](/embeddings/google-palm) |||
1819
| [Hugging Face](/embeddings/hugging-face) |||
1920
| [Instructor](/embeddings/instructor) |||
21+
| [Hugging Face Embedding Server](/embeddings/hugging-face-embedding-server) |||
22+
| [Jina AI](/embeddings/jinaai) |||
2023

2124
We welcome pull requests to add new Embedding Functions to the community.
2225

@@ -114,7 +117,7 @@ You can create your own embedding function to use with Chroma, it just needs to
114117
from chromadb import Documents, EmbeddingFunction, Embeddings
115118

116119
class MyEmbeddingFunction(EmbeddingFunction):
117-
def __call__(self, texts: Documents) -> Embeddings:
120+
def __call__(self, input: Documents) -> Embeddings:
118121
# embed the documents somehow
119122
return embeddings
120123
```

docs/embeddings/google-gemini.md

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
---
2+
---
3+
4+
# Google Generative AI
5+
6+
import Tabs from '@theme/Tabs';
7+
import TabItem from '@theme/TabItem';
8+
9+
<div class="select-language">Select a language</div>
10+
11+
<Tabs queryString groupId="lang">
12+
<TabItem value="py" label="Python"></TabItem>
13+
<TabItem value="js" label="JavaScript"></TabItem>
14+
</Tabs>
15+
16+
Chroma provides a convenient wrapper around Google's Generative AI embedding API. This embedding function runs remotely on Google's servers, and requires an API key.
17+
18+
You can get an API key by signing up for an account at [Google MakerSuite](https://makersuite.google.com/).
19+
20+
<Tabs queryString groupId="lang" className="hideTabSwitcher">
21+
<TabItem value="py" label="Python">
22+
23+
This embedding function relies on the `google-generativeai` python package, which you can install with `pip install google-generativeai`.
24+
25+
```python
26+
# import
27+
import chromadb
28+
from chromadb.utils import embedding_functions
29+
30+
# use directly
31+
google_ef = embedding_functions.GoogleGenerativeAiEmbeddingFunction(api_key="YOUR_API_KEY")
32+
google_ef(["document1","document2"])
33+
34+
# pass documents to query for .add and .query
35+
collection = client.create_collection(name="name", embedding_function=google_ef)
36+
collection = client.get_collection(name="name", embedding_function=google_ef)
37+
```
38+
39+
You can view a more [complete example](https://github.com/chroma-core/chroma/tree/main/examples/gemini) chatting over documents with Gemini embedding and langauge models.
40+
41+
For more info - please visit the [official Google python docs](https://ai.google.dev/tutorials/python_quickstart).
42+
43+
</TabItem>
44+
<TabItem value="js" label="JavaScript">
45+
46+
This embedding function relies on the `@google/generative-ai` npm package, which you can install with `yarn add @google/generative-ai`.
47+
48+
```javascript
49+
import { ChromaClient, GoogleGenerativeAiEmbeddingFunction } from 'chromadb'
50+
const embedder = new GoogleGenerativeAiEmbeddingFunction({googleApiKey: "<YOUR API KEY>"})
51+
52+
// use directly
53+
const embeddings = await embedder.generate(["document1","document2"])
54+
55+
// pass documents to query for .add and .query
56+
const collection = await client.createCollection({name: "name", embeddingFunction: embedder})
57+
const collectionGet = await client.getCollection({name:"name", embeddingFunction: embedder})
58+
```
59+
60+
You can view a more [complete example using Node](https://github.com/chroma-core/chroma/blob/main/clients/js/examples/node/app.js).
61+
62+
For more info - please visit the [official Google JS docs](https://ai.google.dev/tutorials/node_quickstart).
63+
64+
</TabItem>
65+
66+
</Tabs>
67+
Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
---
2+
---
3+
4+
# Hugging Face Text Embedding Server
5+
6+
import Tabs from '@theme/Tabs';
7+
import TabItem from '@theme/TabItem';
8+
9+
<div class="select-language">Select a language</div>
10+
11+
<Tabs queryString groupId="lang">
12+
<TabItem value="py" label="Python"></TabItem>
13+
<TabItem value="js" label="JavaScript"></TabItem>
14+
</Tabs>
15+
16+
Chroma provides a convenient wrapper for HuggingFace Text Embedding Server, a standalone server that provides text embeddings via a REST API. You can read more about it [**here**](https://github.com/huggingface/text-embeddings-inference).
17+
18+
## Setting Up The Server
19+
20+
To run the embedding server locally you can run the following command from the root of the Chroma repository. The docker compose command will run Chroma and the embedding server together.
21+
22+
```bash
23+
docker compose -f examples/server_side_embeddings/huggingface/docker-compose.yml up -d
24+
```
25+
26+
or
27+
28+
```bash
29+
docker run -p 8001:80 -d -rm --name huggingface-embedding-server ghcr.io/huggingface/text-embeddings-inference:cpu-0.3.0 --model-id BAAI/bge-small-en-v1.5 --revision -main
30+
```
31+
32+
:::note
33+
The above docker command will run the server with the `BAAI/bge-small-en-v1.5` model. You can find more information about running the server in docker [**here**](https://github.com/huggingface/text-embeddings-inference#docker).
34+
:::
35+
36+
## Usage
37+
38+
<Tabs queryString groupId="lang" className="hideTabSwitcher">
39+
<TabItem value="py" label="Python">
40+
41+
This embedding function relies on the `requests` python package, which you can install with `pip install requests`.
42+
43+
```python
44+
from chromadb.utils.embedding_functions import HuggingFaceEmbeddingServer
45+
huggingface_ef = HuggingFaceEmbeddingServer(url="http://localhost:8001/embed")
46+
```
47+
48+
The embedding model is configured on the server side. Check the docker-compose file in `examples/server_side_embeddings/huggingface/docker-compose.yml` for an example of how to configure the server.
49+
50+
</TabItem>
51+
<TabItem value="js" label="JavaScript">
52+
53+
54+
```javascript
55+
import {HuggingFaceEmbeddingServerFunction} from 'chromadb';
56+
const embedder = new HuggingFaceEmbeddingServerFunction({url:"http://localhost:8001/embed"})
57+
58+
// use directly
59+
const embeddings = embedder.generate(["document1","document2"])
60+
61+
// pass documents to query for .add and .query
62+
const collection = await client.createCollection({name: "name", embeddingFunction: embedder})
63+
const collection = await client.getCollection({name: "name", embeddingFunction: embedder})
64+
```
65+
66+
</TabItem>
67+
</Tabs>

docs/embeddings/jinaai.md

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
---
2+
---
3+
4+
# Jina AI
5+
6+
import Tabs from '@theme/Tabs';
7+
import TabItem from '@theme/TabItem';
8+
9+
<div class="select-language">Select a language</div>
10+
11+
<Tabs queryString groupId="lang">
12+
<TabItem value="py" label="Python"></TabItem>
13+
<TabItem value="js" label="JavaScript"></TabItem>
14+
</Tabs>
15+
16+
Chroma provides a convenient wrapper around JinaAI's embedding API. This embedding function runs remotely on JinaAI's servers, and requires an API key. You can get an API key by signing up for an account at [JinaAI](https://jina.ai/embeddings/).
17+
18+
<Tabs queryString groupId="lang" className="hideTabSwitcher">
19+
<TabItem value="py" label="Python">
20+
21+
This embedding function relies on the `requests` python package, which you can install with `pip install requests`.
22+
23+
```python
24+
jinaai_ef = embedding_functions.JinaEmbeddingFunction(
25+
api_key="YOUR_API_KEY",
26+
model_name="jina-embeddings-v2-base-en"
27+
)
28+
jinaai_ef(input=["This is my first text to embed", "This is my second document"])
29+
```
30+
31+
You can pass in an optional `model_name` argument, which lets you choose which Jina model to use. By default, Chroma uses `jina-embedding-v2-base-en`.
32+
33+
</TabItem>
34+
<TabItem value="js" label="JavaScript">
35+
36+
```javascript
37+
const {JinaEmbeddingFunction} = require('chromadb');
38+
const embedder = new JinaEmbeddingFunction({
39+
jinaai_api_key: 'jina_****',
40+
model_name: 'jina-embeddings-v2-base-en',
41+
});
42+
43+
// use directly
44+
const embeddings = embedder.generate(['document1', 'document2']);
45+
46+
// pass documents to query for .add and .query
47+
const collection = await client.createCollection({name: "name", embeddingFunction: embedder})
48+
const collectionGet = await client.getCollection({name:"name", embeddingFunction: embedder})
49+
```
50+
</TabItem>
51+
</Tabs>

docs/getting-started.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -224,7 +224,7 @@ Find [chromadb on npm](https://www.npmjs.com/package/chromadb).
224224
## 📚 Next steps
225225

226226
- Chroma is designed to be simple enough to get started with quickly and flexible enough to meet many use-cases. You can use your own embedding models, query Chroma with your own embeddings, and filter on metadata. To learn more about Chroma, check out the [Usage Guide](./usage-guide.md) and [API Reference](./api-reference.md).
227-
- Chroma is integrated in [LangChain](https://python.langchain.com/en/latest/modules/indexes/vectorstores.html?highlight=chroma#langchain.vectorstores.Chroma) (`python` and `js`), making it easy to build AI applications with Chroma. Check out the [integrations](./integrations.md) page to learn more.
227+
- Chroma is integrated in [LangChain](https://python.langchain.com/en/latest/modules/indexes/vectorstores.html?highlight=chroma#langchain.vectorstores.Chroma) (`python` and `js`), making it easy to build AI applications with Chroma. Check out the [integrations](./integrations) page to learn more.
228228
- You can [deploy a persistent instance](./deployment) of Chroma to an external server, to make it easier to work on larger projects or with a team.
229229

230230
## Coming Soon

docs/integrations/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ We welcome pull requests to add new Integrations to the community.
1717
| [🦜️🔗 Langchain](/integrations/langchain) |||
1818
| [🦙 LlamaIndex](/integrations/llama-index) || :soon: |
1919
| [Braintrust](/integrations/braintrust) |||
20+
| [🔭 OpenLLMetry](/integrations/openllmetry) || :soon: |
2021

2122
*Coming soon* - integrations with LangSmith, JinaAI, and more.
2223

docs/integrations/openllmetry.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
---
2+
slug: /integrations/openllmetry
3+
title: 🔭 OpenLLMetry
4+
---
5+
6+
## 🔭 OpenLLMetry
7+
8+
[OpenLLMetry](https://www.traceloop.com/openllmetry) provides observability for systems using Chroma. It allows tracing calls to Chroma, OpenAI, and other services.
9+
It gives visibility to query and index calls as well as LLM prompts and completions.
10+
For more information on how to use OpenLLMetry, see the [OpenLLMetry docs](https://www.traceloop.com/docs/openllmetry).
11+
12+
<img src="/img/openllmetry.png" />
13+
14+
### Example
15+
16+
Install OpenLLMetry SDK by running:
17+
18+
```bash
19+
pip install traceloop-sdk
20+
```
21+
22+
Then, initialize the SDK in your application:
23+
24+
```python
25+
from traceloop.sdk import Traceloop
26+
27+
Traceloop.init()
28+
```
29+
30+
### Configuration
31+
32+
OpenLLMetry can be configured to send traces to any observability platform that supports OpenTelemetry - Datadog, Honeycomb, Dynatrace, New Relic, etc. See the [OpenLLMetry docs](https://www.traceloop.com/openllmetry/provider/chroma) for more information.

docs/observability.md

Lines changed: 25 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,21 +5,43 @@ title: "👀 Observability"
55

66
# 👀 Observability
77

8+
## Backend Observability
9+
810
Chroma is instrumented with [OpenTelemetry](https://opentelemetry.io/) hooks for observability.
911

1012
:::tip Telemetry vs Observability
1113
"[Telemetry](/telemetry)" refers to anonymous product usage statistics we collect. "Observability" refers to metrics, logging, and tracing which can be used by anyone operating a Chroma deployment. Observability features listed on this page are **never** sent back to Chroma; they are for end-users to better understand how their Chroma deployment is behaving.
1214
:::
1315

14-
## Available Observability
16+
### Available Observability
1517

16-
Chroma currently only exports OpenTelemtry [traces](https://opentelemetry.io/docs/concepts/signals/traces/). Traces allow a Chroma operator to understand how requests flow through the system and quickly identify bottlenecks.
18+
Chroma currently only exports OpenTelemetry [traces](https://opentelemetry.io/docs/concepts/signals/traces/). Traces allow a Chroma operator to understand how requests flow through the system and quickly identify bottlenecks.
1719

18-
## Configuration
20+
### Configuration
1921

2022
Tracing is configured with four environment variables:
2123

2224
- `CHROMA_OTEL_COLLECTION_ENDPOINT`: where to send observability data. Example: `api.honeycomb.com`.
2325
- `CHROMA_OTEL_SERVICE_NAME`: Service name for OTel traces. Default: `chromadb`.
2426
- `CHROMA_OTEL_COLLECTION_HEADERS`: Headers to use when sending observability data. Often used to send API and app keys.
2527
- `CHROMA_OTEL_GRANULARITY`: A value from the [OpenTelemetryGranularity enum](https://github.com/chroma-core/chroma/tree/main/chromadb/telemetry/opentelemetry/__init__.py). Specifies how detailed tracing should be.
28+
29+
## Local Observability Stack (🐳👀📚)
30+
31+
Chroma also comes with a local observability stack. The stack is composed of Chroma Server (the one you know and ❤️), [OpenTelemetry Collector](https://github.com/open-telemetry/opentelemetry-collector), and [Zipkin](https://zipkin.io/).
32+
33+
To start the stack, run from the root of the repo:
34+
35+
```bash
36+
docker compose -f examples/observability/docker-compose.local-observability.yml up --build -d
37+
```
38+
39+
Once the stack is running, you can access Zipkin at http://localhost:9411
40+
41+
:::tip Traces
42+
Traces in Zipkin will start appearing after you make a request to Chroma.
43+
:::
44+
45+
## Client (SDK) Observability
46+
47+
See [OpenLLMetry Integration](/integrations/openllmetry).

docs/reference/Collection.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -51,8 +51,7 @@ Add embeddings to the data store.
5151
- `ValueError` - If you don't provide either embeddings or documents
5252
- `ValueError` - If the length of ids, embeddings, metadatas, or documents don't match
5353
- `ValueError` - If you don't provide an embedding function and don't provide embeddings
54-
- `ValueError` - If you provide both embeddings and documents
55-
- `ValueError` - If you provide an id that already exists
54+
- `DuplicateIDError` - If you provide an id that already exists
5655

5756
### get
5857

0 commit comments

Comments
 (0)