Merge branch 'main' into patch-1

jeffchuber · web-flow · commit 2874cad12346 · 2023-12-19T09:27:40.000-08:00
diff --git a/docs/api/index.md b/docs/api/index.md
@@ -11,7 +11,7 @@ Chroma currently maintains 1st party clients for Python and Javscript. For other
 
 `Client` - is the object that wraps a connection to a backing Chroma DB
 
-`Collection` - is the object that wraps a collectiom
+`Collection` - is the object that wraps a collection
 
 
 <div class="special_table"></div>
diff --git a/docs/embeddings.md b/docs/embeddings.md
@@ -13,10 +13,13 @@ Chroma provides lightweight wrappers around popular embedding providers, making
 |              | Python | JS |
 |--------------|-----------|---------------|
 | [OpenAI](/embeddings/openai) | ✅  | ✅ |
+| [Google Generative AI](/embeddings/google-gemini) | ✅  | ✅ |
 | [Cohere](/embeddings/cohere) | ✅  | ✅ |
 | [Google PaLM](/embeddings/google-palm) | ✅  | ➖ |
 | [Hugging Face](/embeddings/hugging-face) | ✅  | ➖ |
 | [Instructor](/embeddings/instructor) | ✅  | ➖ |
+| [Hugging Face Embedding Server](/embeddings/hugging-face-embedding-server) | ✅  | ✅ |
+| [Jina AI](/embeddings/jinaai) | ✅  | ✅ |
 
 We welcome pull requests to add new Embedding Functions to the community.
 
@@ -114,7 +117,7 @@ You can create your own embedding function to use with Chroma, it just needs to
 from chromadb import Documents, EmbeddingFunction, Embeddings
 
 class MyEmbeddingFunction(EmbeddingFunction):
-    def __call__(self, texts: Documents) -> Embeddings:
+    def __call__(self, input: Documents) -> Embeddings:
         # embed the documents somehow
         return embeddings
 ```
diff --git a/docs/embeddings/google-gemini.md b/docs/embeddings/google-gemini.md
@@ -0,0 +1,67 @@
+---
+---
+
+# Google Generative AI
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+<div class="select-language">Select a language</div>
+
+<Tabs queryString groupId="lang">
+<TabItem value="py" label="Python"></TabItem>
+<TabItem value="js" label="JavaScript"></TabItem>
+</Tabs>
+
+Chroma provides a convenient wrapper around Google's Generative AI embedding API. This embedding function runs remotely on Google's servers, and requires an API key. 
+
+You can get an API key by signing up for an account at [Google MakerSuite](https://makersuite.google.com/).
+
+<Tabs queryString groupId="lang" className="hideTabSwitcher">
+<TabItem value="py" label="Python">
+
+This embedding function relies on the `google-generativeai` python package, which you can install with `pip install google-generativeai`.
+
+```python
+# import
+import chromadb
+from chromadb.utils import embedding_functions
+
+# use directly
+google_ef  = embedding_functions.GoogleGenerativeAiEmbeddingFunction(api_key="YOUR_API_KEY")
+google_ef(["document1","document2"])
+
+# pass documents to query for .add and .query
+collection = client.create_collection(name="name", embedding_function=google_ef)
+collection = client.get_collection(name="name", embedding_function=google_ef)
+```
+
+You can view a more [complete example](https://github.com/chroma-core/chroma/tree/main/examples/gemini) chatting over documents with Gemini embedding and langauge models.
+
+For more info - please visit the [official Google python docs](https://ai.google.dev/tutorials/python_quickstart).
+
+</TabItem>
+<TabItem value="js" label="JavaScript">
+
+This embedding function relies on the `@google/generative-ai` npm package, which you can install with `yarn add @google/generative-ai`.
+
+```javascript
+import { ChromaClient, GoogleGenerativeAiEmbeddingFunction } from 'chromadb'
+const embedder = new GoogleGenerativeAiEmbeddingFunction({googleApiKey: "<YOUR API KEY>"})
+
+// use directly 
+const embeddings = await embedder.generate(["document1","document2"])
+
+// pass documents to query for .add and .query
+const collection = await client.createCollection({name: "name", embeddingFunction: embedder})
+const collectionGet = await client.getCollection({name:"name", embeddingFunction: embedder})
+```
+
+You can view a more [complete example using Node](https://github.com/chroma-core/chroma/blob/main/clients/js/examples/node/app.js).
+
+For more info - please visit the [official Google JS docs](https://ai.google.dev/tutorials/node_quickstart).
+
+</TabItem>
+
+</Tabs>
+
diff --git a/docs/embeddings/hugging-face-embedding-server.md b/docs/embeddings/hugging-face-embedding-server.md
@@ -0,0 +1,67 @@
+---
+---
+
+# Hugging Face Text Embedding Server
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+<div class="select-language">Select a language</div>
+
+<Tabs queryString groupId="lang">
+<TabItem value="py" label="Python"></TabItem>
+<TabItem value="js" label="JavaScript"></TabItem>
+</Tabs>
+
+Chroma provides a convenient wrapper for HuggingFace Text Embedding Server, a standalone server that provides text embeddings via a REST API. You can read more about it [**here**](https://github.com/huggingface/text-embeddings-inference).
+
+## Setting Up The Server
+
+To run the embedding server locally you can run the following command from the root of the Chroma repository. The docker compose command will run Chroma and the embedding server together.
+
+```bash
+docker compose -f examples/server_side_embeddings/huggingface/docker-compose.yml up -d
+```
+
+or
+
+```bash
+docker run -p 8001:80 -d -rm --name huggingface-embedding-server ghcr.io/huggingface/text-embeddings-inference:cpu-0.3.0 --model-id BAAI/bge-small-en-v1.5 --revision -main
+```
+
+:::note
+The above docker command will run the server with the `BAAI/bge-small-en-v1.5` model. You can find more information about running the server in docker [**here**](https://github.com/huggingface/text-embeddings-inference#docker).
+:::
+
+## Usage
+
+<Tabs queryString groupId="lang" className="hideTabSwitcher">
+<TabItem value="py" label="Python">
+
+This embedding function relies on the `requests` python package, which you can install with `pip install requests`.
+
+```python
+from chromadb.utils.embedding_functions import HuggingFaceEmbeddingServer
+huggingface_ef = HuggingFaceEmbeddingServer(url="http://localhost:8001/embed")
+```
+
+The embedding model is configured on the server side. Check the docker-compose file in `examples/server_side_embeddings/huggingface/docker-compose.yml` for an example of how to configure the server.
+
+</TabItem>
+<TabItem value="js" label="JavaScript">
+
+
+```javascript
+import  {HuggingFaceEmbeddingServerFunction} from 'chromadb';
+const embedder = new HuggingFaceEmbeddingServerFunction({url:"http://localhost:8001/embed"})
+
+// use directly 
+const embeddings = embedder.generate(["document1","document2"])
+
+// pass documents to query for .add and .query
+const collection = await client.createCollection({name: "name", embeddingFunction: embedder})
+const collection = await client.getCollection({name: "name", embeddingFunction: embedder})
+```
+
+</TabItem>
+</Tabs>
diff --git a/docs/embeddings/jinaai.md b/docs/embeddings/jinaai.md
@@ -0,0 +1,51 @@
+---
+---
+
+# Jina AI
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+<div class="select-language">Select a language</div>
+
+<Tabs queryString groupId="lang">
+<TabItem value="py" label="Python"></TabItem>
+<TabItem value="js" label="JavaScript"></TabItem>
+</Tabs>
+
+Chroma provides a convenient wrapper around JinaAI's embedding API. This embedding function runs remotely on JinaAI's servers, and requires an API key. You can get an API key by signing up for an account at [JinaAI](https://jina.ai/embeddings/).
+
+<Tabs queryString groupId="lang" className="hideTabSwitcher">
+<TabItem value="py" label="Python">
+
+This embedding function relies on the `requests` python package, which you can install with `pip install requests`.
+
+```python
+jinaai_ef = embedding_functions.JinaEmbeddingFunction(
+                api_key="YOUR_API_KEY",
+                model_name="jina-embeddings-v2-base-en"
+            )
+jinaai_ef(input=["This is my first text to embed", "This is my second document"])
+```
+
+You can pass in an optional `model_name` argument, which lets you choose which Jina model to use. By default, Chroma uses `jina-embedding-v2-base-en`.
+
+</TabItem>
+<TabItem value="js" label="JavaScript">
+
+```javascript
+const {JinaEmbeddingFunction} = require('chromadb');
+const embedder = new JinaEmbeddingFunction({
+  jinaai_api_key: 'jina_****',
+  model_name: 'jina-embeddings-v2-base-en',
+});
+
+// use directly
+const embeddings = embedder.generate(['document1', 'document2']);
+
+// pass documents to query for .add and .query
+const collection = await client.createCollection({name: "name", embeddingFunction: embedder})
+const collectionGet = await client.getCollection({name:"name", embeddingFunction: embedder})
+```
+</TabItem>
+</Tabs>
diff --git a/docs/getting-started.md b/docs/getting-started.md
@@ -224,7 +224,7 @@ Find [chromadb on npm](https://www.npmjs.com/package/chromadb).
 ## 📚 Next steps
 
 - Chroma is designed to be simple enough to get started with quickly and flexible enough to meet many use-cases. You can use your own embedding models, query Chroma with your own embeddings, and filter on metadata. To learn more about Chroma, check out the [Usage Guide](./usage-guide.md) and [API Reference](./api-reference.md).
-- Chroma is integrated in [LangChain](https://python.langchain.com/en/latest/modules/indexes/vectorstores.html?highlight=chroma#langchain.vectorstores.Chroma) (`python` and `js`), making it easy to build AI applications with Chroma. Check out the [integrations](./integrations.md) page to learn more.
+- Chroma is integrated in [LangChain](https://python.langchain.com/en/latest/modules/indexes/vectorstores.html?highlight=chroma#langchain.vectorstores.Chroma) (`python` and `js`), making it easy to build AI applications with Chroma. Check out the [integrations](./integrations) page to learn more.
 - You can [deploy a persistent instance](./deployment) of Chroma to an external server, to make it easier to work on larger projects or with a team.
 
 ## Coming Soon
diff --git a/docs/integrations/index.md b/docs/integrations/index.md
@@ -17,6 +17,7 @@ We welcome pull requests to add new Integrations to the community.
 | [🦜️🔗 Langchain](/integrations/langchain) | ✅  | ✅ |
 | [🦙 LlamaIndex](/integrations/llama-index) | ✅  | :soon: |
 | [Braintrust](/integrations/braintrust) | ✅  | ✅ |
+| [🔭 OpenLLMetry](/integrations/openllmetry) | ✅     | :soon: |
 
 *Coming soon* - integrations with LangSmith, JinaAI, and more.
 
diff --git a/docs/integrations/openllmetry.md b/docs/integrations/openllmetry.md
@@ -0,0 +1,32 @@
+---
+slug: /integrations/openllmetry
+title: 🔭 OpenLLMetry
+---
+
+## 🔭 OpenLLMetry
+
+[OpenLLMetry](https://www.traceloop.com/openllmetry) provides observability for systems using Chroma. It allows tracing calls to Chroma, OpenAI, and other services.
+It gives visibility to query and index calls as well as LLM prompts and completions.
+For more information on how to use OpenLLMetry, see the [OpenLLMetry docs](https://www.traceloop.com/docs/openllmetry).
+
+<img src="/img/openllmetry.png" />
+
+### Example
+
+Install OpenLLMetry SDK by running:
+
+```bash
+pip install traceloop-sdk
+```
+
+Then, initialize the SDK in your application:
+
+```python
+from traceloop.sdk import Traceloop
+
+Traceloop.init()
+```
+
+### Configuration
+
+OpenLLMetry can be configured to send traces to any observability platform that supports OpenTelemetry - Datadog, Honeycomb, Dynatrace, New Relic, etc. See the [OpenLLMetry docs](https://www.traceloop.com/openllmetry/provider/chroma) for more information.
diff --git a/docs/observability.md b/docs/observability.md
@@ -5,21 +5,43 @@ title: "👀 Observability"
 
 # 👀 Observability
 
+## Backend Observability
+
 Chroma is instrumented with [OpenTelemetry](https://opentelemetry.io/) hooks for observability.
 
 :::tip Telemetry vs Observability
 "[Telemetry](/telemetry)" refers to anonymous product usage statistics we collect. "Observability" refers to metrics, logging, and tracing which can be used by anyone operating a Chroma deployment. Observability features listed on this page are **never** sent back to Chroma; they are for end-users to better understand how their Chroma deployment is behaving.
 :::
 
-## Available Observability
+### Available Observability
 
-Chroma currently only exports OpenTelemtry [traces](https://opentelemetry.io/docs/concepts/signals/traces/). Traces allow a Chroma operator to understand how requests flow through the system and quickly identify bottlenecks.
+Chroma currently only exports OpenTelemetry [traces](https://opentelemetry.io/docs/concepts/signals/traces/). Traces allow a Chroma operator to understand how requests flow through the system and quickly identify bottlenecks.
 
-## Configuration
+### Configuration
 
 Tracing is configured with four environment variables:
 
 - `CHROMA_OTEL_COLLECTION_ENDPOINT`: where to send observability data. Example: `api.honeycomb.com`.
 - `CHROMA_OTEL_SERVICE_NAME`: Service name for OTel traces. Default: `chromadb`.
 - `CHROMA_OTEL_COLLECTION_HEADERS`: Headers to use when sending observability data. Often used to send API and app keys.
 - `CHROMA_OTEL_GRANULARITY`: A value from the [OpenTelemetryGranularity enum](https://github.com/chroma-core/chroma/tree/main/chromadb/telemetry/opentelemetry/__init__.py). Specifies how detailed tracing should be.
+
+## Local Observability Stack (🐳👀📚)
+
+Chroma also comes with a local observability stack. The stack is composed of Chroma Server (the one you know and ❤️), [OpenTelemetry Collector](https://github.com/open-telemetry/opentelemetry-collector), and [Zipkin](https://zipkin.io/).
+
+To start the stack, run from the root of the repo:
+
+```bash
+docker compose -f examples/observability/docker-compose.local-observability.yml up --build -d
+```
+
+Once the stack is running, you can access Zipkin at http://localhost:9411
+
+:::tip Traces
+Traces in Zipkin will start appearing after you make a request to Chroma.
+:::
+
+## Client (SDK) Observability
+
+See [OpenLLMetry Integration](/integrations/openllmetry).
diff --git a/docs/reference/Collection.md b/docs/reference/Collection.md
@@ -51,8 +51,7 @@ Add embeddings to the data store.
 - `ValueError` - If you don't provide either embeddings or documents
 - `ValueError` - If the length of ids, embeddings, metadatas, or documents don't match
 - `ValueError` - If you don't provide an embedding function and don't provide embeddings
-- `ValueError` - If you provide both embeddings and documents
-- `ValueError` - If you provide an id that already exists
+- `DuplicateIDError` - If you provide an id that already exists
 
 ### get
 
diff --git a/docs/troubleshooting.md b/docs/troubleshooting.md
@@ -29,24 +29,6 @@ results = collection.query(
 We may change `None` to something else to more clearly communicate why they were not returned.
 :::
 
-## Your index resets back to just a few number of records
-
-Users report that they are using Chroma, happily adding data, and then they go to check the `count()` or `query()` and only a single item or a very small fraction of their data is in Chroma.
-
-Chroma has 3 clients: `Ephemeral`, `Persistent`, and `Http`. The `Ephemeral` and `Persistent` clients should treated as **singletons**.
-
-Here is what commonly happens.
-
-1. Create a new Chroma client, #1 saving to `./db`, and add 10 items.
-2. Create another new Chroma client, #2 saving to `./db`, and add 10 more items. Call this B.
-3. Chroma does not lock the database between clients, each client maintains its own locking structure, so these clients can overwrite each other.
-
-`Solution`: Don't use multiple `Ephemeral` or `Persistent` clients at the same time. Create one client and use it for all your operations.
-
-:::note
-We may add extra logic to warn if multiple in-memory clients are used with the same path.
-:::
-
 ## Build error when running `pip install chromadb`
 
 If you encounter an error like this during setup
@@ -73,3 +55,7 @@ Chroma requires SQLite > 3.35, if you encounter issues with having too low of a
 3. If you are on Windows, you can manually download the latest version of SQLite from https://www.sqlite.org/download.html and
    replace the DLL in your python installation's DLLs folder with the latest version. You can find your python installation path by running `os.path.dirname(sys.executable)` in python.
 4. If you are using a Debian based Docker container, older Debian versions do not have an up to date SQLite, please use `bookworm` or higher.
+
+##  Illegal instruction (core dumped)
+
+If you encounter an error like this during setup and are using Docker - you may have built the library on a machine with a different CPU architecture than the one you are running it on. Try rebuilding the Docker image on the machine you are running it on.
diff --git a/docs/usage-guide.md b/docs/usage-guide.md
diff --git a/sidebars.js b/sidebars.js
diff --git a/static/img/openllmetry.png b/static/img/openllmetry.png