Skip to content
This repository was archived by the owner on May 10, 2024. It is now read-only.

Commit c00b432

Browse files
committed
Merge branch 'main' of https://github.com/JoanFM/docs into docs-add-jina-embedding-func
2 parents 5c07db8 + fe6b52d commit c00b432

31 files changed

+774
-259
lines changed

docs/about.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
sidebar_position: 5
2+
sidebar_position: 15
33
---
44

55
# 👽 About

docs/api-reference.md

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
sidebar_position: 5
2+
sidebar_position: 6
33
title: "📖 API Cheatsheet"
44
---
55

@@ -157,7 +157,12 @@ Run `chroma run --path /db_path` to run the Chroma backend as a standalone serve
157157
## Initialize client - JS
158158

159159
```javascript
160-
import { ChromaClient } from "chromadb";
160+
// CJS
161+
const { ChromaClient } = require("chromadb");
162+
163+
// ESM
164+
import { ChromaClient } from 'chromadb'
165+
161166
const client = new ChromaClient();
162167
```
163168

docs/api/index.md

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
---
2+
slug: /api
3+
title: 🔧 API
4+
---
5+
6+
# 🔧 API
7+
8+
## Client APIs
9+
10+
Chroma currently maintains 1st party clients for Python and Javscript. For other clients in other languages, use their repos for documentation.
11+
12+
`Client` - is the object that wraps a connection to a backing Chroma DB
13+
14+
`Collection` - is the object that wraps a collectiom
15+
16+
17+
<div class="special_table"></div>
18+
19+
| | Client | Collection |
20+
|--------------|-----------|---------------|
21+
| Python | [Client](/reference/Client) | [Collection](/reference/Collection) |
22+
| Javascript | [Client](/js_reference/Client) | [Collection](/reference/Collection) |
23+
24+
***
25+
26+
## Backend API
27+
28+
Chroma's backend Swagger REST API docs are viewable by running Chroma and navigating to `http://localhost:8000/docs`.
29+
30+
```
31+
pip install chromadb
32+
chroma run
33+
open http://localhost:8000/docs
34+
```

docs/contributing.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
sidebar_position: 10
2+
sidebar_position: 14
33
title: "🍻 Contributing"
44
---
55

docs/deployment.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
sidebar_position: 8
2+
sidebar_position: 9
33
title: "☁️ Deployment"
44
---
55

docs/embeddings.md

Lines changed: 20 additions & 231 deletions
Original file line numberDiff line numberDiff line change
@@ -4,53 +4,41 @@ sidebar_position: 4
44

55
# 🧬 Embeddings
66

7-
import Tabs from '@theme/Tabs';
8-
import TabItem from '@theme/TabItem';
9-
10-
<div class="select-language">Select a language</div>
11-
12-
<Tabs queryString groupId="lang">
13-
<TabItem value="py" label="Python"></TabItem>
14-
<TabItem value="js" label="JavaScript"></TabItem>
15-
</Tabs>
16-
17-
***
18-
197
Embeddings are the A.I-native way to represent any kind of data, making them the perfect fit for working with all kinds of A.I-powered tools and algorithms. They can represent text, images, and soon audio and video. There are many options for creating embeddings, whether locally using an installed library, or by calling an API.
208

219
Chroma provides lightweight wrappers around popular embedding providers, making it easy to use them in your apps. You can set an embedding function when you create a Chroma collection, which will be used automatically, or you can call them directly yourself.
2210

23-
<Tabs queryString groupId="lang" className="hideTabSwitcher">
24-
<TabItem value="py" label="Python">
11+
<div class="special_table"></div>
2512

26-
To get Chroma's embedding functions, import the `chromadb.utils.embedding_functions` module.
13+
| | Python | JS |
14+
|--------------|-----------|---------------|
15+
| [OpenAI](/embeddings/openai) |||
16+
| [Cohere](/embeddings/cohere) |||
17+
| [Google PaLM](/embeddings/google-palm) |||
18+
| [Hugging Face](/embeddings/hugging-face) |||
19+
| [Instructor](/embeddings/instructor) |||
2720

28-
```python
29-
from chromadb.utils import embedding_functions
30-
```
21+
We welcome pull requests to add new Embedding Functions to the community.
3122

23+
***
3224

3325
## Default: all-MiniLM-L6-v2
3426

3527
By default, Chroma uses the [Sentence Transformers](https://www.sbert.net/) `all-MiniLM-L6-v2` model to create embeddings. This embedding model can create sentence and document embeddings that can be used for a wide variety of tasks. This embedding function runs locally on your machine, and may require you download the model files (this will happen automatically).
3628

3729
```python
30+
from chromadb.utils import embedding_functions
3831
default_ef = embedding_functions.DefaultEmbeddingFunction()
3932
```
4033

41-
:::tip
34+
:::note
4235
Embedding functions can linked to a collection, which are used whenever you call `add`, `update`, `upsert` or `query`. You can also be use them directly which can be handy for debugging.
4336
```py
4437
val = default_ef(["foo"])
4538
```
4639
-> [[0.05035809800028801, 0.0626462921500206, -0.061827320605516434...]]
4740
:::
4841

49-
</TabItem>
50-
51-
52-
<TabItem value="js" label="JavaScript">
53-
5442

5543
<!--
5644
## Transformers.js
@@ -83,9 +71,6 @@ const embedder = new TransformersEmbeddingFunction();
8371
8472
``` -->
8573

86-
</TabItem>
87-
</Tabs>
88-
8974
<Tabs queryString groupId="lang" className="hideTabSwitcher">
9075
<TabItem value="py" label="Python">
9176

@@ -105,216 +90,21 @@ You can pass in an optional `model_name` argument, which lets you choose which S
10590
</Tabs>
10691

10792

108-
## OpenAI
109-
110-
Chroma provides a convenient wrapper around OpenAI's embedding API. This embedding function runs remotely on OpenAI's servers, and requires an API key. You can get an API key by signing up for an account at [OpenAI](https://openai.com/api/).
111-
112-
<Tabs queryString groupId="lang" className="hideTabSwitcher">
113-
<TabItem value="py" label="Python">
114-
115-
This embedding function relies on the `openai` python package, which you can install with `pip install openai`.
116-
117-
```python
118-
openai_ef = embedding_functions.OpenAIEmbeddingFunction(
119-
api_key="YOUR_API_KEY",
120-
model_name="text-embedding-ada-002"
121-
)
122-
```
123-
124-
To use the OpenAI embedding models on other platforms such as Azure, you can use the `api_base` and `api_type` parameters:
125-
```python
126-
openai_ef = embedding_functions.OpenAIEmbeddingFunction(
127-
api_key="YOUR_API_KEY",
128-
api_base="YOUR_API_BASE_PATH",
129-
api_type="azure",
130-
api_version="YOUR_API_VERSION",
131-
model_name="text-embedding-ada-002"
132-
)
133-
```
134-
135-
</TabItem>
136-
<TabItem value="js" label="JavaScript">
137-
138-
```javascript
139-
const {OpenAIEmbeddingFunction} = require('chromadb');
140-
const embedder = new OpenAIEmbeddingFunction({openai_api_key: "apiKey"})
141-
142-
// use directly
143-
const embeddings = embedder.generate(["document1","document2"])
144-
145-
// pass documents to query for .add and .query
146-
const collection = await client.createCollection({name: "name", embeddingFunction: embedder})
147-
const collection = await client.getCollection({name: "name", embeddingFunction: embedder})
148-
```
149-
150-
</TabItem>
151-
152-
</Tabs>
153-
154-
155-
You can pass in an optional `model_name` argument, which lets you choose which OpenAI embeddings model to use. By default, Chroma uses `text-embedding-ada-002`. You can see a list of all available models [here](https://platform.openai.com/docs/guides/embeddings/what-are-embeddings).
156-
157-
## Cohere
158-
159-
Chroma also provides a convenient wrapper around Cohere's embedding API. This embedding function runs remotely on Cohere’s servers, and requires an API key. You can get an API key by signing up for an account at [Cohere](https://dashboard.cohere.ai/welcome/register).
160-
161-
<Tabs queryString groupId="lang" className="hideTabSwitcher">
162-
<TabItem value="py" label="Python">
163-
164-
This embedding function relies on the `cohere` python package, which you can install with `pip install cohere`.
165-
166-
```python
167-
cohere_ef = embedding_functions.CohereEmbeddingFunction(api_key="YOUR_API_KEY", model_name="large")
168-
cohere_ef(texts=["document1","document2"])
169-
```
170-
171-
</TabItem>
172-
<TabItem value="js" label="JavaScript">
173-
174-
```javascript
175-
const {CohereEmbeddingFunction} = require('chromadb');
176-
const embedder = new CohereEmbeddingFunction("apiKey")
177-
178-
// use directly
179-
const embeddings = embedder.generate(["document1","document2"])
180-
181-
// pass documents to query for .add and .query
182-
const collection = await client.createCollection({name: "name", embeddingFunction: embedder})
183-
const collectionGet = await client.getCollection({name:"name", embeddingFunction: embedder})
184-
```
185-
186-
</TabItem>
187-
188-
</Tabs>
189-
190-
191-
192-
You can pass in an optional `model_name` argument, which lets you choose which Cohere embeddings model to use. By default, Chroma uses `large` model. You can see the available models under `Get embeddings` section [here](https://docs.cohere.ai/reference/embed).
193-
194-
### Multilingual model example
195-
196-
<Tabs queryString groupId="lang" className="hideTabSwitcher">
197-
<TabItem value="py" label="Python">
198-
199-
```python
200-
cohere_ef = embedding_functions.CohereEmbeddingFunction(
201-
api_key="YOUR_API_KEY",
202-
model_name="multilingual-22-12")
203-
204-
multilingual_texts = [ 'Hello from Cohere!', 'مرحبًا من كوهير!',
205-
'Hallo von Cohere!', 'Bonjour de Cohere!',
206-
'¡Hola desde Cohere!', 'Olá do Cohere!',
207-
'Ciao da Cohere!', '您好,来自 Cohere!',
208-
'कोहेरे से नमस्ते!' ]
209-
210-
cohere_ef(texts=multilingual_texts)
211-
212-
```
213-
214-
</TabItem>
215-
<TabItem value="js" label="JavaScript">
216-
217-
```javascript
218-
const {CohereEmbeddingFunction} = require('chromadb');
219-
const embedder = new CohereEmbeddingFunction("apiKey")
220-
221-
multilingual_texts = [ 'Hello from Cohere!', 'مرحبًا من كوهير!',
222-
'Hallo von Cohere!', 'Bonjour de Cohere!',
223-
'¡Hola desde Cohere!', 'Olá do Cohere!',
224-
'Ciao da Cohere!', '您好,来自 Cohere!',
225-
'कोहेरे से नमस्ते!' ]
226-
227-
const embeddings = embedder.generate(multilingual_texts)
228-
229-
```
230-
231-
232-
</TabItem>
233-
234-
</Tabs>
235-
236-
237-
238-
For more information on multilingual model you can read [here](https://docs.cohere.ai/docs/multilingual-language-models).
239-
240-
## Instructor models
241-
242-
The [instructor-embeddings](https://github.com/HKUNLP/instructor-embedding) library is another option, especially when running on a machine with a cuda-capable GPU. They are a good local alternative to OpenAI (see the [Massive Text Embedding Benchmark](https://huggingface.co/blog/mteb) rankings). The embedding function requires the InstructorEmbedding package. To install it, run ```pip install InstructorEmbedding```.
243-
244-
There are three models available. The default is `hkunlp/instructor-base`, and for better performance you can use `hkunlp/instructor-large` or `hkunlp/instructor-xl`. You can also specify whether to use `cpu` (default) or `cuda`. For example:
245-
246-
```python
247-
#uses base model and cpu
248-
ef = embedding_functions.InstructorEmbeddingFunction()
249-
```
250-
or
251-
```python
252-
ef = embedding_functions.InstructorEmbeddingFunction(
253-
model_name="hkunlp/instructor-xl", device="cuda")
254-
```
255-
Keep in mind that the large and xl models are 1.5GB and 5GB respectively, and are best suited to running on a GPU.
256-
257-
## Google PaLM API models
258-
259-
[Google PaLM APIs](https://developers.googleblog.com/2023/03/announcing-palm-api-and-makersuite.html) are currently in private preview, but if you are part of this preview, you can use them with Chroma via the `GooglePalmEmbeddingFunction`.
260-
261-
To use the PaLM embedding API, you must have `google.generativeai` Python package installed and have the API key. To use:
262-
263-
```python
264-
palm_embedding = embedding_functions.GooglePalmEmbeddingFunction(
265-
api_key=api_key, model=model_name)
266-
267-
```
268-
269-
## HuggingFace
270-
271-
Chroma also provides a convenient wrapper around HuggingFace's embedding API. This embedding function runs remotely on HuggingFace's servers, and requires an API key. You can get an API key by signing up for an account at [HuggingFace](https://huggingface.co/).
272-
273-
<Tabs queryString groupId="lang" className="hideTabSwitcher">
274-
<TabItem value="py" label="Python">
275-
276-
This embedding function relies on the `requests` python package, which you can install with `pip install requests`.
277-
278-
```python
279-
huggingface_ef = embedding_functions.HuggingFaceEmbeddingFunction(
280-
api_key="YOUR_API_KEY",
281-
model_name="sentence-transformers/all-MiniLM-L6-v2"
282-
)
283-
```
284-
285-
You can pass in an optional `model_name` argument, which lets you choose which HuggingFace model to use. By default, Chroma uses `sentence-transformers/all-MiniLM-L6-v2`. You can see a list of all available models [here](https://huggingface.co/models).
286-
287-
</TabItem>
288-
<TabItem value="js" label="JavaScript">
289-
</TabItem>
290-
</Tabs>
291-
292-
## Jina AI
293-
294-
<Tabs queryString groupId="lang" className="hideTabSwitcher">
295-
<TabItem value="py" label="Python">
93+
***
29694

297-
Chroma provides a convenient wrapper around JinaAI's embedding API. This embedding function runs remotely on JinaAI's servers, and requires an API key. You can get an API key by signing up for an account at [JinaAI](https://jina.ai/embeddings/).
29895

299-
This embedding function relies on the `requests` python package, which you can install with `pip install requests`.
96+
## Custom Embedding Functions
30097

301-
```python
302-
jinaai_ef = embedding_functions.JinaEmbeddingFunction(
303-
api_key="YOUR_API_KEY",
304-
model_name="jina-embeddings-v2-base-en"
305-
)
306-
jinaai_ef(input=["This is my first text to embed", "This is my second document"])
307-
```
98+
import Tabs from '@theme/Tabs';
99+
import TabItem from '@theme/TabItem';
308100

309-
You can pass in an optional `model_name` argument, which lets you choose which Jina model to use. By default, Chroma uses `jina-embedding-v2-base-en`.
101+
<div class="select-language">Select a language</div>
310102

311-
</TabItem>
312-
<TabItem value="js" label="JavaScript">
313-
</TabItem>
103+
<Tabs queryString groupId="lang">
104+
<TabItem value="py" label="Python"></TabItem>
105+
<TabItem value="js" label="JavaScript"></TabItem>
314106
</Tabs>
315107

316-
## Custom Embedding Functions
317-
318108
<Tabs queryString groupId="lang" className="hideTabSwitcher">
319109
<TabItem value="py" label="Python">
320110

@@ -356,4 +146,3 @@ class MyEmbeddingFunction {
356146

357147
</Tabs>
358148

359-
We welcome pull requests to add new Embedding Functions to the community.

0 commit comments

Comments
 (0)