From 775d6eb1a89900a72bce770480b546a3f2c796b2 Mon Sep 17 00:00:00 2001 From: Rowena Date: Tue, 25 Nov 2025 18:06:26 +0100 Subject: [PATCH 1/4] feat(genapi): how to query reranking models --- .../how-to/query-reranking-models.mdx | 101 ++++++++++++++++++ 1 file changed, 101 insertions(+) create mode 100644 pages/generative-apis/how-to/query-reranking-models.mdx diff --git a/pages/generative-apis/how-to/query-reranking-models.mdx b/pages/generative-apis/how-to/query-reranking-models.mdx new file mode 100644 index 0000000000..8b3599ac26 --- /dev/null +++ b/pages/generative-apis/how-to/query-reranking-models.mdx @@ -0,0 +1,101 @@ +--- +title: How to query reranking models +description: Learn how to interact with powerful reranking models using Scaleway's Generative APIs service. +tags: generative-apis ai-data reranking-models +dates: + validation: 2025-11-25 + posted: 2025-11-25 +--- +import Requirements from '@macros/iam/requirements.mdx' + +Scaleway's Generative APIs service allows users to interact with powerful reranking models hosted on the platform. + +Reranking models are designed to be used in **retrieval** pipelines, e.g. RAG, retrieval of previous conversations, or agents needing to choose the best tool call or search result from multiple results. Reanking models filter all the results retrieved, score their relevance, and keep only those that are the most pertinent to the initial query. + +For example: a query to a fast (but imprecise) model may return a list of 100 documents. A specialized reranking model can then evaluate these documents more deeply, score each on how well it matches the query, and return only the 10 most relevant documents to the first model to be used in answering the query. + +This approach takes advantage of the strengths of each model: one that is fast but not specialized, which can generate candidates quickly, and another than is slow but specialized, to refine these candidates. It can result in reduced context windows with therefore improved relevance, and faster overall query processing time. + + +In the case of using an embedding model such as `qwen3-embedding` for reranking, note that the [Embedding API](TODO) and [Rereanking API](TODO) are functionally equivalent. This is because the generated embedding vectors are normalized - meaning that the reranking score corresponds directly to the **cosine similarity** between vectors, which, for normalized vectors, is identical to the **dot product**. + +In practical terms: + +- Query vector: `qv = embedding(query`) +- Document vector: `dv = embedding(document content)` +- Relevance score: `score = (qv, dv)` (dot product) + +Therefore, if you're performing repeated relevance scoring, you can streamline your workflow as follows: + +- Use the **Embeddings API** to generate and store document vectors in a vector database (e.g. **pgvector**). +- At query time, compute the query vector and retrieve the most relevant documents using **dot product or cosine similarity** directly in the vector database, rather than using the Reranking API. + +This approach is particularly effective when **reranking results from non-embedding retrieval systems** such as: +- Full-text search (e.g., using **BM25** in OpenSearch) +- Graph-based search +- Hybrid search pipelines combining multiple retrieval strategies + + +There are several ways to interact with reranking models: +- The Scaleway [console](https://console.scaleway.com) provides a complete [playground](/generative-apis/how-to/query-language-models/#accessing-the-playground), aiming to test models, adapt parameters, and observe how these changes affect the output in real-time. +- Via the [Rerank API](https://www.scaleway.com/en/developers/api/generative-apis/TODO) + + + +- A Scaleway account logged into the [console](https://console.scaleway.com) +- [Owner](/iam/concepts/#owner) status or [IAM permissions](/iam/concepts/#permission) allowing you to perform actions in the intended Organization +- A valid [API key](/iam/how-to/create-api-keys/) for API authentication +- Python 3.7+ installed on your system + +## How to query reranking models via the playground + +See our [quickstart](/generative-apis/quickstart/#start-with-the-generative-apis-playground) for full details on accessing and testing models in the Scaleway console Generative APIs playground. + +## How to query reranking models via API + +You can query reranking programmatically using your favorite tools or languages. + +In the example that follows, we will use the OpenAI Python client. + +### Installing the OpenAI SDK + +Install the OpenAI SDK using pip: + +```bash +pip install openai +``` + +### Initializing the client + +Initialize the OpenAI client with your base URL and API key: + +```python +from openai import OpenAI + +# Initialize the client with your base URL and API key +client = OpenAI( + base_url="https://api.scaleway.ai/v1", # Scaleway's Generative APIs service URL + api_key="" # Your unique API secret key from Scaleway +) +``` + +### Reranking documents + +You can now generate a reranking of a given set of documents as follows: + +```python +curl https://api.scaleway.ai/v1/rerank \ + -H "Authorization: Bearer $SCW_SECRET_KEY" \ + -H "Content-Type: application/json" \ + -d '{ + "model": "qwen3-embedding-8b", + "query": "What is the biggest area of water on earth ?", + "documents": [ + "The Pacific is approximately 165 million km²", + "Oceans can be sorted by size: Pacific, Atlantic, Indian", + "The Atlantic is a very large ocean.", + "The deepest pool on earth is 96 000 m²" + ], + "top_n": 3 + }' +``` \ No newline at end of file From b3055a3be3c6879c7898dab4c5cdfe9953334fe7 Mon Sep 17 00:00:00 2001 From: Rowena Date: Wed, 26 Nov 2025 13:29:04 +0100 Subject: [PATCH 2/4] fix(genapis): started ammendmnets --- .../how-to/query-reranking-models.mdx | 31 +++++++------------ 1 file changed, 12 insertions(+), 19 deletions(-) diff --git a/pages/generative-apis/how-to/query-reranking-models.mdx b/pages/generative-apis/how-to/query-reranking-models.mdx index 8b3599ac26..5a0455247a 100644 --- a/pages/generative-apis/how-to/query-reranking-models.mdx +++ b/pages/generative-apis/how-to/query-reranking-models.mdx @@ -10,13 +10,22 @@ import Requirements from '@macros/iam/requirements.mdx' Scaleway's Generative APIs service allows users to interact with powerful reranking models hosted on the platform. + + +- A Scaleway account logged into the [console](https://console.scaleway.com) +- [Owner](/iam/concepts/#owner) status or [IAM permissions](/iam/concepts/#permission) allowing you to perform actions in the intended Organization +- A valid [API key](/iam/how-to/create-api-keys/) for API authentication + +## Understanding reranking models + Reranking models are designed to be used in **retrieval** pipelines, e.g. RAG, retrieval of previous conversations, or agents needing to choose the best tool call or search result from multiple results. Reanking models filter all the results retrieved, score their relevance, and keep only those that are the most pertinent to the initial query. For example: a query to a fast (but imprecise) model may return a list of 100 documents. A specialized reranking model can then evaluate these documents more deeply, score each on how well it matches the query, and return only the 10 most relevant documents to the first model to be used in answering the query. This approach takes advantage of the strengths of each model: one that is fast but not specialized, which can generate candidates quickly, and another than is slow but specialized, to refine these candidates. It can result in reduced context windows with therefore improved relevance, and faster overall query processing time. - +## Using embedding models for reranking + In the case of using an embedding model such as `qwen3-embedding` for reranking, note that the [Embedding API](TODO) and [Rereanking API](TODO) are functionally equivalent. This is because the generated embedding vectors are normalized - meaning that the reranking score corresponds directly to the **cosine similarity** between vectors, which, for normalized vectors, is identical to the **dot product**. In practical terms: @@ -34,26 +43,10 @@ This approach is particularly effective when **reranking results from non-embedd - Full-text search (e.g., using **BM25** in OpenSearch) - Graph-based search - Hybrid search pipelines combining multiple retrieval strategies - - -There are several ways to interact with reranking models: -- The Scaleway [console](https://console.scaleway.com) provides a complete [playground](/generative-apis/how-to/query-language-models/#accessing-the-playground), aiming to test models, adapt parameters, and observe how these changes affect the output in real-time. -- Via the [Rerank API](https://www.scaleway.com/en/developers/api/generative-apis/TODO) - - - -- A Scaleway account logged into the [console](https://console.scaleway.com) -- [Owner](/iam/concepts/#owner) status or [IAM permissions](/iam/concepts/#permission) allowing you to perform actions in the intended Organization -- A valid [API key](/iam/how-to/create-api-keys/) for API authentication -- Python 3.7+ installed on your system - -## How to query reranking models via the playground - -See our [quickstart](/generative-apis/quickstart/#start-with-the-generative-apis-playground) for full details on accessing and testing models in the Scaleway console Generative APIs playground. -## How to query reranking models via API +## How to query reranking models via the API -You can query reranking programmatically using your favorite tools or languages. +You can query reranking models programmatically using your favorite tools or languages. In the example that follows, we will use the OpenAI Python client. From cfe3e2eac449853bd6da2f3c1edc777c9477c6e9 Mon Sep 17 00:00:00 2001 From: Rowena Date: Wed, 26 Nov 2025 14:32:13 +0100 Subject: [PATCH 3/4] feat(genapis): finished rerank stuff --- .../how-to/query-reranking-models.mdx | 102 ++++++++++-------- pages/generative-apis/menu.ts | 4 + 2 files changed, 63 insertions(+), 43 deletions(-) diff --git a/pages/generative-apis/how-to/query-reranking-models.mdx b/pages/generative-apis/how-to/query-reranking-models.mdx index 5a0455247a..2ba15c0834 100644 --- a/pages/generative-apis/how-to/query-reranking-models.mdx +++ b/pages/generative-apis/how-to/query-reranking-models.mdx @@ -26,7 +26,7 @@ This approach takes advantage of the strengths of each model: one that is fast b ## Using embedding models for reranking -In the case of using an embedding model such as `qwen3-embedding` for reranking, note that the [Embedding API](TODO) and [Rereanking API](TODO) are functionally equivalent. This is because the generated embedding vectors are normalized - meaning that the reranking score corresponds directly to the **cosine similarity** between vectors, which, for normalized vectors, is identical to the **dot product**. +In the case of using an embedding model such as `qwen3-embedding` for reranking, note that the [Embedding API](https://www.scaleway.com/en/developers/api/generative-apis/#path-embeddings-create-an-embedding) and [Rereanking API](https://www.scaleway.com/en/developers/api/generative-apis/#path-rerank-create-a-reranking) are functionally equivalent. This is because the generated embedding vectors are normalized - meaning that the reranking score corresponds directly to the **cosine similarity** between vectors, which, for normalized vectors, is identical to the **dot product**. In practical terms: @@ -37,7 +37,7 @@ In practical terms: Therefore, if you're performing repeated relevance scoring, you can streamline your workflow as follows: - Use the **Embeddings API** to generate and store document vectors in a vector database (e.g. **pgvector**). -- At query time, compute the query vector and retrieve the most relevant documents using **dot product or cosine similarity** directly in the vector database, rather than using the Reranking API. +- At query time, compute the query vector and retrieve the most relevant documents using **dot product or cosine similarity** directly in the vector database, rather than using the Rerank API. This approach is particularly effective when **reranking results from non-embedding retrieval systems** such as: - Full-text search (e.g., using **BM25** in OpenSearch) @@ -46,49 +46,65 @@ This approach is particularly effective when **reranking results from non-embedd ## How to query reranking models via the API -You can query reranking models programmatically using your favorite tools or languages. +The example below sends a cURL request to the Rerank API to generate a reranking of a given set of documents. -In the example that follows, we will use the OpenAI Python client. +Ensure you have saved your [API key](/iam/how-to/create-api-keys/) in a `$SCW_SECRET_KEY` environment variable. -### Installing the OpenAI SDK - -Install the OpenAI SDK using pip: - -```bash -pip install openai -``` - -### Initializing the client - -Initialize the OpenAI client with your base URL and API key: - -```python -from openai import OpenAI - -# Initialize the client with your base URL and API key -client = OpenAI( - base_url="https://api.scaleway.ai/v1", # Scaleway's Generative APIs service URL - api_key="" # Your unique API secret key from Scaleway -) -``` - -### Reranking documents - -You can now generate a reranking of a given set of documents as follows: + +As the Rerank API is not an OpenAI API, it cannot be used with the python `openai` client. + -```python +```json curl https://api.scaleway.ai/v1/rerank \ - -H "Authorization: Bearer $SCW_SECRET_KEY" \ - -H "Content-Type: application/json" \ - -d '{ - "model": "qwen3-embedding-8b", - "query": "What is the biggest area of water on earth ?", - "documents": [ - "The Pacific is approximately 165 million km²", - "Oceans can be sorted by size: Pacific, Atlantic, Indian", - "The Atlantic is a very large ocean.", - "The deepest pool on earth is 96 000 m²" - ], - "top_n": 3 - }' + -H "Authorization: Bearer $SCW_SECRET_KEY" \ + -H "Content-Type: application/json" \ + -d '{ + "model": "qwen3-embedding-8b", + "query": "What is the biggest area of water on earth ?", + "documents": [ + "The Pacific is approximately 165 million km²", + "Oceans can be sorted by size: Pacific, Atlantic, Indian", + "The Atlantic is a very large ocean.", + "The deepest pool on earth is 96 000 m²" + ], + "top_n": 3 + }' +``` + +The response should be similar to the following: + +```json +{ + "id": "rerank-a89e6d7b8b97492ea81569c65fbfff49", + "model": "qwen3-embedding-8b", + "usage": { + "total_tokens": 99 + }, + "results": [ + { + "index": 1, + "document": { + "text": "Oceans can be sorted by size: Pacific, Atlantic, Indian", + "multi_modal": null + }, + "relevance_score": 0.6456239223480225 + }, + { + "index": 2, + "document": { + "text": "The Atlantic is a very large ocean.", + "multi_modal": null + }, + "relevance_score": 0.6264235377311707 + }, + { + "index": 0, + "document": { + "text": "The Pacific is approximately 165 million km²", + "multi_modal": null + }, + "relevance_score": 0.6059925556182861 + } + ] +} ``` \ No newline at end of file diff --git a/pages/generative-apis/menu.ts b/pages/generative-apis/menu.ts index a73432efa1..35d0c7f384 100644 --- a/pages/generative-apis/menu.ts +++ b/pages/generative-apis/menu.ts @@ -41,6 +41,10 @@ export const generativeApisMenu = { { label: 'Query audio models', slug: 'query-audio-models' + }, + { + label: 'Query reranking models', + slug: 'query-reranking-models' }, { label: 'Use structured outputs', From 68e59e1cf0c17ff08a37bef4597d501a5d2bf443 Mon Sep 17 00:00:00 2001 From: Rowena Jones <36301604+RoRoJ@users.noreply.github.com> Date: Wed, 26 Nov 2025 17:58:31 +0100 Subject: [PATCH 4/4] Apply suggestions from code review Co-authored-by: Benedikt Rollik --- pages/generative-apis/how-to/query-reranking-models.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pages/generative-apis/how-to/query-reranking-models.mdx b/pages/generative-apis/how-to/query-reranking-models.mdx index 2ba15c0834..7fd4716eeb 100644 --- a/pages/generative-apis/how-to/query-reranking-models.mdx +++ b/pages/generative-apis/how-to/query-reranking-models.mdx @@ -18,7 +18,7 @@ Scaleway's Generative APIs service allows users to interact with powerful rerank ## Understanding reranking models -Reranking models are designed to be used in **retrieval** pipelines, e.g. RAG, retrieval of previous conversations, or agents needing to choose the best tool call or search result from multiple results. Reanking models filter all the results retrieved, score their relevance, and keep only those that are the most pertinent to the initial query. +Reranking models are designed to be used in **retrieval** pipelines, e.g. RAG, retrieval of previous conversations, or agents needing to choose the best tool call or search result from multiple results. Reranking models filter all the results retrieved, score their relevance, and keep only those that are the most pertinent to the initial query. For example: a query to a fast (but imprecise) model may return a list of 100 documents. A specialized reranking model can then evaluate these documents more deeply, score each on how well it matches the query, and return only the 10 most relevant documents to the first model to be used in answering the query.