-
Notifications
You must be signed in to change notification settings - Fork 266
feat(ai): add how to query rerank models #5873
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
RoRoJ
wants to merge
4
commits into
main
Choose a base branch
from
MTA-6721
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
110 changes: 110 additions & 0 deletions
110
pages/generative-apis/how-to/query-reranking-models.mdx
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,110 @@ | ||||||||||
| --- | ||||||||||
| title: How to query reranking models | ||||||||||
| description: Learn how to interact with powerful reranking models using Scaleway's Generative APIs service. | ||||||||||
| tags: generative-apis ai-data reranking-models | ||||||||||
| dates: | ||||||||||
| validation: 2025-11-25 | ||||||||||
| posted: 2025-11-25 | ||||||||||
| --- | ||||||||||
| import Requirements from '@macros/iam/requirements.mdx' | ||||||||||
|
|
||||||||||
| Scaleway's Generative APIs service allows users to interact with powerful reranking models hosted on the platform. | ||||||||||
|
|
||||||||||
| <Requirements /> | ||||||||||
|
|
||||||||||
| - A Scaleway account logged into the [console](https://console.scaleway.com) | ||||||||||
| - [Owner](/iam/concepts/#owner) status or [IAM permissions](/iam/concepts/#permission) allowing you to perform actions in the intended Organization | ||||||||||
| - A valid [API key](/iam/how-to/create-api-keys/) for API authentication | ||||||||||
|
|
||||||||||
| ## Understanding reranking models | ||||||||||
|
|
||||||||||
| Reranking models are designed to be used in **retrieval** pipelines, e.g. RAG, retrieval of previous conversations, or agents needing to choose the best tool call or search result from multiple results. Reranking models filter all the results retrieved, score their relevance, and keep only those that are the most pertinent to the initial query. | ||||||||||
|
|
||||||||||
| For example: a query to a fast (but imprecise) model may return a list of 100 documents. A specialized reranking model can then evaluate these documents more deeply, score each on how well it matches the query, and return only the 10 most relevant documents to the first model to be used in answering the query. | ||||||||||
|
|
||||||||||
| This approach takes advantage of the strengths of each model: one that is fast but not specialized, which can generate candidates quickly, and another than is slow but specialized, to refine these candidates. It can result in reduced context windows with therefore improved relevance, and faster overall query processing time. | ||||||||||
|
|
||||||||||
| ## Using embedding models for reranking | ||||||||||
|
|
||||||||||
| In the case of using an embedding model such as `qwen3-embedding` for reranking, note that the [Embedding API](https://www.scaleway.com/en/developers/api/generative-apis/#path-embeddings-create-an-embedding) and [Rereanking API](https://www.scaleway.com/en/developers/api/generative-apis/#path-rerank-create-a-reranking) are functionally equivalent. This is because the generated embedding vectors are normalized - meaning that the reranking score corresponds directly to the **cosine similarity** between vectors, which, for normalized vectors, is identical to the **dot product**. | ||||||||||
|
|
||||||||||
| In practical terms: | ||||||||||
|
|
||||||||||
| - Query vector: `qv = embedding(query`) | ||||||||||
| - Document vector: `dv = embedding(document content)` | ||||||||||
|
Comment on lines
+33
to
+34
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
| - Relevance score: `score = (qv, dv)` (dot product) | ||||||||||
|
|
||||||||||
| Therefore, if you're performing repeated relevance scoring, you can streamline your workflow as follows: | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
|
|
||||||||||
| - Use the **Embeddings API** to generate and store document vectors in a vector database (e.g. **pgvector**). | ||||||||||
| - At query time, compute the query vector and retrieve the most relevant documents using **dot product or cosine similarity** directly in the vector database, rather than using the Rerank API. | ||||||||||
|
|
||||||||||
| This approach is particularly effective when **reranking results from non-embedding retrieval systems** such as: | ||||||||||
| - Full-text search (e.g., using **BM25** in OpenSearch) | ||||||||||
| - Graph-based search | ||||||||||
| - Hybrid search pipelines combining multiple retrieval strategies | ||||||||||
|
|
||||||||||
| ## How to query reranking models via the API | ||||||||||
|
|
||||||||||
| The example below sends a cURL request to the Rerank API to generate a reranking of a given set of documents. | ||||||||||
|
|
||||||||||
| Ensure you have saved your [API key](/iam/how-to/create-api-keys/) in a `$SCW_SECRET_KEY` environment variable. | ||||||||||
|
|
||||||||||
| <Message type="note"> | ||||||||||
| As the Rerank API is not an OpenAI API, it cannot be used with the python `openai` client. | ||||||||||
| </Message> | ||||||||||
|
|
||||||||||
| ```json | ||||||||||
| curl https://api.scaleway.ai/v1/rerank \ | ||||||||||
| -H "Authorization: Bearer $SCW_SECRET_KEY" \ | ||||||||||
| -H "Content-Type: application/json" \ | ||||||||||
| -d '{ | ||||||||||
| "model": "qwen3-embedding-8b", | ||||||||||
| "query": "What is the biggest area of water on earth ?", | ||||||||||
| "documents": [ | ||||||||||
| "The Pacific is approximately 165 million km²", | ||||||||||
| "Oceans can be sorted by size: Pacific, Atlantic, Indian", | ||||||||||
| "The Atlantic is a very large ocean.", | ||||||||||
| "The deepest pool on earth is 96 000 m²" | ||||||||||
| ], | ||||||||||
| "top_n": 3 | ||||||||||
| }' | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| The response should be similar to the following: | ||||||||||
|
|
||||||||||
| ```json | ||||||||||
| { | ||||||||||
| "id": "rerank-a89e6d7b8b97492ea81569c65fbfff49", | ||||||||||
| "model": "qwen3-embedding-8b", | ||||||||||
| "usage": { | ||||||||||
| "total_tokens": 99 | ||||||||||
| }, | ||||||||||
| "results": [ | ||||||||||
| { | ||||||||||
| "index": 1, | ||||||||||
| "document": { | ||||||||||
| "text": "Oceans can be sorted by size: Pacific, Atlantic, Indian", | ||||||||||
| "multi_modal": null | ||||||||||
| }, | ||||||||||
| "relevance_score": 0.6456239223480225 | ||||||||||
| }, | ||||||||||
| { | ||||||||||
| "index": 2, | ||||||||||
| "document": { | ||||||||||
| "text": "The Atlantic is a very large ocean.", | ||||||||||
| "multi_modal": null | ||||||||||
| }, | ||||||||||
| "relevance_score": 0.6264235377311707 | ||||||||||
| }, | ||||||||||
| { | ||||||||||
| "index": 0, | ||||||||||
| "document": { | ||||||||||
| "text": "The Pacific is approximately 165 million km²", | ||||||||||
| "multi_modal": null | ||||||||||
| }, | ||||||||||
| "relevance_score": 0.6059925556182861 | ||||||||||
| } | ||||||||||
| ] | ||||||||||
| } | ||||||||||
| ``` | ||||||||||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.