You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jan 29, 2024. It is now read-only.
For simplicity (and to leverage our GPUs) we can compute manually the paragraph embeddings of the articles in our Elasticsearch database.
But at query time, it makes sense to have our sentence-transformer embedding model deployed on Kubernetes, to be able to scale and avoid downtimes when users make their queries.
Seldon seems to be a good solution to easily deploy our model on Kubernetes and provide a RESTful API to address requests from users.
Note that the goal of this stage of the Information Retrieval pipeline is to quickly retrieve a certain number of potentially relevant documents (e.g. ~1000), but we don't care too much about these results being ranked very accurately (this is happening in the re-ranking following stage). So it could be a good idea to use a smaller, faster sentence embedding model.
Actions
Investigate if there are any (better?) alternatives to Seldon.
Deploy our sentence embedding model on Kubernetes using the best framework that we found.