Explore Re-Ranking models to improve results of Semantic Search

## Context
- As explained in `sentence-transformers` docs (here), the typical structure of an Information Retrieval pipeline consists of two stages. The top K best results of the first stage are fed into the second stage for a more refined sorting/ranking.
  - **Retrieval Bi-Encoder** — Typically a _symmetric_ model, feeds the same neural net first with the query and then with the candidate sentence, then computes the similarity between the two as the cosine/Euclidean/dot-product distance between the two embeddings.
  - **Re-Ranker Cross-Encoder** — Typically an _asymmetric_ model (especially in the case of asymmetric tasks where the query is much shorter than the documents, e.g. given a question retrieve candidate contexts), feeds a neural network with the query and a candidate sentence concatenated and directly outputs a similarity score.
       <img src="https://raw.githubusercontent.com/UKPLab/sentence-transformers/master/docs/img/InformationRetrieval.png" width="600px">
        <img alt="BiEncoder" src="https://raw.githubusercontent.com/UKPLab/sentence-transformers/master/docs/img/Bi_vs_Cross-Encoder.png">
- Why not just stopping after the first retrieval stage? Because hopefully "the advantage of Cross-Encoders is the higher performance, as they perform attention across the query and the document" ([here](https://www.sbert.net/examples/applications/retrieve_rerank/README.html#re-ranker-cross-encoder)).
 - We already tested models for sentence embedding corresponding to the Retrieval Bi-Encoder (see #623). We now want to try out the Re-Ranking.

## Actions
- [ ] Set up a dataset/experiment to evaluate the performance of simple retrieval vs. retrieval+re-ranked.
- [ ] Evaluate results using various re-ranking models ([here](https://www.sbert.net/docs/pretrained_cross-encoders.html)). Models trained on the QNLI dataset ([here](https://www.sbert.net/docs/pretrained_cross-encoders.html#squad-qnli)) could be particularly relevant: these models are trained to predict, for the SQuAD dataset,  if a context contains an answer to a given question.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Explore Re-Ranking models to improve results of Semantic Search #625

Context

Actions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Explore Re-Ranking models to improve results of Semantic Search #625

Description

Context

Actions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions