Skip to content

Improve the quality of searches #236

@gaurav

Description

@gaurav

This comes down to the exact eDisMax query we use. As a general rule, tweaking any part of this expression results in some queries improving while others get worse.

params = {
"query": {
"edismax": {
"query": query,
# qf = query fields, i.e. how should we boost these fields if they contain the same fields as the input.
# https://solr.apache.org/guide/solr/latest/query-guide/dismax-query-parser.html#qf-query-fields-parameter
"qf": "preferred_name_exactish^250 names_exactish^100 preferred_name^25 names^10",
# pf = phrase fields, i.e. how should we boost these fields if they contain the entire search phrase.
# https://solr.apache.org/guide/solr/latest/query-guide/dismax-query-parser.html#pf-phrase-fields-parameter
"pf": "preferred_name_exactish^300 names_exactish^200 preferred_name^30 names^20",
# Boosts
"bq": [],
"boost": [
# The boost is multiplied with score -- calculating the log() reduces how quickly this increases
# the score for increasing clique identifier counts.
"log(sum(clique_identifier_count, 1))"
],
},
},
"sort": "score DESC, clique_identifier_count DESC, curie_suffix ASC",
"limit": limit,
"offset": offset,
"filter": filters,
"fields": "*, score",
"params": inner_params,
}

There is a possibility that this might work very differently if we were to switch to ElasticSearch (#182).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    In progress

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions