Skip to content

Search fields: issues with tokenized vs phrasal searching #265

@wlpotter

Description

@wlpotter

Not sure what the possibilities in OpenSearch are, but the current configuration of how a search string is parsed and matched causes a few issues.

For example, a search with "of" will match any document with "of" in the field. This is an issue across modules, but especially for person names. E.g., searching for "John of Ephesus" in all fields yields over 2,000 results. And searching in "person name" yields 1150+ results. Few of these are relevant.

CBSS has similar issues with "of", as well as author names that are multi-token, for example searching for the author "Kees den Biesen" yields other authors with "den" in their name.

At the same time, I don't think we want to disallow tokenization entirely.

  • maybe we can declare 'stop words' that get ignored/removed from the searching?
  • maybe we can implement and document search syntax like "quoted strings" being treated as phrasal?
  • maybe certain fields work that way but not all, though this seems tricky to document clearly, and I'm unsure if we can even maintain hard-and-fast rules

Metadata

Metadata

Assignees

No one assigned

    Labels

    All ModulesThis issue affects all Syriaca.org modules.meeting-agenda-itemquestionFurther information is requestedsearch/browseIssues related to search and browse, likely requiring OpenSearch configuration

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions