combine ngram and full-token fields

I think this has been discussed in the past but there was no issue open for it.

**background**

For many fields we have two 'subfields', one which contains the complete token (for search and autocomplete) and one which contains the prefix ngrams (for the final autocomplete token).

In the case of the `name` field this is implemented as separate fields (eg. `name.en` and `phrase.en`), for other fields it's implemented as a 'subfield' (eg. `parent.continent` and `parent.continent.ngram`).

At some point I'd love to fix that and make it more consistent, but that's a different issue ;)

**proposal**

On reflection we can provide both prefix and exact token matching using a single field.

The trick is very simply adding an [end of text](https://www.compart.com/en/unicode/U+0003) character to each token when indexing and again when searching for exact matches.

**the 'work'**

There would need to be some changes to the code to support this, much of which could be hidden from the application by using a query-time analyzer which handles adding the 'end of text' character when required.

There may need to be some consideration to synonyms to ensure that they continued to operate as expected.

**summary**

The pros would be that we could simplify the field mapping to remove the duplication required per-field to support ngrams, this would in turn clean up the query logic so it didn't need to be aware of the different field names of ngram fields.

The cons are that we would introduce a new convention which would require adapting the code to accommodate.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

combine ngram and full-token fields #477

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

combine ngram and full-token fields #477

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions