[DO NOT MERGE] Sch 1990 remove registry index#3610
Draft
Conversation
The "document_series" field has been renamed to "document_collections" [1]. There are no documents in elasticsearch with this field populated, so we no longer need the registry that expands the field. [1] alphagov/whitehall#915
Registries are used to expand fields on elasticsearch documents before they are presented to search API. "document_collection" is a migrated format, so the field "document_collections" should be expanded using content from the govuk index, instead of the government index. See PR where "document_collections" field was reinstated for additional context: #3215
The "policy_areas" field has been deprecated and is no longer being indexed into elasticsearch [1]. There are no documents in elasticsearch with this field populated in the govuk index, so we no longer need the registry that expands the field. [1] alphagov/whitehall#5666
Registries are used (in part) to expand fields on elasticsearch
documents before they are presented to search API. This works by
mapping the slug stored in the field to documents of the relevant
format, and taking additional information from the mapped document.
The "world_location" format is no longer being used: there
are no documents with this format stored in elasticsearch.
However, the field "world_locations" is used in many places
(for example as a filter on many finders) and the registry field
expansion changes the format of the field [1][2] from the raw
elasticsearch document from something that looks like:
"world_locations" : ["bulgaria"],
to something that looks like:
"world_locations": [{"slug": "bulgaria"}],
We don't yet understand the consequences of changing the formatting
on this field, so want to preserve it. As a quick way to keep this
formatting change, but remove references to the registry_index (as we
prepare to deprecate the government index), we can use the govuk
index to 'expand' the field. Even though it adds no additional
information (as before), this will preserve the formatting change.
[1] https://github.com/alphagov/search-api/blob/main/lib/search/presenters/entity_expander.rb#L75
[2] https://github.com/alphagov/search-api/blob/main/lib/search/presenters/field_presenter.rb#L33
Now all registries pull information from the govuk index, we can remove the registry_index that points to the government index.
190cd6b to
c7d5188
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In elasticsearch.yml there is a mapping of a "registry_index" to the government index. This reference is used in code that looks up linked documents to supported expanded links for four fields that could exist on an elasticsearch document:
The
policy_areasanddocument_seriesfields have been deprecated, meaning we no longer need the link expansion and so it's being removed. For the other two, we're switching to look up additional information from the govuk index instead of the government index. We're then deleting the reference to the registry_index, in preparation of deprecating the government index.Jira ticket: https://gov-uk.atlassian.net/browse/SCH-1990
Migration of document_collections isn't working because documents with this format don't have the
slugfield populated. This needs sorting before merging.