Skip to content

Comments

Detect altnames that are a substring of name.default#548

Draft
orangejulius wants to merge 2 commits intomasterfrom
detect-prefix-altnames
Draft

Detect altnames that are a substring of name.default#548
orangejulius wants to merge 2 commits intomasterfrom
detect-prefix-altnames

Conversation

@orangejulius
Copy link
Member

@orangejulius orangejulius commented Feb 4, 2021

This change is an attempt to mitigate scoring penalties applied to documents with alternate names (#507).

It handles the case where an alt name is merely a substring of the main name, for example on the Union Square subway stop in OSM:

image

Alt names like this don't add much value: they don't allow searching on any new terms, but do throw off the scoring. Even when we fix the scoring issue, duplicate alt names that add no value still take up space, so this change should be useful for a while.

The change comes in 2 parts, each in their own commit:

  • what is intended to be a pure refactoring of the tag mapping logic. Rather than loop through all tags in whatever order they come, start with the tags that can go into name.default, then the other name.* fields, and finally the rest. This makes it easier to write logic that looks for duplicates
  • the actual change to check for the substring match, which is quite small

I'd be happy to extend this in the future with other near-identical alt names, such as handling Mc Donalds vs McDonalds or ignoring quotes or other special characters like in pelias/api#1488.

"name": {
"default": "IPOH Asian House"
},
"default": "IPOH Asian House" },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-issue: this is weirdly indented

Copy link
Member Author

@orangejulius orangejulius Feb 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, oops! My fault! I had a try at editing the fixture manually since there were only a few changes.

Should have left it to the machines.

Copy link
Member

@missinglink missinglink Feb 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha, it's really not an issue, I just noticed it a few times in the PR, like the trim( tags[key])) catches my eye.

the Pelias code styling used to be more hit-and-miss, I've been using autoformatting in my editor for a while now which hopefully fixes a bunch of the little ones.

at some point I'd still love to fully adopt standardJS.

This makes it easier to add custom logic by working through the tags in
a specified order.
This handles the case where one alt name is a substring fully contained
in another.
@orangejulius
Copy link
Member Author

I came across this PR today and wanted to see if it still made a difference, so I've rebased it and kicked off a planet build to test things out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants