Detect altnames that are a substring of name.default#548
Detect altnames that are a substring of name.default#548orangejulius wants to merge 2 commits intomasterfrom
Conversation
e775ef6 to
9a0b934
Compare
| "name": { | ||
| "default": "IPOH Asian House" | ||
| }, | ||
| "default": "IPOH Asian House" }, |
There was a problem hiding this comment.
non-issue: this is weirdly indented
There was a problem hiding this comment.
oh, oops! My fault! I had a try at editing the fixture manually since there were only a few changes.
Should have left it to the machines.
There was a problem hiding this comment.
haha, it's really not an issue, I just noticed it a few times in the PR, like the trim( tags[key])) catches my eye.
the Pelias code styling used to be more hit-and-miss, I've been using autoformatting in my editor for a while now which hopefully fixes a bunch of the little ones.
at some point I'd still love to fully adopt standardJS.
9a0b934 to
481144e
Compare
This makes it easier to add custom logic by working through the tags in a specified order.
This handles the case where one alt name is a substring fully contained in another.
481144e to
9c364a5
Compare
|
I came across this PR today and wanted to see if it still made a difference, so I've rebased it and kicked off a planet build to test things out. |
This change is an attempt to mitigate scoring penalties applied to documents with alternate names (#507).
It handles the case where an alt name is merely a substring of the main name, for example on the Union Square subway stop in OSM:
Alt names like this don't add much value: they don't allow searching on any new terms, but do throw off the scoring. Even when we fix the scoring issue, duplicate alt names that add no value still take up space, so this change should be useful for a while.
The change comes in 2 parts, each in their own commit:
name.default, then the othername.*fields, and finally the rest. This makes it easier to write logic that looks for duplicatesI'd be happy to extend this in the future with other near-identical alt names, such as handling
Mc DonaldsvsMcDonaldsor ignoring quotes or other special characters like in pelias/api#1488.