Skip to content

getLanguagesInTerritory should apply a threshold or allow consumers to do so #134

@nemobis

Description

@nemobis

mw.uls.getFrequentLanguageList blindly appends whatever $.uls.data.getLanguagesInTerritory( countryCode ) spits to the list of "common languages" for a territory.
If we look deeper into why languages suggested for Italy are so wrong ( https://bugzilla.wikimedia.org/62346), in addition to the issues already reported to CLDR there is the issue that we're not applying any threshold.

For instance, CLDR tells us that hr is spoken by 0.0057 % of the population, which is probably correct, but nevertheless hr manages to get into the list of "common" languages, which is absurd. I know that if the data was better then picking the top 7-9 languages (as the compact links feature does) would hide this issue, but it would make sense to cut the long tail, be it a threshold of 1, 0.1 or 0.01 % of the population.

The implementation doesn't matter. Some alternatives to cutting the tail in getLanguagesInTerritory:

  1. the output could contain some data (like the population data in CLDR) so that mw.uls.getFrequentLanguageList can do a filtering on its own, or
  2. it could be a new jquery.uls function, wrapping getLanguagesInTerritory, which cuts the tail and would be used by mw.uls.getFrequentLanguageList .

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions