-
Notifications
You must be signed in to change notification settings - Fork 71
Description
mw.uls.getFrequentLanguageList blindly appends whatever $.uls.data.getLanguagesInTerritory( countryCode ) spits to the list of "common languages" for a territory.
If we look deeper into why languages suggested for Italy are so wrong ( https://bugzilla.wikimedia.org/62346), in addition to the issues already reported to CLDR there is the issue that we're not applying any threshold.
For instance, CLDR tells us that hr is spoken by 0.0057 % of the population, which is probably correct, but nevertheless hr manages to get into the list of "common" languages, which is absurd. I know that if the data was better then picking the top 7-9 languages (as the compact links feature does) would hide this issue, but it would make sense to cut the long tail, be it a threshold of 1, 0.1 or 0.01 % of the population.
The implementation doesn't matter. Some alternatives to cutting the tail in getLanguagesInTerritory:
- the output could contain some data (like the population data in CLDR) so that mw.uls.getFrequentLanguageList can do a filtering on its own, or
- it could be a new jquery.uls function, wrapping getLanguagesInTerritory, which cuts the tail and would be used by mw.uls.getFrequentLanguageList .