Skip to content

bug: sjs-latn duplication causing database build failure #312

@mcdurdin

Description

@mcdurdin
Running /var/www/html/tools/db/build/search-prepare-data-4.sql
Array
(
    [0] => 23000
    [1] => 2627
    [2] => [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Violation of PRIMARY KEY constraint 'PK__t_langta__DC101C0099698B5B'. Cannot insert duplicate key in object 'k0.t_langtag'. The duplicate key value is (sjs-latn).
    [3] => 01000
    [4] => 3621
    [5] => [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]The statement has been terminated.
)
Failure: PDOException: SQLSTATE[23000]: [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Violation of PRIMARY KEY constraint 'PK__t_langta__DC101C0099698B5B'. Cannot insert duplicate key in object 'k0.t_langtag'. The duplicate key value is (sjs-latn). in /var/www/html/tools/db/build/build.inc.php:199
Stack trace:
#0 /var/www/html/tools/db/build/build.inc.php(199): PDO->exec()
#1 /var/www/html/tools/db/build/build.inc.php(74): BuildDatabaseClass->sqlrun()
#2 /var/www/html/tools/db/build/build_cli.php(30): BuildDatabaseClass->BuildDatabase()
#3 {main}

reported by @darcywong00, reproduced by me


This issue arises when two .keyboard_info files reference the same language tag but use a different name for the tag. In this instance, there are (at least) two BCP 47 tags that are going to cause trouble, for amazigh_latin and amazigh_lat keyboards:

From amazigh_latin.kps:

<Language ID="sjs-Latn">Senhaja De Srair (Latin)</Language>
<Language ID="tia-Latn">Tidikelt Tamazight (Latin)</Language>

The language name in amazigh_latin.keyboard_info is constructed from the bcp47 data by kmc-keyboard-info, so it has:

    "sjs-Latn": {
      "examples": [],
      "languageName": "Senhaja Berber",
      "scriptName": "Latin",
      "displayName": "Senhaja Berber (Latin)"
    },

From amazigh_lat.keyboard_info (legacy):

    "sjs-Latn": {
      "displayName": "Senhaja De Srair (Latin)",
      "languageName": "Senhaja De Srair",
      "scriptName": "Latin"
    },

But this difference is only tripping us up because langtags.json does not include a sjs-latn tag, only sjs-Zyyy:

    {
        "full": "sjs-Zyyy-MA",
        "iana": [ "Senhaja De Srair" ],
        "iso639_3": "sjs",
        "name": "Senhaja Berber",
        "names": [ "Senhaja De Srair", "Senhaja de Srair", "Senhaja de Srayer Berber", "Senhajiya", "Shelha", "Shelha n Jbala", "Shilha", "Shilha Barbarya", "Shilha n Jbala", "Tajeblit", "Tamazight", "Tamazight n Jbala", "Tasenhajit", "Ššelḥa" ],
        "region": "MA",
        "regionname": "Morocco",
        "script": "Zyyy",
        "sldr": false,
        "tag": "sjs",
        "tags": [ "sjs-MA", "sjs-Zyyy" ],
        "unwritten": true,
        "windows": "sjs-Zyyy"
    },

And our search-prepare-data-4.sql script is canonicalizing langtags from keyboards, where there is no matching tag in langtags.json. So it appears that this is the first time two keyboards have had a tag that's not listed in langtags.json, and had mismatching descriptions for that tag.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingcommon/

    Type

    Projects

    Status

    Done

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions