Skip to content

Allow multiple texts with same URN #130

@sciepsilon

Description

@sciepsilon

LingView assumes each FLEx and ELAN file will have a distinct URN value. (The URN is the unique 36-character ID that FLEx or ELAN assigns to each file, such as 97b8ab3b-d2a5-428a-aa68-0aa304ba1c44.) When two files in the data directory have the same URN value, LingView silently omits one of them from the Index of Texts page.

In the wild, sometimes two files do have the same URN value. This issue was brought to our attention on the Kaqchikel site in June of 2023, where it affected the Solola_2013_spk01_IZ_se_convertio_tigre.eaf and Solola_2013_spk02_CP_educacion_bilingue.eaf files and possibly others. We don't know why these files have the same URN, but one theory is each one may have been created by copying the previous ELAN file and then replacing the contents.

There are several actions we could take to make this issue less bad:

  • (Easy) Have LingView print an error message in the preprocessing step if there are two files with the same URN. Include the file name of each of the affected files so it's easier for the user to fix.
  • (Hard) Or, change LingView's internal ID for each file from just its URN, to something related to the filename. Be careful with how we handle special characters in the filename, to make sure they can't break the site.
  • (Hard) Investigate what causes ELAN to assign the same URN to multiple files. Reach out to the developers of ELAN and ask if they consider this to be a bug, and whether they're willing to fix it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementhours or daysTasks that will take more than an hour, but less than 20 hours

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions