Skip to content

WALDO publication sample assignment has incorrect lookup behavior #273

@gregsoos

Description

@gregsoos

WALDO has a webform where publications can be assigned to samples / genetic analyses. There seems to be an issue with how the form creates or modifies these publication labels.

It seems to me that the webform looks for publication labels assigned to a sample matching that indicated by the sample_id or external_id in the user-uploaded file. If no such label is found, it creates a new one. If one such label is found, it overwrites it with the content from the user's file. If it finds more than one, it throws the MultipleObjectsReturned exception. (Note: I say this from observing behavior, I actually have not dug into the code.)

This seems like it would have been basically the correct behavior when everything in WALDO was still by-sample, but with genetic analyses to consider, this seems incorrect. I think the publication label lookup should be done on both the sample (from sample_id and/or external_id in the upload form) as well as the genetic analysis (from genetic_id).

As it currently stands, we actually cannot assign publications for genetic analyses attached to samples which have already been published. Consider, for example, the genetic ID "B_Australian-3_v66.DG". This is a re-processing of "B_Australian-3.DG" and should be assigned to the "MallickReichNature2016" publication. If we upload the form with only the genetic_id of "B_Australian-3_v66.DG" filled out but sample_id and external_id left blank, the form seems to attempt to lookup a sample with an empty string ID, resulting in the following error:

DoesNotExist at /samples/publication_sample_update
Sample matching query does not exist.

On the other hand, if we include the external ID, putting "B_Australian-3" in the external_id column, we instead get the following error:

MultipleObjectsReturned at /samples/publication_sample_update
get() returned more than one PublicationLabels -- it returned 2!

If instead we were handling a genetic analysis associated with a sample with only one corresponding publication label, the webform would simply overwrite that existing label. This means that, for example, when we publish both the non-damage restricted and damage restricted versions of external data, the form can only handle publishing either the _d or the non-_d version. Including both in the upload form causes only the second one to actually end up being published because the first label is simply overwritten.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions