Skip to content

Verify image_gene_node_attributes.tsv generation is valid #1

@coleslaw481

Description

@coleslaw481

Double check we are generating this output correctly.

For example here is samples fragment:

filename,if_plate_id,position,sample,status,locations,antibody,ensembl_ids,gene_names
/archive/1270/1270_C11_1_,1270,C11,1,35,"Nucleoli,Nucleoplasm",HPA068294,"ENSG00000116251,ENSG00000163584","RPL22,RPL22L1"
/archive/1270/1270_C11_2_,1270,C11,2,35,"Nucleoli,Nucleoplasm",HPA068294,"ENSG00000116251,ENSG00000163584","RPL22,RPL22L1"
/archive/1542/1542_A12_1_,1542,A12,1,35,Nucleoli,HPA048060,ENSG00000163584,RPL22L1
/archive/1542/1542_A12_2_,1542,A12,2,35,Nucleoli,HPA048060,ENSG00000163584,RPL22L1

and here is unique fragment:

antibody,ensembl_ids,gene_names,atlas_name,locations,n_location
HPA068294,"ENSG00000116251,ENSG00000163584","RPL22,RPL22L1",U-2 OS,"Nucleoli,Nucleoplasm",2
HPA048060,ENSG00000163584,RPL22L1,U-2 OS,Nucleoli,1

For version 0.1.0 the resulting image_gene_node_attributes.tsv contains this:

name	represents	ambiguous	antibody	filename
RPL22L1	ensembl:ENSG00000163584	RPL22,RPL22L1	HPA068294,HPA048060	1542_A12_2_,1542_A12_1_,1270_C11_2_,1270_C11_1_
RPL22	ensembl:ENSG00000116251	RPL22,RPL22L1	HPA068294	1270_C11_2_,1270_C11_1_

The original jupyter notebook implementation kicked out this for image_gene_node_attributes.tsv:

name	represents	ambiguous	antibody	filename
RPL22	ensembl:ENSG00000116251	RPL22,RPL22L1	HPA068294	1270_C11_1_
RPL22L1	ensembl:ENSG00000163584	RPL22,RPL22L1	HPA048060	1542_A12_1_

Are we wrong or is the jupyter notebook implementation wrong?

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions