Hi,
The perturbations in Frangieh 2021 processing notebook are processed using this regex line
adata.obs['perturbation_name'] = [re.sub('[_123]+', '', s) for s in adata.obs['sgRNA']]
This should be changed to '_[123]+'. This is meant to strip the guide number, right now it also strips out any 1 2 or 3 in the gene.
Unique genes in adata.obs["sgRNA"]
Index(['A2M_1', 'A2M_2', 'A2M_3', 'ACSL3_1', 'ACSL3_2', 'ACSL3_3', 'ACTA2_1',
'ACTA2_2', 'ACTA2_3', 'AEBP1_1',
...
'VDAC2_3', 'WBP2_1', 'WBP2_2', 'WBP2_3', 'WNT7A_1', 'WNT7A_2',
'WNT7A_3', 'XAGE1A_1', 'XAGE1A_2', 'XAGE1A_3'],
dtype='object', length=818)
In the processed data file the current line changes the first gene A2M turns into AM. This loses information and makes matching guides to the data difficult.
Hi,
The perturbations in Frangieh 2021 processing notebook are processed using this regex line
This should be changed to
'_[123]+'. This is meant to strip the guide number, right now it also strips out any 1 2 or 3 in the gene.Unique genes in
adata.obs["sgRNA"]In the processed data file the current line changes the first gene
A2Mturns intoAM. This loses information and makes matching guides to the data difficult.