-
Notifications
You must be signed in to change notification settings - Fork 18
Open
Milestone
Description
Building on #107, consolidate several issues (e.g., duplicate_rsid, discrepant_XY) into one dataframe with the following columns / dtypes:
| Column | pandas dtype |
|---|---|
rsid |
pd.StringDtype() |
chrom |
pd.CategoricalDtype() |
pos |
pd.UInt32Dtype() |
genotype |
pd.CategoricalDtype() |
duplicate_rsid |
pd.BooleanDtype() |
discrepant_loci |
pd.BooleanDtype() |
discrepant_XY |
pd.BooleanDtype() |
heterozygous_MT |
pd.BooleanDtype() |
discrepant_vcf_position |
pd.BooleanDtype() |
discrepant_merge_position |
pd.BooleanDtype() |
discrepant_merge_genotype |
pd.BooleanDtype() |
Multiple issue columns could take on the value of True, and getting SNPs with issues (e.g., discrepant_XY) could be handled by filtering the issues dataframe.
rsids could appear more than once in this dataframe. However, if an rsid has two or more rows that are equivalent (same values for chrom, pos, and genotype), their issues should be consolidated into one row, with the issue columns flagging the issue(s).
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels