Skip to content

SNPs with missing genotypes result in all-NA GWAS output #8

@xiaolongsong00

Description

@xiaolongsong00

Hello,

I am using FarmCPUpp with my own genotype (GD), genotype map (GM), and phenotype (Y) files.
When I run farmcpu(), I notice that some SNPs in the GWAS output have all results as NA:

From my observation, these NA results happen whenever the corresponding SNP in the GD file contains at least one missing value (NA).

My question

  • Does FarmCPUpp currently support handling SNPs with missing values?
  • Should missing values in the numeric GD file be represented as NA, -9, or something else?
  • If missing data are not supported, what is the recommended pre-processing approach (e.g. imputation, filtering)?

Example of my workflow

library(bigmemory)
library(FarmCPUpp)

myY <- read.table("taxa.txt", header = TRUE, stringsAsFactors = FALSE)
myGM <- read.table("mdp_SNP_information.txt", header = TRUE, stringsAsFactors = FALSE)

myGD <- read.big.matrix("mdp_numeric.txt",
                        type = "double", sep = "\t", header = TRUE,
                        col.names = myGM$SNP, ignore.row.names = FALSE,
                        has.row.names = TRUE,
                        backingfile = "mdp_numeric.bin",
                        descriptorfile = "mdp_numeric.desc")

myResults <- farmcpu(Y = myY, GD = myGD, GM = myGM)



Could you please clarify:

What is the correct way to represent missing genotypes in the numeric GD file?

If missing values are not supported, is imputation required before running FarmCPUpp?

Thank you for your time and for maintaining this package!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions