Skip to content

Discussion R scripts NIPT

mdijkstra edited this page Oct 6, 2014 · 1 revision
  • Currently data are not divided into train and test set. Therefore, dispersion of z-score is estimated too small.
  • Arbitrary values. If we have train/test set we can optimise them based on data
  • bin size (50k)
  • chi^2 (3.5)
  • 1.15 * CV
  • 4 models
  • each model has 4 predictors (could use adjusted R^2 or Bayesian variable selection)
  • CV = sigma / mu is biased; according to http://en.wikipedia.org/wiki/Coefficient_of_variation it should be (1 + 1/(4n)) * sigma / mu
  • CV_observed is based on ratio's of observed vs. predicted fractions of reads. CV_theoretical is based on absolute number of reads in sample. This seem to be different units?!
  • According to http://en.wikipedia.org/wiki/Standard_score, the Z-score has sd in denominator, not CV! So I guess you mean SD_theoretical, SD_observed?
  • Z-scores seem to be bi-modal or even tri-modal; i.e., not normally distributed! How come?
  • The different prediction models should (1) agree and (2) be weighted to determine end result

Clone this wiki locally