-
Notifications
You must be signed in to change notification settings - Fork 22
Discussion R scripts NIPT
mdijkstra edited this page Oct 6, 2014
·
1 revision
- Currently data are not divided into train and test set. Therefore, dispersion of z-score is estimated too small.
- Arbitrary values. If we have train/test set we can optimise them based on data
- bin size (50k)
- chi^2 (3.5)
- 1.15 * CV
- 4 models
- each model has 4 predictors (could use adjusted R^2 or Bayesian variable selection)
- CV = sigma / mu is biased; according to http://en.wikipedia.org/wiki/Coefficient_of_variation it should be (1 + 1/(4n)) * sigma / mu
- CV_observed is based on ratio's of observed vs. predicted fractions of reads. CV_theoretical is based on absolute number of reads in sample. This seem to be different units?!
- According to http://en.wikipedia.org/wiki/Standard_score, the Z-score has sd in denominator, not CV! So I guess you mean SD_theoretical, SD_observed?
- Z-scores seem to be bi-modal or even tri-modal; i.e., not normally distributed! How come?
- The different prediction models should (1) agree and (2) be weighted to determine end result