Repository contains data used in vignettes of coRdon, the R/Bioconductor package for codon usage analysis. Due to size limitations, these files are not part of the original package repository.
The HD59.fasta and LD94.fasta files contain DNA sequences from a healthy
individual's gut metagenome, and from liver cirrhosis patient, respectively.
This data is randomly selected part of larger set of sequence data from the
extensive gut microbiome wide association study
(Qin et al., 2014),
available from European Nucleotide Archive (ENA) under the accession number
ERP005860. Sequences in each sample were previously quality-filtered, assembled
and used to predict ORFs, which were then annotated with a KO (KEGG orthology)
function
(Fabijanić and Vlahoviček 2016).
Qin, N., Yang, F., Li, A., Prifti, E., Chen, Y., Shao, L., Guo, J., Le Chatelier, E., Yao, J., Wu, L., Zhou, J., Ni, S., Liu, L., Pons, N., Batto, J. M., Kennedy, S. P., Leonard, P., Yuan, C., Ding, W., Chen, Y., Hu, X., Zheng, B., Qian, G., Xu, W., Ehrlich, S. D., Zheng, S. and Li, L. (2014) Alterations of the human gut microbiome in liver cirrhosis. Nature, 513(7516), pp. 59–64.
Fabijanic, M. and Vlahovicek, K. (2016) Big Data, Evolution, and Metagenomes: Predicting Disease from Gut Microbiota Codon Usage Profiles. In: Carugo O., Eisenhaber F. (eds) Data Mining Techniques for the Life Sciences. Methods in Molecular Biology, vol 1415. Humana Press, New York, NY