binGO-GS

Genomic Prediction of Arabidopsis thaliana Using SNP Subset Selected by Integrating GO-terms and Combinational Optimization in Bins

Functional characteristics

We propose an improved GS method called binGO-GS, which integrates gene ontology (GO)-based prior biological knowledge with a novel bin-based variable selection algorithm to identify a subset of SNP markers that affect phenotypic variation, aiming to reduce genotyping costs and improve genomic prediction accuracy.Quantitative traits from two A. thaliana datasets with nearly 1000 samples and over half a million SNPs. Compared with using all the markers or randomly selected markers for prediction with support vector regression, the marker subset selected by binGO-GS achieved statistically significant improvements in prediction accuracy across all the traits, with p values of 0.0134 and 4.54×10-27, respectively. For the six other reference models, binGO-GS also showed significant improvements in all six models compared with using all the markers in these models. The markers selected by binGO-GS align with the polygenic hypothesis of minor-effect genes underlying complex quantitative traits, revealing that the selected markers for identical or similar morphological traits exhibit similar trends in quantity and distribution.

Installation procedure

Getting Started

Prerequisites

plink1.7
Rstudio
gtf file
R packages:
clusterProfiler
org.At.tair.db
data.table
rrBLUP

Steps to Install

Clone the repository:

git clone https://github.com/ZhijunBioinf/binGO-GS.git
cd binGO-GS
Rscript Explore_num_SNPS.R

Install dependencies:

pip install -r requirements.txt

Usage

Organize your genotypic and phenotypic data and gtf annotation files:

genotypic_train.csv
phenotypic_test.csv

Train the Model

To train the model, run:

Rscript Explore_num_SNPS.R
Rscript Find_target_subset.R

Contributing

This project was developed by:

Qingfang Ba (1468222359@qq.com) - Implementing
Zhijun Dai (daizhijun@hunau.edu.cn) - Supervisor
We welcome contributions from the community! Feel free to submit pull requests or raise issues.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Explore_num_SNPS.R		Explore_num_SNPS.R
Find_target_subset.R		Find_target_subset.R
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

binGO-GS

Genomic Prediction of Arabidopsis thaliana Using SNP Subset Selected by Integrating GO-terms and Combinational Optimization in Bins

Functional characteristics

Installation procedure

Getting Started

Prerequisites

Steps to Install

Usage

Train the Model

To train the model, run:

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

ZhijunBioinf/binGO-GS

Folders and files

Latest commit

History

Repository files navigation

binGO-GS

Genomic Prediction of Arabidopsis thaliana Using SNP Subset Selected by Integrating GO-terms and Combinational Optimization in Bins

Functional characteristics

Installation procedure

Getting Started

Prerequisites

Steps to Install

Usage

Train the Model

To train the model, run:

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages