This repository contains data and analysis code supporting the manuscript:
“Genetic dissection of southern corn leaf blight resistance in sweet corn through genome-wide association studies and genomic selection.”
- 1.Data/ – Raw and processed datasets (phenotypic data, BLUEs, covariates, genotypic calls) used in all analyses.
- 2.Codes/ – Scripts used for phenotypic modeling, genome-wide association studies (GWAS), and genomic selection (GS) analyses described in the manuscript.
- 3.Results/ – Outputs from GWAS and genomic selection analyses, including association statistics, diagnostic plots, and model summaries.
- 4.LeafCV/ – Computer vision–based pipeline for estimating diseased leaf area (DLA) from images of SCLB-infected leaves.
| File name | Description |
|---|---|
19Pheno_SCLB.csv |
Phenotypic data for southern corn leaf blight (SCLB) from the 2019 field trial, including plot-level disease scores and trial metadata. |
20Pheno_SCLB.csv |
Phenotypic data for SCLB from the 2020 field trial. |
21Pheno_SCLB.csv |
Phenotypic data for SCLB from the 2021 field trial. |
23Pheno_SCLB.csv |
Phenotypic data for SCLB from the 2023 field trial. |
24Pheno_SCLB.csv |
Phenotypic data for SCLB from the 2024 field trial. |
dat_mtm.csv |
Combined multi-trial phenotype matrix across years, formatted for mixed-model and genomic selection analyses. |
dat2019.csv |
Phenotypic data for SCLB (manual scoring) and DLA (computer vision–based) from the 2019 field trial. |
21Pheno.csv |
Phenotypic data for SCLB (manual scoring) and DLA (computer vision–based) from the 2021 field trial. |
2019_BLUEs.csv |
Single-environment BLUEs for SCLB (visual scoring) and DLA (CV-based) from the 2019 field trial. |
2020_BLUEs.csv |
Single-environment BLUEs for SCLB (visual scoring) from the 2020 field trial. |
2021_BLUEs.csv |
Single-environment BLUEs for SCLB (visual scoring) and DLA (CV-based) from the 2021 field trial. |
2023_BLUEs.csv |
Single-environment BLUEs for SCLB (visual scoring) from the 2023 field trial. |
2024_BLUEs.csv |
Single-environment BLUEs for SCLB (visual scoring) from the 2024 field trial. |
5yr_SCLB_BLUEs_v2.csv |
Multi-environment BLUEs for SCLB (visual scoring) across the 2019, 2020, 2021, 2023, and 2024 field trials. |
2yr_DLA_Log_BLUEs_v2.csv |
Multi-environment BLUEs for DLA (CV-based) across the 2019 and 2021 field trials. |
BLUEs_for_EMMAX/ |
BLUEs formatted as input files for GWAS using EMMAX (MLM). |
BLUEs_for_GAPIT/ |
BLUEs formatted as input files for GWAS using GAPIT (FarmCPU). |
sweetcallsCV/ |
Covariates for GWAS analyses (e.g., su1, sh2, bt1, se1). |
BLUE4Env.RData |
R data object containing BLUEs structured by environment for multi-environment modeling. |
ECData_WMat.RData |
Environmental covariates and W-matrix used in environment-aware genomic prediction models. |
Ia453_sweetcap_v0.4_16M.BN.kinf |
Kinship matrix derived from the SweetCAP ~16M SNP dataset for EMMAX-based GWAS. |
SNP_data.Rdata |
R data object containing 128,202 SNP genotype data used in genomic selection. |
emmax_cov_3COV3PC.tsv |
Covariate file for EMMAX GWAS analyses, including three biological covariates and three principal components. |
relMatrix.RData |
Genomic relationship matrix used in GBLUP and other mixed-model analyses. |
Notes:
- These files were generated using scripts in
2.Codes/. - RData objects preserve internal data structures required for reproducible downstream analyses.
| File name | Description |
|---|---|
1.EMMAX_GWAS2.sh |
Shell script for GWAS using EMMAX with a linear mixed model (LMM) and kinship correction. |
gapitmodv.R |
R script implementing GWAS using GAPIT with the FarmCPU model. |
SCLB_BLUEs_models.R |
R script for fitting mixed models and estimating BLUEs for SCLB and DLA_Log phenotypes across environments. |
function.R |
Modified GAPIT function.R to accommodate ~16M SNPs from the SweetCAP whole-genome resequencing dataset. |
RUNME_GetEC_W_BayesB_CV*_revisedforpaper.R |
R scripts for genomic prediction using BayesB, incorporating environmental covariates under CV0 and CV1 schemes. |
RUNME_GetEC_W_GBLUP_CV*_revisedforpaper.R |
R scripts for genomic prediction using GBLUP, incorporating environmental covariates under CV0 and CV1 schemes. |
Notes:
RUNMEscripts define specific modeling and cross-validation scenarios.- Cross-validation schemes follow standard genomic prediction benchmarking frameworks (CV0, CV1).
- Scripts rely on input files prepared in
1.Data/.
This directory contains final outputs from GWAS and genomic selection analyses.
| Directory | Description |
|---|---|
GAPIT_output/ |
GWAS results generated using GAPIT (FarmCPU), including Manhattan plots, QQ plots, and SNP association tables. |
sweetcap_16M_3COV3PC/ |
GWAS results from EMMAX / LMM-based analyses using the SweetCAP ~16M SNP dataset, adjusted for three covariates and three principal components. |
GenomicSelection/ |
Results from BayesB and GBLUP genomic prediction models evaluated under CV0 and CV1 schemes. |
README.md |
Detailed documentation describing result structure and interpretation. |
Notes:
- Results correspond to phenotypes and models described in the manuscript.
- Only final outputs are retained; intermediate files have been removed.
This directory contains a computer vision–based pipeline for estimating diseased leaf area (DLA) from images of SCLB-infected sweet corn leaves.
- Scripts and inputs are organized by year (e.g.,
Spring19,Spring21). - DLA estimates generated here are incorporated into phenotypic datasets in
1.Data/. - These phenotypes are subsequently used in GWAS and genomic selection analyses.
For additional details, please refer to the README files within each subdirectory or the associated manuscript.