| Literature DB >> 35275252 |
Linfeng Chen1,2, Shouping Yang3, Susan Araya1, Charles Quigley1, Earl Taliercio4, Rouf Mian4, James E Specht5, Brian W Diers6, Qijian Song7.
Abstract
KEY MESSAGE: Software for high imputation accuracy in soybean was identified. Imputed dataset could significantly reduce the interval of genomic regions controlling traits, thus greatly improve the efficiency of candidate gene identification. Genotype imputation is a strategy to increase marker density of existing datasets without additional genotyping. We compared imputation performance of software BEAGLE 5.0, IMPUTE 5 and AlphaPlantImpute and tested software parameters that may help to improve imputation accuracy in soybean populations. Several factors including marker density, extent of linkage disequilibrium (LD), minor allele frequency (MAF), etc., were examined for their effects on imputation accuracy across different software. Our results showed that AlphaPlantImpute had a higher imputation accuracy than BEAGLE 5.0 or IMPUTE 5 tested in each soybean family, especially if the study progeny were genotyped with an extremely low number of markers. LD extent, MAF and reference panel size were positively correlated with imputation accuracy, a minimum number of 50 markers per chromosome and MAF of SNPs > 0.2 in soybean line were required to avoid a significant loss of imputation accuracy. Using the software, we imputed 5176 soybean lines in the soybean nested mapping population (NAM) with high-density markers of the 40 parents. The dataset containing 423,419 markers for 5176 lines and 40 parents was deposited at the Soybase. The imputed NAM dataset was further examined for the improvement of mapping quantitative trait loci (QTL) controlling soybean seed protein content. Most of the QTL identified were at identical or at similar position based on initial and imputed datasets; however, QTL intervals were greatly narrowed. The resulting genotypic dataset of NAM population will facilitate QTL mapping of traits and downstream applications. The information will also help to improve genotyping imputation accuracy in self-pollinated crops.Entities:
Mesh:
Year: 2022 PMID: 35275252 PMCID: PMC9110473 DOI: 10.1007/s00122-022-04070-7
Source DB: PubMed Journal: Theor Appl Genet ISSN: 0040-5752 Impact factor: 5.574
Fig. 1The effect of parameter ne on imputation accuracy for BEAGLE 5.0 a and IMPUTE 5 b. The tests were conducted for the study panel with 50 SNP per chromosome. The red and green squares separately indicate imputation accuracy under default and adapted setting
Fig. 2Genotype imputation accuracy based on different numbers of SNPs in study panels. BEAGLE 5.0 and IMPUTE 5 were performed for population-based imputation, while AlphaPlantImpute was performed for bi-parental-based imputation. The imputation accuracy of filling and without-filling in AlphaPlantImpute were displayed
Fig. 3The imputation accuracy and missing rate under different genotype probability cutoff values from 0.5 to 0.9 for BEAGLE 5.0 and IMPUTE 5. GP = 0 means without GP filtering. The tests were conducted for study panels with 5, 50 and 160 SNPs per chromosome
Fig. 4The effect of linkage disequilibrium and minor allele frequency for BEAGLE 5.0 and IMPUTE 5. The tests were conducted for the study panel with 50 SNPs per chromosome. The linkage disequilibrium was measured by r2 between imputed marker and the closest marker in the study panel. The minor allele frequency was calculated for imputed markers
Fig. 5Imputation performance for a germplasm population. a The 100 accessions from 500 soybean accessions were randomly masked 10% or 50% SNPs for all individual entries to simulate study panels. Different numbers of individuals were used to study the effect of reference panel size. b Genotype data of different missing levels from 50 to 10% were simulated to study the imputation performance for datasets with sporadic missing
Fig. 6QTL identified for soybean seed protein content in joint linkage analysis. The QTLs detected from original dataset and imputed dataset based on SoySNP50K BeadChip were marked on chromosomes by the confidence intervals. Green square for original dataset and red square for imputed dataset