| Literature DB >> 35187586 |
Rujian Sun1,2,3, Bincheng Sun3, Yu Tian2, Shanshan Su4, Yong Zhang5, Wanhai Zhang3, Jingshun Wang3, Ping Yu3, Bingfu Guo2, Huihui Li2, Yanfei Li2, Huawei Gao2, Yongzhe Gu2, Lili Yu2, Yansong Ma2, Erhu Su6, Qiang Li6, Xingguo Hu3, Qi Zhang3, Rongqi Guo3, Shen Chai3, Lei Feng3, Jun Wang2, Huilong Hong2, Jiangyuan Xu2, Xindong Yao7, Jing Wen2, Jiqiang Liu4, Yinghui Li8,9, Lijuan Qiu10,11.
Abstract
KEY MESSAGE: We developed the ZDX1 high-throughput functional soybean array for high accuracy evaluation and selection of both parents and progeny, which can greatly accelerate soybean breeding. Microarray technology facilitates rapid, accurate, and economical genotyping. Here, using resequencing data from 2214 representative soybean accessions, we developed the high-throughput functional array ZDX1, containing 158,959 SNPs, covering 90.92% of soybean genes and sites related to important traits. By application of the array, a total of 817 accessions were genotyped, including three subpopulations of candidate parental lines, parental lines and their progeny from practical breeding. The fixed SNPs were identified in progeny, indicating artificial selection during the breeding process. By identifying functional sites of target traits, novel soybean cyst nematode-resistant progeny and maturity-related novel sources were identified by allele combinations, demonstrating that functional sites provide an efficient method for the rapid screening of desirable traits or gene sources. Notably, we found that the breeding index (BI) was a good indicator for progeny selection. Superior progeny were derived from the combination of distantly related parents, with at least one parent having a higher BI. Furthermore, new combinations based on good performance were proposed for further breeding after excluding redundant and closely related parents. Genomic best linear unbiased prediction (GBLUP) analysis was the best analysis method and achieved the highest accuracy in predicting four traits when comparing SNPs in genic regions rather than whole genomic or intergenic SNPs. The prediction accuracy was improved by 32.1% by using progeny to expand the training population. Collectively, a versatile assay demonstrated that the functional ZDX1 array provided efficient information for the design and optimization of a breeding pipeline for accelerated soybean breeding.Entities:
Mesh:
Year: 2022 PMID: 35187586 PMCID: PMC9033737 DOI: 10.1007/s00122-022-04043-w
Source DB: PubMed Journal: Theor Appl Genet ISSN: 0040-5752 Impact factor: 5.574
Fig. 1Summary information content of ZDX1 array. a Pipeline of single nucleotide polymorphism (SNP) identification and selection for the ZDX1 array. b The distribution of SNP loci on the soybean chromosomes. c The percentage of gene coverage in the ZDX1 array, the SoySNP50K array, the 180 K AXIOM® array, and the NJAU 355 K SoySNP array. d The number of SNPs belonging to different minor allele frequency (MAF) classes based on 2214 soybean accessions. e Venn diagram showing the overlap of SNP positions between the ZDX1, SoySNP50K, 180 K AXIOM®, and NJAU 355 K SoySNP arrays
Allelic combinations at the rhg1-a, Rhg4, and GmSNAP11 loci
| Combination | Number of parental lines | Number of candidate parental lines | Number of progeny | |||
|---|---|---|---|---|---|---|
| Com1 | GG | GG | TT | 0 | 6 | 1 |
| Com2 | CC | CC | CC | 76 | 162 | 557 |
| Com3 | CC | CC | TT | 0 | 0 | 3 |
| Com4 | GG | CC | TT | 0 | 0 | 2 |
| Com5 | CC | GG | CC | 1 | 0 | 6 |
| Com6 | GG | CC | CC | 0 | 0 | 1 |
| Com7 | CG | CC | CC | 0 | 0 | 1 |
| Com8 | CG | GC | TC | 0 | 1 | 0 |
Fig. 2Analysis of genetic diversity of breeding population and screening of fixed sites in breeding improvement. a Linkage disequilibrium (LD) decay of r2 and physical distance between single nucleotide polymorphisms (SNPs) in parental lines, candidate parental lines, and progeny. b Principal component analysis (PCA) of 77 parental lines and 169 candidate parental lines based on kinship. Individuals from the same species are shown in the same color. c A scatter plot showing the minor allele frequencies (MAFs) for the parental lines and candidate parental lines at 6579 sites with the MAF of progeny = 0
Fig. 3Mean value of parents and progeny, and the rate over best-parent of progeny for five traits plotted against genetic distance. The blue diamonds represent the average parental values, the red circles represent the average progeny, and the yellow triangles represents the rate over best-parent of progeny. The genetic distance is the mean value under different rate over best-parent; rhd represents the correlation coefficient between the rate over best-parent of progeny and the genetic relationship between parents; and rpo represents the correlation coefficient between the mean value of progeny and the mean value of parents. Beginning maturity (R7), 100-seed weight (SW), seed yield (SY)
Fig. 4The relationship between the top 10% of progeny in multiple traits and their parental lines. The blue box in the center is the top 10% of progeny with high breeding index (BI) values. They are arranged in order from high to low from left to right. The BI values are given below the box. The parents of these lines are classified by BI value; the top third of lines with the highest BI values are the high parents; the middle third are the medium parents; and the bottom third are the low parents. The bar graph at the bottom shows the kinship between the parental lines
Fig. 5Different strategies based on the ZDX1 array in genomic selection. a The prediction accuracy (rGS) of three models for five traits with 100 repetitions using fivefold cross-validation. The prediction accuracy is shown as the mean value ± standard deviation. b Prediction accuracy of selected sites for gene region, whole genome, and intergenic region markers. The prediction accuracy is shown as the mean value ± standard deviation. c Simulating the process of predicting progeny performance by parental resources in actual breeding and the prediction process after using progeny to expand the training population. d Prediction accuracy for five traits for the 246 parents (Training Population I) and 246 parents + 141 progeny (Training Population II) used as training populations for prediction. Genomic best linear unbiased prediction (GBLUP), pedigree-based best linear unbiased prediction (ABLUP), combined best linear unbiased prediction (HBLUP), beginning maturity (R7), 100-seed weight (SW), seed yield (SY)
Fig. 6Optimized scheme for using genome-wide molecular marker breeding combined with array screening. Germplasm resources are introduced from a resource bank, redundant accessions are eliminated through genetic diversity analysis, and accessions with excellent alleles are retained. Germplasm accessions with higher breeding index (BI) values are used as one of the candidate parents in cross breeding, and the superior resources are further screened for those with highly distant genetic relationships for cross breeding. A microarray is then used for F1 identification, hybrid segregation combined with phenotypic selection, and whole-genome selection. Germplasm with high breeding values with multiple excellent traits can also be used as recurrent parents. When germplasm with specific traits is used for backcross improvement, functional markers can be used for foreground selection, and microarrays can be used for genome-wide background scanning, combined with phenotypes for selection, resulting in the selection of excellent stable lines. The green dashed boxes indicate the commonly used breeding method, and the boxes enclosed by solid yellow lines represent the improved scheme proposed in this study