| Literature DB >> 29619038 |
Fabio Cericola1, Ingo Lenk2, Dario Fè2, Stephen Byrne3,4, Christian S Jensen2, Morten G Pedersen2, Torben Asp3, Just Jensen1, Luc Janss1.
Abstract
Ryegrass single plants, bi-parental family pools, and multi-parental family pools are often genotyped, based on allele-frequencies using genotyping-by-sequencing (GBS) assays. <span class="Disease">GBS assays can be performed at low-coverage depth to reduce costs. However, reducing the coverage depth leads to a higher proportion of missing data, and leads to a reduction in accuracy when identifying the allele-frequency at each locus. As a consequence of the latter, genomic relationship matrices (GRMs) will be biased. This bias in GRMs affects variance estimates and the accuracy of GBLUP for genomic prediction (GBLUP-GP). We derived equations that describe the bias from low-coverage sequencing as an effect of binomial sampling of sequence reads, and allowed for any ploidy level of the sample considered. This allowed us to combine individual and pool genotypes in one GRM, treating pool-genotypes as a polyploid genotype, equal to the total ploidy-level of the parents of the pool. Using simulated data, we verified the magnitude of the GRM bias at different coverage depths for three different kinds of ryegrass breeding material: individual genotypes from single plants, pool-genotypes from F2 families, and pool-genotypes from synthetic varieties. To better handle missing data, we also tested imputation procedures, which are suited for analyzing allele-frequency genomic data. The relative advantages of the bias-correction and the imputation of missing data were evaluated using real data. We examined a large dataset, including single plants, F2 families, and synthetic varieties genotyped in three GBS assays, each with a different coverage depth, and evaluated them for heading date, crown rust resistance, and seed yield. Cross validations were used to test the accuracy using GBLUP approaches, demonstrating the feasibility of predicting among different breeding material. Bias-corrected GRMs proved to increase predictive accuracies when compared with standard approaches to construct GRMs. Among the imputation methods we tested, the random forest method yielded the highest predictive accuracy. The combinations of these two methods resulted in a meaningful increase of predictive ability (up to 0.09). The possibility of predicting across individuals and pools provides new opportunities for improving ryegrass breeding schemes.Entities:
Keywords: Perennial ryegrass; family pools; genomic prediction; genomic relationship matrix; genotyping by sequencing; missing value imputation; sequencing depth
Year: 2018 PMID: 29619038 PMCID: PMC5871745 DOI: 10.3389/fpls.2018.00369
Source DB: PubMed Journal: Front Plant Sci ISSN: 1664-462X Impact factor: 5.753
Figure 1The average GRM diagonal elements observed for F2 families, as a function of the population size of the F1 and the F2 families.
Figure 2Result of the simulation study. Diagonal elements of GRM at different coverage depths (ST) are denoted before (Db: orange) and after (Dc: blue) correcting for low ST bias. The red-dashed lines represent the diagonal element of the GRM calculated with true allele frequencies.
Figure 3The diagonal elements of GRMs plotted against sample averages for coverage depths (ST). Different breeding materials are colored as follow: biparental F2 families (green dots), multiparental synthetic varieties (blue dots), and single plants (red dots). (A) Shows GRM diagonal elements before low ST-bias correction; (B) shows GRM diagonal elements after low ST-bias correction.
Figure 4Predictive ability (PA) estimated with the leave-one-out cross-validation strategy. Result obtained by using three different imputation strategies (mean imputation MNi, k-nearest-neighbor kNNi, and random forest RFi) and two bias correction procedures for the allele-frequencies estimates (biased diagonal Db and corrected diagonal Dc), for three different traits (heading date HD, crown rust resistance CRR, and seed yield SY) in F2 families (pools), SYNthetic varieties (pools) and Single Plants.
Figure 5The predictive ability (PA) estimated with the across-set cross-validation procedure. Result obtained by using three different imputation strategies (mean imputation MNi, k-nearest-neighbor kNNi, and random forest RFi) and two bias-correction procedures for the allele-frequencies estimates (biased diagonal Db and corrected diagonal Dc), for three different traits (heading date HD, crown rust resistance CRR, and seed yield SY) in F2 families (pools), SYNthetic varieties (pools) and Single Plants.