| Literature DB >> 24082033 |
Bettina Lado1, Ivan Matus, Alejandra Rodríguez, Luis Inostroza, Jesse Poland, François Belzile, Alejandro del Pozo, Martín Quincke, Marina Castro, Jarislav von Zitzewitz.
Abstract
In crop breeding, the interest of predicting the performance of candidate cultivars in the field has increased due to recent advances in molecular breeding technologies. However, the complexity of the wheat genome presents some challenges for applying new technologies in molecular marker identification with next-generation sequencing. We applied genotyping-by-sequencing, a recently developed method to identify single-nucleotide polymorphisms, in the genomes of 384 wheat (Triticum aestivum) genotypes that were field tested under three different water regimes in Mediterranean climatic conditions: rain-fed only, mild water stress, and fully irrigated. We identified 102,324 single-nucleotide polymorphisms in these genotypes, and the phenotypic data were used to train and test genomic selection models intended to predict yield, thousand-kernel weight, number of kernels per spike, and heading date. Phenotypic data showed marked spatial variation. Therefore, different models were tested to correct the trends observed in the field. A mixed-model using moving-means as a covariate was found to best fit the data. When we applied the genomic selection models, the accuracy of predicted traits increased with spatial adjustment. Multiple genomic selection models were tested, and a Gaussian kernel model was determined to give the highest accuracy. The best predictions between environments were obtained when data from different years were used to train the model. Our results confirm that genotyping-by-sequencing is an effective tool to obtain genome-wide information for crops with complex genomes, that these data are efficient for predicting traits, and that correction of spatial variation is a crucial ingredient to increase prediction accuracy in genomic selection models.Entities:
Keywords: GBLUP; GenPred; Shared data resources; genomic selection; genotyping-by-sequencing; quantitative trait locus; single nucleotide polymorphism; spatial Correction; wheat
Mesh:
Year: 2013 PMID: 24082033 PMCID: PMC3852373 DOI: 10.1534/g3.113.007807
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Description of models used to adjust the phenotypic data
| Model Name | Model Expression |
|---|---|
| IB | y = g i + repj + bl(rep)ijk + eijk |
| RC | y = g i +rep j + fil(rep) jk + col(rep) jl + eijkl |
| RCB_MVNG | y = gi + βxj + repk + eijk |
| MVNG | y = u + βxi + ei |
IB, incomplete blocks, field design; g, treatment; rep, repetitions; bl(rep), incompletes blocks nested in repetitions; e, residual; u, general mean; RC, row by column model; fil(rep), rows nested in repetitions; col(rep), columns nested in repetitions; RCB_MVNG, random complete block model with moving means as covariable; x, covariable as phenotypic value of plot minus means of neighbors plots within grid; MVNG, linear regression model with moving means as covariable.
Figure 1Diagram to calculate the covariable xi. Yi is the phenotypic value in the plot. The neighboring plots are indicated with gray color.
Figure 2PCoA from dissimilarity matrix calculated with genetic data. Points in red represent advanced lines from the INIA Chile breeding program, green points identify lines from CIMMYT, and blue points denote advanced lines and prebreeding lines from the INIA Uruguay breeding program.
Figure 3GBS-based SNP distribution along different wheat chromosomes.
Figure 4Plot residuals along the field for each model analysis for Santa Rosa irrigated. The color scale shows the value of residuals as indicated. (A) Residuals for incomplete blocks, field design; (B) residuals for RC; (C) residuals for RCB_MVNG; (D) residuals for MVNG.
Figure 5Plot residuals along the field for each model analysis for Santa Rosa nonirrigated trial. The color scale shows the value of residual effects as indicated. (A) Residuals for incomplete blocks, field design; (B) residuals for RC; (C) residuals for RCB_MVNG; (D) residuals for MVNG
Broad sense heritability for each field trial
| 2011 | 2012 | ||||
|---|---|---|---|---|---|
| H2 IB | H2 RC | H2 RCB_MVNG / MVNG | H2 | ||
| SR_FI | GY | 0.42 | 0.44 | 0.57 | 0.622 |
| TKW | 0.88 | 0.88 | 0.89 | — | |
| DH | 0.95 | 0.95 | 0.95 | — | |
| NKS | 0.76 | 0.76 | 0.74 | — | |
| SR_MWS | GY | 0.33 | 0.37 | 0.56 | 0.641 |
| TKW | 0.79 | 0.81 | 0.82 | — | |
| DH | 0.93 | 0.93 | 0.93 | — | |
| NKS | 0.74 | 0.75 | 0.75 | — | |
| C_WS | GY | — | — | — | 0.340 |
H2, broad sense heritability; IB, incomplete blocks, field design; RC, row by column model; RCB_MVNG, random complete block model with moving means as covariable; MVNG, linear regression model with moving means as covariable; SR_FI, Santa Rosa under Full Irrigation; SR_MWS, Santa Rosa under Mild Water Stress; C_SWS, Cauquenes under severe water stress; GY, grain yield; TKW, thousand kernel weight; DH, days to heading; NKS, number of kernels per spike.
Accuracy of predictions for each trial in 2011 using random training sets with 100 independent randomizations
| IB | RC | RCB_MVNG | MVNG | |||
|---|---|---|---|---|---|---|
| SR_FI | GY | RR | 0.298 ± 0.117 | 0.296 ± 0.119 | 0.319 ± 0.114 | 0.319 ± 0.113 |
| GAUSS | 0.312 ± 0.117 | 0.310 ± 0.120 | 0.325 ± 0.117 | 0.326 ± 0.116 | ||
| TKW | RR | 0.780 ± 0.056 | 0.780 ± 0.056 | 0.777 ± 0.057 | 0.843 ± 0.040 | |
| GAUSS | 0.786 ± 0.055 | 0.786 ± 0.055 | 0.782 ± 0.056 | 0.847 ± 0.039 | ||
| DH | RR | 0.409 ± 0.109 | 0.409 ± 0.109 | 0.405 ± 0.109 | 0.579 ± 0.123 | |
| GAUSS | 0.436 ± 0.111 | 0.436 ± 0.111 | 0.433 ± 0.111 | 0.614 ± 0.121 | ||
| NKS | RR | 0.479 ± 0.114 | 0.479 ± 0.114 | 0.484 ± 0.115 | 0.665 ± 0.077 | |
| GAUSS | 0.487 ± 0.119 | 0.487 ± 0.119 | 0.492 ± 0.120 | 0.669 ± 0.075 | ||
| SR_MWS | GY | RR | 0.236 ± 0.141 | 0.275 ± 0.147 | 0.231 ± 0.127 | 0.347 ± 0.134 |
| GAUSS | 0.231 ± 0.144 | 0.273 ± 0.150 | 0.260 ± 0.128 | 0.370 ± 0.132 | ||
| TKW | RR | 0.759 ± 0.061 | 0.762 ± 0.061 | 0.757 ± 0.058 | 0.841 ± 0.034 | |
| GAUSS | 0.764 ± 0.059 | 0.767 ± 0.059 | 0.761 ± 0.057 | 0.845 ± 0.034 | ||
| DH | RR | 0.398 ± 0.110 | 0.399 ± 0.110 | 0.396 ± 0.110 | 0.563 ± 0.134 | |
| GAUSS | 0.423 ± 0.108 | 0.423 ± 0.108 | 0.423 ± 0.108 | 0.604 ± 0.134 | ||
| NKS | RR | 0.464 ± 0.115 | 0.466 ± 0.114 | 0.458 ± 0.114 | 0.608 ± 0.088 | |
| GAUSS | 0.483 ± 0.111 | 0.485 ± 0.111 | 0.478 ± 0.111 | 0.608 ± 0.086 |
IB, incomplete blocks, field design; RC, row by column model; RCB_MVNG, random complete block model with moving means as covariable; MVNG, linear regression model with moving means as covariable; SR_FI, Santa Rosa under full irrigation; GY, grain yield; RR, Ridge regression kernel; GAUSS, Gaussian kernel; TKW, thousand kernel weight; DH, days to heading; NKS, number of kernels per spike; SR_MWS, Santa Rosa under mild water stress.
Accuracy of prediction for yield in 2012 using random training sets with 100 independent randomizations
| RR | GAUSS | |
|---|---|---|
| SR_FI | 0.487 ± 0.093 | 0.516 ± 0.086 |
| SR_MWS | 0.617 ± 0.078 | 0.626 ± 0.077 |
| C_SWS | 0.382 ± 0.104 | 0.378 ± 0.104 |
RR, Ridge regression kernel; GAUSS: Gaussian kernel; SR_FI, Santa Rosa under Full Irrigation; SR_MWS, Santa Rosa under Mild Water Stress; C_SWS, Cauqenes under Severe Water Stress.
Accuracy of predictions between different environments
| SR_FI2011 | SR_MWS2011 | C_SWS2012 | SR_FI2012 | SR_MWS2012 | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 0.292 | 0.319 | 0.263 | 0.294 | 0.414 | 0.405 | − | − | − | − |
| 2 | 0.221 | 0.234 | 0.192 | 0.205 | − | − | 0.626 | 0.637 | − | − |
| 3 | 0.291 | 0.312 | 0.251 | 0.275 | − | − | − | − | 0.761 | 0.760 |
| 4 | 0.569 | 0.641 | − | − | 0.319 | 0.310 | 0.592 | 0.624 | − | − |
| 5 | 0.626 | 0.681 | − | − | 0.258 | 0.249 | − | − | 0.622 | 0.619 |
| 6 | 0.628 | 0.718 | − | − | − | − | 0.403 | 0.426 | 0.458 | 0.453 |
| 7 | − | − | 0.560 | 0.639 | 0.329 | 0.326 | 0.592 | 0.620 | − | − |
| 8 | − | − | 0.604 | 0.662 | 0.271 | 0.269 | − | − | 0.610 | 0.615 |
| 9 | − | − | 0.624 | 0.693 | − | − | 0.430 | 0.445 | 0.466 | 0.465 |
| 10 | − | − | − | − | 0.088 | 0.109 | 0.330 | 0.358 | 0.303 | 0.325 |
In each case (1−10), two environments were used to train the prediction model. SR_FI2011, Santa Rosa Full irrigated in 2011; SR_MWS2011, Santa Rosa mild water stress in 2011; C_SWS2012, Cauqenes severe water stress in 2012; SR_FI2012, Santa Rosa full irrigated in 2012; SR_MWS2012,Santa Rosa mild water stress 2012.
The training sets were 1: SR_FI2012/SR_MWS2012; 2: C_SWS2012/SR_MWS2012; 3: C_SWS2012/SR_FI2012; 4: SR_MWS2011/ SR_MWS2012; 5: SR_MWS2011/SR_FI2012; 6: SR_MWS2011/C_SWS2012; 7: SR_FI2011/SR_MWS2012, 8: SR_FI2011/SR_FI2012; 9: SR_FI2011/C_SWS2012; 10: SR_FI2011/SR_MWS2011.