| Literature DB >> 27172218 |
José Crossa1, Diego Jarquín2, Jorge Franco3, Paulino Pérez-Rodríguez4, Juan Burgueño5, Carolina Saint-Pierre5, Prashant Vikram5, Carolina Sansaloni5, Cesar Petroli5, Deniz Akdemir6, Clay Sneller7, Matthew Reynolds5, Maria Tattaris5, Thomas Payne5, Carlos Guzman5, Roberto J Peña5, Peter Wenzl5, Sukhwinder Singh1.
Abstract
This study examines genomic prediction within 8416 Mexican landrace accessions and 2403 Iranian landrace accessions stored in gene banks. The Mexican and Iranian collections were evaluated in separate field trials, including an optimum environment for several traits, and in two separate environments (drought, D and heat, H) for the highly heritable traits, days to heading (DTH), and days to maturity (DTM). Analyses accounting and not accounting for population structure were performed. Genomic prediction models include genotype × environment interaction (G × E). Two alternative prediction strategies were studied: (1) random cross-validation of the data in 20% training (TRN) and 80% testing (TST) (TRN20-TST80) sets, and (2) two types of core sets, "diversity" and "prediction", including 10% and 20%, respectively, of the total collections. Accounting for population structure decreased prediction accuracy by 15-20% as compared to prediction accuracy obtained when not accounting for population structure. Accounting for population structure gave prediction accuracies for traits evaluated in one environment for TRN20-TST80 that ranged from 0.407 to 0.677 for Mexican landraces, and from 0.166 to 0.662 for Iranian landraces. Prediction accuracy of the 20% diversity core set was similar to accuracies obtained for TRN20-TST80, ranging from 0.412 to 0.654 for Mexican landraces, and from 0.182 to 0.647 for Iranian landraces. The predictive core set gave similar prediction accuracy as the diversity core set for Mexican collections, but slightly lower for Iranian collections. Prediction accuracy when incorporating G × E for DTH and DTM for Mexican landraces for TRN20-TST80 was around 0.60, which is greater than without the G × E term. For Iranian landraces, accuracies were 0.55 for the G × E model with TRN20-TST80. Results show promising prediction accuracies for potential use in germplasm enhancement and rapid introgression of exotic germplasm into elite materials.Entities:
Keywords: A × E: accession × environment interaction; GenPred; Gene bank accessions; cross-validations; genomic prediction; genomic selection; reference core subsets; shared data resources
Mesh:
Year: 2016 PMID: 27172218 PMCID: PMC4938637 DOI: 10.1534/g3.116.029637
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Phenotypic traits of Mexican and Iranian gene bank landrace collections, number of accessions, number of markers, and heritability (h2) of the trait
| Trait | Number of Accessions | Number of Markers | |
|---|---|---|---|
| Mexican collection | |||
| Days to heading (DTH) | 8481 | 23,747 | 0.556 |
| Days to maturity (DTM) | 8481 | 23,747 | 0.554 |
| Plant height (PHT) | 8414 | 23,756 | 0.345 |
| Grain yield per square meter (GYSM) | 8142 | 23,740 | 0.339 |
| Thousand-kernel weight (TKW) | 8102 | 23,855 | 0.583 |
| Test weight (TW) | 8102 | 23,855 | 0.527 |
| Grain hardness (GH) | 7863 | 23,574 | 0.448 |
| Grain protein (GP) | 8101 | 23,849 | 0.508 |
| SDS sedimentation (SDS) | 8093 | 23,946 | 0.504 |
| Iranian collection | |||
| Days to heading (DTH) | 2374 | 39,758 | 0.827 |
| Days to maturity (DTM) | 2374 | 39,758 | 0.822 |
| Thousand-kernel weight (TKW) | 2000 | 33,709 | 0.833 |
| Test weight (TW) | 2000 | 33,709 | 0.754 |
| Grain hardness (GH) | 2000 | 33,709 | 0.839 |
| Grain protein (GP) | 2000 | 33,709 | 0.625 |
| Grain length (GL) | 2000 | 33,709 | 0.881 |
| SDS sedimentation (SDS) | 2000 | 33,709 | 0.681 |
| Grain width (GW) | 2000 | 33,709 | 0.848 |
| Plant height (PHT) | 2000 | 33,709 | 0.434 |
Figure 1Plot of the (A) first vs. second principal component (PC1 vs. PC2) from the marker data for Mexican landraces; (B) first vs. third principal component (PC1 vs. PC3) from the marker data for Mexican landraces; (C) second vs. third principal component (PC2 vs. PC3) from the marker data for Mexican landraces; (D) cumulative variance of the various principal components; (E) first vs. second principal component (PC1 vs. PC2) from the marker data for Iranian landraces; (F) first vs. third principal component (PC1 vs. PC3) from the marker data for Iranian landraces; (G) second vs. third principal component (PC2 vs. PC3) from the marker data for Iranian landraces; (H) cumulative variance of the various principal components.
Accounting for population structure
| Trait | TRN20-TST80 | 10% Diversity Core | 20% Diversity Core | 10% Prediction Core | 20% Prediction Core |
|---|---|---|---|---|---|
| Mexican landraces | |||||
| Plant height (PHT) | 0.407 (0.006) | 0.359 | 0.412 | 0.353 | 0.405 |
| Thousand-kernel weight (TKW) | 0.677 (0.007) | 0.644 | 0.654 | 0.652 | 0.663 |
| Test weight (TW) | 0.498 (0.008) | 0.457 | 0.478 | 0.462 | 0.497 |
| Grain hardness (GH) | 0.458 (0.008) | 0.404 | 0.450 | 0.420 | 0.458 |
| Grain protein (GP) | 0.516 (0.009) | 0.471 | 0.497 | 0.461 | 0.512 |
| SDS sedimentation (SDS) | 0.571 (0.007) | 0.542 | 0.539 | 0.4531 | 0.553 |
| Grain yield per square meter (GYSM) | 0.460 (0.006) | 0.434 | 0.451 | 0.422 | 0.451 |
| Iranian landraces | |||||
| Plant height (PHT) | 0.166 (0.027) | 0.112 | 0.182 | 0.141 | 0.154 |
| Thousand-kernel weight (TKW) | 0.519 (0.017) | 0.463 | 0.468 | 0.445 | 0.475 |
| Test weight (TW) | 0.437 (0.020) | 0.392 | 0.399 | 0.391 | 0.379 |
| Grain hardness (GH) | 0.528 (0.017) | 0.447 | 0.520 | 0.386 | 0.463 |
| Grain protein (GP) | 0.412 (0.023) | 0.417 | 0.408 | 0.385 | 0.386 |
| Grain length (GL) | 0.662 (0.016) | 0.593 | 0.647 | 0.612 | 0.628 |
| Grain width (GW) | 0.502 (0.019) | 0.417 | 0.475 | 0.419 | 0.443 |
| SDS sedimentation (SDS) | 0.390 (0.021) | 0.352 | 0.377 | 0.305 | 0.369 |
Mean correlation between predicted and observed values across 30 random cross-validation partitions (SD in parentheses) for a training set of 20% (TRN20) and a testing set of 80% (TST80), of the total Mexican and Iranian collections for several traits measured in a single environment using the GBLUP model. Correlation using 10% diversity and prediction cores, and 20% diversity and prediction cores as training sets to predict the remaining 90% and 80% of the collections.
Not accounting for population structure
| Trait | TRN20-TST80 | 10% Diversity Core | 20% Diversity Core | 10% Prediction Core | 20% Prediction Core |
|---|---|---|---|---|---|
| Mexican landraces | |||||
| Plant height (PHT) | 0.451 (0.006) | 0.409 | 0.463 | 0.401 | 0.454 |
| Thousand-kernel weight (TKW) | 0.767 (0.004) | 0.747 | 0.756 | 0.752 | 0.752 |
| Test weight (TW) | 0.687 (0.004) | 0.668 | 0.677 | 0.671 | 0.687 |
| Grain hardness (GH) | 0.617 (0.006) | 0.586 | 0.614 | 0.596 | 0.613 |
| Grain protein (GP) | 0.733 (0.004) | 0.715 | 0.729 | 0.710 | 0.729 |
| SDS sedimentation (SDS) | 0.599 (0.006) | 0.573 | 0.570 | 0.563 | 0.584 |
| Grain yield per square meter (GYSM) | 0.570 (0.005) | 0.556 | 0.568 | 0.540 | 0.564 |
| Iranian landraces | |||||
| Plant height (PHT) | 0.260 (0.023) | 0.207 | 0.271 | 0.1540 | 0.224 |
| Thousand-kernel weight (TKW) | 0.598 (0.014) | 0.560 | 0.557 | 0.4751 | 0.548 |
| Test weight (TW) | 0.567 (0.013) | 0.532 | 0.554 | 0.3790 | 0.534 |
| Grain hardness (GH) | 0.618 (0.013) | 0.575 | 0.610 | 0.4628 | 0.522 |
| Grain protein (GP) | 0.493 (0.017) | 0.497 | 0.484 | 0.3864 | 0.471 |
| Grain length (GL) | 0.683 (0.013) | 0.626 | 0.667 | 0.6281 | 0.626 |
| Grain width (GW) | 0.688 (0.012) | 0.645 | 0.669 | 0.4432 | 0.645 |
| SDS sedimentation (SDS) | 0.456 (0.017) | 0.427 | 0.445 | 0.3689 | 0.3938 |
Mean correlation between predicted and observed values across 30 random cross-validation partitions (SD in parentheses) for a training set of 20% (TRN20) and a testing set of 80% (TST80), of the total Mexican and Iranian collections for several traits measured in a single environment using the GBLUP model. Correlation using 10% diversity and prediction cores and 20% diversity and prediction cores as training sets to predict the remaining 90% and 80% of the collections.
Figure 2Box-plot of days to heading (DTH) and days to maturity (DTM) measured in drought (D) and heat (H) environments for (A) Mexican landraces and (B) Iranian landraces.
Accounting for population structure in Mexican and Iranian landraces
| Estimated Variance Component | Percentage of Within-Environment Variance | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| E | A | G | A × E | G × E | Res. | A | G | A × E | G × E | Res. | |
| Mexican landraces | |||||||||||
| Days to Heading | |||||||||||
| M1:E + A | 1.137 | 0.025 | 0.038 | 39.43 | 60.56 | ||||||
| M2:E + A + G | 0.814 | 0.004 | 0.015 | 0.034 | 8.23 | 27.94 | 63.81 | ||||
| M3:E + A + G + G × E | 0.690 | 0.006 | 0.013 | 0.013 | 0.017 | 13.34 | 26.48 | 26.60 | 33.56 | ||
| M4:E + A + G + G × E + A × E | 0.635 | 0.006 | 0.014 | 0.005 | 0.012 | 0.012 | 12.06 | 28.15 | 10.80 | 23.85 | 25.12 |
| Days to Maturity | |||||||||||
| M1:E + A | 1.075 | 0.124 | 0.120 | 50.83 | 49.16 | ||||||
| M2:E + A + G | 0.747 | 0.014 | 0.078 | 0.116 | 6.84 | 37.38 | 55.76 | ||||
| M3:E + A + G + G × E | 0.622 | 0.037 | 0.066 | 0.041 | 0.052 | 18.81 | 33.65 | 20.96 | 26.56 | ||
| M4:E + A + G + G × E + A × E | 0.566 | 0.035 | 0.069 | 0.015 | 0.038 | 0.039 | 18.04 | 35.01 | 7.72 | 19.58 | 19.63 |
| Iranian landraces | |||||||||||
| Days to Heading | |||||||||||
| M1:E + A | 1.122 | 0.046 | 0.139 | 24.92 | 75.08 | ||||||
| M2:E + A + G | 0.816 | 0.012 | 0.042 | 0.12 | 7.13 | 23.92 | 68.95 | ||||
| M3:E + A + G + G × E | 0.702 | 0.014 | 0.031 | 0.067 | 0.044 | 9.06 | 19.94 | 42.77 | 28.23 | ||
| M4:E + A + G + G × E + A × E | 0.595 | 0.013 | 0.034 | 0.016 | 0.059 | 0.036 | 8.02 | 21.43 | 10.04 | 37.56 | 22.95 |
| Days to Maturity | |||||||||||
| M1:E + A | 1.103 | 0.03 | 0.041 | 42.26 | 57.74 | ||||||
| M2:E + A + G | 0.853 | 0.008 | 0.021 | 0.037 | 12.11 | 32.05 | 55.84 | ||||
| M3:E + A + G + G × E | 0.733 | 0.008 | 0.019 | 0.018 | 0.015 | 13.62 | 31.65 | 30.02 | 24.71 | ||
| M4:E + A + G + G × E + A × E | 0.637 | 0.007 | 0.019 | 0.006 | 0.015 | 0.013 | 11.96 | 32.26 | 9.54 | 24.72 | 21.51 |
Estimated variance components for four models (M1–M4) and percentage of within-environment variance accounted for by each random effect of the corresponding model using the full data for two traits, days to heading (DTH) and days to maturity (DTM). E, environment; A, accession; G, genomic (marker); A × E, accession × environment; G × E, genomic × environment; Res. residual.
Not accounting for population structure in Mexican and Iranian landraces
| Estimated Variance Component | Percentage of Within-Environment Variance | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| E | A | G | A × E | G × E | Res. | A | G | A × E | G × E | Res. | |
| Mexican landrace | |||||||||||
| Days to Heading | |||||||||||
| M1:E + A | 1.127 | 0.051 | 0.038 | 57.35 | 42.64 | ||||||
| M2:E + A + G | 0.804 | 0.004 | 0.015 | 0.033 | 8.15 | 28.87 | 62.97 | ||||
| M3:E + A + G + G × E | 0.680 | 0.006 | 0.013 | 0.013 | 0.016 | 13.05 | 27.20 | 26.54 | 33.18 | ||
| M4:E + A + G + G × E + A × E | 0.625 | 0.005 | 0.014 | 0.005 | 0.011 | 0.012 | 11.74 | 29.05 | 10.71 | 23.60 | 24.88 |
| Days to Maturity | |||||||||||
| M1:E + A | 1.034 | 0.230 | 0.106 | 68.39 | 31.60 | ||||||
| M2:E + A + G | 0.711 | 0.012 | 0.071 | 0.102 | 6.88 | 38.16 | 54.95 | ||||
| M3:E + A + G + G × E | 0.587 | 0.032 | 0.061 | 0.036 | 0.046 | 18.16 | 34.78 | 20.79 | 26.24 | ||
| M4:E + A + G + G × E + A × E | 0.531 | 0.030 | 0.064 | 0.013 | 0.034 | 0.034 | 17.29 | 36.35 | 7.75 | 19.20 | 19.39 |
| Iranian landraces | |||||||||||
| Days to Heading | |||||||||||
| M1:E + A | 1.075 | 0.038 | 0.041 | 48.35 | 51.65 | ||||||
| M2:E + A + G | 0.821 | 0.008 | 0.021 | 0.036 | 12.00 | 32.74 | 55.26 | ||||
| M3:E + A + G + GE | 0.727 | 0.008 | 0.019 | 0.018 | 0.015 | 13.41 | 32.13 | 29.99 | 24.47 | ||
| M4:E + A + G + G × E + A × E | 0.649 | 0.007 | 0.020 | 0.006 | 0.015 | 0.013 | 11.65 | 32.84 | 9.47 | 24.59 | 21.44 |
| Days to Maturity | |||||||||||
| M1:E + A | 1.055 | 0.064 | 0.138 | 31.64 | 68.36 | ||||||
| M2:E + A + G | 0.778 | 0.012 | 0.043 | 0.118 | 7.11 | 24.57 | 68.14 | ||||
| M3:E + A + G + GE | 0.686 | 0.014 | 0.032 | 0.067 | 0.043 | 8.93 | 20.46 | 42.80 | 27.81 | ||
| M4:E + A + G + G × E + A × E | 0.603 | 0.012 | 0.034 | 0.015 | 0.058 | 0.035 | 7.82 | 21.91 | 9.94 | 37.38 | 22.95 |
Estimated variance components for four models and percentage of within-environment variance accounted for by each random effect of the corresponding model using the full data for two traits, days to heading (DTH) and days to maturity (DTM). E, environment; A, accession; G, genomic (marker); A × E, accession × environment; G × E, genomic × environment; Res., residual.
Accounting for population structure in Mexican and Iranian landraces
| Trait | Model | TRN20-TST80 | 10% Diversity Core | 20% Diversity Core | 10% Prediction Core | 20% Prediction Core |
|---|---|---|---|---|---|---|
| Mexican collection | ||||||
| DTH | M1:E + A | 0.002 (0.009) | −0.005 | 0.004 | −0.005 | −0.001 |
| M2:E + A + G | 0.508 (0.005) | 0.461 | 0.489 | 0.477 | 0.503 | |
| M3:E + A + G + G × E | 0.599 (0.004) | 0.559 | 0.580 | 0.565 | 0.597 | |
| M4:E + A + G + G × E + A × E | 0.600 (0.004) | 0.555 | 0.579 | 0.568 | 0.597 | |
| DTM | M1:E + A | 0.001 (0.000) | −0.008 | 0.003 | 0.002 | 0.003 |
| M2:E + A + G | 0.527 (0.005) | 0.482 | 0.511 | 0.484 | 0.513 | |
| M3:E + A + G + G × E | 0.596 (0.004) | 0.558 | 0.584 | 0.553 | 0.586 | |
| M4:E + A + G + G × E + A × E | 0.596 (0.004) | 0.558 | 0.581 | 0.558 | 0.584 | |
| Iranian collection | ||||||
| DTH | M1:E + A | 0.001 (0.023) | 0.010 | −0.014 | 0.005 | −0.003 |
| M2:E + A + G | 0.403 (0.035) | 0.344 | 0.389 | 0.389 | 0.397 | |
| M3:E + A + G + G × E | 0.552 (0.033) | 0.504 | 0.551 | 0.511 | 0.544 | |
| M4:E + A + G + G × E + A × E | 0.551 (0.034) | 0.496 | 0.545 | 0.514 | 0.548 | |
| DTM | M1:E + A | 0.000 (0.021) | 0.013 | 0.003 | −0.009 | 0.004 |
| M2:E + A + G | 0.450 (0.052) | 0.371 | 0.450 | 0.400 | 0.419 | |
| M3:E + A + G + G × E | 0.551 (0.029) | 0.493 | 0.551 | 0.502 | 0.519 | |
| M4:E + A + G + G × E + A × E | 0.548 (0.026) | 0.485 | 0.542 | 0.506 | 0.525 | |
Mean correlation across 30 random partitions between observed and predicted values of four models for two traits, days to heading (DTH) and days to maturity (DTM), across two environments (their standard deviation, SD), for 20% training (TRN20) and 80% testing (TST80) sets of the total number of accessions in the Mexican and Iranian collections for four models (M1–M4). Correlations between observed and predictive values for 10% and 20% diversity and prediction core sets.
Models: E, Environment; A, accession; G, genomic relationship; A × E, accession × environment interaction; G × E, genomic × environment interaction.
Not accounting for population structure in Mexican and Iranian landraces
| Trait | Model | TRN20-TST80 | 10% Diversity Core | 20% Diversity Core | 10% Prediction Core | 20% Prediction Core |
|---|---|---|---|---|---|---|
| Mexican collection | ||||||
| DTH | M1:E + A | 0.000 (0.010) | −0.012 | 0.013 | −0.009 | 0.001 |
| M2:E + A + G | 0.696 (0.004) | 0.676 | 0.692 | 0.677 | 0.689 | |
| M3:E + A + G + G × E | 0.738 (0.004) | 0.718 | 0.732 | 0.718 | 0.735 | |
| M4:E + A + G + G × E + A × E | 0.739 (0.004) | 0.717 | 0.732 | 0.720 | 0.735 | |
| DTM | M1:E + A | −0.002 (0.009) | −0.016 | 0.009 | −0.004 | 0.005 |
| M2:E + A + G | 0.737 (0.003) | 0.721 | 0.735 | 0.717 | 0.733 | |
| M3:E + A + G + G × E | 0.762 (0.003) | 0.745 | 0.759 | 0.740 | 0.757 | |
| M4:E + A + G + G × E + A × E | 0.762 (0.003) | 0.746 | 0.758 | 0.742 | 0.756 | |
| Iranian collection | ||||||
| DTH | M1:E + A | −0.003 (0.021) | 0.015 | 0.008 | −0.015 | 0.019 |
| M2:E + A + G | 0.525 (0.088) | 0.475 | 0.521 | 0.503 | 0.516 | |
| M3:E + A + G + G × E | 0.600 (0.056) | 0.552 | 0.600 | 0.553 | 0.577 | |
| M4:E + A + G + G × E + A × E | 0.598 (0.052) | 0.544 | 0.591 | 0.558 | 0.581 | |
| DTM | M1:E + A | −0.001 (0.023) | 0.0035 | −0.0086 | 0.000 | 0.009 |
| M2:E + A + G | 0.469 (0.106) | 0.4396 | 0.4557 | 0.469 | 0.475 | |
| M3:E + A + G + G × E | 0.587 (0.032) | 0.5406 | 0.5889 | 0.549 | 0.581 | |
| M4:E + A + G + G × E + A × E | 0.587 (0.030) | 0.5337 | 0.5841 | 0.556 | 0.584 | |
Mean correlation across 30 random partitions between observed and predicted values of four models for two traits, days to heading (DTH) and days to maturity (DTM), across two environments (standard deviation in parentheses), for 20% training (TRN20) and 80% testing (TST80) sets of the total number of accessions in the Mexican and Iranian collections for four models (M1–M4). Correlations between observed and predictive values for 10% and 20% diversity and prediction core sets.
Models: E, Environment; A, accession; G, genomic relationship; A × E, accession × environment interaction; G × E, genomic × environment interaction.
Average percent change in prediction accuracy of 10% and 20% diversity and prediction cores vs. prediction accuracy of random cross-validation TRN20-TST80 (first four columns) and percent change in prediction accuracy between 10% diversity core vs. 10% prediction core, and between 20% diversity core vs. 20% prediction core for traits measured in one or in two environments for Mexican and Iranian collections
| Trait-Collection | 10% Diversity | 10% Prediction | 20% Diversity | 20% Prediction | 10% Diversity | 20% Diversity |
|---|---|---|---|---|---|---|
| Accounting for population structure | ||||||
| One environment -Mexican collection | 8.0 | 10.3 | 2.7 | 1.2 | 2.4 | −1.6 |
| One environment -Iranian collection | 13.1 | 14.9 | 2.8 | 8.7 | 1.2 | 5.9 |
| Two environments -Mexican collection | 7.5 | 6.5 | 3.0 | 1.2 | −1.3 | −1.9 |
| Two environments -Iranian collection | 12.1 | 7.6 | 0.9 | 3.3 | −5.3 | 2.4 |
| Not accounting for population structure | ||||||
| One environment -Mexican collection | 4.1 | 4.7 | 0.9 | 0.9 | 0.6 | −0.1 |
| One environment -Iranian collection | 7.5 | 25.5 | 1.94 | 9.5 | 1.9 | 19.5 |
| Two environments -Mexican collection | 2.51 | 2.72 | 0.59 | 0.63 | 0.13 | 0.03 |
| Two environments -Iranian collection | 8.30 | 5.11 | 0.81 | 1.46 | −3.48 | 0.63 |
Traits evaluated in one environment when accounting for population structure (from Table 2), traits evaluated in two environments when not accounting for population structure (from Table 4), traits evaluated in one environment when not accounting for population structure (from Table A1), traits evaluated in two environments when not accounting for population structure (from Table A3).