| Literature DB >> 24308815 |
Gancho T Slavov1, Rick Nipper2, Paul Robson1, Kerrie Farrar1, Gordon G Allison1, Maurice Bosch1, John C Clifton-Brown1, Iain S Donnison1, Elaine Jensen1.
Abstract
• Increasing demands for food and energy require a step change in the effectiveness, speed and flexibility of crop breeding. Therefore, the aim of this study was to assess the potential of genome-wide association studies (GWASs) and genomic selection (i.e. phenotype prediction from a genome-wide set of markers) to guide fundamental plant science and to accelerate breeding in the energy grass Miscanthus. • We generated over 100,000 single-nucleotide variants (SNVs) by sequencing restriction site-associated DNA (RAD) tags in 138 Micanthus sinensis genotypes, and related SNVs to phenotypic data for 17 traits measured in a field trial. • Confounding by population structure and relatedness was severe in naïve GWAS analyses, but mixed-linear models robustly controlled for these effects and allowed us to detect multiple associations that reached genome-wide significance. Genome-wide prediction accuracies tended to be moderate to high (average of 0.57), but varied dramatically across traits. As expected, predictive abilities increased linearly with the size of the mapping population, but reached a plateau when the number of markers used for prediction exceeded 10,000-20,000, and tended to decline, but remain significant, when cross-validations were performed across subpopulations. • Our results suggest that the immediate implementation of genomic selection in Miscanthus breeding programs may be feasible.Entities:
Keywords: Miscanthus sinensis; RAD-Seq; genome-wide association studies (GWAS); genomic selection; molecular markers; single-nucleotide variants
Mesh:
Substances:
Year: 2013 PMID: 24308815 PMCID: PMC4284002 DOI: 10.1111/nph.12621
Source DB: PubMed Journal: New Phytol ISSN: 0028-646X Impact factor: 10.151
Fig 1Geographical distribution and principal component (PC) analysis of population genetic structure for 138 Miscanthus sinensis genotypes using 14 073 single-nucleotide variant loci. The percentages of the total variation explained by each PC are shown in parentheses.
Phenotypic traits measured in 142 Miscanthus sinensis genotypes
| Trait | Definition | PC1 ( | PC2 ( | ||
|---|---|---|---|---|---|
| Phenology | |||||
| | Date of flowering stage 1: day of year when the first flag leaf emerged | ||||
| | Average senescence score (0–10) throughout the growing season | 0.04 | 0.03 (0.752) | ||
| Morphology/biomass | |||||
| | Largest plant diameter measured at ground level (mm) | 0.03 | Long (0.18) | ||
| | Estimated total dry weight (g) | 0.01 | Alt ( | ||
| | Ligule-to-tip length along the central vein of the youngest leaf with a ligule (cm) | ||||
| | Blade width at half-leaf length for the leaf used to measure | 0.00 | Lat ( | 0.09 (0.270) | |
| | Height from the ground to the point of ‘inflection’ of the majority of leaves (cm) | 0.18 | Lat ( | ||
| | Estimated moisture content based on a subsample (%) | 0.01 (0.946) | |||
| | Additive combination of | 0.02 | |||
| | Three-category score reflecting leaf angle relative to the vertical | 0.06 | Alt (0.19) | ||
| | Four-category score reflecting stem angle relative to the vertical | 0.00 | |||
| | Diameter 10–15 cm from the ground of a randomly chosen stem (mm) | 0.00 | 0.08 (0.375) | ||
| | Length of the tallest stem (cm) | ||||
| | Number of stems with ≥ 50% canopy height across the middle of the plant | 0.00 | Lat (0.19) | 0.08 (0.370) | 0.02 (0.787) |
| Cell wall composition | |||||
| | Gravimetrically measured cellulose content (% dry weight) | 0.02 (0.793) | |||
| | Gravimetrically measured hemicellulose content (% dry weight) | 0.06 | Long (0.22) | 0.13 (0.117) | |
| | Gravimetrically measured lignin content (% dry weight) | 0.00 | Long ( | 0.00 (0.959) | 0.03 (0.770) |
Trait, phenotypic traits measured in 2007 (.7), 2008 (.8) or 2009 (.9) (i.e. after two, three or four growing seasons, respectively). Detailed phenotyping protocols have been described by Slavov .
QST, genetic differentiation between ‘Continent’ and ‘Japan’ subpopulations. Values exceeding the empirical 95th percentile of putatively neutral differentiation (i.e. FST = 0.23) are shown in bold (Slavov ).
Geo(r), geographical coordinate with strongest correlation and Pearson's correlation coefficient (Slavov ). Values with two-sided P < 0.05 are shown in bold.
PC1 (P value), Pearson's correlation with the first eigenvector of population structure (Fig.1) and two-sided P value. Values with two-sided P < 0.05 are shown in bold.
PC2 (P value), Pearson's correlation with the second eigenvector of population structure (Fig.1) and two-sided P value. Values with two-sided P < 0.05 are shown in bold.
Fig 2Genetic (below diagonal) and phenotypic (above diagonal) correlations among 17 phenotypic traits measured in 138 Miscanthus sinensis genotypes. Values on the main diagonal correspond to , where H2 is the broad-sense heritability of the 17 traits (Table 4).
Performance of genome-wide prediction in 138 Miscanthus sinensis genotypes based on single-nucleotide variant (SNV) markers filtered using liberal criteria (Table 2)
| Trait | |||||
|---|---|---|---|---|---|
| Phenology | |||||
| | 0.89 | 0.76 (0.02) | 0.81 (0.02) | 0.78 (0.01) | 0.82 (0.02) |
| | 0.83 | 0.64 (0.01) | 0.71 (0.01) | 0.64 (0.01) | 0.71 (0.01) |
| Morphology/biomass | |||||
| | 0.52 | 0.27 (0.05) | 0.38 (0.06) | 0.29 (0.04) | 0.40 (0.06) |
| | 0.54 | 0.06 (0.05) | 0.09 (0.07) | 0.04 (0.06) | 0.05 (0.08) |
| | 0.65 | 0.67 (0.01) | 0.83 (0.01) | 0.66 (0.01) | 0.82 (0.01) |
| | 0.64 | 0.52 (0.02) | 0.65 (0.03) | 0.56 (0.01) | 0.70 (0.02) |
| | 0.77 | 0.35 (0.03) | 0.40 (0.03) | 0.34 (0.02) | 0.39 (0.03) |
| | 0.59 | 0.70 (0.01) | 0.92 (0.01) | 0.73 (0.01) | 0.95 (0.01) |
| | 0.48 | 0.39 (0.03) | 0.57 (0.04) | 0.43 (0.02) | 0.62 (0.03) |
| | 0.50 | 0.46 (0.03) | 0.65 (0.05) | 0.47 (0.02) | 0.66 (0.03) |
| | 0.48 | 0.37 (0.02) | 0.53 (0.03) | 0.40 (0.02) | 0.58 (0.03) |
| | 0.60 | 0.51 (0.03) | 0.66 (0.04) | 0.50 (0.02) | 0.65 (0.03) |
| | 0.88 | 0.65 (0.01) | 0.69 (0.01) | 0.63 (0.01) | 0.68 (0.02) |
| | 0.51 | 0.17 (0.04) | 0.23 (0.06) | 0.27 (0.03) | 0.39 (0.04) |
| Cell wall composition | |||||
| | 0.79 | 0.62 (0.02) | 0.70 (0.02) | 0.61 (0.02) | 0.69 (0.02) |
| | 0.60 | 0.25 (0.03) | 0.32 (0.04) | 0.18 (0.04) | 0.24 (0.05) |
| | 0.66 | 0.43 (0.02) | 0.53 (0.03) | 0.35 (0.02) | 0.43 (0.03) |
| Average (SD) | 0.64 | 0.46 (0.20) | 0.57 (0.22) | 0.46 (0.20) | 0.57 (0.23) |
All predictive abilities and accuracies are based on 100 random 10-fold cross-validations (i.e. using a training population with N = 124 genotypes).
Trait, phenotypic trait as defined in Table 1.
H2, broad-sense heritability (see the Materials and Methods section).
r (SD), average predictive ability and standard deviation across 100 random 10-fold cross-validations based on 53 174 SNVs obtained from alignments to the Sorghum bicolor genome.
Accu (SD), average accuracy of genome-wide prediction and standard deviation across 100 random 10-fold cross-validations based on 53 174 SNVs obtained from alignments to the S. bicolor genome
r (SD), average predictive ability and standard deviation across 100 random 10-fold cross-validations based on 121 771 SNVs obtained from alignments to an M. sinensis pseudo-reference.
Accu (SD), average accuracy of genome-wide prediction and standard deviation across 100 random 10-fold cross-validations based on 121 771 SNVs obtained from alignments to an M. sinensis pseudo-reference.
Average (SD), overall average and standard deviation across traits.
Filtering criteria for single-nucleotide variant (SNV) data from 138 Miscanthus sinensis genotypes
| Filtering criteria | Stringent | Liberal |
|---|---|---|
| 15 | NA | |
| 14 | 3 | |
| NA | 6 | |
| 10 | 20 | |
| NA | 3 | |
| 0.25 | NA | |
| 0.05 | NA | |
| 2 | 2 | |
| Statistics | ||
| | 20 127 | 53 174 |
| | 0.027 | 0.174 |
| | 0.24 | 0.26 |
| | 30 755 | 121 771 |
| | 0.031 | 0.115 |
Q, minimum Phred-like SNV quality score (Li ).
Min depth, minimum number of reads.
Min ave depth, minimum average number of reads across all genotypes.
Missing (%), maximum percentage of missing genotype data allowed for a given locus.
Minor alleles, minimum number of copies of the minor allele among all genotypes.
Max |FIS|, maximum deviation of observed genotype frequencies from Hardy–Weinberg expectations, F = 1 – Ho/He, where Ho and He are the observed and expected heterozygosities.
Min het reads, minimum proportion of reads supporting the less frequent allele in a heterozygous genotype.
No. of alleles, number of SNV alleles detected.
No. of loci (Sorg), number of SNVs that passed all filtering criteria based on alignments to the Sorghum bicolor genome.
Ave MAF (Sorg), average minor allele frequency for SNVs that passed all filtering criteria based on alignments to the Sorghum bicolor genome.
Ave r2 (Sorg), average linkage disequilibrium (r2, calculated as genotypic correlation) for pairs of loci with MAF ≥ 0.10 that aligned within 1 kb of one another in the Sorghum bicolor genome.
No. of loci (Misc), number of SNVs that passed all filtering criteria based on alignments to an M. sinensis pseudo-reference.
Ave MAF (Misc), average minor allele frequency for SNVs that passed all filtering criteria based on alignments to an M. sinensis pseudo-reference.
NA, not applicable.
Fig 3Chromosome-wide distribution of Miscanthus sinensis single-nucleotide variant loci detected based on alignments to Sorghum bicolor and filtered using ‘stringent’ (a) and ‘liberal’ (b) criteria (Table 2). Each line corresponds to a 1-Mb interval.
Markers with significant phenotypic associations (false discovery rate < 0.05) in 138 Miscanthus sinensis genotypes
| Chromosome | Position | MAF | PVE | Trait | Gene | Description/annotation | ||
|---|---|---|---|---|---|---|---|---|
| 1 | 3 776 666 | 3.31E-06 | 0.03 | 0.01 | 0.12 | Sb01g004700 | ATVAMP725 | |
| 1 | 3 789 996 | 2.98E-06 | 0.03 | 0.02 | 0.13 | Sb01g004720 | Aminoacyl-tRNA synthetase family | |
| 1 | 68 013 320 | 3.38E-06 | 0.03 | 0.01 | 0.12 | Sb01g044850 | Unknown protein | |
| 2 | 67 477 249 | 1.14E-06 | 0.03 | 0.01 | 0.16 | Sb02g032850 | Unknown protein | |
| 2 | 67 477 259 | 1.14E-06 | 0.03 | 0.01 | 0.16 | Sb02g032850 | Unknown protein | |
| 3 | 65 293 239 | 2.79E-06 | 0.03 | 0.05 | 0.02 | Sb03g037310 | ATCDPMEK | |
| 4 | 4 150 586 | 1.57E-06 | 0.04 | 0.06 | 0.18 | NA | NA | |
| 6 | 50 249 485 | 3.35E-06 | 0.03 | 0.02 | 0.16 | Sb06g020830 | Protein kinase family protein | |
| 9 | 24 591 741 | 1.84E-06 | 0.03 | 0.03 | 0.26 | NA | NA | |
| 9 | 46 480 666 | 4.07E-06 | 0.04 | 0.15 | 0.02 | Sb09g018620 | Hydroxyproline-rich glycoprotein family protein | |
Associations with Bonferroni-corrected genome-wide significance (α = 0.05) are shown in bold. Only results for markers detected from alignments to the Sorghum bicolor genome are shown.
Chromosome, Sorghum bicolor chromosome to which the marker was aligned.
Position, Sorghum bicolor chromosome position to which the marker was aligned.
P,P value from genome-wide association studies (GWAS) analysis using the efficient mixed-model association expedited approach (EMMAX), including the kinship matrix and the first two eigenvectors of population structure (see the Materials and Methods section).
Q, false discovery rate calculated using the q-value R package (Dabney & Storey, 2013).
MAF, minor allele frequency.
PVE, naïve estimate of the proportion of variance explained based on simple linear regression (see the Materials and Methods section).
Trait, phenotypic trait as defined in Table 1.
Gene, Sorghum bicolor gene to which the marker was aligned.
Significant at genome-wide α = 0.05 after Bonferroni correction based on EMMAX analyses (see the Materials and Methods section).
Included in the optimal model according to the multiple Bonferroni criterion in multi-locus mixed-model (MLMM) analyses (see the Materials and Methods section).
Included in the optimal model according to the multiple Bonferroni criterion in MLMM analyses including the first two eigenvectors of population structure (see the Materials and Methods section).
NA, not applicable (markers aligning to putatively intergenic positions).
Fig 4Performance of genome-wide prediction in a population of 138 Miscanthus sinensis genotypes using single-nucleotide variant markers obtained from alignments to the Sorghum bicolor genome. Predictive ability as a function of training population size (a) and number of markers used (b). All data points are averages across 100 random cross-validations.
Fig 5Effect of population structure on the performance of genome-wide prediction in a population of 70 Miscanthus sinensis genotypes with known sampling locations using single-nucleotide variants obtained from alignments to the Sorghum bicolor genome. Cross-validations across subpopulations (red bars) were performed using genotypes from Japan (N = 43) as a training population and genotypes from China and South Korea (N = 27) as a test population. Random cross-validations (gray bars) were performed by randomly selecting the same numbers of genotypes in the training and test populations from the total set of 138 genotypes. All data points are averages, and error bars correspond to standard deviations across 100 random cross-validations. Only traits with cross-subpopulation predictive abilities exceeding (1) 0.10 and (2) the standard deviation from the random cross-validations for the respective trait are shown.
Fig 6Effect of marker selection on the performance of genome-wide prediction of average senescence score (a) and total dry weight (b) in a population of 138 Miscanthus sinensis genotypes based on single-nucleotide variants (SNVs) detected from alignments to the Sorghum bicolor genome. ‘Random’ (circles, black line), randomly selected markers; ‘GWAS’ (triangles, red line), markers with the lowest genome-wide association study P values within the training population; ‘rrBLUP’ (squares, green line), markers with the highest estimated effects within the training population based on ridge regression. All data points are averages ± SD across 100 random cross-validations. Dashed lines correspond to predictive abilities based on all 53 174 SNVs.