| Literature DB >> 29794163 |
Zhikai Liang1, Shashi K Gupta2, Cheng-Ting Yeh3, Yang Zhang1, Daniel W Ngu1, Ramesh Kumar4, Hemant T Patil5, Kanulal D Mungra6, Dev Vart Yadav4, Abhishek Rathore2, Rakesh K Srivastava2, Rajeev Gupta2, Jinliang Yang1, Rajeev K Varshney2, Patrick S Schnable3, James C Schnable7.
Abstract
Pearl millet is a non-model grain and fodder crop adapted to extremely hot and dry environments globally. In India, a great deal of public and private sectors' investment has focused on developing pearl millet single cross hybrids based on the cytoplasmic-genetic male sterility (CMS) system, while in Africa most pearl millet production relies on open pollinated varieties. Pearl millet lines were phenotyped for both the inbred parents and hybrids stage. Many breeding efforts focus on phenotypic selection of inbred parents to generate improved parental lines and hybrids. This study evaluated two genotyping techniques and four genomic selection schemes in pearl millet. Despite the fact that 6× more sequencing data were generated per sample for RAD-seq than for tGBS, tGBS yielded more than 2× as many informative SNPs (defined as those having MAF > 0.05) than RAD-seq. A genomic prediction scheme utilizing only data from hybrids generated prediction accuracies (median) ranging from 0.73-0.74 (1000-grain weight), 0.87-0.89 (days to flowering time), 0.48-0.51 (grain yield) and 0.72-0.73 (plant height). For traits with little to no heterosis, hybrid only and hybrid/inbred prediction schemes performed almost equivalently. For traits with significant mid-parent heterosis, the direct inclusion of phenotypic data from inbred lines significantly (P < 0.05) reduced prediction accuracy when all lines were analyzed together. However, when inbreds and hybrid trait values were both scored relative to the mean trait values for the respective populations, the inclusion of inbred phenotypic datasets moderately improved genomic predictions of the hybrid genomic estimated breeding values. Here we show that modern approaches to genotyping by sequencing can enable genomic selection in pearl millet. While historical pearl millet breeding records include a wealth of phenotypic data from inbred lines, we demonstrate that the naive incorporation of this data into a hybrid breeding program can reduce prediction accuracy, while controlling for the effects of heterosis per se allowed inbred genotype and trait data to improve the accuracy of genomic estimated breeding values for pearl millet hybrids.Entities:
Keywords: GenPred; Genomic Selection; Shared Data Resources; genotyping; hybrid breeding; pearl millet
Mesh:
Year: 2018 PMID: 29794163 PMCID: PMC6027876 DOI: 10.1534/g3.118.200242
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Comparison between RAD-seq and tGBS genotyping technologies
| RAD-seq | tGBS | |
|---|---|---|
| Total number of samples genotyped | 372 | 384 |
| Sequencing platform | Paired-end | Single-end |
| Illumina HiSeq 2000 | Ion Proton | |
| Average (Median) Reads/Sample after QC | 12,221,976 (12,097,256) | 1,793,300 (1,365,265) |
| Average (Median) Sequence/Sample after QC | 965,295,176 (955,561,340) | 195,057,311 (146,026,776) |
| Average (Median) missing rate / SNP | 41.39% (41.67%) | 58.65% (63.02%) |
| Average (Median) Proportion Het Calls / SNP before imputation | 2.05% (0.42%) | 4.12% (3.82%) |
| Average (Median) Proportion Het Calls / SNP after imputation | 1.63% (0.53%) | 4.72% (2.86%) |
| Average (Median) MAF / SNP before imputation | 1.89% (1.18%) | 11.69% (5.43%) |
| Average (Median) MAF / SNP after imputation | 1.24% (0.67%) | 10.37% (3.26%) |
| Total SNPs | 649,067 | 73,291 |
| SNPs with MAF >0.05 after imputation | 15,306 | 32,463 |
Figure 2(A) Proportion of phenotypic variance explained by genotype, location (considered as a environmental factor), genotype by location (GxE) interaction for either inbred pearl millet lines or hybrid pearl millet lines. (B) Phenotype investigation of four studied traits in pearl millet population. *** p value of the significance of this correlation is ≤ 0.001, ** p value of the significance of this correlation is ≤ 0.01 and * p value of the significance of this correlation is ≤ 0.05; (C) Distribution of observed mid-parent heterosis for each of the four traits scored in this study.
Figure 1Four approaches taken to training and testing genomic prediction schemes. Scheme 1 (M1) uses different sets of 4/5s of the inbred phenotypic data to build a model which is tested by comparing predicted and measured traits for all hybrids. Scheme 2 (M2) is conventional fivefold cross validation, where the hybrids tested are divided into five equal parts, and the genomic estimated breeding values for hybrids in each are predicted using a model trained with the other four parts of the dataset. Scheme 3A (M3A), follows the same strategy outlined for M2, with the the training set extended to include the phenotypic and genotypic data for the inbred lines from M1. Scheme 3B (M3B) follows the same strategy as M3A but normalizes for the separate mean trait values of the inbred and hybrid populations prior to combining them into the training dataset.
Figure 3Distribution of missing data rates (A, D), heterozygosity (B, E), and minor allele frequency (C, F) for SNPs identified and scored in either the RAD-seq or tGBS dataset. A-C summarize raw SNP data prior to imputation. D-F show densities for the same characteristics subsequent to imputation. However, no missing sites were left after imputation, hence panel D is blank. Dashed line in C & F indicates the cut off of MAF = 0.05 for SNPs which were utilized in downstream genomic prediction.
Figure 4Prediction accuracy for each of four phenotypes scored in this pearl millet population employing the four schemes outlined in Figure 1 using tGBS SNP calls. Scheme 3A (M3A) employed absolute predicted trait values for inbreds and hybrids to train a genomic prediction model, while scheme 3B (M3B) employed predicted trait data for inbreds and hybrids calculated relative to the separate mean trait values for inbred and hybrid lines.
Figure 5A proposed model for the decrease in genomic prediction accurary for high heterosis traits when inbred individuals are introduced into training populations. A) Distribution of BLUP scores for yield for populations of hybrid and inbred individuals based on a combined BLUP analysis. B) Distribution of scores for a hypothetical marker having an equally large effect size in inbred and hybrid individuals. When allele frequencies differ between these populations, and the ratio of hybrid to inbred individuals may vary between the groups of individuals with genotype AA or with genotype BB. C) Distribution of scores for a hypothetical marker with no effect on trait value. D) Distribution of BLUP scores for yield for populations of hybrid and inbred individuals based on a separate BLUP analysis for hybrid and inbred individuals.