| Literature DB >> 34751373 |
Katherine W Jordan1,2, Peter J Bradbury3, Zachary R Miller4, Moses Nyine1, Fei He1, Max Fraser5, Jim Anderson5, Esten Mason6, Andrew Katz6, Stephen Pearce6, Arron H Carter7, Samuel Prather7, Michael Pumphrey7, Jianli Chen8, Jason Cook9, Shuyu Liu10, Jackie C Rudd10, Zhen Wang10, Chenggen Chu10, Amir M H Ibrahim10, Jonathan Turkus11, Eric Olson11, Ragupathi Nagarajan12, Brett Carver12, Liuling Yan12, Ellie Taagen4, Mark Sorrells4, Brian Ward13, Jie Ren1,14, Alina Akhunova1,14, Guihua Bai2, Robert Bowden2, Jason Fiedler15, Justin Faris15, Jorge Dubcovsky16, Mary Guttieri2, Gina Brown-Guedira13, Ed Buckler3, Jean-Luc Jannink3, Eduard D Akhunov1.
Abstract
To improve the efficiency of high-density genotype data storage and imputation in bread wheat (Triticum aestivum L.), we applied the Practical Haplotype Graph (PHG) tool. The Wheat PHG database was built using whole-exome capture sequencing data from a diverse set of 65 wheat accessions. Population haplotypes were inferred for the reference genome intervals defined by the boundaries of the high-quality gene models. Missing genotypes in the inference panels, composed of wheat cultivars or recombinant inbred lines genotyped by exome capture, genotyping-by-sequencing (GBS), or whole-genome skim-seq sequencing approaches, were imputed using the Wheat PHG database. Though imputation accuracy varied depending on the method of sequencing and coverage depth, we found 92% imputation accuracy with 0.01× sequence coverage, which was slightly lower than the accuracy obtained using the 0.5× sequence coverage (96.6%). Compared to Beagle, on average, PHG imputation was ∼3.5% (P-value < 2 × 10-14) more accurate, and showed 27% higher accuracy at imputing a rare haplotype introgressed from a wild relative into wheat. We found reduced accuracy of imputation with independent 2× GBS data (88.6%), which increases to 89.2% with the inclusion of parental haplotypes in the database. The accuracy reduction with GBS is likely associated with the small overlap between GBS markers and the exome capture dataset, which was used for constructing PHG. The highest imputation accuracy was obtained with exome capture for the wheat D genome, which also showed the highest levels of linkage disequilibrium and proportion of identity-by-descent regions among accessions in the PHG database. We demonstrate that genetic mapping based on genotypes imputed using PHG identifies SNPs with a broader range of effect sizes that together explain a higher proportion of genetic variance for heading date and meiotic crossover rate compared to previous studies.Entities:
Keywords: Practical Haplotype Graph; exome capture; genotype imputation; skim-seq; wheat
Mesh:
Year: 2022 PMID: 34751373 PMCID: PMC9210282 DOI: 10.1093/g3journal/jkab390
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.542
Figure 1Genetic diversity of WC65 accessions of wheat and its diploid and tetraploid relatives used for developing the Wheat PHG. (A) Neighbor-joining tree of WC65 accessions used for constructing the Wheat PHG. (B) The rate of LD decay in the A, B, and D genomes of wheat. (C) The length of pair-wise IBD between the parental lines from different breeding programs used in WheatCAP.
Estimates of genetic diversity (π), minor allele frequency (MAF), Tajima’s D and linkage disequilibrium in the WC65 population used for constructing the Wheat PHG
| Diversity statistic | A genome | B genome | D genome |
|---|---|---|---|
| No. SNPs | 430,050 | 504,260 | 523,011 |
| MAF | 0.116 | 0.122 | 0.050 |
| π (per bp) | 0.175 | 0.182 | 0.076 |
| Tajima’s D | –0.673 | –0.552 | –2.192 |
| LD | 12.2 Mb | 9.8 Mb | 20.0 Mb |
Distance at which LD drops to half of its initial value (r2 ≤ 0.33).
Figure 2The accuracy of imputation using the Wheat PHG. (A) The impact of sequence coverage and the method of imputation on accuracy for DS75. (B) Accuracy of imputation using GBS sequencing at different coverage levels and different database haplotype representation. (C) Accuracy of imputation for alleles with different minor allele frequency for matched samples using GBS and skim-sequencing, n = 24.
Comparison of imputation accuracy between PHG and Beagle using exome capture data
| DS75 accession | PHG 0.5× | PHG 0.1× | PHG 0.01× | Beagle 0.1× | Beagle 0.01× |
|---|---|---|---|---|---|
| Arthur | 95.4% | 93.8% | 88.5% | 90.4% | 86.4% |
| Alice | 96.7% | 95.8% | 91.5% | 92.3% | 88.9% |
| Antero | 97.1% | 96.4% | 91.9% | 93.6% | 89.5% |
| Bess | 96.0% | 94.5% | 89.2% | 91.1% | 86.6% |
| Branson | 96.0% | 94.4% | 87.7% | 91.3% | 87.5% |
| Bolles | 96.8% | 95.4% | 90.1% | 88.6% | 93.3% |
| BrawlCLPlus | 96.3% | 94.9% | 91.3% | 92.5% | 88.6% |
| Byrd | 96.8% | 96.0% | 92.7% | 93.4% | 88.9% |
| Camelot | 98.0% | 98.2% | 97.5% | 92.4% | 88.0% |
| Danby | 96.6% | 95.8% | 92.2% | 93.4% | 88.5% |
| Decade | 96.3% | 95.3% | 91.1% | 92.5% | 88.7% |
| Denali | 96.4% | 95.5% | 92.0% | 92.2% | 88.2% |
| DoubleCLPlus | 96.9% | 95.8% | 90.6% | 93.1% | 89.0% |
| Duster | 97.7% | 97.7% | 97.1% | 89.3% | 93.0% |
| Expedition | 97.0% | 96.1% | 92.7% | 93.5% | 89.0% |
| Forefront | 96.3% | 95.0% | 89.6% | 88.0% | 91.7% |
| Freeman | 96.4% | 95.6% | 91.4% | 92.8% | 87.5% |
| Glacier | 96.4% | 94.6% | 88.2% | 91.7% | 87.4% |
| Gallagher | 96.4% | 95.2% | 89.9% | 91.3% | 86.7% |
| Goodstreak | 97.2% | 96.0% | 91.1% | 93.7% | 88.9% |
| Hilliard | 95.9% | 94.3% | 89.0% | 91.2% | 86.9% |
| Hunter | 95.2% | 93.9% | 87.8% | 89.7% | 85.7% |
| Hatcher | 96.0% | 95.4% | 90.3% | 92.4% | 88.2% |
| Ideal | 96.1% | 95.7% | 91.2% | 91.6% | 87.7% |
| Jamestown | 96.1% | 93.2% | 89.7% | 91.2% | 86.0% |
| Jagger | 95.9% | 94.4% | 90.6% | 84.2% | 75.6% |
| Jagalene | 97.6% | 98.0% | 98.1% | 93.0% | 87.8% |
| Jerry | 96.8% | 95.8% | 91.5% | 93.3% | 88.8% |
| KS061193K-2 | 97.5% | 97.8% | 97.9% | 93.6% | 88.5% |
| KS090387K-20 | 97.6% | 97.9% | 96.2% | 92.1% | 87.3% |
| KS13H-9 | 96.9% | 96.0% | 90.7% | 93.1% | 88.7% |
| KS14H-180-4 | 97.0% | 96.2% | 91.1% | 93.0% | 88.8% |
| KanMark | 98.1% | 98.2% | 97.1% | 93.3% | 89.5% |
| Kharkof | 96.2% | 94.5% | 90.4% | 92.6% | 88.6% |
| LCSChrome | 96.3% | 95.5% | 90.1% | 91.9% | 86.9% |
| Linkert | 97.0% | 96.0% | 91.5% | 90.1% | 93.8% |
| Lonerider | 97.6% | 95.9% | 91.0% | 92.6% | 87.7% |
| Mace | 96.7% | 95.6% | 90.2% | 93.1% | 88.7% |
| Mattern | 96.6% | 95.4% | 91.9% | 92.5% | 87.9% |
| McGill | 96.7% | 95.6% | 90.9% | 93.0% | 89.0% |
| Millenium | 96.8% | 95.8% | 91.6% | 92.8% | 88.7% |
| Mott | 96.4% | 95.4% | 90.4% | 93.2% | 89.6% |
| NE10589 | 96.8% | 96.4% | 91.9% | 93.1% | 88.1% |
| NUPlains | 97.9% | 98.0% | 96.7% | 93.7% | 89.7% |
| NW13493 | 96.6% | 95.6% | 90.7% | 92.6% | 87.4% |
| OK11D25056 | 96.8% | 95.4% | 91.2% | 92.9% | 88.9% |
| OK12716Red | 96.5% | 95.5% | 90.9% | 92.5% | 87.4% |
| OK13209 | 96.9% | 95.7% | 91.0% | 93.0% | 88.7% |
| OK13621 | 96.9% | 95.9% | 91.5% | 92.2% | 87.3% |
| OK11709W-139122 | 96.7% | 95.8% | 91.9% | 92.8% | 89.2% |
| Oahe | 96.4% | 95.4% | 91.1% | 92.6% | 88.9% |
| Overley | 97.2% | 97.3% | 97.2% | 89.4% | 92.9% |
| Pembroke | 95.1% | 93.3% | 87.7% | 89.4% | 85.3% |
| Panhandle | 96.2% | 95.1% | 90.4% | 92.2% | 87.4% |
| Prevail | 96.5% | 95.4% | 89.8% | 91.8% | 89.7% |
| Redfield | 96.5% | 95.6% | 90.8% | 92.9% | 88.5% |
| Robidoux | 96.9% | 95.9% | 91.5% | 93.2% | 89.6% |
| SD08080 | 96.7% | 95.7% | 90.7% | 92.7% | 88.5% |
| Scout66 | 96.9% | 95.9% | 92.4% | 93.7% | 89.6% |
| Snowmass | 96.6% | 95.7% | 91.0% | 93.0% | 88.3% |
| TAM114 | 96.7% | 95.8% | 92.0% | 92.8% | 89.3% |
| TAM203 | 96.1% | 95.2% | 91.1% | 91.5% | 86.9% |
| TAM204 | 95.8% | 94.9% | 90.9% | 92.1% | 87.7% |
| TAM303 | 96.0% | 94.9% | 91.6% | 90.9% | 87.1% |
| TAM304 | 96.7% | 95.2% | 90.1% | 92.3% | 88.6% |
| TAM305 | 96.4% | 95.6% | 90.9% | 91.9% | 87.1% |
| Traverse | 96.7% | 95.1% | 90.3% | 90.5% | 86.6% |
| Tribute | 95.6% | 94.1% | 87.0% | 89.6% | 85.0% |
| TX11A001295 | 96.9% | 96.2% | 93.8% | 92.4% | 87.4% |
| TX12M4068 | 96.5% | 95.2% | 91.6% | 92.0% | 87.4% |
| WB-Redhawk | 97.7% | 97.6% | 98.1% | 93.0% | 88.6% |
| Wesley | 97.0% | 95.9% | 91.9% | 93.9% | 89.9% |
| Yellowstone | 95.8% | 94.7% | 91.1% | 94.7% | 93.2% |
| Zenda | 97.7% | 97.7% | 97.5% | 93.1% | 88.4% |
| Average | 96.6% | 95.7% | 91.7% | 92.1% | 88.3% |
Cultivars used in PHG database construction.
The accuracy of DS75 imputation in different wheat genomes
| Wheat genome |
|
|
|
|
|---|---|---|---|---|
| Total | 95.7% | 92.1% | 91.7% | 88.3% |
| A | 95.1% | 91.2% | 90.3% | 85.4% |
| B | 94.9% | 90.4% | 89.9% | 85.5% |
| D | 97.4% | 96.6% | 95.3% | 94.6% |
Accuracies by approach are comprised of matching germplasm, EC: n = 75, Beagle: n = 75.
Comparison of imputation using complexity reduced sequencing technologies
| Dataset | GBS70 | NAMgbs | NAMskim | ||
|---|---|---|---|---|---|
| Coverage | 1× | 2.5× | 1× | 1× | 0.1× |
| Avg. reads/sample | 1.85 million | 5 million | 1.85 million | 1.85 million | 6.1 million |
| Database status | Independent | Independent | Semi-dep. | Dependent | Semi-dep. |
| Imputation accuracy | 86.9% | 88.6% | 89.2% | 90.1% | 85.3% |
Paired-end sequencing.
Relationship between minor allele frequency and the accuracy of imputation for reduced complexity semi-dependent datasets
| Minor allele frequency (MAF) | ||||||
|---|---|---|---|---|---|---|
| 0–0.1 | 0.1–0.2 | 0.2–0.3 | 0.3–0.4 | 0.4–0.5 | >0.1 | |
| No. sites | 1,029,330 | 156,251 | 97,013 | 73,001 | 66,296 | 392,561 |
| NAMgbs accuracy | 0.8707 | 0.9226 | 0.9168 | 0.9078 | 0.9126 | 0.9134 |
| NAMskim accuracy | 0.8015 | 0.8560 | 0.8782 | 0.8789 | 0.8900 | 0.8760 |
| Matched | 0.8763 | 0.9172 | 0.9102 | 0.8994 | 0.8992 | 0.9084 |
Summary of all groups where MAF > 0.1.
The sites within each MAF frequency bin were determined by frequency in the PHG database.
Data from NAMgbs for the same 24 lines sequenced for NAMskim.
Figure 3Relationship between the true and predicted phenotypes. Significant markers were identified by stepwise regression on heading date, total number of crossovers per line (TCO), and total number of distal crossovers per line (dCO) phenotypes.