| Literature DB >> 33137192 |
Yingjie Gao1, Zhiquan Yang1, Wenqian Yang1, Yanbo Yang1, Jing Gong1,2, Qing-Yong Yang1,3, Xiaohui Niu1.
Abstract
Genotype imputation is a process that estimates missing genotypes in terms of the haplotypes and genotypes in a reference panel. It can effectively increase the density of single nucleotide polymorphisms (SNPs), boost the power to identify genetic association and promote the combination of genetic studies. However, there has been a lack of high-quality reference panels for most plants, which greatly hinders the application of genotype imputation. Here, we developed Plant-ImputeDB (http://gong_lab.hzau.edu.cn/Plant_imputeDB/), a comprehensive database with reference panels of 12 plant species for online genotype imputation, SNP and block search and free download. By integrating genotype data and whole-genome resequencing data of plants from various studies and databases, the current Plant-ImputeDB provides high-quality reference panels of 12 plant species, including ∼69.9 million SNPs from 34 244 samples. It also provides an easy-to-use online tool with the option of two popular tools specifically designed for genotype imputation. In addition, Plant-ImputeDB accepts submissions of different types of genomic variations, and provides free and open access to all publicly available data in support of related research worldwide. In general, Plant-ImputeDB may serve as an important resource for plant genotype imputation and greatly facilitate the research on plant genetic research.Entities:
Year: 2021 PMID: 33137192 PMCID: PMC7779032 DOI: 10.1093/nar/gkaa953
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Data summary in Plant-ImputeDB
| Reference panel | |||||
|---|---|---|---|---|---|
| Species | NCBI taxonomy ID | Assembly version | Number of chromosomes | Number of samples | Number of SNPs |
|
| 3702 | TAIR10 | 5 | 2029 | 2 963 242 |
|
| 3708 | ZS11 v0 | 19 | 991 | 9 141 089 |
|
| 3885 | PhaVulg1_0 | 11 | 628 | 4 811 097 |
|
| 3635 | TM-1 UTX_v2.0 | 26 | 686 | 3 149 846 |
|
| 3659 | Cucumber (Gy14) v2 | 7 | 1234 | 21 154 |
|
| 3664 | Cucurbita pepo v4.1 | 20 | 830 | 41 888 |
|
| 4577 | AGPv3 | 10 | 1210 | 35 073 758 |
|
| 3656 | Melon (DHL92) v3.5.1 | 12 | 2084 | 26 011 |
|
| 39 947 | IRGSP-1.0 | 12 | 3240 | 4 897 277 |
|
| 3847 | Wm82.a2 | 20 | 20 087 | 39 636 |
|
| 3654 | Watermelon (97103) v2 | 11 | 414 | 8 816 591 |
|
| 4565 | IWGSC v1.0 | 21 | 811 | 942 041 |
Figure 1.Construction of plant reference panels in Plant-ImputeDB. (A) Data collection. (B) Data processing. (C–F) Database content and web interface.
Imputation accuracy using reference panels in Plant-ImputeDB
| Beagle imputation results | Minimac3 imputation results | |||||||
|---|---|---|---|---|---|---|---|---|
| Number of imputed SNP (mean ± SD) | Increased fold | CR (mean ± SD) |
| Number of imputed SNPs (mean ± SD) | Increased fold | CR (mean ± SD) |
| |
| Arabidopsis | 2 792 659 ± 5127 | 27.93 | 0.9906 ± 0.0002 | 0.9411 ± 0.0014 | 2 884 385 ± 5631 | 28.84 | 0.9912 ± 0.0002 | 0.9448 ± 0.0010 |
| Oilseed rape | 4 604 327 ± 69 131 | 46.04 | 0.8857 ± 0.0016 | 0.7717 ± 0.0037 | 1 412 928 ± 58 660 | 14.31 | 0.9286 ± 0.0022 | 0.8135 ± 0.0023 |
| Common bean | 3 289 257 ± 13 766 | 32.89 | 0.9584 ± 0.0012 | 0.8973 ± 0.0017 | 4 152 965 ± 76 476 | 41.53 | 0.9798 ± 0.0017 | 0.9717 ± 0.0018 |
| Cotton | 2 927 154 ± 76 601 | 29.27 | 0.9810 ± 0.0032 | 0.9615 ± 0.0057 | 2 935 382 ± 751 456 | 29.35 | 0.9848 ± 0.0084 | 0.9588 ± 0.0095 |
| Maize | 21 336 638 ± 142 290 | 213.37 | 0.9396 ± 0.0017 | 0.7996 ± 0.0069 | 7 827 635 ± 266 095 | 78.28 | 0.9502 ± 0.0015 | 0.8363 ± 0.0077 |
| Rice | 4 996 975 ± 1960 | 49.97 | 0.9538 ± 0.0009 | 0.9416 ± 0.0011 | 3 570 124 ± 64 495 | 35.70 | 0.9655 ± 0.0010 | 0.9420 ± 0.0016 |
| Watermelon | 8 058 314 ± 510 335 | 80.58 | 0.9861 ± 0.0040 | 0.8675 ± 0.0398 | 7 628 587 ± 468 864 | 76.29 | 0.9903 ± 0.0032 | 0.9102 ± 0.0375 |
| Bread wheat | 496 703 ± 121 523 | 4.97 | 0.9890 ± 0.0019 | 0.9534 ± 0.0036 | 580 923 ± 129 173 | 5.81 | 0.9878 ± 0.0019 | 0.9560 ± 0.0034 |
| Cucumber | 6090 ± 59 | 1.52 | 0.9332 ± 0.0021 | 0.8099 ± 0.0045 | 13 350 ± 193 | 3.34 | 0.9413 ± 0.0010 | 0.8210 ± 0.0066 |
| Zucchini | 17 729 ± 326 | 3.55 | 0.9081 ± 0.0027 | 0.7588 ± 0.0025 | 27 853 ± 458 | 5.57 | 0.9171 ± 0.0026 | 0.7712 ± 0.0030 |
| Muskmelon | 6856 ± 48 | 1.37 | 0.9043 ± 0.0007 | 0.7582 ± 0.0030 | 10 387 ± 86 | 2.08 | 0.9277 ± 0.0003 | 0.7602 ± 0.0014 |
| Soybean | 33 808 ± 15 | 6.76 | 0.9697 ± 0.0008 | 0.9099 ± 0.0024 | 39 453 ± 37 | 7.89 | 0.9788 ± 0.0007 | 0.9419 ± 0.0023 |
CR: concordance rate between true and imputed genotypes.
R: squared correlation between true and imputed genotypes.
Figure 2.Overview of the Plant-ImputeDB database. (A) Main modules in Plant-ImputeDB, including ‘Imputation’, ‘Reference Panel’ and ‘Download’ modules. (B) Online genotype imputation in the Plant-ImputeDB database. (C) Browsing of SNPs based on genomic region. (D) Browsing of genomic blocks based on gene ID. (E) ‘Download’ function of Plant-ImputeDB.