| Literature DB >> 26740888 |
Abbas Mikhchi1, Mahmood Honarvar2, Nasser Emam Jomeh Kashan1, Saeed Zerehdaran3, Mehdi Aminafshar1.
Abstract
BACKGROUND: Genotype imputation is an important process of predicting unknown genotypes, which uses reference population with dense genotypes to predict missing genotypes for both human and animal genetic variations at a low cost. Machine learning methods specially boosting methods have been used in genetic studies to explore the underlying genetic profile of disease and build models capable of predicting missing values of a marker.Entities:
Keywords: Boosting methods; Computation time; Imputation accuracy; Trios
Year: 2016 PMID: 26740888 PMCID: PMC4702368 DOI: 10.1186/s40781-015-0081-1
Source DB: PubMed Journal: J Anim Sci Technol ISSN: 2055-0391
Fig. 1Genotype imputation within a trio
Mean of imputation accuracy for Boosting methods in various versions on the four different datasets
| Data set | Density | Sample size | Version | AB | LB | TB |
|---|---|---|---|---|---|---|
| 5 k | 100 | NA10 | 0.9843 | 0.9954 | 0.9611 | |
| 5 k | 100 | NA30 | 0.9883 | 0.9947 | 0.9638 | |
| G1 | 5 k | 100 | NA50 | 0.9822 | 0.9909 | 0.9621 |
| 5 k | 100 | NA70 | 0.9777 | 0.9829 | 0.9583 | |
| 5 k | 100 | NA90 | 0.9211 | 0.9303 | 0.9246 | |
|
|
|
|
| |||
| 10 k | 100 | NA10 | 0.9861 | 0.9981 | 0.9702 | |
| 10 k | 100 | NA30 | 0.9886 | 0.9978 | 0.9697 | |
| G2 | 10 k | 100 | NA50 | 0.9912 | 0.9970 | 0.9679 |
| 10 k | 100 | NA70 | 0.9898 | 0.9939 | 0.9647 | |
| 10 k | 100 | NA90 | 0.9653 | 0.9714 | 0.9523 | |
|
|
|
|
| |||
| 5 k | 500 | NA10 | 0.9859 | 0.9967 | 0.9650 | |
| 5 k | 500 | NA30 | 0.9885 | 0.9952 | 0.9650 | |
| G3 | 5 k | 500 | NA50 | 0.9877 | 0.9926 | 0.9638 |
| 5 k | 500 | NA70 | 0.9800 | 0.9848 | 0.9618 | |
| 5 k | 500 | NA90 | 0.9288 | 0.9383 | 0.9362 | |
|
|
|
|
| |||
| 10 k | 500 | NA10 | 0.9787 | 0.9983 | 0.9706 | |
| 10 k | 500 | NA30 | 0.9799 | 0.9977 | 0.9692 | |
| G4 | 10 k | 500 | NA50 | 0.9830 | 0.9967 | 0.9665 |
| 10 k | 500 | NA70 | 0.9877 | 0.9959 | 0.9634 | |
| 10 k | 500 | NA90 | 0.9706 | 0.9767 | 0.9552 | |
|
|
|
|
|
NA10: 10 % of genotype is missing per offspring, NA30: 30 % of genotype is missing per offspring, NA50: 50 % of genotype is missing per offspring, NA70: 70 % of genotype is missing per offspring, NA90: 90 % of genotype is missing per offspring, Bold: Mean of different versions in each dataset
AB AdaBoost, LB LogitBoost, TB TotalBoost
Fig. 2The effect of the sample size and SNP density on imputation accuracy
Average imputation runtime on four datasets (seconds)
| Data set | Sample size | Density | Version | AB | LB | TB |
|---|---|---|---|---|---|---|
| G1 | 100 | 5 K | NA90 | 2930 | 3055 | 6975 |
| G2 | 100 | 10 K | NA90 | 6511 | 6788 | 13956 |
| G3 | 500 | 5 K | NA90 | 3460 | 3665 | 10221 |
| G4 | 500 | 10 K | NA90 | 7601 | 7802 | 23521 |
NA90: 90 % of genotype is missing per offspring
AB AdaBoost, LB LogitBoost, TB TotalBoost