| Literature DB >> 27343118 |
Abstract
BACKGROUND: Accurate genotype calling for high throughput Illumina data is an important step to extract more genetic information for a large scale genome wide association studies. Many popular calling algorithms use mixture models to infer genotypes of a large number of single nucleotide polymorphisms in a fast and efficient way. In practice, mixture models are mostly restricted to infer genotypes for common SNPs where their minor allele frequencies are quite large. However, it is still challenging to accurately genotype rare variants, especially for some rare variants where the boundaries of their genotypes are not clearly defined.Entities:
Keywords: Dirichlet Process Gaussian mixture model; Gaussian mixture model; Genotype; HapMap; Rare variants; Single nucleotide polymorphism
Mesh:
Year: 2016 PMID: 27343118 PMCID: PMC4921002 DOI: 10.1186/s12863-016-0398-x
Source DB: PubMed Journal: BMC Genet ISSN: 1471-2156 Impact factor: 2.797
Fig. 1Performance of the DP Gaussian Mixture Model on genotyping three rare SNPs (rs1003505, rs1004262, rs1009148)
Fig. 2Performance of the DP Gaussian Mixture Model with reference SNP selection on genotyping three rare SNPs (rs10084633, rs1003945, rs1008185)
Fig. 3Performance of the reference SNP selection on genotyping three rare SNPs (rs10084633, rs1003945, rs1008185)
The comparisons of call rate and concordance rate among GenCall, GenoSNP and M-D
| Algorithm 1 | Algorithm 2 | Call rate (%) | Concordance (%) | |
|---|---|---|---|---|
| Algorithm 1 | Algorithm 2 | |||
| GenCall | M-D | 96.71 | 99.71 | 99.93 |
| GenoSNP | M-D | 99.12 | 99.71 | 99.65 |
| GenCall | GenoSNP | 96.71 | 99.12 | 99.71 |
Note: The unit of Call Rate and Concordance Rate is percentage %; M-D: a new model calling procedure
The comparisons of call rates and accuracy on HapMap samples for overall SNPs
| Criterion | Item | GenCall (%) | GenoSNP (%) | M-D (%) |
|---|---|---|---|---|
| All SNPs | Call rate | 96.79 | 99.14 | 99.78 |
| Accuracy | 96.63 | 98.52 | 99.44 |
Note: M-D: a new model calling procedure; Call rate: the percentage of valid genotypes; Accuracy: the percentage of consistent genotype between each calling method and the gold standard
Comparisons of call rates and accuracy on HapMap samples for three SNP groups
| Class | Prop | Item | GenCall | GenoSNP | M-D |
|---|---|---|---|---|---|
|
| 88.60 % | Call rate | 96.59 | 99.13 | 99.77 |
| Accuracy | 96.40 | 98.44 | 99.31 | ||
|
| 4.03 % | Call rate | 97.62 | 99.56 | 99.75 |
| Accuracy | 97.53 | 99.45 | 99.59 | ||
|
| 7.37 % | Call rate | 96.60 | 99.14 | 99.70 |
| Accuracy | 96.45 | 98.71 | 99.40 |
Note: M-D: a new model calling procedure; Call rate: the percentage of valid genotypes; Accuracy: the percentage of consistent genotype between each calling method and the gold standard; Class: indicates the three SNPs categories, such as: g 1, g 2 and g 3; Prop: indicates the percentage of SNPs which belong to three groups, respectively
Comparisons of Hardy-Weinberg Equilibrium test among GenCall, GenoSNP and M-D
| Population | Num-Sample | Algorithm | # of failed SNPs |
|---|---|---|---|
| AA I | 2005 | GenCall | 224 |
| GenoSNP | 907 | ||
| M-D | 422 | ||
| AA II | 83 | GenCall | 20 |
| GenoSNP | 254 | ||
| M-D | 80 | ||
| EA I | 867 | GenCall | 486 |
| GenoSNP | 1024 | ||
| M-D | 643 | ||
| EA II | 158 | GenCall | 40 |
| GenoSNP | 348 | ||
| M-D | 133 |
Note: AA I: African-Americans not of Hispanic Origin; AA II: African-Americans of Hispanic Origin; EA I: European Americans not of Hispanic Origin; EA II: European Americans of Hispanic Origin; Num-Sample: the number of subjects within each population; Algorithm: three algorithms in this table, that is, GenCall, GenoSNP, and M-D; # of failed SNPs: the number of SNPs fail the Hardy-Weinberg Equilibrium test within each population