| Literature DB >> 26634345 |
Abstract
BACKGROUND: A key challenge in analyzing high throughput Single Nucleotide Polymorphism (SNP) arrays is the accurate inference of genotypes for SNPs with low minor allele frequencies. A number of calling algorithms have been developed to infer genotypes for common SNPs, but they are limited in their performance in calling rare SNPs. The existing algorithms can be broadly classified into three categories, including: population-based methods, SNP-based methods, and a hybrid of the two approaches. Despite the relatively better performance of the hybrid approach, it is still challenging to analyze rare SNPs.Entities:
Mesh:
Year: 2015 PMID: 26634345 PMCID: PMC4669649 DOI: 10.1186/s12859-015-0824-5
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Comparisons of call rates and concordance on HapMap samples for two allocation designs
| Design | Item | GenCall | GenoSNP | M3 | M3- |
|---|---|---|---|---|---|
| 1:1 | Call Rate | 96.60 | 99.13 | 99.76 | 99.64 |
| Accuracy | 96.41 | 98.47 | 99.23 | 99.33 | |
| 2:1 | Call Rate | 96.57 | 99.15 | 99.77 | 99.65 |
| Accuracy | 96.39 | 98.49 | 99.24 | 99.38 |
Note: 2:1: 94 individuals are in the training set, and 47 subjects are in the testing group; 1:1: 71 individuals are in the training set, and 70 subjects are in the testing group; M3-S: M3 incorporating samples with known genotypes; Call Rate: the percentage of valid genotypes; Accuracy: the percentage of consistent genotype
Comparisons of call rates and concordance on HapMap samples for rare variants under 600, 1500, and 3000 simulated observations and the allocation design 2:1
| SNPs | # SNP | Item | M3- | M3- | M3- | ||
|---|---|---|---|---|---|---|---|
| 600 | 1500 | 3000 | |||||
| MAF < 0.1 | 4364 | Call Rate | 99.69 | 99.65 | 99.59 | ||
| Accuracy | 99.33 | 99.24 | 99.22 | ||||
| MAF < 0.05 | 2329 | Call Rate | 99.72 | 99.71 | 99.68 | ||
| Accuracy | 99.34 | 99.26 | 99.28 | ||||
| MAF < 0.01 | 597 | Call Rate | 99.59 | 99.86 | 99.81 | ||
| Accuracy | 99.04 | 99.06 | 99.23 | ||||
Note: M3-S: M3 incorporating samples with known genotypes; Call Rate: the percentage of valid genotypes; Accuracy: the percentage of consistent genotype; # SNP: the number of SNPs whose MAFs are less than 0.1, 0.05 or 0.01, respectively
Comparisons of call rates and concordance on HapMap samples for rare variants among GenCall, GenoSNP, M3 and M3-S
| SNPs | # SNP | Item | GenCall | GenoSNP | M3 | M3- |
|---|---|---|---|---|---|---|
| MAF < 0.1 | 4364 | Call Rate | 95.89 | 99.02 | 99.70 | 99.59 |
| Accuracy | 95.65 | 98.28 | 99.19 | 99.22 | ||
| MAF < 0.05 | 2329 | Call Rate | 96.44 | 98.89 | 99.64 | 99.68 |
| Accuracy | 96.15 | 98.02 | 99.08 | 99.28 | ||
| MAF < 0.01 | 597 | Call Rate | 94.37 | 98.90 | 99.53 | 99.81 |
| Accuracy | 93.89 | 97.28 | 98.60 | 99.23 |
Note: M3-S: M3 incorporating samples with known genotypes; Call Rate: the percentage of valid genotypes; Accuracy: the percentage of consistent genotype; # SNP: the number of SNPs whose MAFs are less than 0.1, 0.05 or 0.01, respectively
Fig. 1Illustration of how different sizes of simulated data improve the calling results of one rare SNPs (rs1003505)
Fig. 2Illustration of how M3-S improves the calling results of three rare SNPs (rs1003505, rs1003676, and rs1008185)
Comparisons of call rate and concordance of whole SNPs among GenCall, GenoSNP, M3 and M3-S
| Algorithm 1 | Algorithm 2 | Call rate (%) | Concordance (%) | |
|---|---|---|---|---|
| Algorithm 1 | Algorithm 2 | |||
| GenCall | GenoSNP | 96.71 | 99.12 | 99.71 |
| GenCall | M3 | 96.71 | 99.71 | 99.85 |
| GenCall | M3- | 96.71 | 99.56 | 99.85 |
| GenoSNP | M3 | 99.12 | 99.71 | 99.41 |
| GenoSNP | M3- | 99.12 | 99.56 | 99.45 |
| M3 | M3- | 99.71 | 99.56 | 99.57 |
Note: The unit of Call Rate and Concordance Rate is percentage %; M3-S: M3 incorporating samples with known genotypes; Algorithm: four algorithms in this table, that is, GenCall, GenoSNP, M3 and M3-S
Comparisons of Hardy-Weinberg Equilibrium Test among GenCall, GenoSNP, M3 and M3-S
| Population | Num-Sample | GenCall | GenoSNP | M3 | M3- | M3- | M3- |
|---|---|---|---|---|---|---|---|
| 600 | 1500 | 3000 | |||||
| AA I | 2005 | 224 | 907 | 432 | 481 | 646 | 822 |
| AA II | 83 | 20 | 255 | 64 | 59 | 61 | 65 |
| EA I | 867 | 486 | 1024 | 636 | 639 | 690 | 770 |
| EA II | 158 | 40 | 348 | 109 | 98 | 106 | 129 |
Note: AA I: African-Americans not of Hispanic Origin; AA II: African-Americans of Hispanic Origin; EA I: European Americans not of Hispanic Origin; EA II: European Americans of Hispanic Origin; Num-Sample: the number of subjects within each population