| Literature DB >> 20727218 |
Alexandros Iliadis1, John Watkinson, Dimitris Anastassiou, Xiaodong Wang.
Abstract
BACKGROUND: In genome-wide association studies, thousands of individuals are genotyped in hundreds of thousands of single nucleotide polymorphisms (SNPs). Statistical power can be increased when haplotypes, rather than three-valued genotypes, are used in analysis, so the problem of haplotype phase inference (phasing) is particularly relevant. Several phasing algorithms have been developed for data from unrelated individuals, based on different models, some of which have been extended to father-mother-child "trio" data.Entities:
Mesh:
Year: 2010 PMID: 20727218 PMCID: PMC2939632 DOI: 10.1186/1471-2156-11-78
Source DB: PubMed Journal: BMC Genet ISSN: 1471-2156 Impact factor: 2.797
Average Transmission Error Rate For Phasing Trios
| Average Transmission Error Rate | |||
|---|---|---|---|
| ST1 | ST2 | ST3 | |
| PHASE | 0.0013 | 0.0013 | 0.0145 |
| BEAGLE | |||
| R = 1 | 0.0235 | 0.0318 | 0.0426 |
| R = 4 | 0.0150 | 0.0148 | 0.0344 |
| TDS | 0.0039 | 0.0065 | 0.0320 |
| 2SNP | 0.4377 | 0.4868 | 0.4861 |
Average number of Incorrect Trios per dataset
| Incorrect Trios | |||
|---|---|---|---|
| ST1 | ST2 | ST3 | |
| PHASE | 0.3 | 0.4 | 2.45 |
| BEAGLE | |||
| R = 1 | 3.75 | 5.8 | 6.4 |
| R = 4 | 1.95 | 2.9 | 5.45 |
| TDS | 0.95 | 1.6 | 5.4 |
| 2SNP | 25.9 | 28.6 | 28 |
Average Transmission Error Rate For Phasing Trios with 1% Missing Rate
| Average Transmission Error Rate | |||
|---|---|---|---|
| ST1 | ST2 | ST3 | |
| PHASE | 0.0031 | 0.0023 | 0.0161 |
| BEAGLE | |||
| R = 1 | 0.0213 | 0.0248 | 0.0354 |
| R = 4 | 0.0093 | 0.0133 | 0.0278 |
| TDS | 0.0094 | 0.0116 | 0.0348 |
| 2SNP | 0.3038 | 0.3486 | 0.3169 |
Average number of Incorrect Trios per dataset with 1% Missing Rate
| Incorrect Trios | |||
|---|---|---|---|
| ST1 | ST2 | ST3 | |
| PHASE | 0.6 | 0.475 | 2.653 |
| BEAGLE | |||
| R = 1 | 3.6054 | 5.25 | 6.4661 |
| R = 4 | 1.7464 | 3.1321 | 4.8893 |
| TDS | 1.7521 | 2.7018 | 5.7768 |
| 2SNP | 26.05 | 28.55 | 28.2 |
Average Transmission error rate for 100 and 1000 Trios as a function of the number of markers
| Markers | |||||
|---|---|---|---|---|---|
| 200 | 400 | 1000 | 6000 | ||
| TDS | 100 | 0.00063 | 0.00075 | 0.0015 | 0.0023 |
| 1000 | 0.00042 | 0.0008 | 0.0015 | 0.0023 | |
| Beagle | 100 | 0.0013 | 0.0013 | 0.0021 | 0.0024 |
| 1000 | 0.00011 | 0.00033 | 0.0005 | 0.0007 | |
| 2SNP | 100 | 0.1094 | 0.2855 | 0.3916 | 0.4315 |
| 1000 | 0.1733 | 0.2524 | 0.3836 | 0.4117 | |
Timing Results
| Time(s) | |||
|---|---|---|---|
| ST1 | ST2 | ST3 | |
| PHASE | 8452 | 4932 | 5464 |
| BEAGLE | |||
| R = 1 | 2.59 | 2.73 | 2.95 |
| R = 4 | 2.80 | 3.18 | 3.27 |
| TDS | 1.99 | 2.48 | 2.61 |
| 2SNP | 0.63 | 0.6 | 0.59 |
Timing Results with 1% Missing Rate
| Time(s) | |||
|---|---|---|---|
| ST1 | ST2 | ST3 | |
| PHASE | 8613 | 5220 | 5831 |
| BEAGLE | |||
| R = 1 | 2.6744 | 2.9873 | 3.2409 |
| R = 4 | 2.9233 | 3.2858 | 3.4429 |
| TDS | 2.0643 | 2.5815 | 2.7484 |
| 2SNP | 0.67 | 0.63 | 0.6 |
Average Timing Results in seconds for 100 and 1000 Trios as a function of the number of markers
| Markers | |||||
|---|---|---|---|---|---|
| 200 | 400 | 1000 | 6000 | ||
| TDS | 100 | 2.8 | 5 | 14.4 | 113.6 |
| 1000 | 31.8 | 63.3 | 156.2 | 1257.4 | |
| Beagle | 100 | 3.7 | 5.6 | 15.2 | 118.4 |
| 1000 | 12.7 | 31.6 | 291.8 | 1952.4 | |
| 2SNP | 100 | 3 | 8.9 | 28.7 | 180.7 |
| 1000 | 33.4 | 116.2 | 399.8 | 3008.2 | |
Average Allelic Imputation Error Rate For Simulated datasets
| Average Allelic Imputation Error Rate | |||
|---|---|---|---|
| ST1 | ST2 | ST3 | |
| PHASE | 0.0063 | 0.0145 | 0.0133 |
| BEAGLE | |||
| R = 1 | 0.0124 | 0.0255 | 0.0249 |
| R = 4 | 0.0101 | 0.0224 | 0.0223 |
| TDS | 0.0124 | 0.0271 | 0.0266 |
| 2SNP | 0.0741 | 0.0855 | 0.0983 |
Average Allelic Imputation Error Rate and Timing Results for HapMap datasets
| Allelic Imputation Error Rate | Time(s) | |
|---|---|---|
| PHASE | 0.0051 | 5360 |
| BEAGLE | ||
| R = 1 | 0.0129 | 3.156 |
| R = 4 | 0.0112 | 3.339 |
| TDS | 0.0134 | 2.53 |
| 2SNP | 0.0831 | 0.685 |
Figure 1Example of TDS. We process three trios sequentially. In each trio the first two genotypes are the genotypes of the parents and the third genotype is the genotype of the child. The possible solutions of each trio are given exactly next to it and numbered 1, 2. In each of the possible solutions for each trio the first two genotypes are the transmitted and the untransmitted haplotype from the first parent and similarly the remaining two for the second parent. At each step we are willing to keep only K = 2 streams which would be called "surviving streams". 1) The first trio has two possible solutions. 2) a) The second trio has two possible solutions. We have four possible combinations of a solution from the first trio to a solution from the second. The indices below the solutions show from which solutions from each trio this stream was created. For example stream s1-2 as illustrated, was created from the first solution in the first trio and from the second in the second. In each stream we associate a weight as described in method section. b) We keep only the K = 2 streams with the highest weights (surviving streams) so at this point we consider them as the most probable and keep them. 3) The third trio has 2 possible solutions. a) Each one of them is appended in the end of each of the two solutions that we have kept. The definition of the streams is similar as before with stream s2-1-1 coming from appending solution 1 of the third trio to stream s2-1. b) Again we keep only two of the streams the ones with the highest weights s2-1-1 and s2-1-2.
Average Transmission Error Rate for Equal Block Partitioning TDS (Equal TDS)
| Average Transmission Error Rate | |||
|---|---|---|---|
| ST1 | ST2 | ST3 | |
| TDS | 0.0039 | 0.0065 | 0.0320 |
| Equal TDS | 0.0113 | 0.0085 | 0.0360 |
Average number of Incorrect Trios per dataset for Equal Block Partitioning TDS (Equal TDS)
| Incorrect Trio | |||
|---|---|---|---|
| ST1 | ST2 | ST3 | |
| TDS | 0.95 | 1.6 | 5.4 |
| Equal TDS | 1.6 | 1.7 | 5.6 |