| Literature DB >> 18466555 |
Abstract
In this report, we compared haplotyping approaches using families and unrelated individuals on the simulated rheumatoid arthritis (RA) data in Problem 3 from Genetic Analysis Workshop (GAW) 15. To investigate these two approaches, we picked two representative programs: PedPhase and fastPHASE, respectively, for each approach. PedPhase is a rule-based method focusing on the haplotyping constraints within each pedigree and solving them using integer linear programming. fastPHASE is a statistical method based on the clustering property of haplotypes in a population over short regions. It is believed that with family information, one can obtain more accurate phasing results with considerably more cost for genotyping additional family members. Our results indicate that, though only relying on the constraints within each family (with four members) individually, PedPhase has better phasing accuracy than fastPHASE, even when the total numbers of genotyped individuals are the same. But for missing genotype imputation, fastPHASE performs better than PedPhase by taking population information into consideration. The relative influence of family constraints and population information on haplotyping accuracy as shown in this report provides some empirical bases on assessing the trade-off of genotyping family data under different settings.Entities:
Year: 2007 PMID: 18466555 PMCID: PMC2367580 DOI: 10.1186/1753-6561-1-s1-s55
Source DB: PubMed Journal: BMC Proc ISSN: 1753-6561
Running time and error rates of fastPHASE on chromosome 6
| Error rate | |||||
| Individuals | Missing | Running time (sec) | Genotype inference | Heterozygous switch | Point-wise |
| 100 | 0.00 | 1437 | ------a | 0.4835 | 0.1569 |
| 100 | 0.05 | 1563 | 0.1106 | 0.4926 | 0.1618 |
| 100 | 0.10 | 1380 | 0.1202 | 0.4967 | 0.1639 |
| 100 | 0.15 | 1671 | 0.1267 | 0.4829 | 0.1614 |
| 200 | 0.00 | 4679 | ------ | 0.4964 | 0.1609 |
| 200 | 0.05 | 4620 | 0.0922 | 0.4960 | 0.1620 |
| 200 | 0.10 | 4673 | 0.0979 | 0.4966 | 0.1635 |
| 200 | 0.15 | 4382 | 0.1064 | 0.4961 | 0.1631 |
| 400 | 0.00 | 9417 | ------ | 0.5035 | 0.1634 |
| 400 | 0.05 | 9411 | 0.0904 | 0.5093 | 0.1666 |
| 400 | 0.10 | 9360 | 0.0966 | 0.4880 | 0.1609 |
| 400 | 0.15 | 8524 | 0.1024 | 0.4959 | 0.1652 |
a------, Inference error rate is not measured because no missingness exists in this row.
Running time and error rates of PedPhase on chromosome 6
| Error rate | |||||
| Individuals | Missing | Running time (sec) | Genotype inference | Heterozygous switch | Point-wise |
| 100 | 0.00 | 10 | ------a | 0.0061 | 0.0011 |
| 100 | 0.05 | 11 | 0.1165 | 0.0215 | 0.0088 |
| 100 | 0.10 | 15 | 0.1439 | 0.0412 | 0.0190 |
| 100 | 0.15 | 16 | 0.1731 | 0.560 | 0.0279 |
| 200 | 0.00 | 24 | ------ | 0.0058 | 0.0022 |
| 200 | 0.05 | 25 | 0.1223 | 0.0231 | 0.0099 |
| 200 | 0.10 | 27 | 0.1291 | 0.0398 | 0.0181 |
| 200 | 0.15 | 35 | 0.1513 | 0.0517 | 0.0256 |
| 400 | 0.00 | 51 | ------ | 0.0056 | 0.0015 |
| 400 | 0.05 | 55 | 0.1240 | 0.0232 | 0.0094 |
| 400 | 0.10 | 61 | 0.1418 | 0.0353 | 0.0179 |
| 400 | 0.15 | 66 | 0.1628 | 0.0459 | 0.0238 |
a------, Inference error rate is not measured because no missingness exists in this row.
Figure 1Comparison of running time and error rates between fastPHASE and PedPhase on chromosome 6. Panels 1, 2, 3, and 4 show the running time, genotype inference error rate, heterozygous switch error rate, and point-wise error rate of PedPhase and fastPHASE on different testing categories, e.g., 200_0.10 means 200 individuals with missing rate 0.1.