| Literature DB >> 17261196 |
Shuying Sue Li1, Jacob Jen-Hao Cheng, Lue Ping Zhao.
Abstract
BACKGROUND: The completion of the HapMap project has stimulated further development of haplotype-based methodologies for disease associations. A key aspect of such development is the statistical inference of individual diplotypes from unphased genotypes. Several methodologies for inferring haplotypes have been developed, but they have not been evaluated extensively to determine which method not only performs well, but also can be easily incorporated in downstream haplotype-based association analyses. In this paper, we attempt to do so. Our evaluation was carried out by comparing the two leading Bayesian methods, implemented in PHASE and HAPLOTYPER, and the two leading empirical methods, implemented in PL-EM and HPlus. We used these methods to analyze real data, namely the dense genotypes on X-chromosome of 30 European and 30 African trios provided by the International HapMap Project, and simulated genotype data. Our conclusions are based on these analyses.Entities:
Mesh:
Year: 2007 PMID: 17261196 PMCID: PMC1803795 DOI: 10.1186/1471-2156-8-2
Source DB: PubMed Journal: BMC Genet ISSN: 1471-2156 Impact factor: 2.797
Performances of haplotyping methods on analyzing 500 randomly selected haplotype blocks of the 30 European mothers' genotypes on X-chromosome from the HapMap data.
| Mean | 0.989 | 0.990 | 0.986 | 0.991 |
| Median | 1.0 | 1.0 | 1.0 | 1.0 |
| Standard deviation | 0.029 | 0.024 | 0.040 | 0.024 |
| Range | (0.733, 1.0) | (0.833, 1.0) | (0.292,1.0) | (0.733,1.0) |
| Mean | 0.989 | 0.990 | 0.990 | 0.991 |
| Median | 1.0 | 1.0 | 1.0 | 1.0 |
| Standard deviation | 0.029 | 0.025 | 0.040 | 0.025 |
| Range | (0.733, 1.0) | (0.833, 1.0) | (0.283, 1.0) | (0.733, 1.0) |
| 98 | 20 | 9935 | 422 | |
*: HAPLOTYPER failed to resolve 18 of the 500 haplotype blocks.
Performances of haplotyping programs on analyzing 500 randomly selected haplotype blocks of the 30 African mothers' genotypes on X-chromosome from the HapMap data.
| Mean | 0.988 | 0.988 | 0.987 | 0.998 |
| Median | 1.0 | 1.0 | 1.0 | 1.0 |
| Standard deviation | 0.025 | 0.024 | 0.027 | 0.025 |
| Range | (0.833, 1.0) | (0.833, 1.0) | (0.689,1.0) | (0.833,1.0) |
| Mean | 0.987 | 0.987 | 0.987 | 0.987 |
| Median | 1.0 | 1.0 | 1.0 | 1.0 |
| Standard deviation | 0.027 | 0.027 | 0.030 | 0.027 |
| Range | (0.800, 1.0) | (0.800, 1.0) | (0.683, 1.0) | (0.800, 1.0) |
| 15 | 7 | 1174 | 37 | |
*: HAPLOTYPER failed to resolve 2 of the 500 haplotype blocks.
Figure 1The relationship between the performances of haplotyping methods and the percentage of individuals with uncertainty haplotypes. The plots illustrate for the performances (in similarity Index and Prediction Rate) of empirical methods (PL-EM and HPlus) and Bayesian methods (PHASE and HAPLOTYPER) on analyzing the 500 randomly selected haplotype blocks of the 30 European mothers' genotypes on X-chromosome.
Figure 2The relationship between the performances of haplotyping methods and the linkage disequilibrium (LD) of the haplotypes within blocks. The plots illustrate for the performances (in similarity Index and Prediction Rate) of empirical methods (PL-EM and HPlus) and Bayesian methods (PHASE and HAPLOTYPER) on analyzing the 500 randomly selected haplotype blocks of the 30 European mothers' genotypes on X-chromosome.
Performances of haplotyping programs on simulated data based on some selected genotypes from the 30 European mothers on X-chromosome from the HapMap data.
| 13 | |||||||||||||
| 100 | 0.969 | 0.972 | 0.985 | 0.979 | 0.966 | 0.970 | 0.977 | 0.978 | 0.64 | 0.16 | 53.26 | 2.61 | |
| 150 | 0.981 | 0.982 | 0.989 | 0.981 | 0.973 | 0.978 | 0.989 | 0.980 | 0.69 | 0.23 | 88.57 | 2.64 | |
| 200 | 0.986 | 0.987 | 0.993 | 0.984 | 0.985 | 0.986 | 0.992 | 0.982 | 0.86 | 0.30 | 114.88 | 5.25 | |
| 250 | 0.988 | 0.989 | 0.994 | 0.979 | 0.986 | 0.987 | 0.992 | 0.977 | 0.92 | 0.38 | 157.55 | 6.45 | |
| 300 | 0.927 | 0.992 | 0.996 | 0.984 | 0.991 | 0.992 | 0.992 | 0.982 | 0.89 | 0.40 | 194.63 | 7.77 | |
| 16 | |||||||||||||
| 100 | 0.931 | 0.933 | 0.945 | 0.930 | 0.897 | 0.901 | 0.893 | 0.905 | 3.38 | 1.34 | 108.74 | 3.31 | |
| 150 | 0.948 | 0.951 | 0.958 | 0.943 | 0.909 | 0.913 | 0.894 | 0.919 | 12.02 | 1.83 | 180.08 | 4.78 | |
| 200 | 0.961 | 0.963 | 0.967 | 0.955 | 0.923 | 0.926 | 0.907 | 0.932 | 34.81 | 2.21 | 249.74 | 6.06 | |
| 250 | 0.968 | 0.968 | 0.970 | 0.956 | 0.930 | 0.931 | 0.912 | 0.934 | 61.47 | 2.46 | 332.92 | 7.27 | |
| 300 | 0.956 | 0.971 | 0.972 | 0.956 | 0.928 | 0.928 | 0.895 | 0.932 | 64.17 | 2.85 | 421.54 | 8.67 | |
| 12 | |||||||||||||
| 100 | 0.908 | 0.918 | 0.925 | 0.913 | 0.866 | 0.873 | 0.859 | 0.884 | 1.46 | 0.36 | 76.49 | 3.49 | |
| 150 | 0.940 | 0.943 | 0.945 | 0.933 | 0.896 | 0.899 | 0.862 | 0.899 | 1.52 | 0.49 | 127.48 | 5.08 | |
| 200 | 0.956 | 0.958 | 0.957 | 0.938 | 0.904 | 0.906 | 0.880 | 0.901 | 1.61 | 0.55 | 173.51 | 6.61 | |
| 250 | 0.964 | 0.964 | 0.963 | 0.944 | 0.915 | 0.917 | 0.897 | 0.909 | 1.64 | 0.69 | 202.33 | 8.85 | |
| 300 | 0.971 | 0.971 | 0.969 | 0.948 | 0.921 | 0.921 | 0.893 | 0.914 | 1.67 | 0.75 | 278.34 | 10.13 | |
*: Analysis results of the 30 mothers' genotypes on X-chromosome from the HapMap data
+: HAPLO is short for HAPLOTYPER
Performances of haplotyping programs on simulated data based on a coalescence model with mutation rate of 4 (= 4Neθ).
| 25 | 100 | 0.943 | 0.947 | 0.982 | 0.976 | 0.941 | 0.945 | 0.981 | 0.976 | 1.39 | 0.18 | 122.89 | 2.52 |
| 25 | 150 | 0.955 | 0.960 | 0.988 | 0.986 | 0.952 | 0.957 | 0.988 | 0.986 | 2.56 | 0.23 | 185.86 | 3.09 |
| 26 | 200 | 0.967 | 0.971 | 0.991 | 0.988 | 0.964 | 0.968 | 0.988 | 0.988 | 5.51 | 0.29 | 283.28 | 4.36 |
| 29 | 250 | 0.974 | 0.977 | 0.992 | 0.986 | 0.973 | 0.976 | 0.993 | 0.980 | 12.37 | 0.36 | 429.03 | 5.75 |
+: HAPLO is short for HAPLOTYPER