| Literature DB >> 19751505 |
Hae-Won Uh1, Jeanine J Houwing-Duistermaat, Hein Putter, Hans C van Houwelingen.
Abstract
BACKGROUND: In haplotype-based candidate gene studies a problem is that the genotype data are unphased, which results in haplotype ambiguity. The R(h)(2) measure 1 quantifies haplotype predictability from genotype data. It is computed for each individual haplotype, and for a measure of global relative efficiency a minimum R(h)(2) value is suggested. Alternatively, we developed methods directly based on the information content of haplotype frequency estimates to obtain global relative efficiency measures: R(A)(2) and R(D)(2) based on A- and D-optimality, respectively. All three methods are designed for single populations; they can be applied in cases only, controls only or the whole data. Therefore they are not necessarily optimal for haplotype testing in case-control studies.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19751505 PMCID: PMC2760579 DOI: 10.1186/1471-2156-10-54
Source DB: PubMed Journal: BMC Genet ISSN: 1471-2156 Impact factor: 2.797
Haplotype frequency estimates of hipROA data
| 1 | 111 | 0.36 | 0.25 | 0.37 |
| 2 | 112 | 0.08 | 0.15 | 0.07 |
| 3 | 121 | 0.16 | 0.27 | 0.15 |
| 4 | 122 | 0.16 | 0.18 | 0.16 |
| 5 | 211 | 0.20 | 0.12 | 0.21 |
| 6 | 212 | 0.02 | 0.03 | 0.02 |
| 7 | 221 | 0.02 | 0.01 | 0.03 |
Global relative efficiency.
| control | 653 | 212 | 32.5 | 59.3 | 86.4 | 89.8 | 82.3 | 92.6 |
| case | 61 | 22 | 36.1 | 77.9 | 81.4 | 78.5 | ||
| control | 500 | 174 | 34.8 | 63.7 | 85.4 | 79.8 | 83.2 | 93.3 |
| case | 500 | 181 | 36.2 | 53.9 | 88.6 | 77.3 | ||
For each group min(), and values were given, and for a the case-control study value was computed in terms of power of the global statistic T in (8). The subscript 2,3 indicates the relative efficiency of the haplotypes 112 and 121. 1Results from one simulated sample.
Selection strategy for the subset based on information without taking into account correlations between haplotype frequency estimates.
| Cases | 1HH | 10 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 1.00 | |
| HHH | 7 | 0 | 0.03 | 0.18 | 0.19 | 0.19 | 0.18 | 0.03 | 0.79 | ||
| H1H | 2 | 0.19 | 0.19 | 0 | 0 | 0.19 | 0.19 | 0 | 0.77 | ||
| HH1 | 3 | 0.04 | 0 | 0.040 | 0 | 0.04 | 0 | 0.040 | 0.16 | ||
| no ambiguity | 39 | ||||||||||
| loss per haplotype | 3.00 | 3.07 | 3.85 | 3.83 | 1.85 | 1.62 | 0.31 | 17.52 | |||
| Controls | H1H | 28 | 0.21 | 0.21 | 0 | 0 | 0.21 | 0.21 | 0 | 0.83 | |
| HH1 | 46 | 0.18 | 0 | 0.18 | 0 | 0.18 | 0 | 0.18 | 0.72 | ||
| 1HH | 91 | 0.12 | 0.12 | 0.12 | 0.12 | 0 | 0 | 0 | 0.49 | ||
| HHH | 47 | 0 | 0.04 | 0.06 | 0.10 | 0.10 | 0.06 | 0.04 | 0.40 | ||
| no ambiguity | 441 | ||||||||||
| loss per haplotype | 25.29 | 19.09 | 22.23 | 15.760 | 18.59 | 8.52 | 10.34 | 119.81 | |||
| Cases | 1HH | 83 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 1.00 | |
| HHH | 40 | 0 | 0.03 | 0.11 | 0.13 | 0.13 | 0.11 | 0.03 | 0.55 | ||
| H1H | 26 | 0.11 | 0.11 | 0 | 0 | 0.11 | 0.11 | 0 | 0.15 | ||
| HH1 | 32 | 0.04 | 0 | 0.04 | 0 | 0.04 | 0 | 0.04 | 0.15 | ||
| no ambiguity | 319 | ||||||||||
| loss per haplotype | 24.80 | 24.94 | 26.26 | 26.07 | 9.36 | 7.17 | 2.52 | ||||
| Controls | H1H | 25 | 0.23 | 0.23 | 0 | 0 | 0.23 | 0.23 | 0 | 0.93 | |
| HH1 | 36 | 0.21 | 0 | 0.21 | 0 | 0.21 | 0 | 0.21 | 0.83 | ||
| HHH | 43 | 0 | 0.05 | 0.06 | 0.11 | 0.11 | 0.06 | 0.05 | 0.44 | ||
| 1HH | 70 | 0 | 0.11 | 0.11 | 0.11 | 0.11 | 0 | 0 | 0.42 | ||
| no ambiguity | 326 | ||||||||||
| loss per haplotype | 20.68 | 15.23 | 17.65 | 11.89 | 17.86 | 8.61 | 9.55 | 101.47 | |||
The group identifiers denote the genotype at the SNPs, where 1 and 2 stand for homozygote 1/1 and 2/2, and H denotes a heterozygote. The order of the group identifications are determined by the sum of the diagonal elements - the column "loss per genotype" - of the loss matrix ℒin (3). Individuals with higher loss will results in higher information gain, when their ambiguity could be resolved. The values of the last row, "loss per haplotype", show information loss per haplotype. The simulated data set is the same sample data set as in Table 2.
Figure 1Forward stepwise selection of informative individuals and the corresponding increase in . To gain information efficiently forward stepwise selection of the most informative individuals is employed for maximizing the power of global test T, for the real hipROA data (upper panels: n(case) = 61 and n(control) = 653) and a comparable simulated data (lower panels: n(case) = n(control) = 500). (i) The left panels: The points represent the selection by . The groups in the y-labels are ordered as in Table 3: the upper part 1HH, HHH, H1H, HH1 represents the selection order for cases, and the lower part selection order H1H, HH1, 1HH, HHH for controls using the real data. Consequently, the jumps between the groups, and cases and controls are caused by using different methods. (ii) The right panels show the increase in by resolving phase uncertainty.