| Literature DB >> 21143908 |
Ryan Abo1, Jathine Wong, Alun Thomas, Nicola J Camp.
Abstract
BACKGROUND: Genomewide association studies have resulted in a great many genomic regions that are likely to harbor disease genes. Thorough interrogation of these specific regions is the logical next step, including regional haplotype studies to identify risk haplotypes upon which the underlying critical variants lie. Pedigrees ascertained for disease can be powerful for genetic analysis due to the cases being enriched for genetic disease. Here we present a Monte Carlo based method to perform haplotype association analysis. Our method, hapMC, allows for the analysis of full-length and sub-haplotypes, including imputation of missing data, in resources of nuclear families, general pedigrees, case-control data or mixtures thereof. Both traditional association statistics and transmission/disequilibrium statistics can be performed. The method includes a phasing algorithm that can be used in large pedigrees and optional use of pseudocontrols.Entities:
Mesh:
Year: 2010 PMID: 21143908 PMCID: PMC3016409 DOI: 10.1186/1471-2105-11-592
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Haplotype phasing accuracy and timing results for one data set.
| Missing data rates (%) | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 5 | 10 | 15 | |||||||
| Data | nloci | Phasing type | accuracy | time(s) | accuracy | time (s) | accuracy | time (s) | accuracy | time (s) |
| CC | 5 | new* | 0.87 | 1.49 | 0.85 | 1.50 | 0.82 | 1.94 | 0.80 | 2.95 |
| HAPLORE‡ | 0.87 | 1.75 | 0.85 | 2.09 | 0.82 | 1.74 | 0.80 | 2.4 | ||
| GCHap† | 0.87 | 1.16 | 0.85 | 1.28 | 0.82 | 1.66 | 0.80 | 1.44 | ||
| 10 | new | 0.62 | 5.94 | 0.57 | 10.13 | 0.53 | 15.54 | 0.49 | 30.21 | |
| HAPLORE | 0.61 | 20.44 | 0.56 | 32.0 | 0.52 | 42.73 | 0.49 | 56.51 | ||
| GCHap | 0.57 | 2.56 | 0.53 | 4.44 | 0.49 | 5.19 | 0.46 | 7.53 | ||
| 15 | new | 0.36 | 42.74 | 0.33 | 122.48 | 0.28 | 316.55 | 0.26 | 1260.62 | |
| HAPLORE | 0.36 | 90.38 | 0.32 | 147.25 | 0.27 | 167.37 | 0.21 | 302.84 | ||
| GCHap | 0.30 | 4.84 | 0.27 | 8.17 | 0.23 | 11.36 | 0.22 | 17.86 | ||
| TRIO | 5 | new | 0.98 | 1.53 | 0.95 | 1.72 | 0.92 | 2.19 | 0.90 | 2.49 |
| HAPLORE | 0.98 | 1.47 | 0.95 | 1.50 | 0.92 | 1.17 | 0.90 | 1.47 | ||
| GCHap | 0.88 | 1.15 | 0.85 | 1.38 | 0.82 | 1.54 | 0.80 | 1.60 | ||
| 10 | new | 0.95 | 2.81 | 0.89 | 4.45 | 0.84 | 6.93 | 0.77 | 13.54 | |
| HAPLORE | 0.95 | 4.48 | 0.89 | 7.39 | 0.84 | 10.99 | 0.77 | 35.09 | ||
| GCHap | 0.59 | 3.36 | 0.55 | 6.53 | 0.51 | 7.44 | 0.47 | 10.69 | ||
| 15 | new | 0.92 | 4.52 | 0.81 | 8.15 | 0.73 | 15.13 | 0.65 | 107.28 | |
| HAPLORE | 0.90 | 12.50 | 0.80 | 56.25 | - | - | - | - | ||
| GCHap | 0.36 | 7.63 | 0.31 | 11.45 | 0.27 | 16.55 | 0.24 | 29.08 | ||
| ASP | 5 | new | 0.99 | 1.05 | 0.98 | 1.61 | 0.96 | 1.59 | 0.95 | 2.00 |
| HAPLORE | 0.99 | 0.61 | 0.98 | 0.74 | 0.96 | 0.60 | 0.95 | 0.67 | ||
| GCHap | 0.89 | 0.98 | 0.86 | 1.35 | 0.84 | 1.40 | 0.81 | 1.43 | ||
| 10 | new | 0.97 | 2.22 | 0.95 | 2.47 | 0.92 | 3.30 | 0.89 | 3.53 | |
| HAPLORE | 0.97 | 2.06 | 0.95 | 2.53 | 0.92 | 3.82 | 0.89 | 4.34 | ||
| GCHap | 0.60 | 2.49 | 0.56 | 3.74 | 0.53 | 5.17 | 0.48 | 6.34 | ||
| 15 | new | 0.93 | 2.99 | 0.91 | 3.64 | 0.85 | 4.5 | 0.80 | 7.90 | |
| HAPLORE | 0.91 | 3.61 | 0.89 | 32.64 | - | - | - | - | ||
| GCHap | 0.37 | 5.31 | 0.31 | 7.55 | 0.28 | 9.66 | 0.24 | 15.59 | ||
| LP1 | 5 | new | 0.99 | 2.04 | 0.98 | 1.88 | 0.98 | 1.96 | 0.97 | 2.02 |
| HAPLORE | - | - | - | - | - | - | - | - | ||
| GCHap | 0.87 | 1.60 | 0.86 | 1.64 | 0.85 | 1.69 | 0.85 | 1.82 | ||
| 10 | new | 0.98 | 3.45 | 0.97 | 3.80 | 0.96 | 3.94 | 0.95 | 5.27 | |
| HAPLORE | - | - | - | - | - | - | - | - | ||
| GCHap | 0.63 | 4.90 | 0.61 | 6.46 | 0.59 | 7.00 | 0.59 | 6.54 | ||
| 15 | new | 0.96 | 6.76 | 0.95 | 8.23 | 0.93 | 10.85 | 0.92 | 54.92 | |
| HAPLORE | - | - | - | - | - | - | - | - | ||
| GCHap | 0.45 | 8.17 | 0.42 | 9.29 | 0.40 | 10.72 | 0.39 | 15.35 | ||
*new pedigree-informed phasing algorithm
‡HAPLORE (pedigree-informed)
† GCHap (pedigree naïve)
CC (Case Control): 500 cases, 500 controls = 1000 individuals (1000 genotyped)
TRIO (both parents and one offspring): 500 trios = 1500 total individuals (1500 genotyped)
ASP (Affected Sib Pairs and parents): 250 ASPs = 1000 total individuals (1000 genotyped)
LP1 (Large Pedigree) = 5 generational pedigree ~5800 total individuals (~1500 genotyped)
- program failed.
Type I error rates and power† for all data sets and statistics.
| CC* | TRIO# | ASP* | LP1# | LP2* | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| EC | PC | TDT | EC | PC | TDT | EC | PC | EC | PC | |||
| pedigree-informed null | NA | 0.046 | 0.044 | 0.045 | 0.052 | 0.051 | 0.051 | 0.049 | 0.047 | 0.054 | 0.054 | |
| pedigree-naïve null | 0.058 | 0.056 | 0.048 | 0.055 | 0.061 | 0.061 | 0.058 | 0.062 | 0.047 | 0.054 | 0.056 | |
| Freq risk hap | GRR | |||||||||||
| 0.17 | 2.0 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
| 1.5 | 0.910 | 0.840 | 0.846 | 0.846 | 0.904 | 0.914 | 0.916 | 0.970 | 0.974 | 0.900 | 0.892 | |
| 1.35 | 0.672 | 0.620 | 0.634 | 0.624 | 0.640 | 0.650 | 0.660 | 0.786 | 0.778 | 0.632 | 0.616 | |
| 1.2 | 0.280 | 0.276 | 0.290 | 0.282 | 0.256 | 0.264 | 0.264 | 0.358 | 0.364 | 0.254 | 0.268 | |
| 0.10 | 2.0 | 0.994 | 0.994 | 0.994 | 0.994 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
| 1.5 | 0.754 | 0.712 | 0.722 | 0.726 | 0.728 | 0.732 | 0.738 | 0.896 | 0.900 | 0.762 | 0.742 | |
| 1.35 | 0.456 | 0.422 | 0.434 | 0.428 | 0.378 | 0.406 | 0.408 | 0.604 | 0.572 | 0.448 | 0.428 | |
| 1.2 | 0.210 | 0.162 | 0.166 | 0.170 | 0.218 | 0.238 | 0.242 | 0.256 | 0.262 | 0.186 | 0.166 | |
| 0.07 | 2.0 | 0.990 | 0.966 | 0.968 | 0.966 | 0.994 | 0.994 | 0.992 | 1.000 | 1.000 | 0.998 | 0.996 |
| 1.5 | 0.642 | 0.574 | 0.586 | 0.596 | 0.620 | 0.646 | 0.642 | 0.820 | 0.822 | 0.688 | 0.666 | |
| 1.35 | 0.399 | 0.300 | 0.312 | 0.312 | 0.368 | 0.400 | 0.396 | 0.554 | 0.538 | 0.368 | 0.360 | |
| 1.2 | 0.168 | 0.146 | 0.161 | 0.166 | 0.143 | 0.151 | 0.154 | 0.252 | 0.252 | 0.146 | 0.140 | |
| 0.04 | 2.0 | 0.855 | 0.777 | 0.798 | 0.794 | 0.867 | 0.886 | 0.880 | 0.992 | 0.986 | 0.928 | 0.920 |
| 1.5 | 0.363 | 0.323 | 0.348 | 0.338 | 0.343 | 0.378 | 0.376 | 0.624 | 0.610 | 0.474 | 0.464 | |
| 1.35 | 0.245 | 0.158 | 0.184 | 0.174 | 0.196 | 0.236 | 0.230 | 0.304 | 0.317 | 0.241 | 0.211 | |
| 1.2 | 0.096 | 0.139 | 0.152 | 0.141 | 0.087 | 0.114 | 0.104 | 0.152 | 0.128 | 0.118 | 0.112 | |
P-values between (0.036, 0.0635) for 1000 replicates are consistent with a valid 0.05 type 1 error rate.
EC = Explicit controls, PC = pseudocontrols, TDT = transmission disequilibrium test
* = 500 cases, 500 controls
# = 500 cases, 1,000 controls
† Power is shown for the hapMC with the pedigree-informed MLE estimation
NA = pedigree informed phasing not applicable to case-control.
Type I error rates and power† for mixed resource study designs.
| CC | TRIO | ASP | LP1 | LP2 | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| EC | PC | PC | EC | EC | TRIOCC | ASPCC | LP1CC | LP2CC | ||
| Total sample size | 1,000* | 1,500† | 1,000 | 1,500 | 1,000 | 2,500 | 2,000 | 2,500 | 2,000 | |
| pedigree-informed null | NA | 0.044 | 0.051 | 0.049 | 0.054 | 0.051 | 0.039 | 0.042 | 0.044 | |
| pedigree-naive null | 0.058 | 0.048 | 0.061 | 0.062 | 0.054 | 0.053 | 0.046 | 0.048 | 0.060 | |
| Freq risk hap | GRR | |||||||||
| 0.17 | 2.0 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
| 1.5 | 0.910 | 0.846 | 0.914 | 0.970 | 0.900 | 0.996 | 0.998 | 0.996 | 0.998 | |
| 1.35 | 0.672 | 0.634 | 0.650 | 0.786 | 0.632 | 0.904 | 0.912 | 0.934 | 0.918 | |
| 1.2 | 0.280 | 0.290 | 0.264 | 0.358 | 0.254 | 0.516 | 0.518 | 0.566 | 0.500 | |
| 0.10 | 2.0 | 0.994 | 0.994 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
| 1.5 | 0.754 | 0.722 | 0.732 | 0.896 | 0.762 | 0.950 | 0.960 | 0.984 | 0.978 | |
| 1.35 | 0.456 | 0.434 | 0.406 | 0.604 | 0.448 | 0.744 | 0.730 | 0.804 | 0.788 | |
| 1.2 | 0.210 | 0.166 | 0.238 | 0.256 | 0.186 | 0.324 | 0.390 | 0.372 | 0.358 | |
| 0.07 | 2.0 | 0.990 | 0.968 | 0.994 | 1.000 | 0.998 | 1.000 | 1.000 | 1.000 | 1.000 |
| 1.5 | 0.642 | 0.586 | 0.646 | 0.820 | 0.688 | 0.866 | 0.904 | 0.952 | 0.944 | |
| 1.35 | 0.399 | 0.312 | 0.400 | 0.554 | 0.368 | 0.598 | 0.674 | 0.752 | 0.678 | |
| 1.2 | 0.168 | 0.161 | 0.151 | 0.252 | 0.146 | 0.250 | 0.264 | 0.336 | 0.294 | |
| 0.04 | 2.0 | 0.855 | 0.798 | 0.886 | 0.992 | 0.928 | 0.981 | 0.990 | 0.998 | 0.998 |
| 1.5 | 0.363 | 0.348 | 0.378 | 0.624 | 0.474 | 0.618 | 0.616 | 0.746 | 0.726 | |
| 1.35 | 0.245 | 0.184 | 0.236 | 0.304 | 0.241 | 0.340 | 0.411 | 0.430 | 0.429 | |
| 1.2 | 0.096 | 0.152 | 0.114 | 0.152 | 0.118 | 0.240 | 0.167 | 0.200 | 0.176 | |
Results from most powerful statistic for each single resource analyses and mixed resource analyses. All controls within LP1 and LP2 are familial controls.
* = 500 cases, 500 controls
† = 500 cases, 1,000 controls
† Power is shown for the hapMC with the pedigree-informed MLE estimation
NA = pedigree informed phasing not applicable to case-control null.
Example of preprocessing step 1, loading genotype data into the six n-locus bit variables (n = 5).
| M1 | M2 | M3 | M4 | M5 | |
|---|---|---|---|---|---|
| 12 | 00 | 11 | 12 | 22 | |
| Homozygous† | 0 | 0 | 1 | 0 | 1 |
| Heterozygous§ | 1 | 0 | 0 | 1 | 0 |
| Unphased** | 1 | 0 | 0 | 1 | 0 |
| Set‡ | 0 | 0 | 1 | 0 | 1 |
| Missing†† | 0 | 1 | 0 | 0 | 0 |
| Value§§ | 0 | 0 | 0 | 0 | 1 |
| Set | 0 | 0 | 1 | 0 | 1 |
| Missing | 0 | 1 | 0 | 0 | 0 |
| Value | 0 | 0 | 0 | 0 | 1 |
*11 indicates a homozygous genotype for the major allele; 12 a heterozygous genotype, and 22 a homozygous genotype for the minor allele; 00 indicates missing genotype.
†Homozygous 0 indicates the positions that are heterozygous or missing and 1 indicates the homozygous positions.
§Heterozygous 0 indicates the positions that are homozygous or missing and 1 indicates the heterozygous positions.
**Unphased 0 indicates the positions that are phased or missing and 1 indicates a heterozygous position without known phase.
‡Set 0 indicates the positions that have not been assigned an allele (i.e. unphased or missing) and 1 indicates the allele value and phase is known.
††Missing 0 indicates the positions that have an observed allele value and 1 indicates positions that are missing.
§§Value 0 indicates the positions that have the major allele or unknown and 1 indicates positions that have the minor allele.
Inheritance and transmission rules.
| Rule | Description |
|---|---|
| Inheritance | Indicates which haplotype is received by the offspring (characteristic of offspring haplotypes). |
| The parental source of an offspring haplotype can be established using exclusion. That is, once an offspring haplotype is known not to be from one parent, it is must be from the other parent. | |
| Exclusion can be determined if a haplotype has an allele not found within a parent's genotypes or the haplotype does not match either of a parent's set haplotypes. | |
| Transmission | Indicates which haplotype is transmitted by the parent (characteristic of parental haplotypes). |
| Which haplotype is transmitted from a parent to an offspring can be established using exclusion. That is, once a parental haplotype is excluded as being either haplotype in an offspring, then the alternate parental haplotype must be the transmitted one. | |
| A conditional exclusion can be determined by examining the situation where one parental haplotype was transmitted to the offspring and check if the complimentary haplotype from the offspring's genotypes could be inherited from the other parent. | |
Figure 1Example of a parent-to-offspring homozygous update using bit-variables. The parent's homozygous variable (homparent) is used to update variables in the offspring. In this example, offspring haplotype 1 has been chosen for update. Variables listed on the left in the trio drawing are the current states for the offspring and parent. Variables listed in the panel on the right and indicated by (new) are the updated states. Panel A. Logical "AND" operation determines which loci are homozygous in the parent and unphased in the offspring. The result (a.) indicates positions (value = 1) where updates can be made to the setoffspring, valueoffspring and unphasedoffspring variables. In this example, 3 positions can be updated (the 5th, 8th and 9th). Panel B. Similar to panel A, but for the missingoffspring variable. In this example, no positions can be updated for this variable (all position in b. = 0). Panels C-E. Logical operations "OR", exclusive OR ("XOR") and "AND" are used to determine the new updated versions of variables setoffspring, unphasedoffspring and valueoffspring.
Figure 2Simulated pedigree structures. A. Case-offspring trios (TRIO) B. Affected sib-pairs with parents (ASP) C. Five generation large pedigrees (LP). Black filled shapes are affected individuals (cases), white filled shapes are unaffected (controls) and grey filled are unknown.