| Literature DB >> 25956955 |
Dingge Ying1,2, Pak Chung Sham3,4, David Keith Smith5, Lu Zhang6, Yu Lung Lau7, Wanling Yang8,9.
Abstract
Recent founder mutations may play important roles in complex diseases and Mendelian disorders. Detecting shared haplotypes that are identical by descent (IBD) could facilitate discovery of these mutations. Several programs address this, but are usually limited to detecting pair-wise shared haplotypes and not providing a comparison of cases and controls. We present a novel algorithm and software package, HaploShare, which detects extended haplotypes that are shared by multiple individuals, and allows comparisons between cases and controls. Testing on simulated and real cases demonstrated significant improvements in detection power and reduction of false positive rate by HaploShare relative to other programs.Entities:
Mesh:
Year: 2015 PMID: 25956955 PMCID: PMC4432975 DOI: 10.1186/s13059-015-0662-9
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Figure 1Flow chart of the steps taken by HaploShare.
Figure 2Detection of simulated founder haplotypes. The dashed curves are the null distributions of log likelihood ratio derived from controls (step 6, Figure 1) and the solid curves are those of the simulated founder haplotypes. The dashed line perpendicular to x-axis represents the cutoff based on the separation of 95/5% of area under curve of the null distribution. (A) Effect of the age of the simulated founder haplotypes on detection sensitivity. The left panel shows the null distribution of the log likelihood ratio from controls (dashed curves) and the simulated founder haplotypes of 10, 20, and 30 generations, respectively (solid curves). The bar chart on the right is the detailed adjusted detection power for founder haplotypes of different ages (generations). The simulations were based on 10 individuals sharing an ancestral haplotype among a total of 1,000 individuals. Data on 1,000 individuals were used as controls. (B) Effect of the number of individuals sharing a simulated founder haplotype on detection sensitivity. The left three figures show the separation of log likelihood ratios for the recent ancestral haplotypes from that of the controls when four, seven, and 10 individuals among 1,000 cases share an ancestral haplotype IBD of 10 generations, respectively. They were also evaluated against 1,000 controls. The bar chart on the right shows the detailed detection sensitivity difference by the number of individuals sharing a common ancestral haplotype. (C) Effect of the total pool of cases considered on detection sensitivity. The three figures on the left show results when five individuals sharing a common recent ancestral haplotype of 10 generations in age, and are evaluated in a pool of 500, 800, and 1,000 cases, respectively. In each case, 1,000 individuals were used as controls in the evaluation process. The bar chart on the right shows the detection sensitivity in the three different scenarios.
Comparison of adjusted power on detecting pair-wise haplotype-sharing IBD between HaploShare, BEAGLE, GERMLINE, PLINK, DASH, and IBD-Groupon
|
|
|
|
|
|
|
|
|
|
| |||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
| ||
| 50 | 1.14 | 45.1 | 47.8 | 30.2 | 38.7 | 36.6 | 30.8 | 0.0 | 41.9 | 36.4 | 29.2 | |
| 40 | 1.28 | 51.9 | 54.6 | 35.9 | 46.9 | 41.4 | 40.8 | 0.0 | 44.7 | 40.3 | 39.3 | |
| 30 | 1.53 | 94.8 | 95.1 | 66.6 | 81.4 | 49.2 | 48.8 | 19.1 | 49.2 | 47.7 | 49.2 | |
| 20 | 2.35 | 97.7 | 97.8 | 80.0 | 90.1 | 73.2 | 73.9 | 39.2 | 61.1 | 62.3 | 73.7 | |
| 10 | 4.18 | 100.0 | 100.0 | 94.6 | 96.9 | 97.1 | 96.2 | 65.6 | 70.1 | 72.6 | 96.5 | |
All results shown here are detections of pair-wise sharing reaching threshold based on respective thresholds in respective programs, no comparison to controls was involved.
aFor HaploShare, GERMLINE, DASH, and PLINK, it is the minimum genetic distance (or physical distance) of the haplotypes shared pair-wise to be selected for further analysis; for BEAGLE, it is the prior IBD probability; for BEAGLE fastIBD, it is the fastIBD score, and for IBD-Groupon, it is also the fastIBD score.
bHaplotypes shared pair-wise surpassing a certain genetic distance (0.5 or 1 cM) are further analyzed for sharing by multiple individuals, which by itself increases power of detecting more individuals sharing an extended haplotype (Step 4, Figure 1).
cAnalysis stops at only detecting haplotypes shared pair-wise without merging pair-wise sharing into sharing by multiple individuals (stops at Step 3, Figure 1).
Adjusted power and rank of the simulated founder haplotypes among all the regions found by HaploShare, BEAGLE fastIBD, and DASH
|
|
|
|
| |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Threshold | Pair-wise IBD >1.0 cM, | Pair-wise IBD >0.5 cM, | fastIBD score <10−10 | <10−12 15 found | Pair-wise IBD >0.5 cM | |||||||
| Total number of haplotypes founda | 5 found | 32 found | 139 found | 89 found | ||||||||
| Number of samples sharing the simulated haplotype | 2 | 5 | 10 | 2 | 5 | 10 | 2 | 2 | 2 | 5 | 10 | |
| Age of simulated founder haplotype (generations) | 10 | 1 (74) | 1 (83) | 1 (99) | 4 (91) | 3 (95) | 3 (99) | 32 (95) | 11 (35) | 50 (65) | 46(75) | 38 (82) |
| 20 | 1 (60) | 1 (78) | 1 (86) | 10 (73) | 8 (85) | 6 (90) | 45 (69) | 13 (21) | 60 (53) | 57 (65) | 54 (71) | |
| 30 | 2 (34) | 2 (47) | 1 (60) | 20 (50) | 19 (64) | 15 (72) | 88 (41) | 14 (16) | 68 (41) | 65 (49) | 64 (52) | |
| 40 | 3 (19) | 3 (26) | 2 (39) | 24 (39) | 23 (47) | 21 (53) | 107 (34) | 15 (12) | 75 (35) | 75 (39) | 75 (45) | |
| 50 | 5 (09) | 4 (13) | 3 (23) | 27 (32) | 27 (39) | 27 (41) | 125 (28) | 15 (11) | 76 (30) | 80 (33) | 80 (37) | |
Shown are the rank and (detection power %) of the simulated haplotypes.
aTotal extended haplotypes found are an average of all the extended haplotypes shared reaching significance from 100 simulations, which usually (depending on power) includes one simulated founder haplotype shared by two, five, or 10 samples in each simulation. The total sample size in each simulation is 1,000 and haplotypes shared by 20 samples (2%) or fewer were analyzed. When 1 cM was used as the threshold for selecting haplotypes shared pair-wise, there were five haplotypes reaching significance on average, including the simulated founder haplotype. When 0.5 cM was used as the threshold, there were 32 haplotypes reaching significance. Any of the haplotypes found repeatedly in different simulations were only counted once.
Comparison of computing time between HaploShare, BEAGLE, GERMLINE PLINK, and DASH
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
|
|
|
|
|
|
| |
| Phased genotype | 21 h | 115 h | 4 min | 2 min | 5 min | N/a |
| Unphased genotype | 29 h | 116 h | 38 min | N/a | N/a | 3 min |
Figure 3Illustration of an extended haplotype shared by multiple individuals. In this example, four individuals share an extended haplotype in a region composed of 23 SNPs with six LD haplotype blocks. Two blocks form the core haplotype that is shared by all four individuals. The shared haplotype in each LD block (middle panel) and their frequencies (lower panel) are displayed below each block. For block 5, three samples share both haplotypes. In this case, the haplotype with a higher frequency is used during the evaluation process.
Figure 4Detection of the haplotypes carrying the RET mutation. The RET mutation region spans 7 Mb in physical distance but only 1 cM in genetic distance, which contains 17 LD blocks. All 14 cases share the core haplotype composed of blocks 8 to 11; and the rest of the region is shared by at least two cases at any given point. The frequency of the shared haplotypes in different blocks varies from less than 1% to 60% among the Hong Kong Chinese population.