| Literature DB >> 23812980 |
Dan He1.
Abstract
MOTIVATION: Detecting IBD tracts is an important problem in genetics. Most of the existing methods focus on detecting pairwise IBD tracts, which have relatively low power to detect short IBD tracts. Methods to detect IBD tracts among multiple individuals simultaneously, or group-wise IBD tracts, have better performance for short IBD tracts detection. Group-wise IBD tracts can be applied to a wide range of applications, such as disease mapping, pedigree reconstruction and so forth. The existing group-wise IBD tract detection method is computationally inefficient and is only able to handle small datasets, such as 20, 30 individuals with hundreds of SNPs. It also requires a previous specification of the number of IBD groups, or partitions of the individuals where all the individuals in the same partition are IBD with each other, which may not be realistic in many cases. The method can only handle a small number of IBD groups, such as two or three, because of scalability issues. What is more, it does not take LD (linkage disequilibrium) into consideration.Entities:
Mesh:
Year: 2013 PMID: 23812980 PMCID: PMC3694672 DOI: 10.1093/bioinformatics/btt237
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Flow chart of IBD-Groupon. (a) Identify the maximum IBD chunks from pairwise IBD relationships. (b) Running example of IBD-Groupon for the first maximum IBD chunk, where haplotypes 1, 2 and 3 are IBD, haplotypes 2, 3 and 4 are IBD and haplotypes 5 and 6 are IBD. (b.a) The IBD graph. (b.b) The bipartite graph where node A corresponds to the maximal clique [1,2,3], node B corresponds to the maximal clique [2,3,4], node C corresponds to the maximal clique [5,6]. (b.c) The state graph. (c) The full HMM built from the pairwise IBD relationships. As there are six chunks, there are six states. Only the first state has two values. Each state emits the pairwise IBD relationships for the corresponding chunk
Performance of MCMC versus IBD-Groupon on simulated data
| Method | True positive | Average true positive length | No. of false positive | Average false positive length |
|---|---|---|---|---|
| MCMC | 0.9 | 33.6 | 16 | 16.9 |
| IBD-Groupon (t = 10E-8) | 0 | 0 | 0 | 0 |
| IBD-Groupon (t = 10E-7) | 0.3 | 27.8 | 0 | 0 |
| IBD-Groupon (t = 10E-6) | 0.55 | 27 | 0 | 0 |
| IBD-Groupon (t = 10E-5) | 0.6 | 27 | 0 | 0 |
| IBD-Groupon (t = 10E-4) | 0.75 | 26 | 1 | 13 |
| IBD-Groupon (t = 10E-3) | 0.9 | 22 | 8 | 14 |
Performance of MCMC on real data
| IBD group size | No. of reported IBDs | Average num of reported IBD locus |
|---|---|---|
| 3 | 144 | 123 |
| 4 | 237 | 75 |
| 5 | 246 | 52 |
Fig. 2.(a) Running time (sec.) of IBD-Groupon for different number of values for each state using 50 individuals. (b) Running time (sec.) of IBD-Groupon for different number of individuals using 100 top values
Performance of IBD-Groupon on real data with respect to different number of individuals
| No. of individuals | Precision | Recall (2) (%) | Recall (3) (%) | Recall (4) (%) |
|---|---|---|---|---|
| 10 | 95 | 100 | 100 | NA |
| 30 | 70 | 100 | 80 | 100 |
| 50 | 65 | 100 | 83 | 100 |
| 70 | 51 | 100 | 83 | 100 |
| 90 | 43 | 100 | 75 | 75 |
Note: The top-100 values are saved for each state. The precision is for all IBD group sizes. The recall is for different IBD group sizes. Beagle threshold = 10E-4.
Performance of IBD-Groupon on real data with respect to different number of individuals
| Num. of individuals | Precision (%) | Recall (2) (%) | Recall (3) (%) | Recall (4) (%) |
|---|---|---|---|---|
| 10 | 100 | 100 | 100 | NA |
| 30 | 96.4 | 50 | 60 | 0 |
| 50 | 96.8 | 87.5 | 66 | 0 |
| 70 | 87 | 83 | 66 | 0 |
| 90 | 85 | 86 | 62 | 0 |
Note: The top-100 values are saved for each state. The precision is for all IBD group sizes. The recall is for different IBD group sizes. Beagle threshold = 10E-8.
Performance of IBD-Groupon on real data with respect to IBD tracts of different lengths
| IBD tracts length (SNPs) | Beagle precision (%) | IBD-Groupon precision (%) | Beagle recall (%) | IBD-Groupon recall (%) |
|---|---|---|---|---|
| 90 | 34 | 87 | 100 | |
| 90 | 83 | 73 | 93 | |
| 90 | 88 | 67 | 67 | |
| 91 | 86 | 47 | 47 |
Note: The top-100 values are saved for each state. We only consider pairwise IBD tracts. Threshold for Beagle is 10E-8 and for IBD-Groupon is 10E-4.
Expected IBD length versus IBD length reported by IBD-Groupon on real data for 100 individuals and top-100 values are saved for each state
| 2 | 3 | 4 | |
|---|---|---|---|
| Expected length | 236 | 133 | 100 |
| Reported length | 205 | 137 | 102 |
Note: Length of the IBDs is simply the number of SNPs in the IBDs.
Performance of IBD-Groupon on real data with respect to different number of individuals
| DASH | IBD-Groupon | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Threshold | Precision (%) | Recall (2) (%) | Recall (3) (%) | Recall (4) (%) | Precision (%) | Recall (2) (%) | Recall (3) (%) | Recall (4) (%) | ||
| 10E-8 | 63 | 62 | 32 | 25 | 85 | 86 | 62 | NA | ||
| 10E-4 | 43 | 50 | 30 | 13 | 43 | 100 | 75 | 75 |
Note: The top-100 values are saved for each state. The precision is for all IBD group sizes. The recall is for different IBD group sizes. Beagle threshold = 10E-8. No. of individuals = 90.