| Literature DB >> 23984382 |
Xuanping Zhang1, Jiayin Wang, Aiyuan Yang, Chunxia Yan, Feng Zhu, Zhongmeng Zhao, Zhi Cao.
Abstract
Understanding associations between genotypes and complex traits is a fundamental problem in human genetics. A major open problem in mapping phenotypes is that of identifying a set of interacting genetic variants, which might contribute to complex traits. Logic regression (LR) is a powerful multivariant association tool. Several LR-based approaches have been successfully applied to different datasets. However, these approaches are not adequate with regard to accuracy and efficiency. In this paper, we propose a new LR-based approach, called fish-swarm logic regression (FSLR), which improves the logic regression process by incorporating swarm optimization. In our approach, a school of fish agents are conducted in parallel. Each fish agent holds a regression model, while the school searches for better models through various preset behaviors. A swarm algorithm improves the accuracy and the efficiency by speeding up the convergence and preventing it from dropping into local optimums. We apply our approach on a real screening dataset and a series of simulation scenarios. Compared to three existing LR-based approaches, our approach outperforms them by having lower type I and type II error rates, being able to identify more preset causal sites, and performing at faster speeds.Entities:
Mesh:
Year: 2013 PMID: 23984382 PMCID: PMC3747618 DOI: 10.1155/2013/574735
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Figure 1Logic tree representation of X 1∧X 2∧(¬X 3∨X 4) and three permissible moves for logic trees. Starting tree, T 0, is at top left.
Figure 2“FOLLOW” behavior is illustrated. When s > s best (s best is equal to 4, and s is equal to 6; shown at (a)), the probability of “DEL” operations (shown as the green shadow) is larger than the probability of “ALT” operations (shown as the purple shadow). When s < s best (s best is equal to 6, and s is equal to 3; shown at (d)), the probability of “ADD” operations (shown as the green shadow) is larger than the probability of “ALT” operations (shown as the purple shadow).
Accuracy for different numbers of causal SNPs. The column “Causal” shows the number of casual sites. The type I error rate is the percentage of missed causal sites divided by the number of selected SNPs. The type II error rate is the percentage of wrong selections of noncausal SNPs among all of the SNPs involved in a regression model. For each simulation configuration, the number is computed based on 100 repeats.
| Causal | FSLR | MCLR | FBLR | LogicFS | ||||
|---|---|---|---|---|---|---|---|---|
| Type I | Type II | Type I | Type II | Type I | Type II | Type I | Type II | |
| 10 | 0.65% | 65.00% | 1.38% | 88.30% | 0.52% | 52.00% | 0.63% | 63.00% |
| 20 | 1.38% | 69.00% | 1.21% | 94.75% | 1.34% | 67.00% | 1.47% | 73.50% |
| 30 | 1.75% | 58.33% | 1.20% | 96.13% | 2.15% | 71.67% | 2.21% | 73.67% |
| 40 | 2.53% | 63.25% | 1.18% | 97.30% | 3.02% | 75.50% | 3.22% | 80.50% |
| 50 | 3.72% | 69.40% | 1.14% | 97.64% | 4.05% | 81.00% | 3.98% | 79.60% |
| 60 | 3.80% | 63.33% | 1.10% | 97.90% | 4.73% | 78.83% | 4.90% | 81.67% |
| 70 | 4.62% | 66.00% | 1.08% | 98.17% | 5.78% | 82.57% | 5.82% | 83.14% |
| 80 | 5.40% | 67.50% | 1.09% | 98.48% | 6.24% | 78.00% | 6.58% | 82.25% |
| 90 | 5.38% | 59.79% | 1.10% | 98.91% | 7.24% | 80.44% | 7.67% | 85.22% |
| 100 | 6.44% | 64.40% | 1.05% | 98.40% | 7.76% | 77.60% | 8.47% | 84.70% |
Comparisons on identifying preset causal sites. The column “Causal” shows the number of casual sites. A column under the name of an approach shows the average number (among 100 repeats) of successfully identified preset causal sites among the number of casual sites.
| Causal | FSLR | MCLR | FBLR | LogicFS |
|---|---|---|---|---|
| 10 | 3.5 | 1.23 | 4.8 | 3.7 |
| 20 | 6.2 | 1.05 | 6.6 | 5.3 |
| 30 | 12.5 | 1.16 | 8.5 | 7.9 |
| 40 | 14.7 | 1.08 | 9.8 | 7.8 |
| 50 | 12.8 | 1.18 | 9.5 | 10.2 |
| 60 | 22.0 | 1.26 | 12.7 | 11.0 |
| 70 | 23.8 | 1.28 | 12.2 | 11.8 |
| 80 | 26.0 | 1.22 | 17.6 | 14.2 |
| 90 | 36.2 | 0.98 | 17.5 | 13.3 |
| 100 | 35.6 | 1.60 | 22.4 | 15.3 |
Accuracy for different numbers of causal SNPs with risks and noise. The level of risk is equal to the probability of the phenotype being the same as the output of the Boolean expression. The level of noise is equal to the probability of randomly altering an allelic value from wild type to mutation or from mutation to wild type. The type I and II error rates are similar. For each simulation configuration, the number is computed based on 100 repeats.
| FSLR | MCLR | FBLR | LogicFS | |||||
|---|---|---|---|---|---|---|---|---|
| Type I | Type II | Type I | Type II | Type I | Type II | Type I | Type II | |
| Risk | ||||||||
| 5% | 12.8% | 58.80% | 1.16% | 98.88% | 8.70% | 73.60% | 6.80% | 71.80% |
| 10% | 12.7% | 59.00% | 1.12% | 98.44% | 8.90% | 77.60% | 6.70% | 73.40% |
| 15% | 12.3% | 59.20% | 1.19% | 98.92% | 8.90% | 74.40% | 6.60% | 73.90% |
| Noise | ||||||||
| 1% | 12.5% | 59.00% | 1.17% | 98.56% | 8.90% | 77.80% | 6.70% | 73.40% |
| 2% | 13.5% | 58.00% | 1.17% | 98.76% | 9.00% | 81.80% | 6.80% | 74.60% |
| 3% | 14.8% | 56.60% | 1.08% | 99.19% | 8.90% | 78.40% | 7.30% | 76.60% |
Comparisons on identifying preset causal sites with risks and noise. A column under the name of an approach shows the average number (among 100 repeats) of successfully identified preset causal sites under the particular level of noise.
| Noise | FSLR | MCLR | FBLR | LogicFS |
|---|---|---|---|---|
| 5% | 20.6 | 0.72 | 11.1 | 13.3 |
| 10% | 21.0 | 0.62 | 9.1 | 13.7 |
| 15% | 21.7 | 0.50 | 10.8 | 11.7 |
|
| ||||
| 1% | 20.6 | 0.56 | 13.2 | 14.1 |
| 2% | 20.5 | 1.28 | 11.2 | 13.3 |
| 3% | 19.9 | 0.54 | 12.8 | 12.5 |
Comparisons on running time. The running time is measured in seconds.
| Causal | FSLR | MCLR | FBLR | LogicFS |
|---|---|---|---|---|
| 10 | 17.43 | 56.59 | 1659 | 12.23 |
| 20 | 18.48 | 53.64 | 1559 | 12.50 |
| 30 | 18.72 | 58.37 | 1603 | 12.12 |
| 40 | 18.96 | 57.76 | 1463 | 11.88 |
| 50 | 19.45 | 58.31 | 1520 | 12.10 |
| 60 | 19.94 | 57.72 | 1418 | 12.43 |
| 70 | 20.69 | 59.58 | 1482 | 12.11 |
| 80 | 22.49 | 58.04 | 1366 | 12.57 |
| 90 | 24.35 | 58.54 | 1466 | 12.79 |
| 100 | 24.35 | 59.13 | 1346 | 12.65 |