| Literature DB >> 19208121 |
Yanni Zhu1, Xiaotong Shen, Wei Pan.
Abstract
BACKGROUND: The importance of network-based approach to identifying biological markers for diagnostic classification and prognostic assessment in the context of microarray data has been increasingly recognized. To our knowledge, there have been few, if any, statistical tools that explicitly incorporate the prior information of gene networks into classifier building. The main idea of this paper is to take full advantage of the biological observation that neighboring genes in a network tend to function together in biological processes and to embed this information into a formal statistical framework.Entities:
Mesh:
Year: 2009 PMID: 19208121 PMCID: PMC2648796 DOI: 10.1186/1471-2105-10-S1-S21
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Simulation results for p = 55. The simulation results were averaged over 100 runs for p = 55 (22 informative and 33 noise genes).
| Test Error (SE) | # False Negative (SE) | Model Size (SE) | |||||
| Scenario | Method | ||||||
| 1 | STD | 0.122 (0.002) | 0.096 (0.001) | 0.0 (0.0) | 0.0 (0.0) | 55.0 (0.0) | 55.0 (0.0) |
| L1 | 0.134 (0.003) | 0.094 (0.002) | 13.1 (0.3) | 10.9 (0.4) | 12.3 (0.6) | 15.3 (0.7) | |
| New ( | 0.156 (0.003) | 0.105 (0.002) | 9.3 (0.4) | 2.4 (0.3) | 17.0 (0.6) | 24.3 (0.6) | |
| New ( | 0.111 (0.003) | 0.068 (0.002) | 1.0 (0.3) | 0.1 (0.1) | 24.7 (0.5) | 25.1 (0.4) | |
| New ( | 0.081 (0.002) | 0.059 (0.002) | 0.0 (0.0) | 0.0 (0.0) | 28.6 (0.8) | 28.2 (0.8) | |
| 2 | STD | 0.121 (0.002) | 0.099 (0.001) | 0.0 (0.0) | 0.0 (0.0) | 55.0 (0.0) | 55.0 (0.0) |
| L1 | 0.133 (0.003) | 0.096 (0.001) | 13.6 (0.3) | 11.1 (0.4) | 11.4 (0.5) | 15.1 (0.7) | |
| New ( | 0.156 (0.003) | 0.105 (0.002) | 9.6 (0.4) | 3.9 (0.3) | 16.3 (0.7) | 24.7 (0.6) | |
| New ( | 0.121 (0.003) | 0.075 (0.002) | 3.0 (0.4) | 0.3 (0.1) | 22.3 (0.6) | 25.2 (0.5) | |
| New ( | 0.083 (0.002) | 0.064 (0.002) | 0.0 (0.0) | 0.0 (0.0) | 28.6 (0.8) | 29.0 (0.8) | |
| 3 | STD | 0.162 (0.002) | 0.138 (0.001) | 0.0 (0.0) | 0.0 (0.0) | 55.0 (0.0) | 55.0 (0.0) |
| L1 | 0.166 (0.003) | 0.131 (0.001) | 13.9 (0.2) | 11.0 (0.3) | 11.2 (0.5) | 16.6 (0.7) | |
| New ( | 0.177 (0.003) | 0.140 (0.002) | 12.4 (0.4) | 7.7 (0.4) | 13.5 (0.6) | 19.9 (0.8) | |
| New ( | 0.164 (0.003) | 0.127 (0.002) | 4.4 (0.5) | 1.2 (0.3) | 21.5 (0.6) | 26.3 (0.7) | |
| New ( | 0.137 (0.003) | 0.114 (0.001) | 0.4 (0.2) | 0.1 (0.1) | 29.8 (0.9) | 33.2 (0.9) | |
| 4 | STD | 0.189 (0.002) | 0.157 (0.002) | 0.0 (0.0) | 0.0 (0.0) | 55.0 (0.0) | 55.0 (0.0) |
| L1 | 0.186 (0.002) | 0.155 (0.002) | 14.2 (0.3) | 10.5 (0.3) | 11.5 (0.6) | 18.1 (0.8) | |
| New ( | 0.198 (0.003) | 0.160 (0.002) | 13.8 (0.3) | 8.6 (0.4) | 11.8 (0.5) | 20.9 (0.9) | |
| New ( | 0.190 (0.003) | 0.147 (0.002) | 7.2 (0.6) | 1.8 (0.4) | 18.8 (0.7) | 30.1 (0.9) | |
| New ( | 0.163 (0.002) | 0.139 (0.002) | 0.2 (0.2) | 0.03 (0.03) | 32.2 (1.0) | 34.8 (1.0) | |
Coefficient estimates of selected informative genes for p = 55 and n = 100. The mean and the standard deviation (SD) of the coefficient estimates for selected informative genes were calculated from 100 runs.
| L1 | New ( | New ( | New ( | ||||||
| Scenario | Mean | SD | Mean | SD | Mean | SD | Mean | SD | |
| 1 | 0.53 | 0.29 | 0.04 | 0.04 | 0.27 | 0.26 | 0.67 | 0.35 | |
| 0.11 | 0.17 | 0.14 | 0.15 | 0.10 | 0.10 | 0.07 | 0.08 | ||
| -0.55 | 0.30 | -0.04 | 0.05 | -0.28 | 0.32 | -0.68 | 0.35 | ||
| -0.08 | 0.15 | -0.18 | 0.15 | -0.11 | 0.09 | -0.08 | 0.08 | ||
| 2 | 0.76 | 0.33 | 0.09 | 0.06 | 0.34 | 0.16 | 0.91 | 0.40 | |
| 0.09 | 0.14 | 0.20 | 0.14 | 0.14 | 0.11 | 0.09 | 0.08 | ||
| 0.29 | 0.23 | 0.01 | 0.03 | 0.15 | 0.10 | 0.48 | 0.23 | ||
| 0.08 | 0.12 | 0.11 | 0.13 | 0.07 | 0.08 | 0.04 | 0.04 | ||
| 3 | 0.51 | 0.39 | 0.03 | 0.07 | 0.41 | 0.70 | 0.95 | 0.34 | |
| 0.22 | 0.21 | 0.24 | 0.19 | 0.20 | 0.17 | 0.13 | 0.11 | ||
| -0.01 | 0.07 | -0.01 | 0.11 | -0.03 | 0.21 | -0.04 | 0.12 | ||
| 0.26 | 0.27 | 0.01 | 0.04 | 0.15 | 0.30 | 0.52 | 0.27 | ||
| 0.09 | 0.13 | 0.13 | 0.16 | 0.12 | 0.16 | 0.07 | 0.11 | ||
| 0.001 | 0.07 | 0.004 | 0.06 | 0.01 | 0.05 | -0.01 | 0.07 | ||
| 4 | 0.40 | 0.38 | 0.03 | 0.06 | 0.48 | 0.80 | 0.97 | 0.43 | |
| 0.27 | 0.26 | 0.32 | 0.25 | 0.30 | 0.23 | 0.20 | 0.20 | ||
| -0.04 | 0.12 | -0.02 | 0.14 | -0.11 | 0.24 | -0.09 | 0.16 | ||
| -0.23 | 0.29 | -0.004 | 0.01 | -0.21 | 0.45 | -0.56 | 0.30 | ||
| -0.15 | 0.20 | -0.16 | 0.19 | -0.17 | 0.19 | -0.09 | 0.13 | ||
| 0.03 | 0.08 | -0.002 | 0.10 | 0.05 | 0.18 | 0.06 | 0.15 | ||
Simulation results for p = 550 or 1, 100. The simulation results were averaged over 100 runs for p = 550 or 1, 100 (22 informative and either 528 or 1,078 noise genes).
| Test Error (SE) | # False Negative (SE) | Model Size (SE) | ||||
| Method | ||||||
| STD | 0.305 (0.003) | 0.354 (0.002) | 0.0 (0.0) | 0.0 (0.0) | 550 (0.0) | 1,100 (0.0) |
| L1 | 0.218 (0.004) | 0.235 (0.004) | 16.6 (0.2) | 17.1 (0.2) | 16.1 (1.0) | 19.2 (1.2) |
| New ( | 0.232 (0.003) | 0.255 (0.004) | 14.9 (0.3) | 15.6 (0.3) | 20.7 (1.1) | 22.6 (1.4) |
| New ( | 0.202 (0.004) | 0.221 (0.004) | 5.7 (0.5) | 6.7 (0.6) | 32.6 (1.5) | 34.6 (1.9) |
| New ( | 0.170 (0.003) | 0.180 (0.004) | 0.7 (0.3) | 1.3 (0.4) | 82.6 (5.4) | 98.9 (7.2) |
Parkinson's disease data: 1,070 genes. A total of 1,070 genes with SD of expression levels across the 105 samples ≥ 15 had network information. The classification error, number of selected disease genes, number of selected genes, and their standard errors (SE in parentheses) were obtained by averaging over 10 runs. Five disease genes were UBE1, PARK2, UBB, SEPT5, and SNCAIP.
| Method | Error | # Disease Genes | # Genes |
| STD | 0.424 (0.016) | 5.0 (0.0) | 1,070.0 (0.0) |
| L1 | 0.464 (0.021) | 0.1 (0.1) | 19.2 (3.8) |
| New ( | 0.476 (0.015) | 0.1 (0.1) | 24.9 (4.3) |
| New ( | 0.480 (0.026) | 0.2 (0.1) | 30.6 (5.2) |
| New ( | 0.451 (0.028) | 0.0 (0.0) | 70.6 (14.1) |
| Final Model | - | 1.0 | 75.0 |
Figure 1Parkinson's disease gene subnetworks. Left: PD-1nb-net, including 8 Parkinson disease genes (gray) and their 8 direct neighbors (white). Right: PD-2nb-net, including 8 Parkinson disease genes (gray), their 8 direct and 10 second-order neighbors (white).
First- and second-order-neighbor subnetworks of Parkinson's disease data. The classification error, number of selected disease genes, number of selected genes, and their standard errors (SE in parentheses) were obtained by averaging over 10 runs. Eight disease genes were UBE1, PARK2, UBB, SEPT5, SNCAIP, GPR37, TH, and SNCA.
| Network | Method | Error | # Disease Genes | # Genes |
| PD-1nb-net | STD | 0.476 (0.023) | 8.0 (0.0) | 16.0 (0.0) |
| L1 | 0.471 (0.017) | 2.8 (0.7) | 6.1 (1.5) | |
| New ( | 0.462 (0.016) | 3.4 (0.8) | 7.3 (1.7) | |
| New ( | 0.462 (0.014) | 3.6 (0.7) | 8.4 (1.5) | |
| New ( | 0.482 (0.015) | 3.0 (1.2) | 7.5 (2.1) | |
| Final Model | - | 8.0 | 16.0 | |
| PD-2nb-net | STD | 0.444 (0.016) | 8.0 (0.0) | 26.0 (0.0) |
| L1 | 0.449 (0.017) | 3.1 (0.5) | 10.9 (2.1) | |
| New ( | 0.464 (0.022) | 5.3 (0.9) | 13.2 (3.2) | |
| New ( | 0.447 (0.023) | 6.1 (0.8) | 13.7 (2.7) | |
| New ( | 0.433 (0.016) | 6.2 (0.9) | 20.0 (2.5) | |
| Final Model | - | 8.0 | 26.0 | |
Subnetworks of breast cancer data. The BC-1nb-net/BC-2nb-net had 294/1,718 genes in total including 40/107 cancer genes, and 7/14 cancer genes with mutation frequencies larger than 0.10. The classification error, number of selected cancer genes with mutation frequencies larger than 0.10 (CA-LMF), number of selected cancer genes (CA), number of selected genes, and their standard errors (SE in parentheses) were obtained by averaging over 10 runs.
| Network | Method | Error | # CA-LMF | # CA | # Genes |
| BC-1nb-net | STD | 0.371 (0.014) | 7.0 (0.0) | 40.0 (0.0) | 294.0 (0.0) |
| L1 | 0.357 (0.014) | 0.3 (0.2) | 4.6 (0.8) | 32.3 (4.8) | |
| New ( | 0.360 (0.014) | 0.4 (0.2) | 3.6 (1.1) | 25.0 (7.0) | |
| New ( | 0.366 (0.012) | 0.6 (0.3) | 4.7 (1.2) | 27.2 (5.2) | |
| New ( | 0.399 (0.012) | 1.2 (0.2) | 7.8 (1.7) | 40.2 (6.5) | |
| Final Model | - | 1.0 | 4.0 | 14.0 | |
| BC-2nb-net | STD | 0.351 (0.014) | 14.0 (0.0) | 107.0 (0.0) | 1,718.0 (0.0) |
| L1 | 0.360 (0.006) | 0.0 (0.0) | 2.4 (0.9) | 42.9 (11.8) | |
| New ( | 0.374 (0.011) | 0.1 (0.1) | 1.9 (0.5) | 51.4 (12.6) | |
| New ( | 0.360 (0.007) | 0.2 (0.1) | 2.5 (0.7) | 41.7 (9.2) | |
| New ( | 0.385 (0.021) | 0.3 (0.2) | 0.7 (0.3) | 34.2 (10.3) | |
| Final Model | - | 1.0 | 2.0 | 23.0 | |